From: Ihor Radchenko <firstname.lastname@example.org> To: Tom Gillespie <email@example.com> Cc: org-mode-email <firstname.lastname@example.org>, Nicolas Goaziou <email@example.com>, Timothy <firstname.lastname@example.org> Subject: Re: Org Syntax Specification Date: Tue, 18 Jan 2022 20:09:59 +0800 [thread overview] Message-ID: <87r195nt2g.fsf@localhost> (raw) In-Reply-To: <CA+G3_PM15pru_zRgT=t-gzbd6qTOw7xzXCqWd3VMb8ciH3D77g@mail.gmail.com> Tom Gillespie <email@example.com> writes: > Extremely in favor of removing switches. There are so many better ways > to do this now that aren't like some eldritch unix horror crawling up > out of the abyss and into the eBNF :) I also agree that switches and $$-style equations may be deprecated. We can 1. Do not mention them in the document 2. Add org-lint warnings about obsoletion As for your other comments, you seem to be suggesting a number of changes to the existing Org syntax. Some of them looks fine, some are not. However, please keep in mind that we have to deal with back compatibility, third party compatibility, and not breaking existing Org documents unless we have a very strong justification. I suggest to branch a number of new threads from here for each concrete suggestion where you want to make changes to Org syntax, as opposed to just document wording. Otherwise, this discussion will become a total mess. More details below. > +Elements are further divided into "[[#Headings][headings]]", "[[#Sections][sections]]"[fn::sections are not elements], "[[#Greater_Elements][greater Nope. Sections are actually elements. See =org-element-all-elements=. > +other headings. [fn:tom2:I would not discuss strata here because it is > +not related to the syntax of the document. It is related to how that > +syntax is interpreted by org mode. The strata are nesting rules that > +are independent of the syntax, and discussing that here in the syntax > +document is confusing, because the nesting is not something that can be > +parsed directly because it depends on the number of asterisks.] I disagree. Nesting rules are the important part of syntax. We have restrictions on what elements can be inside other element. The same patterns are not recognised in Org depending on their nesting. For example, links that you put into property drawers are not considered link objects. > +citation references and [[#Table_Cells][table cells]].[fn:tom3:Table cells should > +be treated in a way that is entirely separate from objects. This document has included > +them as such as has org-element (iirc) however since they can never appear in a paragraph > +and because tables are completely separate syntactically, we should probably drop the > +idea that table cells are objects. I realize that this might mean the creation of a > +distinction between paragraph-objects, title-objects, table-objects etc.] Again I disagree. While your idea about table cells is reasonable (similar for citation-references inside citations), I am against decoupling Org syntax from org-element implementation. In org-element.el, table-cells are just yet another object. If we make things in org-element and syntax document out of sync, confusion and errors will follow during future maintenance. > A line containing only spaces, tabs, newlines, and line feeds (=\t\n\r=) > -is considered a /blank line/. Blank lines can be used to separate > +is considered a /blank line/. Blank lines separate > paragraphs and other elements. This actually reads slightly confusing. "Blank lines separate paragraphs and other elements" sounds like blank lines are only relevant before/after paragraphs. However, there are also footnote references and lists. Maybe we can try something like: Blank lines can be used to indicate end of some elements. "can" because a single blank line usually does not separate anything. > +considered part of the paragraph.[fn:tom4:I don't think we need to discuss > +nesting scope here, it is confusing, it is always the immediately prior > +(lesser?) element.] Then where can we put it? This is one of the tricky conventions we use in the parser. > ++ STARS :: A string consisting of one or more asterisks[fn::removed > + note about inline tasks because it is still a heading, any mention > + of a concrete number should not appear in the specification of > syntax.] I am not sure here. Inline tasks are special because a one-line inline task must not contain any text below, cannot have planning or properties. > + contains =TODO= and =DONE=, however org-todo-keywords-1 is a buffer local > + variable and can be set by users in an org file using =#+todo:=.]. If we mention this, we also need to elaborate kind of element is #+todo:, where it can be located, and how to parse multiple instances of #+todo in the document. > -A heading contains directly one section (optionally), followed by > -any number of deeper level headings. > +The level of a heading can be used to construct a nested structure. > +All content following a heading that appears before the next heading > +(regardless of the level of that next heading) is a section. In addition, > +text before the first heading in an org document is also a section. Note that it is not true for one-line inline tasks. > +considered a section), sections only occur within headings.[fn:: The > +choice to call this syntactic component a section is confusing because > +it is at odds with the usual notion of a section, namely that the > +usual concept of a section implies that it includes nested content. I > +personally didn't realize that it ended at the next heading until > +writing this comment (as can be seen from reading my comments in the > +laundry implementation). Therefore I suggest that we look for an > +alternate name for this syntactic component. Maybe "segment" or > +something similar that indicates that it is truncated?] Sounds reasonable. However, we may also need to make this change in Elisp level, which is tricky when you think about backward-compatibility. > +however, contain [[Planning][planning]].[fn::This is wrong? If it is not > +wrong, then it should be. Property drawers are already annoying to implement > +because they share syntax with regular drawers, and allowing a property drawer > +at the top of a file without a heading means that it should be a regular drawer > +not a property drawer, otherwise you have to special case the handling of drawers > +in the zeroth section. What is the use case for a property drawer as opposed to > +a #+property: line in the zeroth section? I may come around on this at some point, > +but right now it seems more complex, however it might actually be more consistent > +if we imagine the zeroth section as being nested inside a single heading that has > +level zero implicitly at the top of a document. Unfortunately that means that such > +property drawers cannot be determined from a homogeneous syntax but instead require > +some operations on the internal representation. Note also that if this were allowed > +then the property drawer should only be allowed as the very first line of a file > +because newlines at the start of a file need to be preserved. More though required.] The statement about property drawers in first section (that how we refer to it in org-element) is correct. First section and its property drawer location is special. I agree that it's inconsistent with normal property drawers. However, we cannot change it without breaking existing Org files. It we decide to change syntax in this area, we should think carefully about possible consequences. > + [fn::Without going into to much detail, affiliated keywords should > +not be distinguished from other keywords at the level of the syntax. > +The fact that they are is an artifact of the elisp implementation. > +The determination of the behavior of a keyword with regard to > +affiliating behavior should be determined in a later pass, even if in > +some cases some implementations may want to materialize them into the > +parser for performance reasons. Allowing users to promote a keyword to > +be an affiliated keyword would be incredibly powerful for attaching > +metadata to parts of org-files in a way that is user extensible. It > +may still be desirable to describe the behavior of affiliated keywords > +here, but they are not in any way distinct from other keywords at the > +level of org syntax and trying to implement them as such is usually a > +mistake (that I have made).] I generally support this idea. Handling keywords in org-element is not pretty. Having them in the parse tree would make things easier. However, we again need to consider back-compatibility. I can imagine third-party ox-* packages breaking if we make this change - we should double check if we decide to change this. > +property of the element they apply to. [fn::While it is tempting to try > +to do this at the level of the grammar it induces a number of nasty > +ambiguities in practice. It is saner to have a single unified keyword > +syntax and then to determine affiliation behavior in a later pass.] Yes, it is saner. However, our syntax document is supposed to be human-readable description of what org-element does. We cannot introduce differences between grammar document and de-facto parser implementation. This will defeat the purpose to providing reference syntax - we will get inconsistency between Emacs Org mode and external parsers. > + ~org-element-dual-keywords~ contains =CAPTION= and =RESULTS=.].[fn:: > + All keywords should allow OPTVAL, it regularizes and simplifies the syntax.] I support this idea. > + [fn:: ~:end:~ may be capitalized (legacy support)] Both :END: and :end: are supported by Org parser. What do you mean by legacy? > + [fn::I suggest that we remove inlinetasks from this document. > +They are a hack that cannot be implemented as part of a grammar > +because they require a concrete value to be specified which breaks > +the arbitrary nesting depth of headings. I think I wrote this somewhere > +else as well, but inline tasks can only be a layer on top of headings, > +they cannot displace them.] I disagree. inilinetasks are a part of syntax de facto and they can be encountered in Org documents in the wild. If you treat inlinetasks as ordinary headings, things may be broken unpredictably during parsing. Instead, we may consider making inlinetask level constant. > +indicate that it should, which is misleading. Further, it is actually > +not possible to implement contents as specified because grammars > +cannot track the indentation level that is required to reconstruct > +list items correctly. Therefore CONTENTS should not be defined as such > +but should only specify that they can be anything except a newline. I > +think that the intent of this document is somewhat a conflation of the > +syntax for org and of the semantics as determined by export backends > +and/or org-element, however it makes it extremely confusing because it > +is not actually possible to parse CONTENTS, they must be reconstructed > +from the parse tree.] Could you elaborate why grammars cannot track the indentation level? AFAIU, If it were the case, python would not be parseable. > + [fn::The failure mode for malformed contents needs to be > +determined more clearly here. We don't want property draws to suddenly > +become plain drawers just because a user has a malformed line, that > +could be disastrous if certain settings in the property drawer mask > +settings from further up the tree. In short, malformed contents > +should not poison the whole property drawer.] Yet, it is exactly what happens in Org. malformed property drawers will become ordinary drawers. > + SWITCHES :: Any number of SWITCH patterns, separated by a single > - space character > + space character [fn::For the love of all that is sane can we > + please just remove this from the spec or mark it as legacy.] I support this idea. > +PLANNING must directly follow HEADING without any blank lines in between. > + > + [fn::Need a spec for how to handle multiple instances of the same keyword with different values.] The last one wins (as in org-element-planning-parser) > + [fn::As I think I mention elsewhere, the concrete names here > +should NOT be part of the syntax, it makes the parser brittle > +and hard to maintain. Differentiation between entities and fragments > +should be handled at the syntax level for cases where the fragment > +has brackets, and then in a second pass for values that are > +syntactically entity-or-fragment and must be determined after > +the fact.] How would you define entities object then? First/second pass is an implementation detail. Our current description follows how org-element handles entities. > + [fn::We probably want to node that BACKEND can be the empty string > +per that thread on how to deal with intra-word markup. Again this > +also touches on the general principle of wanting to close over the > +empty string so that users aren't surprised when ~@@:lol@@~ suddenly > +appears in plain text just because no backend was specified.] While I am not opposing the idea, your principle is not followed by org-element parser. We may consider changing it, but it is again a whole separate discussion where we need to consider pros and cons. > Note that the first pattern may not occur on an /unindented/ line, as it > -is then a [[#Footnote_Definitions][footnote definition]]. > +is then a [[#Footnote_Definitions][footnote definition]]. [fn::I'm not sure this is quite right? > +the font locking code is not consistent with actual behavior, need to > +review the laundry test cases and example files.] Do not look at font-locking. You can safely consider that fontification is wrong in all non-trivial cases. Always check org-element-at-point and org-element-context. > - [[#Targets_and_Radio_Targets][radio targets]], [[#Targets_and_Radio_Targets][targets]], and [[#Timestamps][timestamps]]. > + [[#Targets_and_Radio_Targets][radio targets]], [[#Targets_and_Radio_Targets][targets]], and [[#Timestamps][timestamps]]. [fn::Like for the comma in > + macros, I think it would be safe to add ~\|~ as an escape character. > + The issue in the elisp implementation is not actually at the level > + of the syntax, but is actually in the export backends or somewhere > + deeper, because even using a macro that expands to be a pipe ~|~ > + breaks the table (which is really bad).] I am not sure if it is needed. We can already to \vert > + [fn::I have some suggestions for extensions to timestamp syntax to > +support historical and far future dates, as well timezone offsets (NOT > +the 3 letter ambiguous disaster) and seconds and sub-second times.] That would be welcome, but someone™ should implement timezone support in Elisp level. We have several discussions about this in the past. > +The four =*/_+= may be arbitrarily nested to any depth. Verbatim and > +code ==~= may be nested inside any other markup, but no other markup > +will be interpreted inside of them since they are interpreted exactly. That's not accurate. you cannot nest, say, bold inside bold. You cannot put code inside any other markup freely: consider *bold =asd*asd= not bold* Best, Ihor
next prev parent reply other threads:[~2022-01-18 13:19 UTC|newest] Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-01-09 18:02 Timothy 2022-01-15 12:40 ` Sébastien Miquel 2022-01-15 16:36 ` Depreciating TeX-style LaTeX fragments (was: Org Syntax Specification) Timothy 2022-01-16 8:08 ` Sébastien Miquel 2022-01-16 9:23 ` Depreciating TeX-style LaTeX fragments Martin Steffen 2022-01-16 9:46 ` Colin Baxter 😺 2022-01-16 11:11 ` Tim Cross 2022-01-16 13:26 ` Juan Manuel Macías 2022-01-16 14:43 ` Colin Baxter 😺 2022-01-16 15:16 ` Greg Minshall 2022-01-16 17:45 ` Rudolf Adamkovič 2022-01-16 12:10 ` Eric S Fraga 2022-01-16 14:30 ` Anthony Cowley 2022-01-18 0:54 ` Org Syntax Specification Tom Gillespie 2022-01-18 12:09 ` Ihor Radchenko [this message] 2022-01-19 1:22 ` Tom Gillespie 2022-01-19 11:58 ` Ihor Radchenko
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: https://www.orgmode.org/ * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=87r195nt2g.fsf@localhost \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --subject='Re: Org Syntax Specification' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Code repositories for project(s) associated with this inbox: https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).