From 3527331f02e593ec6ba6cb4c8bde3f64de3ad216 Mon Sep 17 00:00:00 2001 From: Tom Gillespie Date: Mon, 17 Jan 2022 19:34:21 -0500 Subject: [PATCH] Tom's comments and modifications to org syntax edited I removed any mention of markdown because it is a distraction in this document and is not something we want anyone attending to here. I change "top level section" to "zeroth section" which I think is more consistent terminology because level is often used to refer to the depth of parsing at any given point in the file and the top level refers to anything that can be parsed without context. Zeroth makes it clear that we are talking about the actual zeroth occurrence of a section in a file/buffer/stream. --- dev/org-syntax-edited.org | 399 +++++++++++++++++++++++++++++++------- 1 file changed, 331 insertions(+), 68 deletions(-) diff --git a/dev/org-syntax-edited.org b/dev/org-syntax-edited.org index c3259473..2e99070d 100644 --- a/dev/org-syntax-edited.org +++ b/dev/org-syntax-edited.org @@ -19,9 +19,7 @@ under the GNU General Public License v3 or later. Org is a plaintext format composed of simple, yet versatile, forms which represent formatting and structural information. It is designed to be both intuitive to use, and capable of representing complex -documents. Like [[https://datatracker.ietf.org/doc/html/rfc7763][Markdown]], Org may be considered a lightweight markup -language. However, while Markdown refers to a collection of similar -syntaxes, Org is a single syntax. +documents. This document describes and comments on Org syntax as it is currently read by its parser (=org-element.el=) and, therefore, by the export @@ -32,14 +30,13 @@ framework. ** Objects and Elements The components of this syntax can be divided into two classes: -"[[#Objects][objects]]" and "[[#Elements][elements]]". To better understand these classes, -consider the paragraph as a unit of measurement. /Elements/ are -syntactic components that exist at the same or greater scope than a -paragraph, i.e. which could not be contained by a paragraph. -Conversely, /objects/ are syntactic components that exist with a smaller -scope than a paragraph, and so can be contained within a paragraph. - -Elements can be stratified into "[[#Headings][headings]]", "[[#Sections][sections]]", "[[#Greater_Elements][greater +"[[#Elements][elements]]" and "[[#Objects][objects]]". Elements are +syntactic components that have the same priority as or greater +priority than a paragraph. Objects are syntactic components that are +only recognized inside a paragraph or other paragraph-like elements +such as heading titles. + +Elements are further divided into "[[#Headings][headings]]", "[[#Sections][sections]]"[fn::sections are not elements], "[[#Greater_Elements][greater elements]]", and "[[#Lesser_Elements][lesser elements]]", from broadest scope to narrowest. Along with objects, these sub-classes define categories of syntactic environments. Only [[#Headings][headings]], [[#Sections][sections]], [[#Property_Drawers][property drawers]], and @@ -52,7 +49,12 @@ elements that cannot contain any other elements. As such, a paragraph is considered a lesser element. Greater elements can themselves contain greater elements or lesser elements. Sections contain both greater and lesser elements, and headings can contain a section and -other headings. +other headings. [fn:tom2:I would not discuss strata here because it is +not related to the syntax of the document. It is related to how that +syntax is interpreted by org mode. The strata are nesting rules that +are independent of the syntax, and discussing that here in the syntax +document is confusing, because the nesting is not something that can be +parsed directly because it depends on the number of asterisks.] ** The minimal and standard sets of objects @@ -60,25 +62,33 @@ To simplify references to common collections of objects, we define two useful sets. The /<<>> of objects/ refers to [[#Plain_Text][plain text]], [[#Emphasis_Markers][text markup]], [[#Entities][entities]], [[#LaTeX_Fragments][LaTeX fragments]], [[#Subscript_and_Superscript][superscripts and subscripts]]. The /<<>> of objects/ refers to the entire set of objects, excluding -citation references and [[#Table_Cells][table cells]]. +citation references and [[#Table_Cells][table cells]].[fn:tom3:Table cells should +be treated in a way that is entirely separate from objects. This document has included +them as such as has org-element (iirc) however since they can never appear in a paragraph +and because tables are completely separate syntactically, we should probably drop the +idea that table cells are objects. I realize that this might mean the creation of a +distinction between paragraph-objects, title-objects, table-objects etc.] ** Blank lines A line containing only spaces, tabs, newlines, and line feeds (=\t\n\r=) -is considered a /blank line/. Blank lines can be used to separate +is considered a /blank line/. Blank lines separate paragraphs and other elements. With the exception of [[#Items][list items]], blank lines belong to the preceding element with the narrowest possible scope. For example, if at the end of a section we have a paragraph and a blank line, that blank line is -considered part of the paragraph. +considered part of the paragraph.[fn:tom4:I don't think we need to discuss +nesting scope here, it is confusing, it is always the immediately prior +(lesser?) element.] ** Indentation Indentation consists of a series of space and tab characters at the beginning of a line. Most elements can be indentated, with the exception of [[#Headings][headings]], [[#Inlinetasks][inlinetasks]], [[#Footnote_Definitions][footnote definitions]], and [[#Diary_Sexp][diary -sexps]]. +sexps]]. [fn::Maybe a note that indentation is only meaningful in plain lists +and for greater blocks is aligned to the indentation of the #+end_ block?] ** Syntax patterns @@ -97,7 +107,8 @@ meaning, For instance, "KEY" and "VALUE" when describing elements or objects. Unless otherwise specified, a space in a pattern represents one or -more horizontal whitespace characters. +more horizontal whitespace characters.[fn::This should be in bold +so that people don't miss it.] Patterns will often also contain static structures that serve to differentiate a particular element or object type from others, but @@ -141,25 +152,34 @@ In this document, unless specified otherwise, case is insignificant. :CUSTOM_ID: Headings :END: -A Heading is a /unindented/ line structured according to the following pattern: +A Heading is an /unindented/ line structured according to the following pattern: #+begin_example STARS KEYWORD PRIORITY TITLE TAGS #+end_example -+ STARS :: A string consisting of one or more asterisks (up to - ~org-inlinetask-min-level~ if the =org-inlinetask= library is loaded) ++ STARS :: A string consisting of one or more asterisks[fn::removed + note about inline tasks because it is still a heading, any mention + of a concrete number should not appear in the specification of syntax.] and ended by a space character. The number of asterisks is used to - define the level of the heading. + define the level of the heading. [fn::Implementation note: when parsing + stars the space following the stars MUST NOT BE CONSUMED and the next + phase of parsing MUST start with the space so that it is possible to have + a heading with no title that also has tags.] + KEYWORD (optional) :: A string which is a member of ~org-todo-keywords-1~[fn:otkw1:By default, ~org-todo-keywords-1~ only - contains =TODO= and =DONE=, however this is liable to change.]. Case is - significant. This is called a "TODO keyword". - -+ PRIORITY (optional) :: A single alphanumeric character preceded by a - hash sign =#= and enclosed within square brackets (e.g. =[#A]= or =[#1]=). This - is called a "priority cookie". + contains =TODO= and =DONE=, however org-todo-keywords-1 is a buffer local + variable and can be set by users in an org file using =#+todo:=.]. + Case is significant. This is called a "TODO keyword". [fn::Implementation note: + TODO keywords cannot be hardcoded in a tokenizer, the tokenizer must + be configurable at runtime so that in-file TODO keywords are properly interpreted.] + ++ PRIORITY (optional) :: A single letter preceded by a + hash sign =#= and enclosed within square brackets (e.g. =[#A]= or =[#D]=). This + is called a "priority cookie".[fn::Numeric values are not supported + it is a quirk of the elisp implementation that they appear to work, + however they break in nasty and unexpected ways.] + TITLE (optional) :: A series of objects from the standard set, excluding line break objects. It is matched after every other part. @@ -180,15 +200,17 @@ STARS KEYWORD PRIORITY TITLE TAGS If the first word appearing in the title is =COMMENT=, the heading will be considered as "commented". Case is significant. -If its title is the value of ~org-footnote-section~ (=Footnotes= by -default), it will be considered as a "footnote section". Case is -significant. +If the title of a heading is exactly the value of ~org-footnote-section~ +(=Footnotes= by default), it will be considered as a "footnote section". +Case is significant. If =ARCHIVE= is one of the tags given, the heading will be considered as "archived". Case is significant. -A heading contains directly one section (optionally), followed by -any number of deeper level headings. +The level of a heading can be used to construct a nested structure. +All content following a heading that appears before the next heading +(regardless of the level of that next heading) is a section. In addition, +text before the first heading in an org document is also a section. *** Sections :PROPERTIES: @@ -197,7 +219,15 @@ any number of deeper level headings. Sections contain one or more non-heading elements. With the exception of the text before the first heading in a document (which is -considered a section), sections only occur within headings. +considered a section), sections only occur within headings.[fn:: The +choice to call this syntactic component a section is confusing because +it is at odds with the usual notion of a section, namely that the +usual concept of a section implies that it includes nested content. I +personally didn't realize that it ended at the next heading until +writing this comment (as can be seen from reading my comments in the +laundry implementation). Therefore I suggest that we look for an +alternate name for this syntactic component. Maybe "segment" or +something similar that indicates that it is truncated?] *Example* @@ -224,31 +254,67 @@ Its internal structure could be summarized as: (heading)))) #+end_example -*** The top level section +*** The zeroth section :PROPERTIES: -:CUSTOM_ID: Top_level_section +:CUSTOM_ID: Zeroth_section :END: All elements before the first heading in a document lie in a special -section called the /top level section/. It may be preceded by blank -lines. Unlike a normal section, the top level section can immediately +section called the /zeroth section/. It may be preceded by blank +lines. Unlike a normal section, the zeroth section can immediately contain a [[#Property_Drawers][property drawer]], optionally preceded by [[#Comments][comments]]. It cannot -however, contain [[Planning][planning]]. +however, contain [[Planning][planning]].[fn::This is wrong? If it is not +wrong, then it should be. Property drawers are already annoying to implement +because they share syntax with regular drawers, and allowing a property drawer +at the top of a file without a heading means that it should be a regular drawer +not a property drawer, otherwise you have to special case the handling of drawers +in the zeroth section. What is the use case for a property drawer as opposed to +a #+property: line in the zeroth section? I may come around on this at some point, +but right now it seems more complex, however it might actually be more consistent +if we imagine the zeroth section as being nested inside a single heading that has +level zero implicitly at the top of a document. Unfortunately that means that such +property drawers cannot be determined from a homogeneous syntax but instead require +some operations on the internal representation. Note also that if this were allowed +then the property drawer should only be allowed as the very first line of a file +because newlines at the start of a file need to be preserved. More though required.] ** Affiliated Keywords :PROPERTIES: :CUSTOM_ID: Affiliated_Keywords :END: + [fn::Without going into to much detail, affiliated keywords should +not be distinguished from other keywords at the level of the syntax. +The fact that they are is an artifact of the elisp implementation. +The determination of the behavior of a keyword with regard to +affiliating behavior should be determined in a later pass, even if in +some cases some implementations may want to materialize them into the +parser for performance reasons. Allowing users to promote a keyword to +be an affiliated keyword would be incredibly powerful for attaching +metadata to parts of org-files in a way that is user extensible. It +may still be desirable to describe the behavior of affiliated keywords +here, but they are not in any way distinct from other keywords at the +level of org syntax and trying to implement them as such is usually a +mistake (that I have made).] + With the exception of [[#Comments][comments]], [[#Clocks][clocks]], [[#Headings][headings]], [[#Inlinetasks][inlinetasks]], [[#Items][items]], [[#Node_Properties][node properties]], [[#Planning][planning]], [[#Property_Drawers][property drawers]], [[#Sections][sections]], and [[#Table_Rows][table rows]], every other element type can be assigned attributes. + [fn::Technically tables can be assigned attributes, if you try to affiliate to a table +row you are accidentally creating a new table. Also, comments probably shouldn't be +in this list, but I need to review what the behavior was when trying to affiliate +to a paragraph where there is a comment in between, I'm pretty sure it doesn't work +though some of the reordering via org-element does .... Being able to affiliate to +comments could be quite powerful for some specialized use cases.] This is done by adding specific [[#Keywords][keywords]], named /affiliated/ keywords, immediately above the element considered (a blank line cannot lie between the affiliated keyword and element). Structurally, affiliated keyword are not considered an element in their own right but a -property of the element they apply to. +property of the element they apply to. [fn::While it is tempting to try +to do this at the level of the grammar it induces a number of nasty +ambiguities in practice. It is saner to have a single unified keyword +syntax and then to determine affiliation behavior in a later pass.] Affiliated keywords are structured according to one of the following pattern: @@ -268,19 +334,42 @@ Affiliated keywords are structured according to one of the following pattern: + OPTVAL (optional) :: A string consisting of any characters but a newline. This term is only valid when KEY is a member of ~org-element-dual-keywords~[fn:oedkw:By default, - ~org-element-dual-keywords~ contains =CAPTION= and =RESULTS=.]. + ~org-element-dual-keywords~ contains =CAPTION= and =RESULTS=.].[fn:: + All keywords should allow OPTVAL, it regularizes and simplifies the syntax.] + VALUE :: A string consisting of any characters but a newline, except in the case where KEY is member of ~org-element-parsed-keywords~[fn:oepkw:By default, ~org-element-parsed-keywords~ contains =CAPTION=.] in which case VALUE is a series of objects from the standard set, excluding footnote - references. - -Repeating an affiliated keyword before an element will usually result -in the prior VALUEs being overwritten by the last instance of KEY. -There are two situations under which the VALUEs will be concatenated: + references (and line breaks ???).[fn::This is confusing. A + =#+caption:= cannot contain a @@export: snippet@@ with a newline in + it, which this text seems to imply. A better wording would be to + state that there are some keywords where the contents of VALUE will + be further parsed as paragraphs (or whatever we are calling that + thing now. I think we are still missing the term for "object + containing syntax component")] + + [fn::The behavior of affiliated keywords with respect to shadowing +needs to be fully specified because it has major semantics implications, +and for org babel headers it has security implications.] +By default when there are multiple affiliated keywords that last occurrence +of a given keyword is the one that has priority. Normally users should not +specify more than a single instance of an affiliated keyword per element, but +if they do the last one on the page wins. + +The default behavior is NOT followed for the ~#+header:~ keyword that +is used for org-babel blocks. ~#+header:~ keywords combine header +fields and resolve conflicts by having the top right most (first line +last instance on the line) instance of field take priority. [fn::This +behavior is critical for org babel and code execution security. If +there are cases where aff keywords are not following this behavior +then they need to be fixed. The reason to do first one wins in cases +like this is so that users do not have to insert lines below which +lead to hard to understand diffs.] + +In addition, there are two situations in which the VALUEs will be concatenated: 1. If KEY is a member of ~org-element-dual-keywords~[fn:oedkw]. -2. If the affiliated keyword is an instance of the patten +2. If the affiliated keyword is an instance of the pattern =#+attr_BACKEND: VALUE=. The following example contains three affiliated keywords: @@ -296,16 +385,20 @@ The following example contains three affiliated keywords: :CUSTOM_ID: Greater_Elements :END: -Unless specified otherwise, greater elements can contain directly +Unless otherwise specified, greater elements can directly contain any greater or [[#Lesser_Elements][lesser element]] except: + Elements of their own type. + [[#Planning][Planning]], which may only occur in a [[#Headings][heading]]. -+ [[#Property_Drawers][Property drawers]], which may only occur in a [[#Headings][heading]] or the [[#Top_level_section][top level ++ [[#Property_Drawers][Property drawers]], which may only occur in a [[#Headings][heading]] or the [[#Zeroth_section][zeroth section]]. + [[#Node_Properties][Node properties]], which can only be found in [[#Property_Drawers][property drawers]]. + [[#Items][Items]], which may only occur in [[#Plain_Lists][plain lists]]. + [[#Table_Rows][Table rows]], which may only occur in [[#Tables][tables]]. + [fn::This is somewhat confusing because it lists combinations that +should already be impossible by default because e.g. items are meaningless +outside plain lists and should not even be mentioned outside of that context.] + *** Greater Blocks :PROPERTIES: :CUSTOM_ID: Greater_Blocks @@ -329,10 +422,14 @@ CONTENTS than a newline. + CONTENTS :: A collection of zero or more elements, subject to two conditions: - - No line may start with =#+end_NAME=. + - No line in the block may start with =#+end_NAME=. - Lines beginning with an asterisk must be quoted by a comma (=,*=). Furthermore, lines starting with =#+= may be quoted by a comma (=,#+=). + [fn::Implementation note: ~#+begin_name~ to ~#+end_name~ usually needs to +be implemented in the tokenization step. The substructure discussed here +is thus usually handled in a second pass.] + *** Drawers and Property Drawers :PROPERTIES: :CUSTOM_ID: Drawers @@ -349,6 +446,8 @@ CONTENTS and underscores (=-_=). + CONTENTS :: A collection of zero or more elements, except another drawer. + [fn:: ~:end:~ may be capitalized (legacy support)] + *** Dynamic Blocks :PROPERTIES: :CUSTOM_ID: Dynamic_Blocks @@ -366,12 +465,20 @@ CONTENTS + CONTENTS :: A collection of zero or more elements, except another dynamic block. + [fn::The spec needs to clarify how to handle ~#+begin:~ alone on a line or follow by +only whitespace. It is quite nasty to have the behavior of ~#+begin:~ change if it is +or is not followed by invisible whitespace. I suggest that we change the behavior of +~#+begin:~ without whitespace to regularize it so that it is _always_ the start of a +dynamic block since the ~#+begin:~ keyword by itself is pretty much completely useless +since if you put anything after it, it becomes the start of a dynamic block anyway.] + *** Footnote Definitions :PROPERTIES: :CUSTOM_ID: Footnote_Definitions :END: -Footnote definitions must occur at the start of an /unindented/ line, +Footnote definitions must occur at the start of an /unindented/ line +(they must be preceeded by only a newline, nothing else), and are structured according to the following pattern: #+begin_example [fn:LABEL] CONTENTS @@ -401,6 +508,13 @@ It even contains a single blank line. :CUSTOM_ID: Inlinetasks :END: + [fn::I suggest that we remove inlinetasks from this document. +They are a hack that cannot be implemented as part of a grammar +because they require a concrete value to be specified which breaks +the arbitrary nesting depth of headings. I think I wrote this somewhere +else as well, but inline tasks can only be a layer on top of headings, +they cannot displace them.] + Inlinetasks are syntactically a [[#Headings][heading]] with a level of at least ~org-inlinetask-min-level~[fn:oiml:The default value of ~org-inlinetask-min-level~ is =15=.], i.e. starting with at least that @@ -448,8 +562,8 @@ BULLET COUNTER-SET CHECK-BOX TAG CONTENTS character, or a hyphen enclosed by square brackets (i.e. =[ ]=, =[X]=, or =[-]=). + TAG (optional) :: An instance of the pattern =TAG-TEXT ::= where =TAG-TEXT= represents a string consisting of non-newline characters - that does not contain the substring "\nbsp{}::\nbsp{}" (two colons surrounded by - whitespace). + that does not contain the substring ~" :: "~ (two colons surrounded by + whitespace without the quotes). + CONTENTS (optional) :: A collection of zero or more elements, ending at the first instance of one of the following: - The next item. @@ -457,6 +571,22 @@ BULLET COUNTER-SET CHECK-BOX TAG CONTENTS not counting lines within other elements or [[#Inlinetasks][inlinetask]] boundaries. - Two consecutive blank lines. + [fn:: The description of CONTENTS is confusing since it cannot contain +a heading, which is implicit in the indentation rule but not +obvious. In addition, contents may not actually contain zero or more +elements because many elements must start on their own line. So +e.g. 1. #+begin_src does not work, however, the wording seems to +indicate that it should, which is misleading. Further, it is actually +not possible to implement contents as specified because grammars +cannot track the indentation level that is required to reconstruct +list items correctly. Therefore CONTENTS should not be defined as such +but should only specify that they can be anything except a newline. I +think that the intent of this document is somewhat a conflation of the +syntax for org and of the semantics as determined by export backends +and/or org-element, however it makes it extremely confusing because it +is not actually possible to parse CONTENTS, they must be reconstructed +from the parse tree.] + *Examples* #+begin_example @@ -471,11 +601,17 @@ BULLET COUNTER-SET CHECK-BOX TAG CONTENTS :END: A /plain list/ is a set of consecutive [[#Items][items]] of the same indentation. + [fn::This is confusing because the definition of contents above is +confusing, it also implies that plain lists cannot be nested, or are +not somehow nested, which is also confusing. Maybe a line to the effect +that plain lists may be nested along with any other element that is +properly indented or something?] If first item in a plain list has a COUNTER in its BULLET, the plain list will be an "ordered plain-list". If it contains a TAG, it will be a "descriptive list". Otherwise, it will be an "unordered list". -List types are mutually exclusive. +List types are mutually exclusive at the same level of indentation, if +both types are present consecutively then they parse as separate lists. For example, consider the following excerpt of an Org document: @@ -524,6 +660,13 @@ CONTENTS + CONTENTS :: A collection of zero or more [[#Node_Properties][node properties]], not separated by blank lines. + [fn::The failure mode for malformed contents needs to be +determined more clearly here. We don't want property draws to suddenly +become plain drawers just because a user has a malformed line, that +could be disastrous if certain settings in the property drawer mask +settings from further up the tree. In short, malformed contents +should not poison the whole property drawer.] + *Example* #+begin_example @@ -537,11 +680,24 @@ CONTENTS :CUSTOM_ID: Tables :END: + [fn::I think that this section needs to be split into two separate +sections one for each grammar. It will make it much easier to specify +each grammar, and it will also make it clear that they are not +syntactic elements that are trivially interchangeable since only a +subset of tables.el tables can be converted to org table syntax (at +the moment). I'm willing to take a shot at it.] + Tables are started by a line beginning with either: + A vertical bar (=|=), forming an "org" type table. + The string =+-= followed by a sequence of plus (=+=) and minus (=-=) signs, forming a "table.el" type table. + [fn::Consider whether tables.el tables should be supported by the +syntax outside of elisp org mode. There are some slightly divergent +use cases and features and we likely need/want to explore some of +the alternatives proposed for how to allow pure org tables to support +the features that are currently only possible for tables.el tables.] + Tables cannot be immediately preceded by such lines, as the current line would the be part of the earlier table. @@ -577,6 +733,11 @@ blocks]], [[#Paragraphs][paragraphs]] or [[#Table_Rows][table rows]] can contain :CUSTOM_ID: Babel_Call :END: + [fn::As with the other keyword-like things and syntax, I suggest that +we ultimately move babel calls to live under a section on keyword +content parsers so that it is clear that they should not be treated as +separate syntactic components.] + Babel calls are structured according to one of the following patterns: #+begin_example ,#+call: NAME(ARGUMENTS) @@ -593,11 +754,20 @@ Babel calls are structured according to one of the following patterns: non-newline characters. Opening and closing square brackets must be balanced. + [fn::Nesting rules for the parens and square brackets need revie here +and elsewhere. The "must be balanced" requirement is implemented with +an extremely nasty materialized regex which only works for 3 or 4 levels +of nesting and thus is really likely to not be what we want.] + *** Blocks :PROPERTIES: :CUSTOM_ID: Blocks :END: + [fn::These probably should not actually be distinct from greater blocks. +the syntax is the same, the only difference is that there are 5 types that +have special specified handling.] + Like [[#Greater_Blocks][greater blocks]], blocks are structured according to the following pattern: #+begin_example @@ -622,7 +792,8 @@ CONTENTS the pattern =LANGUAGE SWITCHES ARGUMENTS= with: + LANGUAGE :: A string consisting of any non-whitespace characters + SWITCHES :: Any number of SWITCH patterns, separated by a single - space character + space character [fn::For the love of all that is sane can we + please just remove this from the spec or mark it as legacy.] - SWITCH :: Either the pattern =-l "FORMAT"= where =FORMAT= represents a string consisting of any characters but a double quote (="=) or newline, or the pattern =-S= or =+S= where =S= @@ -631,7 +802,7 @@ CONTENTS + CONTENTS (optional) :: A string consisting of any characters (including newlines) subject to the same two conditions of greater block's CONTENTS, i.e. - - No line may start with =#+end_NAME=. + - No line in the block may start with =#+end_NAME=. - Lines beginning with an asterisk must be quoted by a comma (=,*=). As with greater blocks, lines starting with =#+= may be quoted by a comma (=,#+=). @@ -655,6 +826,10 @@ CONTENTS :CUSTOM_ID: Clocks :END: + [fn::This section seems to have been made way too simple? Or is the +specifically the clocking-clock? If it is the clocking-clock then +that should be clarified.] + A clock element is structured according to the following pattern: #+begin_example @@ -674,7 +849,6 @@ clock: INACTIVE-TIMESTAMP-RANGE DURATION clock: [2024-10-12] #+end_example - *** Diary Sexp :PROPERTIES: :CUSTOM_ID: Diary_Sexp @@ -701,6 +875,9 @@ A diary sexp[fn::A common abbreviation for S-expression] element is an :CUSTOM_ID: Planning :END: + [fn::I think this and property drawers should be moved to be closer +to the heading spec section?] + A planning element is structured according to the following pattern: #+begin_example @@ -709,13 +886,15 @@ PLANNING #+end_example + HEADING :: A [[#Headings][heading]] element. -+ PLANNING :: A line consisting of a series of =KEYWORD: TIMESTAMP= ++ PLANNING :: A line consisting of one or more =KEYWORD: TIMESTAMP= patterns (termed "info" patterns). - - KEYWORD :: Either the string =DEADLINE=, =SCHEDULED=, or =CLOSED=. + - KEYWORD :: Either the string =DEADLINE=, =SCHEDULED=, or =CLOSED=. [fn:: + Request to add the =OPENED= keyword to track when a task was first known/entered into a file.] - TIMESTAMP :: A [[#Timestamps][timestamp]] object. -It is not permitted for any blank lines to lie between HEADING and -PLANNING. +PLANNING must directly follow HEADING without any blank lines in between. + + [fn::Need a spec for how to handle multiple instances of the same keyword with different values.] *Example* @@ -742,7 +921,6 @@ Comments consist of one or more consecutive comment lines. # Over multiple lines #+end_example - *** Fixed Width Areas :PROPERTIES: :CUSTOM_ID: Fixed_Width_Areas @@ -773,6 +951,9 @@ consecutive hyphens (=-----=). :CUSTOM_ID: Keywords :END: + [fn::Reminder about regularizing keyword syntax so that it +always supports ~#+key[opt]:value~ syntax.] + Keywords are structured according to the following pattern: #+begin_example @@ -780,7 +961,11 @@ Keywords are structured according to the following pattern: #+end_example + KEY :: A string consisting of any non-whitespace characters, other - than =call= (which would forms a [[#Babel_Call][babel call]] element). + than =call= (which would forms a [[#Babel_Call][babel call]] element). [fn::This is + why I have the note on the ~#+call:~ section. If someone tries to + implement this they are going to be in a world of pain because there + is a concrete value here. This is because that distinction is not in + the syntax but instead should be in a later stage.] + VALUE :: A string consisting of any characters but a newline. When KEY is a member of ~org-element-parsed-keywords~[fn:oepkw], VALUE can contain @@ -791,7 +976,9 @@ Note that while instances of this pattern are preferentially parsed as keyword may occur so long as it is not immediately preceding a valid element that can be affiliated. For example, an instance of =#+caption: hi= followed by a blank line will be parsed as a keyword, -not an affiliated keyword. +not an affiliated keyword. [fn::A full spec for user defined aff keywords +will require a bit more clarity for how lonely affiliated keywords should +behave.] *** LaTeX Environments :PROPERTIES: @@ -838,6 +1025,16 @@ according to one of the following patterns: which does not end in a plus characters (=+=). + VALUE (optional) :: A string containing any characters but a newline. + [fn::This spec is not consistent with the behavior and has bad design. +Name should be allowed to be empty, same as with heading tags. It is +critical to include the empty string as part of a grammar like this so +that it is closed, otherwise we get nasty edge cases. For example it +should be the case that ~:+:~ is syntactically valid as a node +property. The fact that it won't apply to anything is ok, it might +also be useful if we regularize ~#+begin_NAME~ to allow the empty +string for NAME. Note that ~:+:~ is already treated as syntactically +valid for font locking and for property drawer detection (I think).] + *** Paragraphs :PROPERTIES: :CUSTOM_ID: Paragraphs @@ -850,11 +1047,21 @@ Empty lines and other elements end paragraphs. Paragraphs can contain the standard set of objects. + [fn::Implementation note: it is possible to define +paragraphs constructively instead of as they are defined +here as the negation or fall through of all other things. +We should update this section with the positive definition +once I have it nailed down.] + *** Table Rows :PROPERTIES: :CUSTOM_ID: Table_Rows :END: + [fn::I suggest we roll this up into the org tables section +spec so that we don't have to worry about making a note that +these only occur in tables.] + A table row consists of a vertical bar (=|=) followed by: + Any number of [[#Table_Cells][table cells]], forming a "standard" type row. + A hyphen (=-=), forming a "rule" type row. Any non-newline characters @@ -892,6 +1099,13 @@ such as a paragraph. :CUSTOM_ID: Entities :END: + [fn::As I think I mention elsewhere, the concrete names here +should NOT be part of the syntax, it makes the parser brittle +and hard to maintain. Differentiation between entities and fragments +should be handled at the syntax level for cases where the fragment +has brackets, and then in a second pass for values that are +syntactically entity-or-fragment and must be determined after +the fact.] Entities are structured according to the following pattern: #+begin_example @@ -987,6 +1201,7 @@ ought to be removed. They are slow to parse, fragile, redundant and imply false positives. --- ngz #+end_quote + [fn::Strong support for removing these.] ** Export Snippets :PROPERTIES: @@ -1002,6 +1217,12 @@ Export snippets are structured according to the following pattern: + BACKEND :: A string consisting of alphanumeric characters and hyphens. + VALUE (optional) :: A string containing anything but the string =@@=. + [fn::We probably want to node that BACKEND can be the empty string +per that thread on how to deal with intra-word markup. Again this +also touches on the general principle of wanting to close over the +empty string so that users aren't surprised when ~@@:lol@@~ suddenly +appears in plain text just because no backend was specified.] + ** Footnote References :PROPERTIES: :CUSTOM_ID: Footnote_References @@ -1019,14 +1240,17 @@ Footnote references are structured according to one of the following patterns: hyphens and underscores (=-_=). + DEFINITION (optional) :: A series of objects from the standard set, so long as opening and closing square brackets are balanced within - DEFINITION. + DEFINITION. [fn::As noted elsewhere, the balanced brackets + requirement is a nightmare and needs a review.] If the reference follows the second pattern, it is called an "inline footnote". If it follows the third pattern, i.e. if LABEL is omitted, it is called an "anonymous footnote". Note that the first pattern may not occur on an /unindented/ line, as it -is then a [[#Footnote_Definitions][footnote definition]]. +is then a [[#Footnote_Definitions][footnote definition]]. [fn::I'm not sure this is quite right? +the font locking code is not consistent with actual behavior, need to +review the laundry test cases and example files.] ** Citations :PROPERTIES: @@ -1127,7 +1351,8 @@ src_LANG[HEADERS]{BODY} + LANG :: A string consisting of any non-whitespace characters. + HEADERS (optional), BODY (optional) :: A string consisting of any characters but a newline. Opening and closing square brackets must - be balanced. + be balanced. [fn::Nesting issues need review. Suggestion to do + something like what Racket scribble does.] ** Line Breaks :PROPERTIES: @@ -1366,6 +1591,10 @@ SIGN CHARS FINAL :CUSTOM_ID: Table_Cells :END: + [fn::Need to condense this with tables and table rows because +spreading these out makes it super hard to understand the table syntax +for basically no reason.] + Table cells are structured according to the following pattern: #+begin_example @@ -1375,7 +1604,12 @@ CONTENTS SPACES| + CONTENTS :: A series of objects not containing the vertical bar character (=|=). It can contain the minimal set of objects, [[#Citations][citations]], [[#Export_Snippets][export snippets]], [[#Footnote_References][footnote references]], [[#Links][links]], [[#Macros][macros]], - [[#Targets_and_Radio_Targets][radio targets]], [[#Targets_and_Radio_Targets][targets]], and [[#Timestamps][timestamps]]. + [[#Targets_and_Radio_Targets][radio targets]], [[#Targets_and_Radio_Targets][targets]], and [[#Timestamps][timestamps]]. [fn::Like for the comma in + macros, I think it would be safe to add ~\|~ as an escape character. + The issue in the elisp implementation is not actually at the level + of the syntax, but is actually in the export backends or somewhere + deeper, because even using a macro that expands to be a pipe ~|~ + breaks the table (which is really bad).] + SPACES :: A string consisting of zero or more of space characters, used to align the table columns. @@ -1386,6 +1620,10 @@ The final vertical bar (=|=) may be omitted in the last cell of a row. :CUSTOM_ID: Timestamps :END: + [fn::I have some suggestions for extensions to timestamp syntax to +support historical and far future dates, as well timezone offsets (NOT +the 3 letter ambiguous disaster) and seconds and sub-second times.] + Timestamps are structured according to one of the seven following patterns: #+begin_example @@ -1470,6 +1708,10 @@ BORDER BODY BORDER + [[#Special_Tokens][POST]] :: Either a whitespace character, =-=, =.=, =,=, =;=, =:=, =!=, =?=, ='=, =)=, =}=, =[=, ="=, or the end of a line. +The four =*/_+= may be arbitrarily nested to any depth. Verbatim and +code ==~= may be nested inside any other markup, but no other markup +will be interpreted inside of them since they are interpreted exactly. + *Examples* #+begin_example @@ -1483,12 +1725,16 @@ functions starting with ~org-element-~. :CUSTOM_ID: Plain_Text :END: + [fn::I'm not sure I would add this, the fall through is sloppy +and it is better to specify values constructively.] + Any string that doesn't match any other object can be considered a plain text object.[fn::In ~org-element.el~ plain text objects are abstracted away to strings for performance reasons.] Within a plain text object, all whitespace is collapsed to a single space. For instance, =hello\n there= is equivalent to =hello there=. + * Footnotes [fn:1] In particular, the parser requires stars at column 0 to be @@ -1497,7 +1743,10 @@ quoted by a comma when they do not define a heading. [fn:2] It also means that only headings and sections can be recognized just by looking at the beginning of the line. Planning lines and property drawers can be recognized by looking at one or two lines -above. +above. [fn::This is incorrect. There are many elements that can be +recognized by looking at the start of a line, however the conflation +between pure syntax level and other levels of parsing and processing +obscure this.] As a consequence, using ~org-element-at-point~ or ~org-element-context~ will move up to the parent heading, and parse top-down from there @@ -1571,6 +1820,20 @@ until context around the original location is found. :CUSTOM_ID: Entities_List :END: + [fn::The org entities section is useful, but I suggest not including +that section at all right now. There is a way to define and abstract +syntax that does not require the parser to pull in all those concrete +forms which reflects how org mode implements that functionality but +should not be specified as part of the syntax document. There are some +significant edge cases that need to be worked out in the grammar for +this that having a hardcoded list masks. I suggest we work that +portion out before committing any of that to a spec doc. It is also a +bad idea to list of all of those in the spec doc because it will +likely get out of sync with the code that implements such detection in +elisp (despite the fact that the list is being auto generated via a +code block). Maybe it makes sense to include the code block so that +devs and users can discover it for themselves?] + #+begin_src emacs-lisp :results raw :exports results (concat "| Name | Character |\n|-\n" (mapconcat -- 2.34.1