Org Syntax Specification

emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed

* Org Syntax Specification
@ 2022-01-09 18:02 Timothy
  2022-01-15 12:40 ` Sébastien Miquel
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Timothy @ 2022-01-09 18:02 UTC (permalink / raw)
  To: org-mode-email, mail

[-- Attachment #1.1: Type: text/plain, Size: 2554 bytes --]

Hi All,

I’ve talked about adding citation syntax to the org-syntax document before, and
previously expressed the thought that it could be generally improved quite a
bit. This has culminated me in spending the last few days straight working on a
rewrite of org-syntax.org to try to bring it closer to the point where we can
knock “(draft)” out of the title 🙂.

Ihor has been a tremendous help pointing out inaccuracies and explaining some of
the parsing behaviour (thanks!), which has allowed me to get it to a point where
I think it would benefit from wider feedback.

I’ve just pushed my latest revision to worg as
<https://orgmode.org/worg/dev/org-syntax-edited.html>. Personally though, I think
it’s best viewed as a PDF, so I’ve also uploaded the PDF export to
<https://0x0.st/oiM5.pdf>.

It would be great if those of you with an interest/understanding of Org’s syntax
could have a look and let me know what you think. I think the best way to
compare to the current org-syntax.org would be to put them side-by-side. I’ve
attempted to list the main changes I’ve made in the appendix, however I’ve
likely missed things.

Lastly, having spent a while looking at the syntax, I’m wondering if we should
take this opportunity to mark some of the syntactic elements we’ve become less
happy with as *(depreciated)*. I’m specifically thinking of the TeX-style LaTeX
fragments which have been a bit of a pain. To quote Nicolas in org-syntax.org:
      It would introduce incompatibilities with previous Org versions,
      but support for `$...$' (and for symmetry, `$$...$$') constructs
      ought to be removed.

      They are slow to parse, fragile, redundant and imply false
      positives.  — ngz

Marking this as depreciated would have no effect on Org’s current behaviour, but
we could:
1. Mark as depreciated now-ish
2. Add a utility to convert from TeX-style to LaTeX-style
3. Add org lint/fortification warnings
4. A while later (half a decade? more?) actually remove support

The other component of the syntax which feels particularly awkward to me is
source block switches. They seem a bit odd, and since arguments exist,
completely redundant.

――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

That’s all for now, I hope you all had a great Christmas and new year!

All the best,
Timothy

[-- Attachment #1.2: Type: text/html, Size: 7068 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Org Syntax Specification
  2022-01-09 18:02 Org Syntax Specification Timothy
@ 2022-01-15 12:40 ` Sébastien Miquel
  2022-01-15 16:36   ` Depreciating TeX-style LaTeX fragments (was: Org Syntax Specification) Timothy
  2022-01-18  0:54 ` Org Syntax Specification Tom Gillespie
  2022-09-25  9:09 ` Bastien
  2 siblings, 1 reply; 22+ messages in thread
From: Sébastien Miquel @ 2022-01-15 12:40 UTC (permalink / raw)
  To: Timothy, org-mode-email

[-- Attachment #1: Type: text/plain, Size: 1598 bytes --]

Hi,

The new document seems much clearer. It makes a nice complement to the
manual and we should definitely lose the (draft). Thank you Timothy
for the work.

> Lastly, having spent a while looking at the syntax, I’m wondering if 
> we should take this opportunity to mark some of the syntactic elements 
> we’ve become less happy with as *(depreciated)*. I’m specifically 
> thinking of the TeX-style LaTeX fragments which have been a bit of a 
> pain. To quote Nicolas in org-syntax.org:
>
>     It would introduce incompatibilities with previous Org versions,
>     but support for |$...$| (and for symmetry, |$$...$$|) constructs
>     ought to be removed.
>
>     They are slow to parse, fragile, redundant and imply false
>     positives. — ngz
>

This quote has been mentioned a few times lately, and no one has yet
spoken in favor of the $…$ syntax, so I'll have a quick go.

It is easier to use, easier to read and more commonly used (and known)
in tex documents (a quick web search for sample tex documents confirms
the latter). Removing this syntax would make org slightly harder to
pick up, with respect to writing scientific documents.

As for the listed shortcomings, I don't think we know whether its
slowness is significant and false positives can be avoided by using
the \dollar entity (possibly ?). In my own use, the only usability
issue I can think of is false negatives while writing : inserting a
space or other such characters at the end of a snippet removes the
fontification (I solve this by modifying the fontification regexps).

Regards,

-- 
Sébastien Miquel

[-- Attachment #2: Type: text/html, Size: 3541 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Depreciating TeX-style LaTeX fragments (was: Org Syntax Specification)
  2022-01-15 12:40 ` Sébastien Miquel
@ 2022-01-15 16:36   ` Timothy
  2022-01-16  8:08     ` Sébastien Miquel
  2022-01-16 12:10     ` Eric S Fraga
  0 siblings, 2 replies; 22+ messages in thread
From: Timothy @ 2022-01-15 16:36 UTC (permalink / raw)
  To: sebastien.miquel; +Cc: org-mode-email


[-- Attachment #1.1: Type: text/plain, Size: 3415 bytes --]

Hi Sebastien,

Thanks for your comments, and your thoughts on the proposed deprecation.

It’s worth explicitly considering why we wouldn’t want to steer people away from
the TeX-syntax LaTeX fragments, so I am glad you have brought up some reasons.
I do not find myself agreeing with them however, and will endeavour to explain
why below.

⁃ It is easier to use
  • Hmm. Not sure about this. Keystroke wise we’re comparing `$$' to `\('. The
    latter can be completed by smartparens, but since single dollars are
    reasonable Org content the former can’t. At this point the only argument is
    muscle memory, and if you’re a LaTeX user (a good target audience for LaTeX
    fragments I think), I’d expect LaTeX-style `\(' to be more familiar.
⁃ Easier to read
  • I had a quick look at a document to gauge this for myself, and if anything I
    found the opposite (see <https://0x0.st/o-32.png>). This may be influenced by
    a minor fontification tweak I made to LaTeX style input though.
⁃ more commonly used (and known) in tex documents (a quick web search for sample
  tex documents confirms the latter).
⁃ Removing this syntax would make org slightly harder to pick up, with respect
  to writing scientific documents.
  • With respect to writing scientific documents, I think we can reasonably
    expect people to be familiar with `\(', particularly given the points I raise
    below.

These points seem to have a common thread in wanting to have Org be like LaTeX.
I find this sensible, but I think this is a good opportunity to point out that
$/$$ are very much second class citizens in LaTeX now, no matter what you may
see in old documents.

To quote from David Carlisle (one of the main members of the LaTeX3 team) on [tex.stackexchange]:
> $$ is TeX primitive syntax, which, as others have commented is hard to
> redefine (in classic TeX there is no command name which triggers entering or
> leaving display math).
> LaTeX doesn’t officially support $$. The most noticeable failure if you use
> the syntax is that the fleqn option will no longer affect the display of the
> mathematics, it will remain centered rather than being set flush left.

Another member of the LaTeX3 team, Joseph Wright, has made even stronger
comments about $-syntax on [tex.stackexchange]:
> I’d note with my ’LaTeX3’ hat on that there is a strong chance we’ll favour `\(
> ... \)' to the point of not supporting `$...$' for LaTeX3. So in the long term it
> might be best to get used to `\(...\)'.

In further comments Joseph goes on to say that it is likely that $-syntax will
/not/ be dropped outright, but that $$ likely will be. Among other
things the $-syntax produces worse error reporting and spacing.

So, to sum up LaTeX currently prefers `\(...\)' / `\[...\]' over `$' / `$$', and it
looks like people will be pushed more strongly in this direction in future.

More than anything else, I think this demonstrates why aside from annoyances
with the parsing, purely from a user perspective, it would make sense to favour
LaTeX-syntax LaTeX fragments.

All the best,
Timothy


[tex.stackexchange] <https://tex.stackexchange.com/questions/503/why-is-preferable-to>

[tex.stackexchange] <https://tex.stackexchange.com/questions/510/are-and-preferable-to-dollar-signs-for-math-mode?noredirect=1&lq=1#comment2607_513>

[-- Attachment #1.2: Type: text/html, Size: 13813 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Depreciating TeX-style LaTeX fragments (was: Org Syntax Specification)
  2022-01-15 16:36   ` Depreciating TeX-style LaTeX fragments (was: Org Syntax Specification) Timothy
@ 2022-01-16  8:08     ` Sébastien Miquel
  2022-01-16  9:23       ` Depreciating TeX-style LaTeX fragments Martin Steffen
  2022-01-16  9:46       ` Colin Baxter 😺
  2022-01-16 12:10     ` Eric S Fraga
  1 sibling, 2 replies; 22+ messages in thread
From: Sébastien Miquel @ 2022-01-16  8:08 UTC (permalink / raw)
  To: Timothy; +Cc: org-mode-email

Hi,

With respect to readability, I only mean to point out that the $…$
syntax is one less character, and that the \(\) characters are quite
overloaded.

> this is a good opportunity to point out that $/$$ are very much second 
> class citizens in LaTeX now, no matter what you may see in old documents. 

The posts that you quote are 10 years old. As per [0] (2020), there
will be no LaTeX3. Nor is it only old documents that use the $…$
syntax : looking for learning ressources (see [1]), everything that I
find uses it. That includes The Not So Short Introduction to LaTeX [2]
(2021) and https://en.wikibooks.org/wiki/LaTeX/Mathematics.

Although I have no evidence of this, my expectation is that the
majority of tex users use the $…$ syntax (it is in fact widely used
outside of tex: in most markdown flavors and texmacs for example). I
also expect that a significant proportion of tex users are not aware
of the \(…\) syntax. I think here of users that are less tech literate
than most of this mailing list.

Regards,

[0]: 
https://www.latex-project.org/publications/2020-FMi-TUB-tb128mitt-quovadis.pdf
[1]: 
https://tex.stackexchange.com/questions/11/what-are-good-learning-resources-for-a-latex-beginner
[2]: https://ctan.tetaneutral.net/info/lshort/english/lshort.pdf

-- 
Sébastien Miquel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Depreciating TeX-style LaTeX fragments
  2022-01-16  8:08     ` Sébastien Miquel
@ 2022-01-16  9:23       ` Martin Steffen
  2022-01-16  9:46       ` Colin Baxter 😺
  1 sibling, 0 replies; 22+ messages in thread
From: Martin Steffen @ 2022-01-16  9:23 UTC (permalink / raw)
  To: emacs-orgmode

Hi

to add my two cents. I am latex user of _many_ years (as user of emacs +
org), and I use it often for math-loaded texts.

I do use $ (I actually did not even know that \( \) is (supposed to be)
the new way  until I saw it generated by org.

As for $$ (or \[), I basically don't use it. I use
begin/end{displaymath}.

I don't care that it's a lot to type in, as I use an editor, that
assists me (said emacs ;-) resp. auc-tex mode).

I like the keybindings for environments there (and with the usual prefix
C-u C-x C-e, one can for instance turn a display-math into an equation,
should one decide later).

$$ I never used. The display-math simply looks nicer and is better
supported by auc-tex in that it uses standard indentation for
environments. For me it's likewise important that the text is properly
indented, and highlighted, so I can read the source file with easy,
while working on it.

Also \[ \] does proper indentation, but as said, I got used to C-x C-e
and that produces for me displaymath (probably it can be customized, but
I am happy with it as is).

Martin

>>>>> "Sébastien" == Sébastien Miquel <sebastien.miquel@posteo.eu> writes:

    Sébastien> Hi,

    Sébastien> With respect to readability, I only mean to point out
    Sébastien> that the $…$ syntax is one less character, and that the
    Sébastien> \(\) characters are quite overloaded.

    >> this is a good opportunity to point out that $/$$ are very much
    >> second class citizens in LaTeX now, no matter what you may see in
    >> old documents.

    Sébastien> The posts that you quote are 10 years old. As per [0]
    Sébastien> (2020), there will be no LaTeX3. Nor is it only old
    Sébastien> documents that use the $…$ syntax : looking for learning
    Sébastien> ressources (see [1]), everything that I find uses
    Sébastien> it. That includes The Not So Short Introduction to LaTeX
    Sébastien> [2] (2021) and
    Sébastien> https://en.wikibooks.org/wiki/LaTeX/Mathematics.

    Sébastien> Although I have no evidence of this, my expectation is
    Sébastien> that the majority of tex users use the $…$ syntax (it is
    Sébastien> in fact widely used outside of tex: in most markdown
    Sébastien> flavors and texmacs for example). I also expect that a
    Sébastien> significant proportion of tex users are not aware of the
    Sébastien> \(…\) syntax. I think here of users that are less tech
    Sébastien> literate than most of this mailing list.

    Sébastien> Regards,

    Sébastien> [0]:
    Sébastien> https://www.latex-project.org/publications/2020-FMi-TUB-tb128mitt-quovadis.pdf
    Sébastien> [1]:
    Sébastien> https://tex.stackexchange.com/questions/11/what-are-good-learning-resources-for-a-latex-beginner
    Sébastien> [2]:
    Sébastien> https://ctan.tetaneutral.net/info/lshort/english/lshort.pdf

    Sébastien> -- Sébastien Miquel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Depreciating TeX-style LaTeX fragments
  2022-01-16  8:08     ` Sébastien Miquel
  2022-01-16  9:23       ` Depreciating TeX-style LaTeX fragments Martin Steffen
@ 2022-01-16  9:46       ` Colin Baxter 😺
  2022-01-16 11:11         ` Tim Cross
                           ` (2 more replies)
  1 sibling, 3 replies; 22+ messages in thread
From: Colin Baxter 😺 @ 2022-01-16  9:46 UTC (permalink / raw)
  To: Sébastien Miquel; +Cc: org-mode-email, Timothy

>>>>> Sébastien Miquel <sebastien.miquel@posteo.eu> writes:

    > Hi, With respect to readability, I only mean to point out that the
    > $…$ syntax is one less character, and that the \(\) characters are
    > quite overloaded.

Indeed. Compare something like

$g=\lim_{\delta m\to 0}(\delta F/\delta m)$

with

\(g=\lim_{\delta m\to 0}(\delta F/\delta m)\)

Backslash city! I know which one I'd prefer to read.

    >> this is a good opportunity to point out that $/$$ are very much
    >> second class citizens in LaTeX now, no matter what you may see in
    >> old documents.

    > The posts that you quote are 10 years old. As per [0] (2020),
    > there will be no LaTeX3. Nor is it only old documents that use the
    > $…$ syntax : looking for learning ressources (see [1]), everything
    > that I find uses it. That includes The Not So Short Introduction
    > to LaTeX [2] (2021) and
    > https://en.wikibooks.org/wiki/LaTeX/Mathematics.

Ah, LaTeX3 - whatever happened to that?

    > Although I have no evidence of this, my expectation is that the
    > majority of tex users use the $…$ syntax (it is in fact widely
    > used outside of tex: in most markdown flavors and texmacs for
    > example). I also expect that a significant proportion of tex users
    > are not aware of the \(…\) syntax. I think here of users that are
    > less tech literate than most of this mailing list.

Agreed.

Best wishes,


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Depreciating TeX-style LaTeX fragments
  2022-01-16  9:46       ` Colin Baxter 😺
@ 2022-01-16 11:11         ` Tim Cross
  2022-01-16 13:26         ` Juan Manuel Macías
  2022-01-16 17:45         ` Rudolf Adamkovič
  2 siblings, 0 replies; 22+ messages in thread
From: Tim Cross @ 2022-01-16 11:11 UTC (permalink / raw)
  To: emacs-orgmode

Colin Baxter 😺 <m43cap@yandex.com> writes:

>>>>>> Sébastien Miquel <sebastien.miquel@posteo.eu> writes:
>
>     > Hi, With respect to readability, I only mean to point out that the
>     > $…$ syntax is one less character, and that the \(\) characters are
>     > quite overloaded.
>
> Indeed. Compare something like
>
> $g=\lim_{\delta m\to 0}(\delta F/\delta m)$
>
> with
>
> \(g=\lim_{\delta m\to 0}(\delta F/\delta m)\)
>
> Backslash city! I know which one I'd prefer to read.
>
>     >> this is a good opportunity to point out that $/$$ are very much
>     >> second class citizens in LaTeX now, no matter what you may see in
>     >> old documents.
>
>     > The posts that you quote are 10 years old. As per [0] (2020),
>     > there will be no LaTeX3. Nor is it only old documents that use the
>     > $…$ syntax : looking for learning ressources (see [1]), everything
>     > that I find uses it. That includes The Not So Short Introduction
>     > to LaTeX [2] (2021) and
>     > https://en.wikibooks.org/wiki/LaTeX/Mathematics.
>
> Ah, LaTeX3 - whatever happened to that?
>
>     > Although I have no evidence of this, my expectation is that the
>     > majority of tex users use the $…$ syntax (it is in fact widely
>     > used outside of tex: in most markdown flavors and texmacs for
>     > example). I also expect that a significant proportion of tex users
>     > are not aware of the \(…\) syntax. I think here of users that are
>     > less tech literate than most of this mailing list.
>
> Agreed.
>
> Best wishes,

While I can see the advantages of $..$ for equations, I think we also
need to keep in mind that org mode is NOT a latex or tex editing mode.
While it is excellent at providing a higher level abstraction which
works well with Latex, other considerations also need to come into play,
especially with respect to efficient and consistent parsing of org mode
syntax. From that perspective, $...$ seem to add complexity which is
making it much harder to get consistency and efficiency in parsing and
processing things like font locking, indentation etc.

The question then becomes "Is the slight reduction in typing and/or
possibly more readable $..$ syntax sufficient justification for more
complex and difficult to maintain code for parsing, font-locking and
indentation/filling? Furthermore, could not the readability issue be
even further enhanced with the \[...\] syntax if we are able to parse
the contents more reliably/efficiently and possibly provide other
mechanisms to improve readability of math/formula? (i.e. better
font-locking, hiding of delimiters etc).

I'm not convinced arguments regarding what authors familiar with writing
in Tex/Latex are familiar with is terribly relevant to org mode. There
are already things in org mode which are inconsistent with what you
would write in pure Tex/Latex and as mentioned, org mode is not just a
front-end for writing Tex/Latex documents. Org has its own flavoured
markup and we should work towards making the syntax of that markup as
consistent, clean and verifiable as possible. 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Depreciating TeX-style LaTeX fragments
  2022-01-15 16:36   ` Depreciating TeX-style LaTeX fragments (was: Org Syntax Specification) Timothy
  2022-01-16  8:08     ` Sébastien Miquel
@ 2022-01-16 12:10     ` Eric S Fraga
  2022-01-16 14:30       ` Anthony Cowley
  1 sibling, 1 reply; 22+ messages in thread
From: Eric S Fraga @ 2022-01-16 12:10 UTC (permalink / raw)
  To: Timothy; +Cc: sebastien.miquel, org-mode-email

On Sunday, 16 Jan 2022 at 00:36, Timothy wrote:
>         Hmm. Not sure about this. Keystroke wise we’re comparing $$
>         to \(. The latter can be completed by smartparens, but since
>         single dollars are reasonable Org content the former can’t.
>         At this point the only argument is muscle memory, and if

As an aside, I will suggest including the following code in your Emacs
customization:

#+begin_src emacs-lisp :tangle "esf-org.el"
  ;; from Nicolas Richard <theonewiththeevillook@yahoo.fr>
  ;; Date: Fri, 8 Mar 2013 16:23:02 +0100
  ;; Message-ID: <87vc913oh5.fsf@yahoo.fr>
  (defun yf/org-electric-dollar nil
    "When called once, insert \\(\\) and leave point in between.
  When called twice, replace the previously inserted \\(\\) by one $."
         (interactive)
         (if (and (looking-at "\\\\)") (looking-back "\\\\("))
             (progn (delete-char 2)
                    (delete-char -2)
                    (insert "$"))
           (insert "\\(\\)")
           (backward-char 2)))
  (define-key org-mode-map (kbd "$") 'yf/org-electric-dollar)
#+end_src

I've been using this for years now and it works very well: I also had
$...$ in my muscle memory.

The only time it can be annoying is if you wish to edit/write org table
expressions directly instead of using org's features for this, such as
editing the equation (C-c ') or inserting one (C-c = with or without
C-u).

-- 
: Eric S Fraga, with org release_9.5.2-306-g9623da in Emacs 29.0.50


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Depreciating TeX-style LaTeX fragments
  2022-01-16  9:46       ` Colin Baxter 😺
  2022-01-16 11:11         ` Tim Cross
@ 2022-01-16 13:26         ` Juan Manuel Macías
  2022-01-16 14:43           ` Colin Baxter 😺
  2022-01-16 17:45         ` Rudolf Adamkovič
  2 siblings, 1 reply; 22+ messages in thread
From: Juan Manuel Macías @ 2022-01-16 13:26 UTC (permalink / raw)
  To: Colin Baxter, Timothy, Sébastien Miquel; +Cc: orgmode

Colin Baxter writes:

> Ah, LaTeX3 - whatever happened to that?

If you're a LaTeX user, you're already using LaTeX3 to a very high
extent, even if you don't see it. The current idea is not to replace
LaTeX2e with LaTeX3 as a new version, but to gradually incorporate
elements of LaTeX3 into the LaTeX kernel, like the new syntax, xparse,
etc. LaTeX3 is already present in many aspects of LaTeX, and that is an
undeniable advance. If anyone is interested in the state of the art,
this short talk by Frank Mittelbach at TUG 2020 is very illustrative:

https://invidious.snopyta.org/watch?v=zNci4lcb8Vo

Best regards,

Juan Manuel 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Depreciating TeX-style LaTeX fragments
  2022-01-16 12:10     ` Eric S Fraga
@ 2022-01-16 14:30       ` Anthony Cowley
  0 siblings, 0 replies; 22+ messages in thread
From: Anthony Cowley @ 2022-01-16 14:30 UTC (permalink / raw)
  To: Eric S Fraga; +Cc: sebastien.miquel, org-mode-email, Timothy



> On Jan 16, 2022, at 7:13 AM, Eric S Fraga <e.fraga@ucl.ac.uk> wrote:
> 
> On Sunday, 16 Jan 2022 at 00:36, Timothy wrote:
>>        Hmm. Not sure about this. Keystroke wise we’re comparing $$
>>        to \(. The latter can be completed by smartparens, but since
>>        single dollars are reasonable Org content the former can’t.
>>        At this point the only argument is muscle memory, and if
> 
> As an aside, I will suggest including the following code in your Emacs
> customization:
> 
> #+begin_src emacs-lisp :tangle "esf-org.el"
>  ;; from Nicolas Richard <theonewiththeevillook@yahoo.fr>
>  ;; Date: Fri, 8 Mar 2013 16:23:02 +0100
>  ;; Message-ID: <87vc913oh5.fsf@yahoo.fr>
>  (defun yf/org-electric-dollar nil
>    "When called once, insert \\(\\) and leave point in between.
>  When called twice, replace the previously inserted \\(\\) by one $."
>         (interactive)
>         (if (and (looking-at "\\\\)") (looking-back "\\\\("))
>             (progn (delete-char 2)
>                    (delete-char -2)
>                    (insert "$"))
>           (insert "\\(\\)")
>           (backward-char 2)))
>  (define-key org-mode-map (kbd "$") 'yf/org-electric-dollar)
> #+end_src
> 
> I've been using this for years now and it works very well: I also had
> $...$ in my muscle memory.

This is a really helpful snippet, but I tried it out for a while a previous time this issue came up and found the readability of equations took too much of a hit. The “backslash city” really is tough to visually parse. Backslash density is already an unfortunate bit of the LaTeX experience, and I didn’t get used to the extra slashed characters as bookends over a two week trial.

I still wanted to express my appreciation for you sharing this!

Anthony

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Depreciating TeX-style LaTeX fragments
  2022-01-16 13:26         ` Juan Manuel Macías
@ 2022-01-16 14:43           ` Colin Baxter 😺
  2022-01-16 15:16             ` Greg Minshall
  0 siblings, 1 reply; 22+ messages in thread
From: Colin Baxter 😺 @ 2022-01-16 14:43 UTC (permalink / raw)
  To: Juan Manuel Macías; +Cc: Sébastien Miquel, orgmode, Timothy

>>>>> Juan Manuel Macías <maciaschain@posteo.net> writes:

    > Colin Baxter writes:
    >> Ah, LaTeX3 - whatever happened to that?

    > If you're a LaTeX user, you're already using LaTeX3 to a very high
    > extent, even if you don't see it. The current idea is not to
    > replace LaTeX2e with LaTeX3 as a new version, but to gradually
    > incorporate elements of LaTeX3 into the LaTeX kernel, like the new
    > syntax, xparse, etc. LaTeX3 is already present in many aspects of
    > LaTeX, and that is an undeniable advance. If anyone is interested
    > in the state of the art, this short talk by Frank Mittelbach at
    > TUG 2020 is very illustrative:

Yes, I know. My remark was tongue in cheek.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Depreciating TeX-style LaTeX fragments
  2022-01-16 14:43           ` Colin Baxter 😺
@ 2022-01-16 15:16             ` Greg Minshall
  0 siblings, 0 replies; 22+ messages in thread
From: Greg Minshall @ 2022-01-16 15:16 UTC (permalink / raw)
  To: Colin Baxter 😺
  Cc: Juan Manuel Macías, Sébastien Miquel, orgmode, Timothy

Colin,

>     > Colin Baxter writes:
>     >> Ah, LaTeX3 - whatever happened to that?
...
> Yes, I know. My remark was tongue in cheek.

which leaves open whether your tongue was already in your cheek at:

> Indeed. Compare something like
> 
> $g=\lim_{\delta m\to 0}(\delta F/\delta m)$
> 
> with
> 
> \(g=\lim_{\delta m\to 0}(\delta F/\delta m)\)

?

additionally, fwiw, i was a long time '$...$'-user.  at one point i was
betrayed, and switched to '\(...\)'.  it may be more to type (i hadn't
noticed the suggestion Eric just sent in), but i liked the
repeatability.  and, in terms of parsing, i'm very sympathetic to having
"directional" end markers.

cheers, Greg


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Depreciating TeX-style LaTeX fragments
  2022-01-16  9:46       ` Colin Baxter 😺
  2022-01-16 11:11         ` Tim Cross
  2022-01-16 13:26         ` Juan Manuel Macías
@ 2022-01-16 17:45         ` Rudolf Adamkovič
  2 siblings, 0 replies; 22+ messages in thread
From: Rudolf Adamkovič @ 2022-01-16 17:45 UTC (permalink / raw)
  To: Colin Baxter 😺, Sébastien Miquel; +Cc: org-mode-email, Timothy

Colin Baxter 😺 <m43cap@yandex.com> writes:

> \(g=\lim_{\delta m\to 0}(\delta F/\delta m)\)
>
> Backslash city! I know which one I'd prefer to read.

Further, in-text single-letter variables that permeate mathematical
writing, and I think everyone would agree that $k$ reads well.  Alas, as
soon as one needs to write $k$-th, it stops working and one must rewrite
as \(k\)-th.  So, one often ends up using both ways anyway, right?

Rudy

-- 
"Logic is a science of the necessary laws of thought, without which no
employment of the understanding and the reason takes place." -- Immanuel
Kant, 1785

Rudolf Adamkovič <salutis@me.com> [he/him]
Studenohorská 25
84103 Bratislava
Slovakia


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Org Syntax Specification
  2022-01-09 18:02 Org Syntax Specification Timothy
  2022-01-15 12:40 ` Sébastien Miquel
@ 2022-01-18  0:54 ` Tom Gillespie
  2022-01-18 12:09   ` Ihor Radchenko
  2022-09-25  9:09 ` Bastien
  2 siblings, 1 reply; 22+ messages in thread
From: Tom Gillespie @ 2022-01-18  0:54 UTC (permalink / raw)
  To: Timothy; +Cc: org-mode-email, Nicolas Goaziou

[-- Attachment #1: Type: text/plain, Size: 886 bytes --]

Hi Timothy,
    I have attached a patch with some modifications and a bunch of
comments (as footnotes). More replies in line. Thank you for all your
work on this!
Tom

> Marking this as depreciated would have no effect on Org’s current behaviour, but we could:
>
> Mark as depreciated now-ish
> Add a utility to convert from TeX-style to LaTeX-style
> Add org lint/fortification warnings
> A while later (half a decade? more?) actually remove support

In favor of this. There are good alternatives for this now.

> The other component of the syntax which feels particularly awkward to me is source block switches. They seem a bit odd, and since arguments exist, completely redundant.

Extremely in favor of removing switches. There are so many better ways
to do this now that aren't like some eldritch unix horror crawling up
out of the abyss and into the eBNF :)

[-- Attachment #2: 0001-Tom-s-comments-and-modifications-to-org-syntax-edite.patch --]
[-- Type: text/x-patch, Size: 41243 bytes --]

From 3527331f02e593ec6ba6cb4c8bde3f64de3ad216 Mon Sep 17 00:00:00 2001
From: Tom Gillespie <tgbugs@gmail.com>
Date: Mon, 17 Jan 2022 19:34:21 -0500
Subject: [PATCH] Tom's comments and modifications to org syntax edited

I removed any mention of markdown because it is a distraction in this
document and is not something we want anyone attending to here.

I change "top level section" to "zeroth section" which I think is more
consistent terminology because level is often used to refer to the
depth of parsing at any given point in the file and the top level
refers to anything that can be parsed without context. Zeroth makes it
clear that we are talking about the actual zeroth occurrence of a
section in a file/buffer/stream.
---
 dev/org-syntax-edited.org | 399 +++++++++++++++++++++++++++++++-------
 1 file changed, 331 insertions(+), 68 deletions(-)

diff --git a/dev/org-syntax-edited.org b/dev/org-syntax-edited.org
index c3259473..2e99070d 100644
--- a/dev/org-syntax-edited.org
+++ b/dev/org-syntax-edited.org
@@ -19,9 +19,7 @@ under the GNU General Public License v3 or later.
 Org is a plaintext format composed of simple, yet versatile, forms
 which represent formatting and structural information.  It is designed
 to be both intuitive to use, and capable of representing complex
-documents.  Like [[https://datatracker.ietf.org/doc/html/rfc7763][Markdown]], Org may be considered a lightweight markup
-language.  However, while Markdown refers to a collection of similar
-syntaxes, Org is a single syntax.
+documents.
 
 This document describes and comments on Org syntax as it is currently
 read by its parser (=org-element.el=) and, therefore, by the export
@@ -32,14 +30,13 @@ framework.
 ** Objects and Elements
 
 The components of this syntax can be divided into two classes:
-"[[#Objects][objects]]" and "[[#Elements][elements]]".  To better understand these classes,
-consider the paragraph as a unit of measurement.  /Elements/ are
-syntactic components that exist at the same or greater scope than a
-paragraph, i.e. which could not be contained by a paragraph.
-Conversely, /objects/ are syntactic components that exist with a smaller
-scope than a paragraph, and so can be contained within a paragraph.
-
-Elements can be stratified into "[[#Headings][headings]]", "[[#Sections][sections]]", "[[#Greater_Elements][greater
+"[[#Elements][elements]]" and "[[#Objects][objects]]".  Elements are
+syntactic components that have the same priority as or greater
+priority than a paragraph. Objects are syntactic components that are
+only recognized inside a paragraph or other paragraph-like elements
+such as heading titles.
+
+Elements are further divided into "[[#Headings][headings]]", "[[#Sections][sections]]"[fn::sections are not elements], "[[#Greater_Elements][greater
 elements]]", and "[[#Lesser_Elements][lesser elements]]", from broadest scope to
 narrowest.  Along with objects, these sub-classes define categories of
 syntactic environments.  Only [[#Headings][headings]], [[#Sections][sections]], [[#Property_Drawers][property drawers]], and
@@ -52,7 +49,12 @@ elements that cannot contain any other elements.  As such, a paragraph
 is considered a lesser element.  Greater elements can themselves
 contain greater elements or lesser elements. Sections contain both
 greater and lesser elements, and headings can contain a section and
-other headings.
+other headings. [fn:tom2:I would not discuss strata here because it is
+not related to the syntax of the document. It is related to how that
+syntax is interpreted by org mode. The strata are nesting rules that
+are independent of the syntax, and discussing that here in the syntax
+document is confusing, because the nesting is not something that can be
+parsed directly because it depends on the number of asterisks.]
 
 ** The minimal and standard sets of objects
 
@@ -60,25 +62,33 @@ To simplify references to common collections of objects, we define two
 useful sets.  The /<<<minimal set>>> of objects/ refers to [[#Plain_Text][plain text]], [[#Emphasis_Markers][text
 markup]], [[#Entities][entities]], [[#LaTeX_Fragments][LaTeX fragments]], [[#Subscript_and_Superscript][superscripts and subscripts]].  The
 /<<<standard set>>> of objects/ refers to the entire set of objects, excluding
-citation references and [[#Table_Cells][table cells]].
+citation references and [[#Table_Cells][table cells]].[fn:tom3:Table cells should
+be treated in a way that is entirely separate from objects. This document has included
+them as such as has org-element (iirc) however since they can never appear in a paragraph
+and because tables are completely separate syntactically, we should probably drop the
+idea that table cells are objects. I realize that this might mean the creation of a
+distinction between paragraph-objects, title-objects, table-objects etc.]
 
 ** Blank lines
 
 A line containing only spaces, tabs, newlines, and line feeds (=\t\n\r=)
-is considered a /blank line/.  Blank lines can be used to separate
+is considered a /blank line/.  Blank lines separate
 paragraphs and other elements.
 
 With the exception of [[#Items][list items]], blank lines belong to the preceding
 element with the narrowest possible scope.  For example, if at the end
 of a section we have a paragraph and a blank line, that blank line is
-considered part of the paragraph.
+considered part of the paragraph.[fn:tom4:I don't think we need to discuss
+nesting scope here, it is confusing, it is always the immediately prior
+(lesser?) element.]
 
 ** Indentation
 
 Indentation consists of a series of space and tab characters at the
 beginning of a line. Most elements can be indentated, with the
 exception of [[#Headings][headings]], [[#Inlinetasks][inlinetasks]], [[#Footnote_Definitions][footnote definitions]], and [[#Diary_Sexp][diary
-sexps]].
+sexps]]. [fn::Maybe a note that indentation is only meaningful in plain lists
+and for greater blocks is aligned to the indentation of the #+end_ block?]
 
 ** Syntax patterns
 
@@ -97,7 +107,8 @@ meaning, For instance, "KEY" and "VALUE" when describing
 elements or objects.
 
 Unless otherwise specified, a space in a pattern represents one or
-more horizontal whitespace characters.
+more horizontal whitespace characters.[fn::This should be in bold
+so that people don't miss it.]
 
 Patterns will often also contain static structures that serve to
 differentiate a particular element or object type from others, but
@@ -141,25 +152,34 @@ In this document, unless specified otherwise, case is insignificant.
 :CUSTOM_ID: Headings
 :END:
 
-A Heading is a /unindented/ line structured according to the following pattern:
+A Heading is an /unindented/ line structured according to the following pattern:
 
 #+begin_example
 STARS KEYWORD PRIORITY TITLE TAGS
 #+end_example
 
-+ STARS :: A string consisting of one or more asterisks (up to
-  ~org-inlinetask-min-level~ if the =org-inlinetask= library is loaded)
++ STARS :: A string consisting of one or more asterisks[fn::removed
+  note about inline tasks because it is still a heading, any mention
+  of a concrete number should not appear in the specification of syntax.]
   and ended by a space character.  The number of asterisks is used to
-  define the level of the heading.
+  define the level of the heading. [fn::Implementation note: when parsing
+  stars the space following the stars MUST NOT BE CONSUMED and the next
+  phase of parsing MUST start with the space so that it is possible to have
+  a heading with no title that also has tags.]
 
 + KEYWORD (optional) :: A string which is a member of
   ~org-todo-keywords-1~[fn:otkw1:By default, ~org-todo-keywords-1~ only
-  contains =TODO= and =DONE=, however this is liable to change.].  Case is
-  significant.  This is called a "TODO keyword".
-
-+ PRIORITY (optional) :: A single alphanumeric character preceded by a
-  hash sign =#= and enclosed within square brackets (e.g. =[#A]= or =[#1]=).  This
-  is called a "priority cookie".
+  contains =TODO= and =DONE=, however org-todo-keywords-1 is a buffer local
+  variable and can be set by users in an org file using =#+todo:=.].
+  Case is significant.  This is called a "TODO keyword". [fn::Implementation note:
+  TODO keywords cannot be hardcoded in a tokenizer, the tokenizer must
+  be configurable at runtime so that in-file TODO keywords are properly interpreted.]
+
++ PRIORITY (optional) :: A single letter preceded by a
+  hash sign =#= and enclosed within square brackets (e.g. =[#A]= or =[#D]=).  This
+  is called a "priority cookie".[fn::Numeric values are not supported
+  it is a quirk of the elisp implementation that they appear to work,
+  however they break in nasty and unexpected ways.]
 
 + TITLE (optional) :: A series of objects from the standard set,
   excluding line break objects.  It is matched after every other part.
@@ -180,15 +200,17 @@ STARS KEYWORD PRIORITY TITLE TAGS
 If the first word appearing in the title is =COMMENT=, the heading
 will be considered as "commented".  Case is significant.
 
-If its title is the value of ~org-footnote-section~ (=Footnotes= by
-default), it will be considered as a "footnote section".  Case is
-significant.
+If the title of a heading is exactly the value of ~org-footnote-section~
+(=Footnotes= by default), it will be considered as a "footnote section".
+Case is significant.
 
 If =ARCHIVE= is one of the tags given, the heading will be considered as
 "archived".  Case is significant.
 
-A heading contains directly one section (optionally), followed by
-any number of deeper level headings.
+The level of a heading can be used to construct a nested structure.
+All content following a heading that appears before the next heading
+(regardless of the level of that next heading) is a section. In addition,
+text before the first heading in an org document is also a section.
 
 *** Sections
 :PROPERTIES:
@@ -197,7 +219,15 @@ any number of deeper level headings.
 
 Sections contain one or more non-heading elements.  With the exception
 of the text before the first heading in a document (which is
-considered a section), sections only occur within headings.
+considered a section), sections only occur within headings.[fn:: The
+choice to call this syntactic component a section is confusing because
+it is at odds with the usual notion of a section, namely that the
+usual concept of a section implies that it includes nested content.  I
+personally didn't realize that it ended at the next heading until
+writing this comment (as can be seen from reading my comments in the
+laundry implementation). Therefore I suggest that we look for an
+alternate name for this syntactic component. Maybe "segment" or
+something similar that indicates that it is truncated?]
 
 *Example*
 
@@ -224,31 +254,67 @@ Its internal structure could be summarized as:
    (heading))))
 #+end_example
 
-*** The top level section
+*** The zeroth section
 :PROPERTIES:
-:CUSTOM_ID: Top_level_section
+:CUSTOM_ID: Zeroth_section
 :END:
 
 All elements before the first heading in a document lie in a special
-section called the /top level section/.  It may be preceded by blank
-lines.  Unlike a normal section, the top level section can immediately
+section called the /zeroth section/.  It may be preceded by blank
+lines.  Unlike a normal section, the zeroth section can immediately
 contain a [[#Property_Drawers][property drawer]], optionally preceded by [[#Comments][comments]].  It cannot
-however, contain [[Planning][planning]].
+however, contain [[Planning][planning]].[fn::This is wrong? If it is not
+wrong, then it should be. Property drawers are already annoying to implement
+because they share syntax with regular drawers, and allowing a property drawer
+at the top of a file without a heading means that it should be a regular drawer
+not a property drawer, otherwise you have to special case the handling of drawers
+in the zeroth section. What is the use case for a property drawer as opposed to
+a #+property: line in the zeroth section? I may come around on this at some point,
+but right now it seems more complex, however it might actually be more consistent
+if we imagine the zeroth section as being nested inside a single heading that has
+level zero implicitly at the top of a document. Unfortunately that means that such
+property drawers cannot be determined from a homogeneous syntax but instead require
+some operations on the internal representation. Note also that if this were allowed
+then the property drawer should only be allowed as the very first line of a file
+because newlines at the start of a file need to be preserved. More though required.]
 
 ** Affiliated Keywords
 :PROPERTIES:
 :CUSTOM_ID: Affiliated_Keywords
 :END:
 
+ [fn::Without going into to much detail, affiliated keywords should
+not be distinguished from other keywords at the level of the syntax.
+The fact that they are is an artifact of the elisp implementation.
+The determination of the behavior of a keyword with regard to
+affiliating behavior should be determined in a later pass, even if in
+some cases some implementations may want to materialize them into the
+parser for performance reasons. Allowing users to promote a keyword to
+be an affiliated keyword would be incredibly powerful for attaching
+metadata to parts of org-files in a way that is user extensible. It
+may still be desirable to describe the behavior of affiliated keywords
+here, but they are not in any way distinct from other keywords at the
+level of org syntax and trying to implement them as such is usually a
+mistake (that I have made).]
+
 With the exception of [[#Comments][comments]], [[#Clocks][clocks]], [[#Headings][headings]], [[#Inlinetasks][inlinetasks]],
 [[#Items][items]], [[#Node_Properties][node properties]], [[#Planning][planning]], [[#Property_Drawers][property drawers]], [[#Sections][sections]], and
 [[#Table_Rows][table rows]], every other element type can be assigned attributes.
+ [fn::Technically tables can be assigned attributes, if you try to affiliate to a table
+row you are accidentally creating a new table. Also, comments probably shouldn't be
+in this list, but I need to review what the behavior was when trying to affiliate
+to a paragraph where there is a comment in between, I'm pretty sure it doesn't work
+though some of the reordering via org-element does .... Being able to affiliate to
+comments could be quite powerful for some specialized use cases.]
 
 This is done by adding specific [[#Keywords][keywords]], named /affiliated/ keywords,
 immediately above the element considered (a blank line cannot lie
 between the affiliated keyword and element). Structurally, affiliated
 keyword are not considered an element in their own right but a
-property of the element they apply to.
+property of the element they apply to. [fn::While it is tempting to try
+to do this at the level of the grammar it induces a number of nasty
+ambiguities in practice. It is saner to have a single unified keyword
+syntax and then to determine affiliation behavior in a later pass.]
 
 Affiliated keywords are structured according to one of the following pattern:
 
@@ -268,19 +334,42 @@ Affiliated keywords are structured according to one of the following pattern:
 + OPTVAL (optional) :: A string consisting of any characters but a
   newline.  This term is only valid when KEY is a member of
   ~org-element-dual-keywords~[fn:oedkw:By default,
-  ~org-element-dual-keywords~ contains =CAPTION= and =RESULTS=.].
+  ~org-element-dual-keywords~ contains =CAPTION= and =RESULTS=.].[fn::
+  All keywords should allow OPTVAL, it regularizes and simplifies the syntax.]
 + VALUE :: A string consisting of any characters but a newline, except
   in the case where KEY is member of
   ~org-element-parsed-keywords~[fn:oepkw:By default,
   ~org-element-parsed-keywords~ contains =CAPTION=.] in which case VALUE
   is a series of objects from the standard set, excluding footnote
-  references.
-
-Repeating an affiliated keyword before an element will usually result
-in the prior VALUEs being overwritten by the last instance of KEY.
-There are two situations under which the VALUEs will be concatenated:
+  references (and line breaks ???).[fn::This is confusing.  A
+  =#+caption:= cannot contain a @@export: snippet@@ with a newline in
+  it, which this text seems to imply. A better wording would be to
+  state that there are some keywords where the contents of VALUE will
+  be further parsed as paragraphs (or whatever we are calling that
+  thing now. I think we are still missing the term for "object
+  containing syntax component")]
+
+ [fn::The behavior of affiliated keywords with respect to shadowing
+needs to be fully specified because it has major semantics implications,
+and for org babel headers it has security implications.]
+By default when there are multiple affiliated keywords that last occurrence
+of a given keyword is the one that has priority. Normally users should not
+specify more than a single instance of an affiliated keyword per element, but
+if they do the last one on the page wins.
+
+The default behavior is NOT followed for the ~#+header:~ keyword that
+is used for org-babel blocks. ~#+header:~ keywords combine header
+fields and resolve conflicts by having the top right most (first line
+last instance on the line) instance of field take priority. [fn::This
+behavior is critical for org babel and code execution security. If
+there are cases where aff keywords are not following this behavior
+then they need to be fixed. The reason to do first one wins in cases
+like this is so that users do not have to insert lines below which
+lead to hard to understand diffs.]
+
+In addition, there are two situations in which the VALUEs will be concatenated:
 1. If KEY is a member of ~org-element-dual-keywords~[fn:oedkw].
-2. If the affiliated keyword is an instance of the patten
+2. If the affiliated keyword is an instance of the pattern
    =#+attr_BACKEND: VALUE=.
 
 The following example contains three affiliated keywords:
@@ -296,16 +385,20 @@ The following example contains three affiliated keywords:
 :CUSTOM_ID: Greater_Elements
 :END:
 
-Unless specified otherwise, greater elements can contain directly
+Unless otherwise specified, greater elements can directly contain
 any greater or [[#Lesser_Elements][lesser element]] except:
 + Elements of their own type.
 + [[#Planning][Planning]], which may only occur in a [[#Headings][heading]].
-+ [[#Property_Drawers][Property drawers]], which may only occur in a [[#Headings][heading]] or the [[#Top_level_section][top level
++ [[#Property_Drawers][Property drawers]], which may only occur in a [[#Headings][heading]] or the [[#Zeroth_section][zeroth
   section]].
 + [[#Node_Properties][Node properties]], which can only be found in [[#Property_Drawers][property drawers]].
 + [[#Items][Items]], which may only occur in [[#Plain_Lists][plain lists]].
 + [[#Table_Rows][Table rows]], which may only occur in [[#Tables][tables]].
 
+ [fn::This is somewhat confusing because it lists combinations that
+should already be impossible by default because e.g. items are meaningless
+outside plain lists and should not even be mentioned outside of that context.]
+
 *** Greater Blocks
 :PROPERTIES:
 :CUSTOM_ID: Greater_Blocks
@@ -329,10 +422,14 @@ CONTENTS
   than a newline.
 + CONTENTS :: A collection of zero or more elements, subject to two
   conditions:
-  - No line may start with =#+end_NAME=.
+  - No line in the block may start with =#+end_NAME=.
   - Lines beginning with an asterisk must be quoted by a comma (=,*=).
   Furthermore, lines starting with =#+= may be quoted by a comma (=,#+=).
 
+ [fn::Implementation note: ~#+begin_name~ to ~#+end_name~ usually needs to
+be implemented in the tokenization step. The substructure discussed here
+is thus usually handled in a second pass.]
+
 *** Drawers and Property Drawers
 :PROPERTIES:
 :CUSTOM_ID: Drawers
@@ -349,6 +446,8 @@ CONTENTS
   and underscores (=-_=).
 + CONTENTS :: A collection of zero or more elements, except another drawer.
 
+ [fn:: ~:end:~ may be capitalized (legacy support)]
+
 *** Dynamic Blocks
 :PROPERTIES:
 :CUSTOM_ID: Dynamic_Blocks
@@ -366,12 +465,20 @@ CONTENTS
 + CONTENTS :: A collection of zero or more elements, except another
   dynamic block.
 
+ [fn::The spec needs to clarify how to handle ~#+begin:~ alone on a line or follow by
+only whitespace. It is quite nasty to have the behavior of ~#+begin:~ change if it is
+or is not followed by invisible whitespace. I suggest that we change the behavior of
+~#+begin:~ without whitespace to regularize it so that it is _always_ the start of a
+dynamic block since the ~#+begin:~ keyword by itself is pretty much completely useless
+since if you put anything after it, it becomes the start of a dynamic block anyway.]
+
 *** Footnote Definitions
 :PROPERTIES:
 :CUSTOM_ID: Footnote_Definitions
 :END:
 
-Footnote definitions must occur at the start of an /unindented/ line,
+Footnote definitions must occur at the start of an /unindented/ line
+(they must be preceeded by only a newline, nothing else),
 and are structured according to the following pattern:
 #+begin_example
 [fn:LABEL] CONTENTS
@@ -401,6 +508,13 @@ It even contains a single blank line.
 :CUSTOM_ID: Inlinetasks
 :END:
 
+ [fn::I suggest that we remove inlinetasks from this document.
+They are a hack that cannot be implemented as part of a grammar
+because they require a concrete value to be specified which breaks
+the arbitrary nesting depth of headings. I think I wrote this somewhere
+else as well, but inline tasks can only be a layer on top of headings,
+they cannot displace them.]
+
 Inlinetasks are syntactically a [[#Headings][heading]] with a level of at least
 ~org-inlinetask-min-level~[fn:oiml:The default value of
 ~org-inlinetask-min-level~ is =15=.], i.e. starting with at least that
@@ -448,8 +562,8 @@ BULLET COUNTER-SET CHECK-BOX TAG CONTENTS
   character, or a hyphen enclosed by square brackets (i.e. =[ ]=, =[X]=, or =[-]=).
 + TAG (optional) :: An instance of the pattern =TAG-TEXT ::= where
   =TAG-TEXT= represents a string consisting of non-newline characters
-  that does not contain the substring "\nbsp{}::\nbsp{}" (two colons surrounded by
-  whitespace).
+  that does not contain the substring ~" :: "~ (two colons surrounded by
+  whitespace without the quotes).
 + CONTENTS (optional) :: A collection of zero or more elements, ending
   at the first instance of one of the following:
   - The next item.
@@ -457,6 +571,22 @@ BULLET COUNTER-SET CHECK-BOX TAG CONTENTS
     not counting lines within other elements or [[#Inlinetasks][inlinetask]] boundaries.
   - Two consecutive blank lines.
 
+ [fn:: The description of CONTENTS is confusing since it cannot contain
+a heading, which is implicit in the indentation rule but not
+obvious. In addition, contents may not actually contain zero or more
+elements because many elements must start on their own line. So
+e.g. 1. #+begin_src does not work, however, the wording seems to
+indicate that it should, which is misleading. Further, it is actually
+not possible to implement contents as specified because grammars
+cannot track the indentation level that is required to reconstruct
+list items correctly. Therefore CONTENTS should not be defined as such
+but should only specify that they can be anything except a newline. I
+think that the intent of this document is somewhat a conflation of the
+syntax for org and of the semantics as determined by export backends
+and/or org-element, however it makes it extremely confusing because it
+is not actually possible to parse CONTENTS, they must be reconstructed
+from the parse tree.]
+
 *Examples*
 
 #+begin_example
@@ -471,11 +601,17 @@ BULLET COUNTER-SET CHECK-BOX TAG CONTENTS
 :END:
 
 A /plain list/ is a set of consecutive [[#Items][items]] of the same indentation.
+ [fn::This is confusing because the definition of contents above is
+confusing, it also implies that plain lists cannot be nested, or are
+not somehow nested, which is also confusing. Maybe a line to the effect
+that plain lists may be nested along with any other element that is
+properly indented or something?]
 
 If first item in a plain list has a COUNTER in its BULLET, the plain
 list will be an "ordered plain-list".  If it contains a TAG, it will
 be a "descriptive list".  Otherwise, it will be an "unordered list".
-List types are mutually exclusive.
+List types are mutually exclusive at the same level of indentation, if
+both types are present consecutively then they parse as separate lists.
 
 For example, consider the following excerpt of an Org document:
 
@@ -524,6 +660,13 @@ CONTENTS
 + CONTENTS :: A collection of zero or more [[#Node_Properties][node properties]], not
   separated by blank lines.
 
+ [fn::The failure mode for malformed contents needs to be
+determined more clearly here. We don't want property draws to suddenly
+become plain drawers just because a user has a malformed line, that
+could be disastrous if certain settings in the property drawer mask
+settings from further up the tree.  In short, malformed contents
+should not poison the whole property drawer.]
+
 *Example*
 
 #+begin_example
@@ -537,11 +680,24 @@ CONTENTS
 :CUSTOM_ID: Tables
 :END:
 
+ [fn::I think that this section needs to be split into two separate
+sections one for each grammar. It will make it much easier to specify
+each grammar, and it will also make it clear that they are not
+syntactic elements that are trivially interchangeable since only a
+subset of tables.el tables can be converted to org table syntax (at
+the moment). I'm willing to take a shot at it.]
+
 Tables are started by a line beginning with either:
 + A vertical bar (=|=), forming an "org" type table.
 + The string =+-= followed by a sequence of plus (=+=) and minus (=-=)
   signs, forming a "table.el" type table.
 
+ [fn::Consider whether tables.el tables should be supported by the
+syntax outside of elisp org mode. There are some slightly divergent
+use cases and features and we likely need/want to explore some of
+the alternatives proposed for how to allow pure org tables to support
+the features that are currently only possible for tables.el tables.]
+
 Tables cannot be immediately preceded by such lines, as the current
 line would the be part of the earlier table.
 
@@ -577,6 +733,11 @@ blocks]], [[#Paragraphs][paragraphs]] or [[#Table_Rows][table rows]] can contain
 :CUSTOM_ID: Babel_Call
 :END:
 
+ [fn::As with the other keyword-like things and syntax, I suggest that
+we ultimately move babel calls to live under a section on keyword
+content parsers so that it is clear that they should not be treated as
+separate syntactic components.]
+
 Babel calls are structured according to one of the following patterns:
 #+begin_example
 ,#+call: NAME(ARGUMENTS)
@@ -593,11 +754,20 @@ Babel calls are structured according to one of the following patterns:
   non-newline characters.  Opening and closing square brackets must be
   balanced.
 
+ [fn::Nesting rules for the parens and square brackets need revie here
+and elsewhere. The "must be balanced" requirement is implemented with
+an extremely nasty materialized regex which only works for 3 or 4 levels
+of nesting and thus is really likely to not be what we want.]
+
 *** Blocks
 :PROPERTIES:
 :CUSTOM_ID: Blocks
 :END:
 
+ [fn::These probably should not actually be distinct from greater blocks.
+the syntax is the same, the only difference is that there are 5 types that
+have special specified handling.]
+
 Like [[#Greater_Blocks][greater blocks]], blocks are structured according to the following pattern:
 
 #+begin_example
@@ -622,7 +792,8 @@ CONTENTS
     the pattern =LANGUAGE SWITCHES ARGUMENTS= with:
     + LANGUAGE :: A string consisting of any non-whitespace characters
     + SWITCHES :: Any number of SWITCH patterns, separated by a single
-      space character
+      space character [fn::For the love of all that is sane can we
+      please just remove this from the spec or mark it as legacy.]
       - SWITCH :: Either the pattern =-l "FORMAT"= where =FORMAT=
         represents a string consisting of any characters but a double
         quote (="=) or newline, or the pattern =-S= or =+S= where =S=
@@ -631,7 +802,7 @@ CONTENTS
 + CONTENTS (optional) :: A string consisting of any characters
   (including newlines) subject to the same two conditions of greater
   block's CONTENTS, i.e.
-  - No line may start with =#+end_NAME=.
+  - No line in the block may start with =#+end_NAME=.
   - Lines beginning with an asterisk must be quoted by a comma (=,*=).
   As with greater blocks, lines starting with =#+= may be quoted by a
   comma (=,#+=).
@@ -655,6 +826,10 @@ CONTENTS
 :CUSTOM_ID: Clocks
 :END:
 
+ [fn::This section seems to have been made way too simple? Or is the
+specifically the clocking-clock? If it is the clocking-clock then
+that should be clarified.]
+
 A clock element is structured according to the following pattern:
 
 #+begin_example
@@ -674,7 +849,6 @@ clock: INACTIVE-TIMESTAMP-RANGE DURATION
 clock: [2024-10-12]
 #+end_example
 
-
 *** Diary Sexp
 :PROPERTIES:
 :CUSTOM_ID: Diary_Sexp
@@ -701,6 +875,9 @@ A diary sexp[fn::A common abbreviation for S-expression] element is an
 :CUSTOM_ID: Planning
 :END:
 
+ [fn::I think this and property drawers should be moved to be closer
+to the heading spec section?]
+
 A planning element is structured according to the following pattern:
 
 #+begin_example
@@ -709,13 +886,15 @@ PLANNING
 #+end_example
 
 + HEADING :: A [[#Headings][heading]] element.
-+ PLANNING :: A line consisting of a series of =KEYWORD: TIMESTAMP=
++ PLANNING :: A line consisting of one or more =KEYWORD: TIMESTAMP=
   patterns (termed "info" patterns).
-  - KEYWORD :: Either the string =DEADLINE=, =SCHEDULED=, or =CLOSED=.
+  - KEYWORD :: Either the string =DEADLINE=, =SCHEDULED=, or =CLOSED=. [fn::
+    Request to add the =OPENED= keyword to track when a task was first known/entered into a file.]
   - TIMESTAMP :: A [[#Timestamps][timestamp]] object.
 
-It is not permitted for any blank lines to lie between HEADING and
-PLANNING.
+PLANNING must directly follow HEADING without any blank lines in between. 
+
+ [fn::Need a spec for how to handle multiple instances of the same keyword with different values.]
 
 *Example*
 
@@ -742,7 +921,6 @@ Comments consist of one or more consecutive comment lines.
 # Over multiple lines
 #+end_example
 
-
 *** Fixed Width Areas
 :PROPERTIES:
 :CUSTOM_ID: Fixed_Width_Areas
@@ -773,6 +951,9 @@ consecutive hyphens (=-----=).
 :CUSTOM_ID: Keywords
 :END:
 
+ [fn::Reminder about regularizing keyword syntax so that it
+always supports ~#+key[opt]:value~ syntax.]
+
 Keywords are structured according to the following pattern:
 
 #+begin_example
@@ -780,7 +961,11 @@ Keywords are structured according to the following pattern:
 #+end_example
 
 + KEY :: A string consisting of any non-whitespace characters, other
-  than =call= (which would forms a [[#Babel_Call][babel call]] element).
+  than =call= (which would forms a [[#Babel_Call][babel call]] element). [fn::This is
+  why I have the note on the ~#+call:~ section. If someone tries to
+  implement this they are going to be in a world of pain because there
+  is a concrete value here. This is because that distinction is not in
+  the syntax but instead should be in a later stage.]
 + VALUE :: A string consisting of any characters but a newline.
 
 When KEY is a member of ~org-element-parsed-keywords~[fn:oepkw], VALUE can contain
@@ -791,7 +976,9 @@ Note that while instances of this pattern are preferentially parsed as
 keyword may occur so long as it is not immediately preceding a valid
 element that can be affiliated.  For example, an instance of
 =#+caption: hi= followed by a blank line will be parsed as a keyword,
-not an affiliated keyword.
+not an affiliated keyword. [fn::A full spec for user defined aff keywords
+will require a bit more clarity for how lonely affiliated keywords should
+behave.]
 
 *** LaTeX Environments
 :PROPERTIES:
@@ -838,6 +1025,16 @@ according to one of the following patterns:
   which does not end in a plus characters (=+=).
 + VALUE (optional) :: A string containing any characters but a newline.
 
+ [fn::This spec is not consistent with the behavior and has bad design.
+Name should be allowed to be empty, same as with heading tags. It is
+critical to include the empty string as part of a grammar like this so
+that it is closed, otherwise we get nasty edge cases. For example it
+should be the case that ~:+:~ is syntactically valid as a node
+property.  The fact that it won't apply to anything is ok, it might
+also be useful if we regularize ~#+begin_NAME~ to allow the empty
+string for NAME. Note that ~:+:~ is already treated as syntactically
+valid for font locking and for property drawer detection (I think).]
+
 *** Paragraphs
 :PROPERTIES:
 :CUSTOM_ID: Paragraphs
@@ -850,11 +1047,21 @@ Empty lines and other elements end paragraphs.
 
 Paragraphs can contain the standard set of objects.
 
+ [fn::Implementation note: it is possible to define
+paragraphs constructively instead of as they are defined
+here as the negation or fall through of all other things.
+We should update this section with the positive definition
+once I have it nailed down.]
+
 *** Table Rows
 :PROPERTIES:
 :CUSTOM_ID: Table_Rows
 :END:
 
+ [fn::I suggest we roll this up into the org tables section
+spec so that we don't have to worry about making a note that
+these only occur in tables.]
+
 A table row consists of a vertical bar (=|=) followed by:
 + Any number of [[#Table_Cells][table cells]], forming a "standard" type row.
 + A hyphen (=-=), forming a "rule" type row.  Any non-newline characters
@@ -892,6 +1099,13 @@ such as a paragraph.
 :CUSTOM_ID: Entities
 :END:
 
+ [fn::As I think I mention elsewhere, the concrete names here
+should NOT be part of the syntax, it makes the parser brittle
+and hard to maintain. Differentiation between entities and fragments
+should be handled at the syntax level for cases where the fragment
+has brackets, and then in a second pass for values that are
+syntactically entity-or-fragment and must be determined after
+the fact.]
 Entities are structured according to the following pattern:
 
 #+begin_example
@@ -987,6 +1201,7 @@ ought to be removed.
 They are slow to parse, fragile, redundant and imply false
 positives.  --- ngz
 #+end_quote
+ [fn::Strong support for removing these.]
 
 ** Export Snippets
 :PROPERTIES:
@@ -1002,6 +1217,12 @@ Export snippets are structured according to the following pattern:
 + BACKEND :: A string consisting of alphanumeric characters and hyphens.
 + VALUE (optional) :: A string containing anything but the string =@@=.
 
+ [fn::We probably want to node that BACKEND can be the empty string
+per that thread on how to deal with intra-word markup. Again this
+also touches on the general principle of wanting to close over the
+empty string so that users aren't surprised when ~@@:lol@@~ suddenly
+appears in plain text just because no backend was specified.]
+
 ** Footnote References
 :PROPERTIES:
 :CUSTOM_ID: Footnote_References
@@ -1019,14 +1240,17 @@ Footnote references are structured according to one of the following patterns:
   hyphens and underscores (=-_=).
 + DEFINITION (optional) :: A series of objects from the standard set,
   so long as opening and closing square brackets are balanced within
-  DEFINITION.
+  DEFINITION. [fn::As noted elsewhere, the balanced brackets
+  requirement is a nightmare and needs a review.]
 
 If the reference follows the second pattern, it is called an "inline
 footnote".  If it follows the third pattern, i.e. if LABEL is omitted,
 it is called an "anonymous footnote".
 
 Note that the first pattern may not occur on an /unindented/ line, as it
-is then a [[#Footnote_Definitions][footnote definition]].
+is then a [[#Footnote_Definitions][footnote definition]]. [fn::I'm not sure this is quite right?
+the font locking code is not consistent with actual behavior, need to
+review the laundry test cases and example files.]
 
 ** Citations
 :PROPERTIES:
@@ -1127,7 +1351,8 @@ src_LANG[HEADERS]{BODY}
 + LANG :: A string consisting of any non-whitespace characters.
 + HEADERS (optional), BODY (optional) :: A string consisting of any
   characters but a newline.  Opening and closing square brackets must
-  be balanced.
+  be balanced. [fn::Nesting issues need review. Suggestion to do
+  something like what Racket scribble does.]
 
 ** Line Breaks
 :PROPERTIES:
@@ -1366,6 +1591,10 @@ SIGN CHARS FINAL
 :CUSTOM_ID: Table_Cells
 :END:
 
+ [fn::Need to condense this with tables and table rows because
+spreading these out makes it super hard to understand the table syntax
+for basically no reason.]
+
 Table cells are structured according to the following pattern:
 
 #+begin_example
@@ -1375,7 +1604,12 @@ CONTENTS SPACES|
 + CONTENTS :: A series of objects not containing the vertical bar
   character (=|=).  It can contain the minimal set of objects,
   [[#Citations][citations]], [[#Export_Snippets][export snippets]], [[#Footnote_References][footnote references]], [[#Links][links]], [[#Macros][macros]],
-  [[#Targets_and_Radio_Targets][radio targets]], [[#Targets_and_Radio_Targets][targets]], and [[#Timestamps][timestamps]].
+  [[#Targets_and_Radio_Targets][radio targets]], [[#Targets_and_Radio_Targets][targets]], and [[#Timestamps][timestamps]]. [fn::Like for the comma in
+  macros, I think it would be safe to add ~\|~ as an escape character.
+  The issue in the elisp implementation is not actually at the level
+  of the syntax, but is actually in the export backends or somewhere
+  deeper, because even using a macro that expands to be a pipe ~|~
+  breaks the table (which is really bad).]
 + SPACES :: A string consisting of zero or more of space characters,
   used to align the table columns.
 
@@ -1386,6 +1620,10 @@ The final vertical bar (=|=) may be omitted in the last cell of a row.
 :CUSTOM_ID: Timestamps
 :END:
 
+ [fn::I have some suggestions for extensions to timestamp syntax to
+support historical and far future dates, as well timezone offsets (NOT
+the 3 letter ambiguous disaster) and seconds and sub-second times.]
+
 Timestamps are structured according to one of the seven following patterns:
 
 #+begin_example
@@ -1470,6 +1708,10 @@ BORDER BODY BORDER
 + [[#Special_Tokens][POST]] :: Either a whitespace character, =-=, =.=, =,=, =;=, =:=, =!=, =?=, ='=, =)=, =}=,
   =[=, ="=, or the end of a line.
 
+The four =*/_+= may be arbitrarily nested to any depth. Verbatim and
+code ==~= may be nested inside any other markup, but no other markup
+will be interpreted inside of them since they are interpreted exactly.
+
 *Examples*
 
 #+begin_example
@@ -1483,12 +1725,16 @@ functions starting with ~org-element-~.
 :CUSTOM_ID: Plain_Text
 :END:
 
+ [fn::I'm not sure I would add this, the fall through is sloppy
+and it is better to specify values constructively.]
+
 Any string that doesn't match any other object can be considered a
 plain text object.[fn::In ~org-element.el~ plain text objects are
 abstracted away to strings for performance reasons.]
 Within a plain text object, all whitespace is collapsed to a single
 space. For instance, =hello\n there= is equivalent to =hello there=.
 
+
 * Footnotes
 
 [fn:1] In particular, the parser requires stars at column 0 to be
@@ -1497,7 +1743,10 @@ quoted by a comma when they do not define a heading.
 [fn:2] It also means that only headings and sections can be recognized
 just by looking at the beginning of the line.  Planning lines and
 property drawers can be recognized by looking at one or two lines
-above.
+above. [fn::This is incorrect. There are many elements that can be
+recognized by looking at the start of a line, however the conflation
+between pure syntax level and other levels of parsing and processing
+obscure this.]
 
 As a consequence, using ~org-element-at-point~ or ~org-element-context~
 will move up to the parent heading, and parse top-down from there
@@ -1571,6 +1820,20 @@ until context around the original location is found.
 :CUSTOM_ID: Entities_List
 :END:
 
+ [fn::The org entities section is useful, but I suggest not including
+that section at all right now. There is a way to define and abstract
+syntax that does not require the parser to pull in all those concrete
+forms which reflects how org mode implements that functionality but
+should not be specified as part of the syntax document. There are some
+significant edge cases that need to be worked out in the grammar for
+this that having a hardcoded list masks. I suggest we work that
+portion out before committing any of that to a spec doc. It is also a
+bad idea to list of all of those in the spec doc because it will
+likely get out of sync with the code that implements such detection in
+elisp (despite the fact that the list is being auto generated via a
+code block). Maybe it makes sense to include the code block so that
+devs and users can discover it for themselves?]
+
 #+begin_src emacs-lisp :results raw :exports results
 (concat "| Name | Character |\n|-\n"
         (mapconcat
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: Org Syntax Specification
  2022-01-18  0:54 ` Org Syntax Specification Tom Gillespie
@ 2022-01-18 12:09   ` Ihor Radchenko
  2022-01-19  1:22     ` Tom Gillespie
  0 siblings, 1 reply; 22+ messages in thread
From: Ihor Radchenko @ 2022-01-18 12:09 UTC (permalink / raw)
  To: Tom Gillespie; +Cc: org-mode-email, Nicolas Goaziou, Timothy

Tom Gillespie <tgbugs@gmail.com> writes:

> Extremely in favor of removing switches. There are so many better ways
> to do this now that aren't like some eldritch unix horror crawling up
> out of the abyss and into the eBNF :)

I also agree that switches and $$-style equations may be deprecated.
We can
1. Do not mention them in the document
2. Add org-lint warnings about obsoletion

As for your other comments, you seem to be suggesting a number of
changes to the existing Org syntax. Some of them looks fine, some are
not. However, please keep in mind that we have to deal with back
compatibility, third party compatibility, and not breaking existing Org
documents unless we have a very strong justification. I suggest to
branch a number of new threads from here for each concrete suggestion
where you want to make changes to Org syntax, as opposed to just
document wording. Otherwise, this discussion will become a total mess.

More details below.

> +Elements are further divided into "[[#Headings][headings]]", "[[#Sections][sections]]"[fn::sections are not elements], "[[#Greater_Elements][greater

Nope. Sections are actually elements. See =org-element-all-elements=.

> +other headings. [fn:tom2:I would not discuss strata here because it is
> +not related to the syntax of the document. It is related to how that
> +syntax is interpreted by org mode. The strata are nesting rules that
> +are independent of the syntax, and discussing that here in the syntax
> +document is confusing, because the nesting is not something that can be
> +parsed directly because it depends on the number of asterisks.]

I disagree. Nesting rules are the important part of syntax. We have
restrictions on what elements can be inside other element. The same
patterns are not recognised in Org depending on their nesting. For
example, links that you put into property drawers are not considered
link objects.

> +citation references and [[#Table_Cells][table cells]].[fn:tom3:Table cells should
> +be treated in a way that is entirely separate from objects. This document has included
> +them as such as has org-element (iirc) however since they can never appear in a paragraph
> +and because tables are completely separate syntactically, we should probably drop the
> +idea that table cells are objects. I realize that this might mean the creation of a
> +distinction between paragraph-objects, title-objects, table-objects etc.]

Again I disagree. While your idea about table cells is reasonable
(similar for citation-references inside citations), I am against
decoupling Org syntax from org-element implementation. In
org-element.el, table-cells are just yet another object. If we make
things in org-element and syntax document out of sync, confusion and
errors will follow during future maintenance.

>  A line containing only spaces, tabs, newlines, and line feeds (=\t\n\r=)
> -is considered a /blank line/.  Blank lines can be used to separate
> +is considered a /blank line/.  Blank lines separate
>  paragraphs and other elements.

This actually reads slightly confusing. "Blank lines separate paragraphs
and other elements" sounds like blank lines are only relevant
before/after paragraphs. However, there are also footnote references and
lists. Maybe we can try something like:

Blank lines can be used to indicate end of some elements.

"can" because a single blank line usually does not separate anything.

> +considered part of the paragraph.[fn:tom4:I don't think we need to discuss
> +nesting scope here, it is confusing, it is always the immediately prior
> +(lesser?) element.]

Then where can we put it? This is one of the tricky conventions we use
in the parser.

> ++ STARS :: A string consisting of one or more asterisks[fn::removed
> +  note about inline tasks because it is still a heading, any mention
> +  of a concrete number should not appear in the specification of
> syntax.]

I am not sure here. Inline tasks are special because a one-line inline
task must not contain any text below, cannot have planning or
properties.

> +  contains =TODO= and =DONE=, however org-todo-keywords-1 is a buffer local
> +  variable and can be set by users in an org file using =#+todo:=.].

If we mention this, we also need to elaborate kind of element is
#+todo:, where it can be located, and how to parse multiple instances of
#+todo in the document.

> -A heading contains directly one section (optionally), followed by
> -any number of deeper level headings.
> +The level of a heading can be used to construct a nested structure.
> +All content following a heading that appears before the next heading
> +(regardless of the level of that next heading) is a section. In addition,
> +text before the first heading in an org document is also a section.

Note that it is not true for one-line inline tasks.

> +considered a section), sections only occur within headings.[fn:: The
> +choice to call this syntactic component a section is confusing because
> +it is at odds with the usual notion of a section, namely that the
> +usual concept of a section implies that it includes nested content.  I
> +personally didn't realize that it ended at the next heading until
> +writing this comment (as can be seen from reading my comments in the
> +laundry implementation). Therefore I suggest that we look for an
> +alternate name for this syntactic component. Maybe "segment" or
> +something similar that indicates that it is truncated?]

Sounds reasonable. However, we may also need to make this change in
Elisp level, which is tricky when you think about
backward-compatibility.

> +however, contain [[Planning][planning]].[fn::This is wrong? If it is not
> +wrong, then it should be. Property drawers are already annoying to implement
> +because they share syntax with regular drawers, and allowing a property drawer
> +at the top of a file without a heading means that it should be a regular drawer
> +not a property drawer, otherwise you have to special case the handling of drawers
> +in the zeroth section. What is the use case for a property drawer as opposed to
> +a #+property: line in the zeroth section? I may come around on this at some point,
> +but right now it seems more complex, however it might actually be more consistent
> +if we imagine the zeroth section as being nested inside a single heading that has
> +level zero implicitly at the top of a document. Unfortunately that means that such
> +property drawers cannot be determined from a homogeneous syntax but instead require
> +some operations on the internal representation. Note also that if this were allowed
> +then the property drawer should only be allowed as the very first line of a file
> +because newlines at the start of a file need to be preserved. More though required.]

The statement about property drawers in first section (that how we refer
to it in org-element) is correct. First section and its property drawer
location is special.

I agree that it's inconsistent with normal property drawers. However, we
cannot change it without breaking existing Org files. It we decide to
change syntax in this area, we should think carefully about possible
consequences.

> + [fn::Without going into to much detail, affiliated keywords should
> +not be distinguished from other keywords at the level of the syntax.
> +The fact that they are is an artifact of the elisp implementation.
> +The determination of the behavior of a keyword with regard to
> +affiliating behavior should be determined in a later pass, even if in
> +some cases some implementations may want to materialize them into the
> +parser for performance reasons. Allowing users to promote a keyword to
> +be an affiliated keyword would be incredibly powerful for attaching
> +metadata to parts of org-files in a way that is user extensible. It
> +may still be desirable to describe the behavior of affiliated keywords
> +here, but they are not in any way distinct from other keywords at the
> +level of org syntax and trying to implement them as such is usually a
> +mistake (that I have made).]

I generally support this idea. Handling keywords in org-element is not
pretty. Having them in the parse tree would make things easier. However,
we again need to consider back-compatibility. I can imagine third-party
ox-* packages breaking if we make this change - we should double check
if we decide to change this.

> +property of the element they apply to. [fn::While it is tempting to try
> +to do this at the level of the grammar it induces a number of nasty
> +ambiguities in practice. It is saner to have a single unified keyword
> +syntax and then to determine affiliation behavior in a later pass.]

Yes, it is saner. However, our syntax document is supposed to be
human-readable description of what org-element does. We cannot introduce
differences between grammar document and de-facto parser implementation.
This will defeat the purpose to providing reference syntax - we will get
inconsistency between Emacs Org mode and external parsers.

> +  ~org-element-dual-keywords~ contains =CAPTION= and =RESULTS=.].[fn::
> +  All keywords should allow OPTVAL, it regularizes and simplifies the syntax.]

I support this idea.

> + [fn:: ~:end:~ may be capitalized (legacy support)]

Both :END: and :end: are supported by Org parser. What do you mean by
legacy?

> + [fn::I suggest that we remove inlinetasks from this document.
> +They are a hack that cannot be implemented as part of a grammar
> +because they require a concrete value to be specified which breaks
> +the arbitrary nesting depth of headings. I think I wrote this somewhere
> +else as well, but inline tasks can only be a layer on top of headings,
> +they cannot displace them.]

I disagree. inilinetasks are a part of syntax de facto and they can be
encountered in Org documents in the wild. If you treat inlinetasks as
ordinary headings, things may be broken unpredictably during parsing.

Instead, we may consider making inlinetask level constant.

> +indicate that it should, which is misleading. Further, it is actually
> +not possible to implement contents as specified because grammars
> +cannot track the indentation level that is required to reconstruct
> +list items correctly. Therefore CONTENTS should not be defined as such
> +but should only specify that they can be anything except a newline. I
> +think that the intent of this document is somewhat a conflation of the
> +syntax for org and of the semantics as determined by export backends
> +and/or org-element, however it makes it extremely confusing because it
> +is not actually possible to parse CONTENTS, they must be reconstructed
> +from the parse tree.]

Could you elaborate why grammars cannot track the indentation level?
AFAIU, If it were the case, python would not be parseable.

> + [fn::The failure mode for malformed contents needs to be
> +determined more clearly here. We don't want property draws to suddenly
> +become plain drawers just because a user has a malformed line, that
> +could be disastrous if certain settings in the property drawer mask
> +settings from further up the tree.  In short, malformed contents
> +should not poison the whole property drawer.]

Yet, it is exactly what happens in Org. malformed property drawers will
become ordinary drawers.

>      + SWITCHES :: Any number of SWITCH patterns, separated by a single
> -      space character
> +      space character [fn::For the love of all that is sane can we
> +      please just remove this from the spec or mark it as legacy.]

I support this idea.

> +PLANNING must directly follow HEADING without any blank lines in between. 
> +
> + [fn::Need a spec for how to handle multiple instances of the same keyword with different values.]

The last one wins (as in org-element-planning-parser)

> + [fn::As I think I mention elsewhere, the concrete names here
> +should NOT be part of the syntax, it makes the parser brittle
> +and hard to maintain. Differentiation between entities and fragments
> +should be handled at the syntax level for cases where the fragment
> +has brackets, and then in a second pass for values that are
> +syntactically entity-or-fragment and must be determined after
> +the fact.]

How would you define entities object then? First/second pass is an
implementation detail. Our current description follows how org-element
handles entities.

> + [fn::We probably want to node that BACKEND can be the empty string
> +per that thread on how to deal with intra-word markup. Again this
> +also touches on the general principle of wanting to close over the
> +empty string so that users aren't surprised when ~@@:lol@@~ suddenly
> +appears in plain text just because no backend was specified.]

While I am not opposing the idea, your principle is not followed by
org-element parser. We may consider changing it, but it is again a whole
separate discussion where we need to consider pros and cons.

>  Note that the first pattern may not occur on an /unindented/ line, as it
> -is then a [[#Footnote_Definitions][footnote definition]].
> +is then a [[#Footnote_Definitions][footnote definition]]. [fn::I'm not sure this is quite right?
> +the font locking code is not consistent with actual behavior, need to
> +review the laundry test cases and example files.]

Do not look at font-locking. You can safely consider that fontification
is wrong in all non-trivial cases. Always check org-element-at-point and
org-element-context.

> -  [[#Targets_and_Radio_Targets][radio targets]], [[#Targets_and_Radio_Targets][targets]], and [[#Timestamps][timestamps]].
> +  [[#Targets_and_Radio_Targets][radio targets]], [[#Targets_and_Radio_Targets][targets]], and [[#Timestamps][timestamps]]. [fn::Like for the comma in
> +  macros, I think it would be safe to add ~\|~ as an escape character.
> +  The issue in the elisp implementation is not actually at the level
> +  of the syntax, but is actually in the export backends or somewhere
> +  deeper, because even using a macro that expands to be a pipe ~|~
> +  breaks the table (which is really bad).]

I am not sure if it is needed. We can already to \vert

> + [fn::I have some suggestions for extensions to timestamp syntax to
> +support historical and far future dates, as well timezone offsets (NOT
> +the 3 letter ambiguous disaster) and seconds and sub-second times.]

That would be welcome, but someone™ should implement timezone support in
Elisp level. We have several discussions about this in the past.

> +The four =*/_+= may be arbitrarily nested to any depth. Verbatim and
> +code ==~= may be nested inside any other markup, but no other markup
> +will be interpreted inside of them since they are interpreted exactly.

That's not accurate. you cannot nest, say, bold inside bold. You cannot
put code inside any other markup freely: consider *bold =asd*asd= not bold*

Best,
Ihor

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Org Syntax Specification
  2022-01-18 12:09   ` Ihor Radchenko
@ 2022-01-19  1:22     ` Tom Gillespie
  2022-01-19 11:58       ` Ihor Radchenko
  0 siblings, 1 reply; 22+ messages in thread
From: Tom Gillespie @ 2022-01-19  1:22 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: org-mode-email, Nicolas Goaziou, Timothy

Hi Ihor,
  Thank you very much for the detailed responses. Let me start with
some context.

1. A number of the comments that I made fall into the brainstorming
   category, so they don't need to make their way into the document at
   this time. I agree that it is critical for this document to capture
   how org is parsed right now and that we should not put the
   pie-in-the-sky changes in until the behavior of org-element matches
   (if such a change is made at all).
2. Though I haven't been hacking on it, I fully intend to contribute
   test cases and exploratory work on org-element in the future, so
   please don't interpret some of what I am writing as requests for
   other people to write code (unless they want to :)
3. When I say grammar in this context I mean specifically an eBNF that
   generates a LALR(1) or LR(1) parser. This is narrower than the
   definition used in the document, which includes things that have to
   be implemented in the tokenizer, or in a pass after the grammar has
   been applied, or are related to some other aspect beyond the pure
   surface syntax.
4. A number of my comments are about the structure of the document
   more than the structure of the syntax or the implementation. I
   think that most of them are trying to ask whether we want to
   clearly delineate pure surface syntax from semantics to make the
   document easier to understand.

More replies in line.
Best!
Tom

> As for your other comments, you seem to be suggesting a number of
> changes to the existing Org syntax. Some of them looks fine, some are
> not. However, please keep in mind that we have to deal with back
> compatibility, third party compatibility, and not breaking existing Org
> documents unless we have a very strong justification. I suggest to
> branch a number of new threads from here for each concrete suggestion
> where you want to make changes to Org syntax, as opposed to just
> document wording. Otherwise, this discussion will become a total mess.

Agreed. I put many of these in here as notes from my experiences, I
will branch those off into separate discussions so that we don't
pollute this thread.

> Nope. Sections are actually elements. See =org-element-all-elements=.

I realized this at a slightly later date but missed cleaning up this
comment.  See my response on section vs segment below.

> I disagree. Nesting rules are the important part of syntax. We have
> restrictions on what elements can be inside other element. The same
> patterns are not recognised in Org depending on their nesting. For
> example, links that you put into property drawers are not considered
> link objects.

When I wrote this comment I was still confused about sections.I think
discussion of nesting in most contexts is ok, but there are some case
where nesting cannot be determined from the grammar, and there I think
we need to make a distinction.

In my thinking I separate the context sensitive nature of parsing from
the nesting structure of the resulting sexpressions, org elements,
etc.The most obvious example of this is that the sexpression
representation for headings nests based on the level of the heading,
but heading level cannot be determined by the grammar so it must be
reconstructed from a flat sequence of headings that have varying level.

> Again I disagree. While your idea about table cells is reasonable
> (similar for citation-references inside citations), I am against
> decoupling Org syntax from org-element implementation. In
> org-element.el, table-cells are just yet another object. If we make
> things in org-element and syntax document out of sync, confusion and
> errors will follow during future maintenance.

Org element treats all elements and objects as a single homogenous
type.  This is fine. However, to help people understand the syntax it
seems easier to define things in a positive way so that we don't say
"all except these two."  Therefore, despite the fact that the
implementation of org-element treats table rows and cells no different
from any other node in the parse tree, we don't need to burden the
reader with that information at this point in time, and could provide
that information as an implementation note for cells.  I think the
other issue I was having here is that the spec for tables is spread
allover the place, and it would be much easier to understand and
implement ifit were all in one place.

> This actually reads slightly confusing. "Blank lines separate paragraphs
> and other elements" sounds like blank lines are only relevant
> before/after paragraphs. However, there are also footnote references and
> lists. Maybe we can try something like:
>
> Blank lines can be used to indicate end of some elements.
>
> "can" because a single blank line usually does not separate anything.

I think your version is quite a bit more readable.  Can we list the
set of all the elements that can be ended by a new lineas well as
those that cannot (iirc they are elements such as footnotes that can
only be ended by a double blank line or a heading)?

> Then where can we put it? This is one of the tricky conventions we use
> in the parser.

After discussing with Timothy, I realized that I totally missed the
greater/lesser parallelism between blocks and elements. I'll see if I
can come up with some wording that will avoid similar confusion for
other readers.

> I am not sure here. Inline tasks are special because a one-line inline
> task must not contain any text below, cannot have planning or
> properties.

Then they are no longer inline tasks, but instead parse as headings, correct?

> > +  contains =TODO= and =DONE=, however org-todo-keywords-1 is a buffer local
> > +  variable and can be set by users in an org file using =#+todo:=.].
>
> If we mention this, we also need to elaborate kind of element is
> #+todo:, where it can be located, and how to parse multiple instances of
> #+todo in the document.

Yes. What I have written for laundry is that only #+todo: declarations
that appear in the zeroth section will be applied (this is true for
all document level configuration keywords). There is also a
possibility that we might be able to support including #+todo:
keywords (and #+link: definitions or similar) in further sections, but
that they would only apply to headings that occur after that line in
the file. Such behavior is likely to be confusing to users so probably
best to only guarantee correct behavior if they are put in the zeroth
section.

The reason it is confusing/problematic is that there could be
a #+todo: buried half way down a file, the buffer configuration is
updated, and then a user can use keywords up the file in the elisp
implementation. Another implementation that parses a file
incrementally would not encounter the buried #+todo: keyword until
after they have already emitted a heading,changing how a heading is
parsed. There is a similar issue with the #+link: keyword.

> > -A heading contains directly one section (optionally), followed by
> > -any number of deeper level headings.
> > +The level of a heading can be used to construct a nested structure.
> > +All content following a heading that appears before the next heading
> > +(regardless of the level of that next heading) is a section. In addition,
> > +text before the first heading in an org document is also a section.
>
> Note that it is not true for one-line inline tasks.

I'm not quite sure which part you are referring to here.

> Sounds reasonable. However, we may also need to make this change in
> Elisp level, which is tricky when you think about
> backward-compatibility.

Let's look into how much work it will be and how disruptive it might
be?  We are already changing to heading in the elisp so maybe now
would be a good time to also change from section to segment?
Alternatively we could start by updating the documentation and include
a note that segments are currently called sections by org element?

> The statement about property drawers in first section (that how we refer
> to it in org-element) is correct. First section and its property drawer
> location is special.
>
> I agree that it's inconsistent with normal property drawers. However, we
> cannot change it without breaking existing Org files. It we decide to
> change syntax in this area, we should think carefully about possible
> consequences.

I've since come around on this. I think that we can make it consistent
by thinking of the zeroth section as an invisible heading with zero
asterisks at the start of a file. This is extremely useful for making
org-transclusion work transparently with whole files. The only
modification that I might suggest in the context of org-transclusion
would be to disallow empty lines before the property drawer. This
allows files to represent single sections (segments) which might be
very useful for implementations that want to store sections in a
database or something like that.

> I generally support this idea. Handling keywords in org-element is not
> pretty. Having them in the parse tree would make things easier. However,
> we again need to consider back-compatibility. I can imagine third-party
> ox-* packages breaking if we make this change - we should double check
> if we decide to change this.

I'm happy to put in the time to submit code fixes for consumers of the
API so we can make this change. I have usually limited my thinking
about compatibility concerts to the document syntax and semantics but
this made me realize that in terms of actual labor the API consumers
are likely to be affected as well.

> Yes, it is saner. However, our syntax document is supposed to be
> human-readable description of what org-element does. We cannot introduce
> differences between grammar document and de-facto parser implementation.
> This will defeat the purpose to providing reference syntax - we will get
> inconsistency between Emacs Org mode and external parsers.

To achieve this can we have an implementation note for org element
specifically? There shouldn't be any divergence between
implementations if we get the abstract variant of this specified
correctly, where correctly means "exactly matches org-element
behavior."

Another note that I think this difference is arising because I'm using
a narrower definition for what counts as syntax while still wanting to
specify that the resulting transformed ast should be the same.

I think it could make the document more useful if we have examples of
how to get to the same endpoint with slightly different decisions
about surface syntax.

One final note here is that part of my objective in this was to
simplify the org-element implementation while opening the possibility
for user defined keyword behavior. You of course are the expert on
org-element so my thinking may very well be misguided on this
point. This is another area where I would be happy to contribute when
the time comes.

> Both :END: and :end: are supported by Org parser. What do you mean by
> legacy?

I seem to recall a statement that things like #+BEGIN_SRC and friends
being retained for legacy support. This is also related to a
standardization conversation which we aren't quite ready to have,
which is that for things like :end: and :END: the lowercase version is
the "canonical" representation when normalizing a document (related to
being able to specify levels of conformance for an org parser, namely
that there is a level that would only accept fully normalized
documents that i.e. use :end: and not :END:). The elisp implementation
of course supports :END:, but I don't recall whether it falls into the
same category as #+BEGIN_SRC being on legacy support and #+begin_src
being the preferred version.

> I disagree. inilinetasks are a part of syntax de facto and they can be
> encountered in Org documents in the wild. If you treat inlinetasks as
> ordinary headings, things may be broken unpredictably during parsing.

This comment in particular was about whether we talk about things
beyond the surface syntax in this document and/or whether we move them
to a section on semantics and transformations that are deeper than the
surface syntax. I'm fine to keep this section in the document, but we
should make it clear that it is not part of the surface syntax (this
is also related to my question about property drawers and planning
following an inline task being parsed as a heading above).

I'm using the term syntax very narrowly here to refer specifically to
the pure surface syntax. Inline tasks don't introduce any novel
restrictions on syntax so they don't have to be implemented as part of
the surface syntax, they are a reinterpretation of a headings and
otherwise follow all the usual rules such as not allowing new headings
inside them etc.

The reason I bring this up is because when implementing an org parser
we would like to communicate to developers which parts of this
document should be implemented directly in the parser and which ones
should be deferred to a later step. Inlinetasks are a good example of
this because they are entirely consistent with regular old org syntax
for headings, and can be implemented as a transformation on the ast
for headings that have a level that is deeper than the inlinetask min
level.

Said another way, we want to communicate that trying to introduce a
node in an eBNF grammar for inline tasks is not a good idea because it
makes org syntax extremely non-regular and breaks countless use cases
that need nesting of headings beyond the inlinetask min level.

> Instead, we may consider making inlinetask level constant.

I don't think this is necessary, or at least is orthogonal to my
concerns.

> Could you elaborate why grammars cannot track the indentation level?
> AFAIU, If it were the case, python would not be parseable.

Python maintains a separate stack for handling leading whitespace.
https://docs.python.org/3/reference/lexical_analysis.html#indentation
Thus it is effectively tracked as part of the tokenizer which goes on
to emit the indent and dedent tokens. However Org cannot take this
approach because it allows much more permissive use of leading
whitespace and in plain lists deals with a minimum deindent relative
to the bullet which may itself be arbitrarily indented. I think I
might be able to implement a stack that could track deinents like that
in the tokenizer but I'm not 100% sure.

Regardless, my (perhaps overly technical point) is that it is not
something that can be done in the grammar, it must be done in the
tokenizer, and the tokenizer would have to emit a control token that
maps to the space between two characters in order for the deinent to
be usable by the grammar.

Somehow this reminds me that I need to check on the behavior of spaces
vs tabs for plain lists (joy).

> Yet, it is exactly what happens in Org. malformed property drawers will
> become ordinary drawers.

Yes, but ideally a property drawer would only be defined by its
location in a document and the use of :properties: to start the drawer
rather than also be defined by the well-formedness of its
contents. This would mean that we would have regular drawers, property
drawers, and malformed property drawers that were recognizable by the
parser. I have a sense that org-lint may already be doing this?

> > +PLANNING must directly follow HEADING without any blank lines in between.
> > +
> > + [fn::Need a spec for how to handle multiple instances of the same keyword with different values.]
>
> The last one wins (as in org-element-planning-parser)

Perfect.

> How would you define entities object then? First/second pass is an
> implementation detail. Our current description follows how org-element
> handles entities.

At the level of the syntax there is no pure entity object. At the
level of semantics (deeper pass) there is. My objective here is to
create a syntax that is invariant to a long and changeable list of
entities. Imagine that a user wants to add a new custom entity, they
need to be able to do that without changing org syntax and in the
laundry case having to recompile the whole parser.

One way that I think about the distinction is that the syntax is the
subset of things that you cannot change at runtime. Of course in emacs
you can change almost everything at runtime so by convention we have
to pick which things we declare to be part of an immutable concrete
syntax.

With that context, the way I would define entities is as
entity-fragment objects where the name is contained in the entities
list. Note that this could lead to a slight change of interpretation
for something like \alpha[] which needs to be explored. I did some
experiments with it but don't remember the results.

> While I am not opposing the idea, your principle is not followed by
> org-element parser. We may consider changing it, but it is again a whole
> separate discussion where we need to consider pros and cons.

I agree this is one of the deeper discussions that we need to have in
a separate place (consolidating some of my earlier points from the
thread on intra-word markup). I'm happy to work on the changes to org
element to make this possible.

> Do not look at font-locking. You can safely consider that fontification
> is wrong in all non-trivial cases. Always check org-element-at-point and
> org-element-context.

We are in agreement here. This was more of a note for me to check back
in on the behavior because my brain thought that [fn::asdf] could not
start a line but that may not be correct.

> I am not sure if it is needed. We can already to \vert

This should be a side thread, likely started by a working
implementation.Some immediate thoughts are recorded here.

\vert breaks cases where you want the table to also be data, for
example I wanted to create a table that had various syntactic elements
such as =|= in cells and rows and I wanted to be able to ctrl-f for
=|= in the table. \vert breaks this case and it is quite confusing if
you need the exact character for clarity in developer
documentation. Here is an example of the table and me trying with
macros to work around the issue
https://github.com/tgbugs/sxpyr/blob/master/docs/sexp.org#reading-behavior

There is an additional point here which is that the restriction on =|=
has nothing to do with surface syntax at all in the elisp
implementation due to the order in which macros are resolved relative
to table elements. Clarifying how macros interact (or hopefully do not
interact) with other parts of syntax should probably be included at
some point.

> That would be welcome, but someone™ should implement timezone support in
> Elisp level. We have several discussions about this in the past.

Definitely on my list. I have the proposed extensions implemented in
laundry that I can use as a guide.

> That's not accurate. you cannot nest, say, bold inside bold. You cannot
> put code inside any other markup freely: consider *bold =asd*asd= not bold*

I think it is accurate. I've tested this fairly extensively for my
laundry implementation to match the org export behavior. Arbitrary
nesting of those 4 is supported and the other 2 can be at the bottom
of any level.

I see *bold =asd*asd= bold* for ox-html/ox-latex and for font locking.

You can also have ******bold****** and it renders the same as *bold*.

Consider these monstrosities as well:
 *b /i _u +s =v /*_+lol+_*/= ~c /*_+lol+_*/~ s+ u_ i/ b*
 */_+bius+_ _+bius+_ bi/*

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Org Syntax Specification
  2022-01-19  1:22     ` Tom Gillespie
@ 2022-01-19 11:58       ` Ihor Radchenko
  0 siblings, 0 replies; 22+ messages in thread
From: Ihor Radchenko @ 2022-01-19 11:58 UTC (permalink / raw)
  To: Tom Gillespie; +Cc: org-mode-email, Nicolas Goaziou, Timothy

Tom Gillespie <tgbugs@gmail.com> writes:

> 3. When I say grammar in this context I mean specifically an eBNF that
>    generates a LALR(1) or LR(1) parser. This is narrower than the
>    definition used in the document, which includes things that have to
>    be implemented in the tokenizer, or in a pass after the grammar has
>    been applied, or are related to some other aspect beyond the pure
>    surface syntax.

I feel that we should not be trying to fit into LR at the expense of
complicating the document. When looking at earlier versions of the
grammar, I mostly had GLR in mind.

> In my thinking I separate the context sensitive nature of parsing from
> the nesting structure of the resulting sexpressions, org elements,
> etc.The most obvious example of this is that the sexpression
> representation for headings nests based on the level of the heading,
> but heading level cannot be determined by the grammar so it must be
> reconstructed from a flat sequence of headings that have varying level.

1. I think that results sexpression is important to describe. We
   eventually plan to provide a reference test set to verify external
   parsers against org-element.el [1]. It is important to describe the
   nesting with this consideration.

2. You actually can determine the end of heading if you are allowed to
   do lookaheads (which is anyway necessary to parse
   #+begin_blah..#+end_blah). The end of current heading is
   "eof|^\*{,N-current-heading} "

[2] https://list.orgmode.org/spmq6a$2s5$1@ciao.gmane.io/T/#t

> ... I think the
> other issue I was having here is that the spec for tables is spread
> allover the place, and it would be much easier to understand and
> implement ifit were all in one place.

That sounds fine for me. Though your next suggestion appears to be
exactly opposite:

> I think your version is quite a bit more readable.  Can we list the
> set of all the elements that can be ended by a new lineas well as
> those that cannot (iirc they are elements such as footnotes that can
> only be ended by a double blank line or a heading)?

The intention behind listing the exceptions for table cells was exactly
as you thinking about open-ended elements. 

>> I am not sure here. Inline tasks are special because a one-line inline
>> task must not contain any text below, cannot have planning or
>> properties.
>
> Then they are no longer inline tasks, but instead parse as headings, correct?

They are still inline tasks. Consider the below example:

* Normal heading

Paragraph
************************************************** Inline task
SCHEDULED: <2022-01-19> <- this is an ordinary paragraph, not a part of inline task
Continuing "SCHEDULED" paragraph, not a part of inline task

* Next heading

The parsed sexp will be
(heading
  (paragraph)
  (inlinetask)
  (paragraph))
(heading)

>> If we mention this, we also need to elaborate kind of element is
>> #+todo:, where it can be located, and how to parse multiple instances of
>> #+todo in the document.
>
> Yes. What I have written for laundry is that only #+todo: declarations
> that appear in the zeroth section will be applied (this is true for
> all document level configuration keywords). There is also a
> possibility that we might be able to support including #+todo:
> keywords (and #+link: definitions or similar) in further sections, but
> that they would only apply to headings that occur after that line in
> the file. Such behavior is likely to be confusing to users so probably
> best to only guarantee correct behavior if they are put in the zeroth
> section.
>
> The reason it is confusing/problematic is that there could be
> a #+todo: buried half way down a file, the buffer configuration is
> updated, and then a user can use keywords up the file in the elisp
> implementation. Another implementation that parses a file
> incrementally would not encounter the buried #+todo: keyword until
> after they have already emitted a heading,changing how a heading is
> parsed. There is a similar issue with the #+link: keyword.

That's why it was initially not included into the syntax document. If we
fall into this rabbit hole, we also need to describe things like
CATEGORY, PROPERTY, OPTIONS, PRIORITIES, PROPERTY, SEQ_TODO, STARTUP,
TYP_TODO, etc.

>> > +All content following a heading that appears before the next heading
>> > +(regardless of the level of that next heading) is a section.
>>
>> Note that it is not true for one-line inline tasks.
>
> I'm not quite sure which part you are referring to here.

I only left the relevant part this time. Also, see the example above.
Inline task only consists of a single line. Nothing below is a part of
it.

> Let's look into how much work it will be and how disruptive it might
> be?  We are already changing to heading in the elisp so maybe now
> would be a good time to also change from section to segment?
> Alternatively we could start by updating the documentation and include
> a note that segments are currently called sections by org element?

Let's continue this in the new thread dedicated to renaming
section->segment.

> I've since come around on this. I think that we can make it consistent
> by thinking of the zeroth section as an invisible heading with zero
> asterisks at the start of a file. This is extremely useful for making
> org-transclusion work transparently with whole files. The only
> modification that I might suggest in the context of org-transclusion
> would be to disallow empty lines before the property drawer. This
> allows files to represent single sections (segments) which might be
> very useful for implementations that want to store sections in a
> database or something like that.

Again, lets move this to separate thread.

>> I generally support this idea. Handling keywords in org-element is not
>> pretty. Having them in the parse tree would make things easier. However,
>> we again need to consider back-compatibility. I can imagine third-party
>> ox-* packages breaking if we make this change - we should double check
>> if we decide to change this.
>
> I'm happy to put in the time to submit code fixes for consumers of the
> API so we can make this change. I have usually limited my thinking
> about compatibility concerts to the document syntax and semantics but
> this made me realize that in terms of actual labor the API consumers
> are likely to be affected as well.

This is not as easy as just submitting patches... Anyway, lets move this
to separate thread.

>> Yes, it is saner. However, our syntax document is supposed to be
>> human-readable description of what org-element does. We cannot introduce
>> differences between grammar document and de-facto parser implementation.
>> This will defeat the purpose to providing reference syntax - we will get
>> inconsistency between Emacs Org mode and external parsers.
>
> To achieve this can we have an implementation note for org element
> specifically? There shouldn't be any divergence between
> implementations if we get the abstract variant of this specified
> correctly, where correctly means "exactly matches org-element
> behavior."

If you refer to restructuring the syntax document without introducing
divergence with org-element, I am fine with such improvements. We
already tried something somewhat similar by referring to Elisp variables
in some cases.

> Another note that I think this difference is arising because I'm using
> a narrower definition for what counts as syntax while still wanting to
> specify that the resulting transformed ast should be the same.
>
> I think it could make the document more useful if we have examples of
> how to get to the same endpoint with slightly different decisions
> about surface syntax.

Sounds reasonable. The only thing I fear about is making the document
too long. Of course, we can always put things in appendices if
necessary.

> One final note here is that part of my objective in this was to
> simplify the org-element implementation while opening the possibility
> for user defined keyword behavior.

I am not sure what you refer to.

>> Both :END: and :end: are supported by Org parser. What do you mean by
>> legacy?
>
> I seem to recall a statement that things like #+BEGIN_SRC and friends
> being retained for legacy support. This is also related to a
> standardization conversation which we aren't quite ready to have,
> which is that for things like :end: and :END: the lowercase version is
> the "canonical" representation when normalizing a document (related to
> being able to specify levels of conformance for an org parser, namely
> that there is a level that would only accept fully normalized
> documents that i.e. use :end: and not :END:). The elisp implementation
> of course supports :END:, but I don't recall whether it falls into the
> same category as #+BEGIN_SRC being on legacy support and #+begin_src
> being the preferred version.

AFAIK, org-element is case-insensitive by default. Majority of
discussions related to this topic are revolving around case of
auto-inserted Org elements.

>> I disagree. inilinetasks are a part of syntax de facto and they can be
>> encountered in Org documents in the wild. If you treat inlinetasks as
>> ordinary headings, things may be broken unpredictably during parsing.
>
> This comment in particular was about whether we talk about things
> beyond the surface syntax in this document and/or whether we move them
> to a section on semantics and transformations that are deeper than the
> surface syntax. I'm fine to keep this section in the document, but we
> should make it clear that it is not part of the surface syntax (this
> is also related to my question about property drawers and planning
> following an inline task being parsed as a heading above).

I afraid that I cannot understand clearly what you refer to when saying
surface syntax vs. semantics.

However, inlinetasks are different from headlines, despite being
sufficiently similar to create confusion. Probably Org is too good in
supporting inlinetasks and headings as if they are the same.

> I'm using the term syntax very narrowly here to refer specifically to
> the pure surface syntax. Inline tasks don't introduce any novel
> restrictions on syntax so they don't have to be implemented as part of
> the surface syntax, they are a reinterpretation of a headings and
> otherwise follow all the usual rules such as not allowing new headings
> inside them etc.

As I mentioned earlier, inlinetasks do not always include everything
until next heading/inlinetask as their section.

> The reason I bring this up is because when implementing an org parser
> we would like to communicate to developers which parts of this
> document should be implemented directly in the parser and which ones
> should be deferred to a later step. Inlinetasks are a good example of
> this because they are entirely consistent with regular old org syntax
> for headings, and can be implemented as a transformation on the ast
> for headings that have a level that is deeper than the inlinetask min
> level.

I am not sure what later step you are referring to.

> Said another way, we want to communicate that trying to introduce a
> node in an eBNF grammar for inline tasks is not a good idea because it
> makes org syntax extremely non-regular and breaks countless use cases
> that need nesting of headings beyond the inlinetask min level.

Do you mean that you imagine the first parsing step to be eBNF grammar?
Why so?

>> Could you elaborate why grammars cannot track the indentation level?
>> AFAIU, If it were the case, python would not be parseable.
>
> Python maintains a separate stack for handling leading whitespace.
> https://docs.python.org/3/reference/lexical_analysis.html#indentation
> Thus it is effectively tracked as part of the tokenizer which goes on
> to emit the indent and dedent tokens. However Org cannot take this
> approach because it allows much more permissive use of leading
> whitespace and in plain lists deals with a minimum deindent relative
> to the bullet which may itself be arbitrarily indented. I think I
> might be able to implement a stack that could track deinents like that
> in the tokenizer but I'm not 100% sure.
>
> Regardless, my (perhaps overly technical point) is that it is not
> something that can be done in the grammar, it must be done in the
> tokenizer, and the tokenizer would have to emit a control token that
> maps to the space between two characters in order for the deinent to
> be usable by the grammar.

AFAIK, tokenizer is just a part of the parser. It may or may not be
separate from the grammar. AFAIU, lookahead grammars can be imagined as
using tokenizer under the hood.

>> Yet, it is exactly what happens in Org. malformed property drawers will
>> become ordinary drawers.
>
> Yes, but ideally a property drawer would only be defined by its
> location in a document and the use of :properties: to start the drawer
> rather than also be defined by the well-formedness of its
> contents. This would mean that we would have regular drawers, property
> drawers, and malformed property drawers that were recognizable by the
> parser. I have a sense that org-lint may already be doing this?

Org syntax is permissive. It can always be parsed without errors.
org-lint is merely catching common unwanted mistakes. I view org-lint as
an addition to grammar. Making linter a part of grammar will complicate
things even more than what we have now.

>> How would you define entities object then? First/second pass is an
>> implementation detail. Our current description follows how org-element
>> handles entities.
>
> At the level of the syntax there is no pure entity object. At the
> level of semantics (deeper pass) there is. My objective here is to
> create a syntax that is invariant to a long and changeable list of
> entities. Imagine that a user wants to add a new custom entity, they
> need to be able to do that without changing org syntax and in the
> laundry case having to recompile the whole parser.
>
> One way that I think about the distinction is that the syntax is the
> subset of things that you cannot change at runtime. Of course in emacs
> you can change almost everything at runtime so by convention we have
> to pick which things we declare to be part of an immutable concrete
> syntax.
>
> With that context, the way I would define entities is as
> entity-fragment objects where the name is contained in the entities
> list. Note that this could lead to a slight change of interpretation
> for something like \alpha[] which needs to be explored. I did some
> experiments with it but don't remember the results.

AFAIK, the current version of the syntax document is trying hard to
restrict itself to fixed grammar that does not change at runtime. That's
why we provide default values of runtime-customizeable variables.
Generalisation entities syntax will require change in org-element parser
and should better be discussed in separate thread.

>> I am not sure if it is needed. We can already to \vert
>
> This should be a side thread, likely started by a working
> implementation.Some immediate thoughts are recorded here.
>
> \vert breaks cases where you want the table to also be data, for
> example I wanted to create a table that had various syntactic elements
> such as =|= in cells and rows and I wanted to be able to ctrl-f for
> =|= in the table. \vert breaks this case and it is quite confusing if
> you need the exact character for clarity in developer
> documentation. Here is an example of the table and me trying with
> macros to work around the issue
> https://github.com/tgbugs/sxpyr/blob/master/docs/sexp.org#reading-behavior
>
> There is an additional point here which is that the restriction on =|=
> has nothing to do with surface syntax at all in the elisp
> implementation due to the order in which macros are resolved relative
> to table elements. Clarifying how macros interact (or hopefully do not
> interact) with other parts of syntax should probably be included at
> some point.

Sounds reasonable and it is also not covered by our escaping mechanisms
in Org. So, lets discuss it in a separate thread.

>> That's not accurate. you cannot nest, say, bold inside bold. You cannot
>> put code inside any other markup freely: consider *bold =asd*asd= not bold*
>
> I think it is accurate. I've tested this fairly extensively for my
> laundry implementation to match the org export behavior. Arbitrary
> nesting of those 4 is supported and the other 2 can be at the bottom
> of any level.
>
> I see *bold =asd*asd= bold* for ox-html/ox-latex and for font locking.

Sorry, my example was wrong. I was referring to
 *bold =asd* asd= bold*

> You can also have ******bold****** and it renders the same as *bold*.

Yes. They key word is "renders". The actual bold object has all the
inner * chars.

> Consider these monstrosities as well:
>  *b /i _u +s =v /*_+lol+_*/= ~c /*_+lol+_*/~ s+ u_ i/ b*
>  */_+bius+_ _+bius+_ bi/*

To clarify, Org does support emphasis nesting as long as that emphasis
does not intersect and as long as the same type of emphasis is not
nested inside.

Best,
Ihor


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Org Syntax Specification
  2022-01-09 18:02 Org Syntax Specification Timothy
  2022-01-15 12:40 ` Sébastien Miquel
  2022-01-18  0:54 ` Org Syntax Specification Tom Gillespie
@ 2022-09-25  9:09 ` Bastien
  2022-09-25 21:28   ` Rohit Patnaik
                     ` (2 more replies)
  2 siblings, 3 replies; 22+ messages in thread
From: Bastien @ 2022-09-25  9:09 UTC (permalink / raw)
  To: Timothy; +Cc: org-mode-email, mail

Hi Timothy,

I'm late to the party, but *thanks* for these important improvements
on the https://orgmode.org/worg/dev/org-syntax.html page!

A few suggestions:

- Make it a description of the syntax of the latest stable Org.  (For
  now let's consider 9.6 to be the latest stable as we are working on
  releasing it soon.)  Perhaps this is already the case and I missed
  it?

- Remove the "draft" status ("DRAFT v2β"). Don't describe it as a
  draft in the org-manual.org if it accurately reflects the current
  syntax (current = latest stable).

- Remove all the inline notes (some suggest changes in Org's grammar,
  that might scare the readers a bit.)

- Promote the page to orgmode.org/worg/org-syntax.html: the /dev/ path
  in the current URL makes it read like it is the syntax for the "dev"
  version.

What do you think?

-- 
 Bastien


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Org Syntax Specification
  2022-09-25  9:09 ` Bastien
@ 2022-09-25 21:28   ` Rohit Patnaik
  2022-11-26  2:41   ` Ihor Radchenko
  2022-11-26  6:05   ` Ihor Radchenko
  2 siblings, 0 replies; 22+ messages in thread
From: Rohit Patnaik @ 2022-09-25 21:28 UTC (permalink / raw)
  To: Bastien; +Cc: emacs-orgmode

I also want to chip in with a thank-you for the org syntax specification page. As someone who's working on a custom org exporter, this is a very useful resource for finding out how elements are structured within org-mode.

Thanks,
Rohit

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Org Syntax Specification
  2022-09-25  9:09 ` Bastien
  2022-09-25 21:28   ` Rohit Patnaik
@ 2022-11-26  2:41   ` Ihor Radchenko
  2022-11-26  6:24     ` Bastien
  2022-11-26  6:05   ` Ihor Radchenko
  2 siblings, 1 reply; 22+ messages in thread
From: Ihor Radchenko @ 2022-11-26  2:41 UTC (permalink / raw)
  To: Bastien; +Cc: Timothy, org-mode-email, mail

Bastien <bzg@gnu.org> writes:

> - Promote the page to orgmode.org/worg/org-syntax.html: the /dev/ path
>   in the current URL makes it read like it is the syntax for the "dev"
>   version.

I think we need to use rewrite rule here in addition to moving the file.
If we simply move the file, old links will be broken.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Org Syntax Specification
  2022-09-25  9:09 ` Bastien
  2022-09-25 21:28   ` Rohit Patnaik
  2022-11-26  2:41   ` Ihor Radchenko
@ 2022-11-26  6:05   ` Ihor Radchenko
  2 siblings, 0 replies; 22+ messages in thread
From: Ihor Radchenko @ 2022-11-26  6:05 UTC (permalink / raw)
  To: Bastien; +Cc: Timothy, org-mode-email, mail

Bastien <bzg@gnu.org> writes:

> A few suggestions:
>
> - Make it a description of the syntax of the latest stable Org.  (For
>   now let's consider 9.6 to be the latest stable as we are working on
>   releasing it soon.)  Perhaps this is already the case and I missed
>   it?

Yes, it should be consistent with the latest Org. (We tried our best to
make it so). We also kept some things a bit more generic for forward
compatibility.

> - Remove the "draft" status ("DRAFT v2β"). Don't describe it as a
>   draft in the org-manual.org if it accurately reflects the current
>   syntax (current = latest stable).

> - Remove all the inline notes (some suggest changes in Org's grammar,
>   that might scare the readers a bit.)

See https://orgmode.org/list/87ilj2l0cy.fsf@localhost

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Org Syntax Specification
  2022-11-26  2:41   ` Ihor Radchenko
@ 2022-11-26  6:24     ` Bastien
  0 siblings, 0 replies; 22+ messages in thread
From: Bastien @ 2022-11-26  6:24 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: Timothy, org-mode-email, mail

Ihor Radchenko <yantar92@posteo.net> writes:

> I think we need to use rewrite rule here in addition to moving the file.
> If we simply move the file, old links will be broken.

Done now, thanks.

-- 
 Bastien


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-11-26  6:25 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-09 18:02 Org Syntax Specification Timothy
2022-01-15 12:40 ` Sébastien Miquel
2022-01-15 16:36   ` Depreciating TeX-style LaTeX fragments (was: Org Syntax Specification) Timothy
2022-01-16  8:08     ` Sébastien Miquel
2022-01-16  9:23       ` Depreciating TeX-style LaTeX fragments Martin Steffen
2022-01-16  9:46       ` Colin Baxter 😺
2022-01-16 11:11         ` Tim Cross
2022-01-16 13:26         ` Juan Manuel Macías
2022-01-16 14:43           ` Colin Baxter 😺
2022-01-16 15:16             ` Greg Minshall
2022-01-16 17:45         ` Rudolf Adamkovič
2022-01-16 12:10     ` Eric S Fraga
2022-01-16 14:30       ` Anthony Cowley
2022-01-18  0:54 ` Org Syntax Specification Tom Gillespie
2022-01-18 12:09   ` Ihor Radchenko
2022-01-19  1:22     ` Tom Gillespie
2022-01-19 11:58       ` Ihor Radchenko
2022-09-25  9:09 ` Bastien
2022-09-25 21:28   ` Rohit Patnaik
2022-11-26  2:41   ` Ihor Radchenko
2022-11-26  6:24     ` Bastien
2022-11-26  6:05   ` Ihor Radchenko

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).