From: "Przemysław Kamiński" <pk@intrepidus.pl>
To: emacs-orgmode@gnu.org
Subject: Re: official orgmode parser
Date: Tue, 27 Oct 2020 06:42:06 +0100 [thread overview]
Message-ID: <bf73a8c6-0493-40ba-53b3-05d205928842@intrepidus.pl> (raw)
In-Reply-To: <CA+G3_POWNK6aypPhUrR7cyLsnVkZayo66nXfrmW1jD7Pa=oeSw@mail.gmail.com>
I'm no expert in parsing but I would expect org's parser to be quite
similar to the multitude of markdown or CommonMark [1] parsers. There
isn't that much difference in syntax, except maybe org is more versatile
and has more syntax elements, like drawers.
Searching for "EBNF Markdown" I stumbled upon [2].
[1] https://commonmark.org/
[2] http://roopc.net/posts/2014/markdown-cfg/
On 10/26/20 10:00 PM, Tom Gillespie wrote:
> Here is an attempt to clarify my own confusion around the nested
> structures in org. In short: each node in the headline tree and the
> plain list tree can be parse using the EBNF, the nesting level cannot,
> which means that certain useful operations such as folding, require
> additional rules beyond the grammar. More in line. Best!
> Tom
>
>> Do you need to? This is valid as an entire Org file, I think:
>>
>> *** foo
>> * bar
>> ***** baz
>>
>> And that can be represented in EBNF. I'm not aware of places where behavior is indent-level specific, except inline tasks, and that edge case can be represented.
>
> You are correct, and as long as the heading depth doesn't change some
> interpretation then this is a non-issue. The reason I mentioned this
> though is
> because it means that you cannot determine how to correctly fold an
> org file from the grammar alone.
>
> To make sure I understand. It is possible to determine the number of
> leading stars (and thus the level), but I think that it is not
> possible to identify the end of a section.
> For example
>
> * a
> *** b
> ** c
> * d
>
> You can parse out a 1, b 3, c 2, d 1, but if you want to be able to
> nest b and c inside a but not nest d inside a, then you need a stack
> in there somewhere. You
> can't have a rule such as
>
> section : headline content
> content : text | section
>
> because the parse would incorrectly nest sections at the same level,
> you would have to write
>
> section-level-1 : headline-1 content-1
> content-1 : text | section-level-2-n
>
> but since we have an arbitrary number of levels the grammar would have
> to be infinite.
> This is only if you want your grammar to be able to encode that the
> content of sections
> can include other more deeply nested sections, which in this context
> we almost certainly
> do not (as you point out).
>
>>> There is a similar issue with the indentation level in
>>> order to correctly interpret plain lists.
>>
>> list ::= ('+' string newline)+ sublist?
>> sublist ::= (indent list)+
>>
>> I think this captures lists?
>
> Ah yes, I see my mistake here. In order for this to work the parser
> has to implement significant whitespace,
> so whitespace cannot be parsed into a single token. I think everything
> works out after that.
>
>> Definitely not able to be represented in EBNF, unless as you say {name} is a limited vocabulary.
>
> Darn those pesky open sets!
>
next prev parent reply other threads:[~2020-10-27 5:44 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-15 7:58 official orgmode parser Przemysław Kamiński
2020-09-15 8:44 ` Gerry Agbobada
2020-09-16 16:36 ` Matt Huszagh
2020-09-23 8:09 ` Bastien
2020-09-15 9:03 ` Tim Cross
2020-09-15 9:17 ` Przemysław Kamiński
2020-09-15 9:55 ` Russell Adams
2020-09-15 11:15 ` Przemysław Kamiński
2020-09-15 12:37 ` tomas
2020-09-15 18:09 ` Diego Zamboni
2020-09-16 12:09 ` Przemysław Kamiński
2020-09-16 12:20 ` tomas
2020-09-16 12:27 ` Ihor Radchenko
2020-09-16 0:16 ` Tim Cross
2020-09-16 7:24 ` Marcin Borkowski
2020-09-16 7:56 ` Ihor Radchenko
2020-09-16 11:36 ` Przemysław Kamiński
2020-09-16 12:02 ` Ihor Radchenko
2020-09-16 12:15 ` Przemysław Kamiński
2020-09-17 1:18 ` Ihor Radchenko
2020-09-17 15:24 ` Przemysław Kamiński
2020-09-23 8:09 ` Bastien
2020-09-23 17:46 ` Przemysław Kamiński
2020-09-23 19:50 ` rey-coyrehourcq
2020-11-11 8:58 ` Bastien
2020-10-24 21:12 ` Daniele Nicolodi
2020-10-24 21:35 ` Tom Gillespie
2020-11-11 9:13 ` Bastien
2020-11-12 17:14 ` Tom Gillespie
2020-11-11 9:15 ` Bastien
2020-11-11 13:05 ` Daniele Nicolodi
2020-11-28 19:19 ` Gerry Agbobada
2020-10-26 11:23 ` Ken Mankoff
2020-10-26 14:21 ` Nicolas Goaziou
2020-10-26 16:17 ` Ken Mankoff
2020-10-26 16:24 ` Nicolas Goaziou
2020-10-26 16:47 ` Ken Mankoff
2020-10-26 17:59 ` Tom Gillespie
2020-10-26 20:26 ` Ken Mankoff
2020-10-26 21:00 ` Tom Gillespie
2020-10-26 21:37 ` Ken Mankoff
2020-10-26 22:19 ` Tom Gillespie
2020-10-27 5:42 ` Przemysław Kamiński [this message]
2020-11-11 8:59 ` Bastien
2020-11-11 9:00 ` Bastien
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.orgmode.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bf73a8c6-0493-40ba-53b3-05d205928842@intrepidus.pl \
--to=pk@intrepidus.pl \
--cc=emacs-orgmode@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).