emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: "Przemysław Kamiński" <pk@intrepidus.pl>
To: emacs-orgmode@gnu.org
Subject: Re: official orgmode parser
Date: Tue, 27 Oct 2020 06:42:06 +0100	[thread overview]
Message-ID: <bf73a8c6-0493-40ba-53b3-05d205928842@intrepidus.pl> (raw)
In-Reply-To: <CA+G3_POWNK6aypPhUrR7cyLsnVkZayo66nXfrmW1jD7Pa=oeSw@mail.gmail.com>

I'm no expert in parsing but I would expect org's parser to be quite 
similar to the multitude of markdown or CommonMark [1] parsers. There 
isn't that much difference in syntax, except maybe org is more versatile 
and has more syntax elements, like drawers.

Searching for "EBNF Markdown" I stumbled upon [2].

[1] https://commonmark.org/
[2] http://roopc.net/posts/2014/markdown-cfg/

On 10/26/20 10:00 PM, Tom Gillespie wrote:
> Here is an attempt to clarify my own confusion around the nested
> structures in org. In short: each node in the headline tree and the
> plain list tree can be parse using the EBNF, the nesting level cannot,
> which means that certain useful operations such as folding, require
> additional rules beyond the grammar. More in line. Best!
> Tom
> 
>> Do you need to? This is valid as an entire Org file, I think:
>>
>> *** foo
>> * bar
>> ***** baz
>>
>> And that can be represented in EBNF. I'm not aware of places where behavior is indent-level specific, except inline tasks, and that edge case can be represented.
> 
> You are correct, and as long as the heading depth doesn't change some
> interpretation then this is a non-issue. The reason I mentioned this
> though is
> because it means that you cannot determine how to correctly fold an
> org file from the grammar alone.
> 
> To make sure I understand. It is possible to determine the number of
> leading stars (and thus the level), but I think that it is not
> possible to identify the end of a section.
> For example
> 
> * a
> *** b
> ** c
> * d
> 
> You can parse out a 1, b 3, c 2, d 1, but if you want to be able to
> nest b and c inside a but not nest d inside a, then you need a stack
> in there somewhere. You
> can't have a rule such as
> 
> section : headline content
> content : text | section
> 
> because the parse would incorrectly nest sections at the same level,
> you would have to write
> 
> section-level-1 : headline-1 content-1
> content-1 : text | section-level-2-n
> 
> but since we have an arbitrary number of levels the grammar would have
> to be infinite.
> This is only if you want your grammar to be able to encode that the
> content of sections
> can include other more deeply nested sections, which in this context
> we almost certainly
> do not (as you point out).
> 
>>> There is a similar issue with the indentation level in
>>> order to correctly interpret plain lists.
>>
>> list ::= ('+' string newline)+ sublist?
>> sublist ::= (indent list)+
>>
>> I think this captures lists?
> 
> Ah yes, I see my mistake here. In order for this to work the parser
> has to implement significant whitespace,
> so whitespace cannot be parsed into a single token. I think everything
> works out after that.
> 
>> Definitely not able to be represented in EBNF, unless as you say {name} is a limited vocabulary.
> 
> Darn those pesky open sets!
> 



  parent reply	other threads:[~2020-10-27  5:44 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-15  7:58 Przemysław Kamiński
2020-09-15  8:44 ` Gerry Agbobada
2020-09-16 16:36   ` Matt Huszagh
2020-09-23  8:09   ` Bastien
2020-09-15  9:03 ` Tim Cross
2020-09-15  9:17   ` Przemysław Kamiński
2020-09-15  9:55     ` Russell Adams
2020-09-15 11:15       ` Przemysław Kamiński
2020-09-15 12:37         ` tomas
2020-09-15 18:09           ` Diego Zamboni
2020-09-16 12:09           ` Przemysław Kamiński
2020-09-16 12:20             ` tomas
2020-09-16 12:27             ` Ihor Radchenko
2020-09-16  0:16     ` Tim Cross
2020-09-16  7:24     ` Marcin Borkowski
2020-09-16  7:56       ` Ihor Radchenko
2020-09-16 11:36         ` Przemysław Kamiński
2020-09-16 12:02           ` Ihor Radchenko
2020-09-16 12:15             ` Przemysław Kamiński
2020-09-17  1:18               ` Ihor Radchenko
2020-09-17 15:24                 ` Przemysław Kamiński
2020-09-23  8:09 ` Bastien
2020-09-23 17:46   ` Przemysław Kamiński
2020-09-23 19:50     ` rey-coyrehourcq
2020-11-11  8:58       ` Bastien
2020-10-24 21:12   ` Daniele Nicolodi
2020-10-24 21:35     ` Tom Gillespie
2020-11-11  9:13       ` Bastien
2020-11-12 17:14         ` Tom Gillespie
2020-11-11  9:15     ` Bastien
2020-11-11 13:05       ` Daniele Nicolodi
2020-11-28 19:19       ` Gerry Agbobada
2020-10-26 11:23   ` Ken Mankoff
2020-10-26 14:21     ` Nicolas Goaziou
2020-10-26 16:17       ` Ken Mankoff
2020-10-26 16:24         ` Nicolas Goaziou
2020-10-26 16:47           ` Ken Mankoff
2020-10-26 17:59             ` Tom Gillespie
2020-10-26 20:26               ` Ken Mankoff
2020-10-26 21:00                 ` Tom Gillespie
2020-10-26 21:37                   ` Ken Mankoff
2020-10-26 22:19                     ` Tom Gillespie
2020-10-27  5:42                   ` Przemysław Kamiński [this message]
2020-11-11  8:59             ` Bastien
2020-11-11  9:00         ` Bastien

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bf73a8c6-0493-40ba-53b3-05d205928842@intrepidus.pl \
    --to=pk@intrepidus.pl \
    --cc=emacs-orgmode@gnu.org \
    --subject='Re: official orgmode parser' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Code repositories for project(s) associated with this inbox:

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).