emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Tom Gillespie <tgbugs@gmail.com>
To: Ken Mankoff <mankoff@gmail.com>
Cc: Bastien <bzg@gnu.org>, "Przemysław Kamiński" <pk@intrepidus.pl>,
	emacs-orgmode <emacs-orgmode@gnu.org>,
	"Nicolas Goaziou" <mail@nicolasgoaziou.fr>
Subject: Re: official orgmode parser
Date: Mon, 26 Oct 2020 17:00:17 -0400	[thread overview]
Message-ID: <CA+G3_POWNK6aypPhUrR7cyLsnVkZayo66nXfrmW1jD7Pa=oeSw@mail.gmail.com> (raw)
In-Reply-To: <874kmgka3l.fsf@gmail.com>

Here is an attempt to clarify my own confusion around the nested
structures in org. In short: each node in the headline tree and the
plain list tree can be parse using the EBNF, the nesting level cannot,
which means that certain useful operations such as folding, require
additional rules beyond the grammar. More in line. Best!
Tom

> Do you need to? This is valid as an entire Org file, I think:
>
> *** foo
> * bar
> ***** baz
>
> And that can be represented in EBNF. I'm not aware of places where behavior is indent-level specific, except inline tasks, and that edge case can be represented.

You are correct, and as long as the heading depth doesn't change some
interpretation then this is a non-issue. The reason I mentioned this
though is
because it means that you cannot determine how to correctly fold an
org file from the grammar alone.

To make sure I understand. It is possible to determine the number of
leading stars (and thus the level), but I think that it is not
possible to identify the end of a section.
For example

* a
*** b
** c
* d

You can parse out a 1, b 3, c 2, d 1, but if you want to be able to
nest b and c inside a but not nest d inside a, then you need a stack
in there somewhere. You
can't have a rule such as

section : headline content
content : text | section

because the parse would incorrectly nest sections at the same level,
you would have to write

section-level-1 : headline-1 content-1
content-1 : text | section-level-2-n

but since we have an arbitrary number of levels the grammar would have
to be infinite.
This is only if you want your grammar to be able to encode that the
content of sections
can include other more deeply nested sections, which in this context
we almost certainly
do not (as you point out).

> > There is a similar issue with the indentation level in
> > order to correctly interpret plain lists.
>
> list ::= ('+' string newline)+ sublist?
> sublist ::= (indent list)+
>
> I think this captures lists?

Ah yes, I see my mistake here. In order for this to work the parser
has to implement significant whitespace,
so whitespace cannot be parsed into a single token. I think everything
works out after that.

> Definitely not able to be represented in EBNF, unless as you say {name} is a limited vocabulary.

Darn those pesky open sets!


  reply	other threads:[~2020-10-26 21:01 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-15  7:58 Przemysław Kamiński
2020-09-15  8:44 ` Gerry Agbobada
2020-09-16 16:36   ` Matt Huszagh
2020-09-23  8:09   ` Bastien
2020-09-15  9:03 ` Tim Cross
2020-09-15  9:17   ` Przemysław Kamiński
2020-09-15  9:55     ` Russell Adams
2020-09-15 11:15       ` Przemysław Kamiński
2020-09-15 12:37         ` tomas
2020-09-15 18:09           ` Diego Zamboni
2020-09-16 12:09           ` Przemysław Kamiński
2020-09-16 12:20             ` tomas
2020-09-16 12:27             ` Ihor Radchenko
2020-09-16  0:16     ` Tim Cross
2020-09-16  7:24     ` Marcin Borkowski
2020-09-16  7:56       ` Ihor Radchenko
2020-09-16 11:36         ` Przemysław Kamiński
2020-09-16 12:02           ` Ihor Radchenko
2020-09-16 12:15             ` Przemysław Kamiński
2020-09-17  1:18               ` Ihor Radchenko
2020-09-17 15:24                 ` Przemysław Kamiński
2020-09-23  8:09 ` Bastien
2020-09-23 17:46   ` Przemysław Kamiński
2020-09-23 19:50     ` rey-coyrehourcq
2020-11-11  8:58       ` Bastien
2020-10-24 21:12   ` Daniele Nicolodi
2020-10-24 21:35     ` Tom Gillespie
2020-11-11  9:13       ` Bastien
2020-11-12 17:14         ` Tom Gillespie
2020-11-11  9:15     ` Bastien
2020-11-11 13:05       ` Daniele Nicolodi
2020-11-28 19:19       ` Gerry Agbobada
2020-10-26 11:23   ` Ken Mankoff
2020-10-26 14:21     ` Nicolas Goaziou
2020-10-26 16:17       ` Ken Mankoff
2020-10-26 16:24         ` Nicolas Goaziou
2020-10-26 16:47           ` Ken Mankoff
2020-10-26 17:59             ` Tom Gillespie
2020-10-26 20:26               ` Ken Mankoff
2020-10-26 21:00                 ` Tom Gillespie [this message]
2020-10-26 21:37                   ` Ken Mankoff
2020-10-26 22:19                     ` Tom Gillespie
2020-10-27  5:42                   ` Przemysław Kamiński
2020-11-11  8:59             ` Bastien
2020-11-11  9:00         ` Bastien

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+G3_POWNK6aypPhUrR7cyLsnVkZayo66nXfrmW1jD7Pa=oeSw@mail.gmail.com' \
    --to=tgbugs@gmail.com \
    --cc=bzg@gnu.org \
    --cc=emacs-orgmode@gnu.org \
    --cc=mail@nicolasgoaziou.fr \
    --cc=mankoff@gmail.com \
    --cc=pk@intrepidus.pl \
    --subject='Re: official orgmode parser' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Code repositories for project(s) associated with this inbox:

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).