emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* A formal grammar for Org
@ 2021-04-04  6:11 Tom Gillespie
  0 siblings, 0 replies; 7+ messages in thread
From: Tom Gillespie @ 2021-04-04  6:11 UTC (permalink / raw)
  To: emacs-orgmode

Dear all,
   Here is a draft of a formal grammar for Org mode [1]. It is still
in a rough state, despite quite a bit of work. However, following some
changes to improve performance for parsing real (big) Org files, I
think it is time to share it with the community so that we can start
to gather feedback. There are a number of opportunities that I have
found for simplifying the org grammar (sometimes by extending it to
make it more regular, and in the process adding useful features) that
are much easier to understand with this grammar in hand as a
reference. The grammar itself is implemented using Racket's #lang brag
(see [2] for an overview of brag's syntax). I had considered trying to
break it up into literate sections in an Org file, but for now decided
to leave it as a single file to simplify the development workflow. As
a result the full implementation is fairly long [3]. Comments and
feedback would be greatly appreciated. Best!
Tom

1. https://github.com/tgbugs/laundry
2. https://docs.racket-lang.org/brag/#%28part._.The_language%29
3. https://github.com/tgbugs/laundry/blob/master/org-mode/parser.rkt


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: A formal grammar for Org
       [not found] <d354da93-c910-c7c8-8bb5-51c3877575f9@gmail.com>
@ 2021-06-01  9:53 ` Tom Gillespie
  2021-06-01 21:22   ` Jakob Schöttl
  0 siblings, 1 reply; 7+ messages in thread
From: Tom Gillespie @ 2021-06-01  9:53 UTC (permalink / raw)
  To: Jakob Schöttl; +Cc: emacs-orgmode

Hi Jakob,
    Thank you for getting in touch. I had been meaning to after
someone pointed me to your repo in a reddit thread, but you beat me to
it. Replies in line. Best!
Tom

PS ccing this back to the list for the record.

On Tue, Jun 1, 2021 at 1:56 AM Jakob Schöttl <jschoett@gmail.com> wrote:
>
> Hi Tom,
>
> I came to your post at the mailing list from here:
> https://github.com/gagbo/LuaOrgParser/issues/1
> Sorry, I don't know, how I can answer on the mailing list when I don't have received the original mail.

No worries, I never managed to figure that out either so I just
subscribed. Maybe by matching the subject as you do here and ccing the
list (attempting it in this email to see what happens)?

> We have a pretty similar project, org-parser[1]. It's also written in a Lisp dialect, Clojure, but it uses instaparse instead of brag as parser library.

https://github.com/tgbugs/laundry/tree/next#similar-projects I managed
to get it into my README as a reminder to myself to have a thorough
look at it, but have been occupied with other work since then.

> My idea was, to transform the formal grammar to a grammar.js for tree-sitter. It would be so cool, if it could be generated from one formal specification.

Yes, that would be great. It would be a major step to have a couple of
grammars for org that can be used for stuff like this and compared to
each other, along with test cases that we can use to define correct
behavior.

One issue that I don't have a full understanding of at the
moment is how certain ambiguous forms will impact the ability to
transform directly into the tree sitter grammar.

The reason I mention
this is because I have had to move to a two phase parser in order to
deal with ambiguous parses.

Having not looked carefully at your
approach I don't know whether you have encountered similar issues. For
the tree sitter use case in particular I'm not entirely sure that the
ambiguity matters, but I haven't had a chance to look at it yet.

> Do you plan, in your parser, to do a transformation step from the raw parser AST to a higher-level AST? E.g. the raw parser AST would parse a (:date  "2021-06-01") and the transformed AST would transform this to a higher-level timestamp object.

Yes. I already do that to a certain extent in the expander
https://github.com/tgbugs/laundry/blob/next/laundry/expander.rkt (the
raw AST is hard to work with directly), but there will be more. I also
expect that I will add an intermediate step where the AST is
rearranged to account for aspects of org semantics that cannot be
captured by the context free part of the grammar.

After that step there are a number of potential conversions, one of which will
transform the AST into Racket structs, but I haven't made it quite
that far yet. That said, I think that in terms of defining a canonical
parse, I am aiming to do that in the transformed intermediate
s-expression representation because I think it will be easier to
define the correctness of certain user interactions on that form rather than
on the higher level object representation, even if the higher level
objects are ultimately used to actually implement that behavior.

> Do you have any automated tests for your parser?

Yes. See https://github.com/tgbugs/laundry/blob/next/laundry/test.rkt
you can run them from the working directory via =raco test laundry=.
I haven't fully specified the expected AST (and transforms) in most
cases because I'm still hammering out details. In some cases I do
specify the parse that I expect, e.g. for headings I specify when
tags are expected in cases where there might be some ambiguity. If you
are looking for edge cases there are a number that are not yet in the
automated tests but that are in
https://github.com/tgbugs/laundry/blob/next/laundry/cursed.org because
they hit on some cases of extreme ambiguity and internal inconsistency
in the elisp implementation or on weird behavior under user
interaction (I also have some other test cases that haven't been
committed to the repo yet).

It would be great to align the grammars and the behavior using a set
of common test cases.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: A formal grammar for Org
  2021-06-01  9:53 ` A formal grammar for Org Tom Gillespie
@ 2021-06-01 21:22   ` Jakob Schöttl
  2021-06-02  4:00     ` David Masterson
  0 siblings, 1 reply; 7+ messages in thread
From: Jakob Schöttl @ 2021-06-01 21:22 UTC (permalink / raw)
  To: Tom Gillespie; +Cc: emacs-orgmode



Am 01.06.21 um 11:53 schrieb Tom Gillespie:
>
>> We have a pretty similar project, org-parser[1]. It's also written in a Lisp dialect, Clojure, but it uses instaparse instead of brag as parser library.
> https://github.com/tgbugs/laundry/tree/next#similar-projects I managed
> to get it into my README as a reminder to myself to have a thorough
> look at it, but have been occupied with other work since then.
Thanks, I'll also set a link in our README to related work.
>> My idea was, to transform the formal grammar to a grammar.js for tree-sitter. It would be so cool, if it could be generated from one formal specification.
> Yes, that would be great. It would be a major step to have a couple of
> grammars for org that can be used for stuff like this and compared to
> each other, along with test cases that we can use to define correct
> behavior.
Right, that would be interesting. But it requires all parser to yield 
exactly the same structure (to be comparable). I think a design goal of 
org-parser is to provide a easy to use AST but not necessarily a 
100%-match to the AST from org-element.el.

How is it with laundry? Do you try to stick exactly to org modes parse 
result structure?

> One issue that I don't have a full understanding of at the
> moment is how certain ambiguous forms will impact the ability to
> transform directly into the tree sitter grammar.
>
> The reason I mention
> this is because I have had to move to a two phase parser in order to
> deal with ambiguous parses.
We also have two phases: "parse" and "transform" (the latter is 
basically a mapping function transforming nodes of the AST). I also see 
that as a problem for generating grammar.js.

a) For tree-sitter, depending of what we expect from it, it may not be 
necessary, to do the second phase. E.g. for syntax highlighting the 
context free grammar might be enough.

b) Since transformations of org-parser can be compiled to JS, it might 
be possible, to even create the grammar.js as two-phase parser.

>> Do you plan, in your parser, to do a transformation step from the raw parser AST to a higher-level AST? E.g. the raw parser AST would parse a (:date  "2021-06-01") and the transformed AST would transform this to a higher-level timestamp object.
> Yes. I already do that to a certain extent in the expander
> https://github.com/tgbugs/laundry/blob/next/laundry/expander.rkt (the
> raw AST is hard to work with directly), but there will be more. I also
> expect that I will add an intermediate step where the AST is
> rearranged to account for aspects of org semantics that cannot be
> captured by the context free part of the grammar.
>
> After that step there are a number of potential conversions, one of which will
> transform the AST into Racket structs, but I haven't made it quite
> that far yet. That said, I think that in terms of defining a canonical
> parse, I am aiming to do that in the transformed intermediate
> s-expression representation because I think it will be easier to
> define the correctness of certain user interactions on that form rather than
> on the higher level object representation, even if the higher level
> objects are ultimately used to actually implement that behavior.
Interesting. Yeah, because things like timestamps have language-specific 
representations may not be comparable across e.g. emacs lisp, rust, and 
clojure/JS.
>> Do you have any automated tests for your parser?
> Yes. See https://github.com/tgbugs/laundry/blob/next/laundry/test.rkt
> you can run them from the working directory via =raco test laundry=.
Ah, alright, I first didn't see them. Wow.

These parser projects are really a huge amount of work times 4 (grammar, 
transformation, tests, re-export) ^^
>
> It would be great to align the grammars and the behavior using a set
> of common test cases.
If it works out, that our parser have exactly the same resulting 
structure, that would be great. But not sure, if that works out, to be 
honest. At least we can share each others mean test.org files ^^

Best, Jakob


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: A formal grammar for Org
  2021-06-01 21:22   ` Jakob Schöttl
@ 2021-06-02  4:00     ` David Masterson
  2021-06-02  8:22       ` Jakob Schöttl
  0 siblings, 1 reply; 7+ messages in thread
From: David Masterson @ 2021-06-02  4:00 UTC (permalink / raw)
  To: Jakob Schöttl; +Cc: Tom Gillespie, emacs-orgmode

Jakob Schöttl <jschoett@gmail.com> writes:

> Am 01.06.21 um 11:53 schrieb Tom Gillespie:
>>
>>> We have a pretty similar project, org-parser[1]. It's also written
>>> in a Lisp dialect, Clojure, but it uses instaparse instead of brag
>>> as parser library.
>> https://github.com/tgbugs/laundry/tree/next#similar-projects I managed
>> to get it into my README as a reminder to myself to have a thorough
>> look at it, but have been occupied with other work since then.
> Thanks, I'll also set a link in our README to related work.

Have either (or both) of you looked at BeOrg (http://beorg.app)?  This
is an (iOS) app that implements task management from Org files by
reading and updating the Org file structure.  I would assume it uses a
parser to breakdown the Org file structure and rebuild it later.  That
is what I see your parsers becoming.

-- 
David Masterson


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: A formal grammar for Org
  2021-06-02  4:00     ` David Masterson
@ 2021-06-02  8:22       ` Jakob Schöttl
  2021-06-03  2:36         ` David Masterson
  2021-06-03 20:32         ` Jean Louis
  0 siblings, 2 replies; 7+ messages in thread
From: Jakob Schöttl @ 2021-06-02  8:22 UTC (permalink / raw)
  To: David Masterson; +Cc: Tom Gillespie, emacs-orgmode



Am 02.06.21 um 06:00 schrieb David Masterson:
> Jakob Schöttl <jschoett@gmail.com> writes:
>
>> Am 01.06.21 um 11:53 schrieb Tom Gillespie:
>>>> We have a pretty similar project, org-parser[1]. It's also written
>>>> in a Lisp dialect, Clojure, but it uses instaparse instead of brag
>>>> as parser library.
>>> https://github.com/tgbugs/laundry/tree/next#similar-projects I managed
>>> to get it into my README as a reminder to myself to have a thorough
>>> look at it, but have been occupied with other work since then.
>> Thanks, I'll also set a link in our README to related work.
> Have either (or both) of you looked at BeOrg (http://beorg.app)?  This
> is an (iOS) app that implements task management from Org files by
> reading and updating the Org file structure.  I would assume it uses a
> parser to breakdown the Org file structure and rebuild it later.  That
> is what I see your parsers becoming.
I haven't tried BeOrg myself, but it's proprietary and we have an open 
source, platform-independent alternative with Organice. See also 
https://github.com/200ok-ch/organice#beorg

org-parser is also open source and will finally replace Organice's 
somewhat hacky Parser as a library.

Regards, Jakob



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: A formal grammar for Org
  2021-06-02  8:22       ` Jakob Schöttl
@ 2021-06-03  2:36         ` David Masterson
  2021-06-03 20:32         ` Jean Louis
  1 sibling, 0 replies; 7+ messages in thread
From: David Masterson @ 2021-06-03  2:36 UTC (permalink / raw)
  To: Jakob Schöttl; +Cc: Tom Gillespie, emacs-orgmode

Jakob Schöttl <jschoett@gmail.com> writes:

> Am 02.06.21 um 06:00 schrieb David Masterson:
>> Jakob Schöttl <jschoett@gmail.com> writes:
>>
>>> Am 01.06.21 um 11:53 schrieb Tom Gillespie:
>>>>> We have a pretty similar project, org-parser[1]. It's also written
>>>>> in a Lisp dialect, Clojure, but it uses instaparse instead of brag
>>>>> as parser library.
>>>> https://github.com/tgbugs/laundry/tree/next#similar-projects I managed
>>>> to get it into my README as a reminder to myself to have a thorough
>>>> look at it, but have been occupied with other work since then.
>>> Thanks, I'll also set a link in our README to related work.
>> Have either (or both) of you looked at BeOrg (http://beorg.app)?  This
>> is an (iOS) app that implements task management from Org files by
>> reading and updating the Org file structure.  I would assume it uses a
>> parser to breakdown the Org file structure and rebuild it later.  That
>> is what I see your parsers becoming.
> I haven't tried BeOrg myself, but it's proprietary and we have an open
> source, platform-independent alternative with Organice. See also 
> https://github.com/200ok-ch/organice#beorg
>
> org-parser is also open source and will finally replace Organice's
> somewhat hacky Parser as a library.

Nice (no pun intended)

-- 
David Masterson


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: A formal grammar for Org
  2021-06-02  8:22       ` Jakob Schöttl
  2021-06-03  2:36         ` David Masterson
@ 2021-06-03 20:32         ` Jean Louis
  1 sibling, 0 replies; 7+ messages in thread
From: Jean Louis @ 2021-06-03 20:32 UTC (permalink / raw)
  To: Jakob Schöttl; +Cc: Tom Gillespie, emacs-orgmode, David Masterson

* Jakob Schöttl <jschoett@gmail.com> [2021-06-02 11:23]:
> I haven't tried BeOrg myself, but it's proprietary and we have an open
> source, platform-independent alternative with Organice. See also
> https://github.com/200ok-ch/organice#beorg

As "Org" started first, it would be better that author ask the company
to stop using the trademark. Trademark need not be registered, but
when registered even better.

By asking the author to either stop using "Org" as it is trademark in
commerce (software is commerce) maybe they could get incentive to make
it free software too.


Jean

Take action in Free Software Foundation campaigns:
https://www.fsf.org/campaigns

Sign an open letter in support of Richard M. Stallman
https://stallmansupport.org/


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-06-03 20:38 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <d354da93-c910-c7c8-8bb5-51c3877575f9@gmail.com>
2021-06-01  9:53 ` A formal grammar for Org Tom Gillespie
2021-06-01 21:22   ` Jakob Schöttl
2021-06-02  4:00     ` David Masterson
2021-06-02  8:22       ` Jakob Schöttl
2021-06-03  2:36         ` David Masterson
2021-06-03 20:32         ` Jean Louis
2021-04-04  6:11 Tom Gillespie

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).