emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: "Jakob Schöttl" <jschoett@gmail.com>
To: Tom Gillespie <tgbugs@gmail.com>
Cc: emacs-orgmode <emacs-orgmode@gnu.org>
Subject: Re: A formal grammar for Org
Date: Tue, 1 Jun 2021 23:22:38 +0200	[thread overview]
Message-ID: <13899acc-0760-a772-67d0-50b91cc5d9ac@gmail.com> (raw)
In-Reply-To: <CA+G3_POBAB1QX1Zv8q9sjFh4KHUHVmaNXp3XO7__6eoSdXKoyA@mail.gmail.com>



Am 01.06.21 um 11:53 schrieb Tom Gillespie:
>
>> We have a pretty similar project, org-parser[1]. It's also written in a Lisp dialect, Clojure, but it uses instaparse instead of brag as parser library.
> https://github.com/tgbugs/laundry/tree/next#similar-projects I managed
> to get it into my README as a reminder to myself to have a thorough
> look at it, but have been occupied with other work since then.
Thanks, I'll also set a link in our README to related work.
>> My idea was, to transform the formal grammar to a grammar.js for tree-sitter. It would be so cool, if it could be generated from one formal specification.
> Yes, that would be great. It would be a major step to have a couple of
> grammars for org that can be used for stuff like this and compared to
> each other, along with test cases that we can use to define correct
> behavior.
Right, that would be interesting. But it requires all parser to yield 
exactly the same structure (to be comparable). I think a design goal of 
org-parser is to provide a easy to use AST but not necessarily a 
100%-match to the AST from org-element.el.

How is it with laundry? Do you try to stick exactly to org modes parse 
result structure?

> One issue that I don't have a full understanding of at the
> moment is how certain ambiguous forms will impact the ability to
> transform directly into the tree sitter grammar.
>
> The reason I mention
> this is because I have had to move to a two phase parser in order to
> deal with ambiguous parses.
We also have two phases: "parse" and "transform" (the latter is 
basically a mapping function transforming nodes of the AST). I also see 
that as a problem for generating grammar.js.

a) For tree-sitter, depending of what we expect from it, it may not be 
necessary, to do the second phase. E.g. for syntax highlighting the 
context free grammar might be enough.

b) Since transformations of org-parser can be compiled to JS, it might 
be possible, to even create the grammar.js as two-phase parser.

>> Do you plan, in your parser, to do a transformation step from the raw parser AST to a higher-level AST? E.g. the raw parser AST would parse a (:date  "2021-06-01") and the transformed AST would transform this to a higher-level timestamp object.
> Yes. I already do that to a certain extent in the expander
> https://github.com/tgbugs/laundry/blob/next/laundry/expander.rkt (the
> raw AST is hard to work with directly), but there will be more. I also
> expect that I will add an intermediate step where the AST is
> rearranged to account for aspects of org semantics that cannot be
> captured by the context free part of the grammar.
>
> After that step there are a number of potential conversions, one of which will
> transform the AST into Racket structs, but I haven't made it quite
> that far yet. That said, I think that in terms of defining a canonical
> parse, I am aiming to do that in the transformed intermediate
> s-expression representation because I think it will be easier to
> define the correctness of certain user interactions on that form rather than
> on the higher level object representation, even if the higher level
> objects are ultimately used to actually implement that behavior.
Interesting. Yeah, because things like timestamps have language-specific 
representations may not be comparable across e.g. emacs lisp, rust, and 
clojure/JS.
>> Do you have any automated tests for your parser?
> Yes. See https://github.com/tgbugs/laundry/blob/next/laundry/test.rkt
> you can run them from the working directory via =raco test laundry=.
Ah, alright, I first didn't see them. Wow.

These parser projects are really a huge amount of work times 4 (grammar, 
transformation, tests, re-export) ^^
>
> It would be great to align the grammars and the behavior using a set
> of common test cases.
If it works out, that our parser have exactly the same resulting 
structure, that would be great. But not sure, if that works out, to be 
honest. At least we can share each others mean test.org files ^^

Best, Jakob


  reply	other threads:[~2021-06-01 21:23 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <d354da93-c910-c7c8-8bb5-51c3877575f9@gmail.com>
2021-06-01  9:53 ` A formal grammar for Org Tom Gillespie
2021-06-01 21:22   ` Jakob Schöttl [this message]
2021-06-02  4:00     ` David Masterson
2021-06-02  8:22       ` Jakob Schöttl
2021-06-03  2:36         ` David Masterson
2021-06-03 20:32         ` Jean Louis
2021-04-04  6:11 Tom Gillespie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=13899acc-0760-a772-67d0-50b91cc5d9ac@gmail.com \
    --to=jschoett@gmail.com \
    --cc=emacs-orgmode@gnu.org \
    --cc=tgbugs@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).