emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Tom Gillespie <tgbugs@gmail.com>
To: "Jakob Schöttl" <jschoett@gmail.com>
Cc: emacs-orgmode <emacs-orgmode@gnu.org>
Subject: Re: A formal grammar for Org
Date: Tue, 1 Jun 2021 02:53:15 -0700	[thread overview]
Message-ID: <CA+G3_POBAB1QX1Zv8q9sjFh4KHUHVmaNXp3XO7__6eoSdXKoyA@mail.gmail.com> (raw)
In-Reply-To: <d354da93-c910-c7c8-8bb5-51c3877575f9@gmail.com>

Hi Jakob,
    Thank you for getting in touch. I had been meaning to after
someone pointed me to your repo in a reddit thread, but you beat me to
it. Replies in line. Best!

PS ccing this back to the list for the record.

On Tue, Jun 1, 2021 at 1:56 AM Jakob Schöttl <jschoett@gmail.com> wrote:
> Hi Tom,
> I came to your post at the mailing list from here:
> https://github.com/gagbo/LuaOrgParser/issues/1
> Sorry, I don't know, how I can answer on the mailing list when I don't have received the original mail.

No worries, I never managed to figure that out either so I just
subscribed. Maybe by matching the subject as you do here and ccing the
list (attempting it in this email to see what happens)?

> We have a pretty similar project, org-parser[1]. It's also written in a Lisp dialect, Clojure, but it uses instaparse instead of brag as parser library.

https://github.com/tgbugs/laundry/tree/next#similar-projects I managed
to get it into my README as a reminder to myself to have a thorough
look at it, but have been occupied with other work since then.

> My idea was, to transform the formal grammar to a grammar.js for tree-sitter. It would be so cool, if it could be generated from one formal specification.

Yes, that would be great. It would be a major step to have a couple of
grammars for org that can be used for stuff like this and compared to
each other, along with test cases that we can use to define correct

One issue that I don't have a full understanding of at the
moment is how certain ambiguous forms will impact the ability to
transform directly into the tree sitter grammar.

The reason I mention
this is because I have had to move to a two phase parser in order to
deal with ambiguous parses.

Having not looked carefully at your
approach I don't know whether you have encountered similar issues. For
the tree sitter use case in particular I'm not entirely sure that the
ambiguity matters, but I haven't had a chance to look at it yet.

> Do you plan, in your parser, to do a transformation step from the raw parser AST to a higher-level AST? E.g. the raw parser AST would parse a (:date  "2021-06-01") and the transformed AST would transform this to a higher-level timestamp object.

Yes. I already do that to a certain extent in the expander
https://github.com/tgbugs/laundry/blob/next/laundry/expander.rkt (the
raw AST is hard to work with directly), but there will be more. I also
expect that I will add an intermediate step where the AST is
rearranged to account for aspects of org semantics that cannot be
captured by the context free part of the grammar.

After that step there are a number of potential conversions, one of which will
transform the AST into Racket structs, but I haven't made it quite
that far yet. That said, I think that in terms of defining a canonical
parse, I am aiming to do that in the transformed intermediate
s-expression representation because I think it will be easier to
define the correctness of certain user interactions on that form rather than
on the higher level object representation, even if the higher level
objects are ultimately used to actually implement that behavior.

> Do you have any automated tests for your parser?

Yes. See https://github.com/tgbugs/laundry/blob/next/laundry/test.rkt
you can run them from the working directory via =raco test laundry=.
I haven't fully specified the expected AST (and transforms) in most
cases because I'm still hammering out details. In some cases I do
specify the parse that I expect, e.g. for headings I specify when
tags are expected in cases where there might be some ambiguity. If you
are looking for edge cases there are a number that are not yet in the
automated tests but that are in
https://github.com/tgbugs/laundry/blob/next/laundry/cursed.org because
they hit on some cases of extreme ambiguity and internal inconsistency
in the elisp implementation or on weird behavior under user
interaction (I also have some other test cases that haven't been
committed to the repo yet).

It would be great to align the grammars and the behavior using a set
of common test cases.

       reply	other threads:[~2021-06-01  9:54 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <d354da93-c910-c7c8-8bb5-51c3877575f9@gmail.com>
2021-06-01  9:53 ` Tom Gillespie [this message]
2021-06-01 21:22   ` A formal grammar for Org Jakob Schöttl
2021-06-02  4:00     ` David Masterson
2021-06-02  8:22       ` Jakob Schöttl
2021-06-03  2:36         ` David Masterson
2021-06-03 20:32         ` Jean Louis
2021-04-04  6:11 Tom Gillespie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+G3_POBAB1QX1Zv8q9sjFh4KHUHVmaNXp3XO7__6eoSdXKoyA@mail.gmail.com \
    --to=tgbugs@gmail.com \
    --cc=emacs-orgmode@gnu.org \
    --cc=jschoett@gmail.com \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).