From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1 ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id aKyfKLoDtmC0YQEAgWs5BA (envelope-from ) for ; Tue, 01 Jun 2021 11:54:02 +0200 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1 with LMTPS id +M9FJLoDtmAVQQAAbx9fmQ (envelope-from ) for ; Tue, 01 Jun 2021 09:54:02 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 204DD18326 for ; Tue, 1 Jun 2021 11:54:01 +0200 (CEST) Received: from localhost ([::1]:60864 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lo160-00049G-Pa for larch@yhetil.org; Tue, 01 Jun 2021 05:54:00 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:53310) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lo15X-000497-AY for emacs-orgmode@gnu.org; Tue, 01 Jun 2021 05:53:31 -0400 Received: from mail-wr1-x432.google.com ([2a00:1450:4864:20::432]:46955) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1lo15V-0000mW-4X for emacs-orgmode@gnu.org; Tue, 01 Jun 2021 05:53:31 -0400 Received: by mail-wr1-x432.google.com with SMTP id g17so13529437wrs.13 for ; Tue, 01 Jun 2021 02:53:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=hdW+SvE4YkgItzdVA/aDuf4abPTpVLOYsXjmk1ZToZs=; b=leYexKiMvIH6TzQ6QNYATBLeODCSpLdromf7/do5I8uC/5lvQvGi6l6ERRJrPDOI+M a9q3UaVq0gwkLvJiPdQc838Tcy4BfuM1AgTuTcSay7A5Bn2MMe4Cpbhpt1BnYnXy/+sf y6brITWVzbunfAdKi75DKeEHYQ5J5A0PNUlXE8AVCaunv/rVP9IIPfi6oyBNiI2UViWO 3Sq4vJTLT/YwqOmHU37JJ/Xuk4MB6bL7ok+2VteXwPRPT360iVbK0C5TA/k9Zw7XcD8h w0bBuAeZK889jXtVjWC/HnnK4NHUTjgumcNKOPjWMjkLWndCYrOpwajv83j/pk54mG/+ sa+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=hdW+SvE4YkgItzdVA/aDuf4abPTpVLOYsXjmk1ZToZs=; b=h0s9fDDRMNUCfVj3FiwNCyAq7pEC0NoBKCKAiLvd+6SFQI8w1BKWyD4eyIqZLF1fOU LFuHIQT9HoDSVHsOLZiVUKJIimziaMpknBanhMSWiYmK6ktvykf4LWVRhOHb4RpdaILV hOoVKNeYGG/bHbQn2qmwtyd8DTBU0hvoAcrL+g6yhk0o6OCJS0FN42k7l6Rx6NCAQ6q6 tbDMHFHKYiBPLP5migLB352AhBwdj48aQmO3H22lMKZ3LFux9KfY5lziDXIxeWs+JFot dzgyrFPWokMz8v4wC2cDi898TSgWod+m1qdQtKKKFBn7/X1ZqSJbAZmMMY1iOsorfYNF 10zQ== X-Gm-Message-State: AOAM530CBdQnU+Cdo6sUpt+zQOwLxJe60hPT7bPsAOFXCdhBgTxSSpfV EA4UKSgJhcAk136T0vCzAIqh5uf8eMdVX8HGpi0= X-Google-Smtp-Source: ABdhPJwZFVZtBnIaKdItPNo3pBSZrCRK6SeIO6vW5Hh/Irk+uz2tplnxXmMvKANEe7QXYgwN7bvlOLk02RMsvdnbnG0= X-Received: by 2002:adf:8b9a:: with SMTP id o26mr16619303wra.96.1622541206234; Tue, 01 Jun 2021 02:53:26 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Tom Gillespie Date: Tue, 1 Jun 2021 02:53:15 -0700 Message-ID: Subject: Re: A formal grammar for Org To: =?UTF-8?Q?Jakob_Sch=C3=B6ttl?= Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=2a00:1450:4864:20::432; envelope-from=tgbugs@gmail.com; helo=mail-wr1-x432.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: emacs-orgmode Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1622541242; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=hdW+SvE4YkgItzdVA/aDuf4abPTpVLOYsXjmk1ZToZs=; b=XNpgKXPGT0l5m1zsZJ1+WwIy3D9ney0TeirTOiVQGdrD17oc6cnspICocVQLBN6CPFrb+S sLySfONnCBHTOyufXvrP2yGXzASXZC8BXCB1iA0N54LRJfvKsqsoECrQKAiSHMoK4/jqpo D89EV2Nws9+tSgWDxZ0/6QKtFQ0MSQ2tAIYyHjCEP6g/65t0qG9q+e3pEQtxtNKtm/DLJi k7x8OPegsDwzBsDi7a/lgG/Lifujw3uvjlfWhXbjgLl2B/8KQTnDdmC1/zmmb9cnqby6CH gchumxa0qWkHs97aQm3yGd326Z9zqev6MI0GFOQ4SnR+RWGz5RXuwfLs/AeGeg== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1622541242; a=rsa-sha256; cv=none; b=jtfDn5PwbHgFAZShB3Y8Y5oeybKSQqhN0XJzWWJi2A1E76NoJwQ4+IajWG2QFzX6VFEsOe nvlBacYQDO7At02P7Pr1NHGT+VsswAfE1igCspWVnFwreYR858OEYHOf3a33LT8BmBPbNx xFtlnAbHrTXm2Hdz+Pv59sxTcmjmAWqwxDOJB/tXbxFLEiOL1jhzQ+KvDlfhicEzRJNiGR ZqaJgjMbVlRigOnknucDYzipfJuHnTYAbutFBfPcoL3TCvNERLTHiyRBMpNFAh9bky2weo vbj5doqR5ZA/+4IVkMbBYgEsVkLb5Q3cS2Vu0I88/HVg0kbcZnuxQkLkS92nOw== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=leYexKiM; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Migadu-Spam-Score: -3.13 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=leYexKiM; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Migadu-Queue-Id: 204DD18326 X-Spam-Score: -3.13 X-Migadu-Scanner: scn0.migadu.com X-TUID: sKRj+LLRxOV9 Hi Jakob, Thank you for getting in touch. I had been meaning to after someone pointed me to your repo in a reddit thread, but you beat me to it. Replies in line. Best! Tom PS ccing this back to the list for the record. On Tue, Jun 1, 2021 at 1:56 AM Jakob Sch=C3=B6ttl wrot= e: > > Hi Tom, > > I came to your post at the mailing list from here: > https://github.com/gagbo/LuaOrgParser/issues/1 > Sorry, I don't know, how I can answer on the mailing list when I don't ha= ve received the original mail. No worries, I never managed to figure that out either so I just subscribed. Maybe by matching the subject as you do here and ccing the list (attempting it in this email to see what happens)? > We have a pretty similar project, org-parser[1]. It's also written in a L= isp dialect, Clojure, but it uses instaparse instead of brag as parser libr= ary. https://github.com/tgbugs/laundry/tree/next#similar-projects I managed to get it into my README as a reminder to myself to have a thorough look at it, but have been occupied with other work since then. > My idea was, to transform the formal grammar to a grammar.js for tree-sit= ter. It would be so cool, if it could be generated from one formal specific= ation. Yes, that would be great. It would be a major step to have a couple of grammars for org that can be used for stuff like this and compared to each other, along with test cases that we can use to define correct behavior. One issue that I don't have a full understanding of at the moment is how certain ambiguous forms will impact the ability to transform directly into the tree sitter grammar. The reason I mention this is because I have had to move to a two phase parser in order to deal with ambiguous parses. Having not looked carefully at your approach I don't know whether you have encountered similar issues. For the tree sitter use case in particular I'm not entirely sure that the ambiguity matters, but I haven't had a chance to look at it yet. > Do you plan, in your parser, to do a transformation step from the raw par= ser AST to a higher-level AST? E.g. the raw parser AST would parse a (:date= "2021-06-01") and the transformed AST would transform this to a higher-le= vel timestamp object. Yes. I already do that to a certain extent in the expander https://github.com/tgbugs/laundry/blob/next/laundry/expander.rkt (the raw AST is hard to work with directly), but there will be more. I also expect that I will add an intermediate step where the AST is rearranged to account for aspects of org semantics that cannot be captured by the context free part of the grammar. After that step there are a number of potential conversions, one of which w= ill transform the AST into Racket structs, but I haven't made it quite that far yet. That said, I think that in terms of defining a canonical parse, I am aiming to do that in the transformed intermediate s-expression representation because I think it will be easier to define the correctness of certain user interactions on that form rather tha= n on the higher level object representation, even if the higher level objects are ultimately used to actually implement that behavior. > Do you have any automated tests for your parser? Yes. See https://github.com/tgbugs/laundry/blob/next/laundry/test.rkt you can run them from the working directory via =3Draco test laundry=3D. I haven't fully specified the expected AST (and transforms) in most cases because I'm still hammering out details. In some cases I do specify the parse that I expect, e.g. for headings I specify when tags are expected in cases where there might be some ambiguity. If you are looking for edge cases there are a number that are not yet in the automated tests but that are in https://github.com/tgbugs/laundry/blob/next/laundry/cursed.org because they hit on some cases of extreme ambiguity and internal inconsistency in the elisp implementation or on weird behavior under user interaction (I also have some other test cases that haven't been committed to the repo yet). It would be great to align the grammars and the behavior using a set of common test cases.