From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id IGIHIcWzl18dOQAA0tVLHw (envelope-from ) for ; Tue, 27 Oct 2020 05:44:37 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1 with LMTPS id +AXqHMWzl1+oNwAAbx9fmQ (envelope-from ) for ; Tue, 27 Oct 2020 05:44:37 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id D386694060F for ; Tue, 27 Oct 2020 05:44:36 +0000 (UTC) Received: from localhost ([::1]:41132 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kXHmc-0003lv-52 for larch@yhetil.org; Tue, 27 Oct 2020 01:44:34 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:53708) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kXHkQ-0003lo-2e for emacs-orgmode@gnu.org; Tue, 27 Oct 2020 01:42:18 -0400 Received: from qrelay63.mxroute.com ([172.82.139.63]:38781) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kXHkM-00080c-NF for emacs-orgmode@gnu.org; Tue, 27 Oct 2020 01:42:17 -0400 Received: from filter004.mxroute.com ([149.28.56.236] 149.28.56.236.vultr.com) (Authenticated sender: mN4UYu2MZsgR) by qrelay63.mxroute.com (ZoneMTA) with ESMTPA id 1756893fa310004441.001 for ; Tue, 27 Oct 2020 05:42:09 +0000 X-Zone-Loop: d821e3d7481310e2b83442c5357ff9bf1d35b9a015da X-Originating-IP: [149.28.56.236] Received: from friday.mxlogin.com (friday.mxlogin.com [159.69.65.104]) by filter004.mxroute.com (Postfix) with ESMTPS id 91F563EAC0 for ; Tue, 27 Oct 2020 05:42:07 +0000 (UTC) Subject: Re: official orgmode parser To: emacs-orgmode@gnu.org References: <68dc1ea1-52e8-7d9e-fb2d-bcf08c111eca@intrepidus.pl> <874kno7w3v.fsf@gnu.org> <87k0vdkz7c.fsf@gmail.com> <87sga13w62.fsf@nicolasgoaziou.fr> <87r1plj71v.fsf@gmail.com> <87mu093qgn.fsf@nicolasgoaziou.fr> <87o8kpj5mu.fsf@gmail.com> <874kmgka3l.fsf@gmail.com> From: =?UTF-8?B?UHJ6ZW15c8WCYXcgS2FtacWEc2tp?= Message-ID: Date: Tue, 27 Oct 2020 06:42:06 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-AuthUser: pk@mmksoft.uk Received-SPF: pass client-ip=172.82.139.63; envelope-from=pk@intrepidus.pl; helo=qrelay63.mxroute.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/10/27 01:42:11 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -40 X-Spam_score: -4.1 X-Spam_bar: ---- X-Spam_report: (-4.1 / 5.0 requ) BAYES_00=-1.9, NICE_REPLY_A=-2.167, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Spam-Score: -1.01 X-TUID: 4JDT++p8E7qm I'm no expert in parsing but I would expect org's parser to be quite similar to the multitude of markdown or CommonMark [1] parsers. There isn't that much difference in syntax, except maybe org is more versatile and has more syntax elements, like drawers. Searching for "EBNF Markdown" I stumbled upon [2]. [1] https://commonmark.org/ [2] http://roopc.net/posts/2014/markdown-cfg/ On 10/26/20 10:00 PM, Tom Gillespie wrote: > Here is an attempt to clarify my own confusion around the nested > structures in org. In short: each node in the headline tree and the > plain list tree can be parse using the EBNF, the nesting level cannot, > which means that certain useful operations such as folding, require > additional rules beyond the grammar. More in line. Best! > Tom > >> Do you need to? This is valid as an entire Org file, I think: >> >> *** foo >> * bar >> ***** baz >> >> And that can be represented in EBNF. I'm not aware of places where behavior is indent-level specific, except inline tasks, and that edge case can be represented. > > You are correct, and as long as the heading depth doesn't change some > interpretation then this is a non-issue. The reason I mentioned this > though is > because it means that you cannot determine how to correctly fold an > org file from the grammar alone. > > To make sure I understand. It is possible to determine the number of > leading stars (and thus the level), but I think that it is not > possible to identify the end of a section. > For example > > * a > *** b > ** c > * d > > You can parse out a 1, b 3, c 2, d 1, but if you want to be able to > nest b and c inside a but not nest d inside a, then you need a stack > in there somewhere. You > can't have a rule such as > > section : headline content > content : text | section > > because the parse would incorrectly nest sections at the same level, > you would have to write > > section-level-1 : headline-1 content-1 > content-1 : text | section-level-2-n > > but since we have an arbitrary number of levels the grammar would have > to be infinite. > This is only if you want your grammar to be able to encode that the > content of sections > can include other more deeply nested sections, which in this context > we almost certainly > do not (as you point out). > >>> There is a similar issue with the indentation level in >>> order to correctly interpret plain lists. >> >> list ::= ('+' string newline)+ sublist? >> sublist ::= (indent list)+ >> >> I think this captures lists? > > Ah yes, I see my mistake here. In order for this to work the parser > has to implement significant whitespace, > so whitespace cannot be parsed into a single token. I think everything > works out after that. > >> Definitely not able to be represented in EBNF, unless as you say {name} is a limited vocabulary. > > Darn those pesky open sets! >