From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id GJjBGjE5l192HAAA0tVLHw (envelope-from ) for ; Mon, 26 Oct 2020 21:01:37 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id uIu0FjE5l18ZVAAA1q6Kng (envelope-from ) for ; Mon, 26 Oct 2020 21:01:37 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id EB57694036A for ; Mon, 26 Oct 2020 21:01:36 +0000 (UTC) Received: from localhost ([::1]:45324 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kX9cU-0000t9-Sv for larch@yhetil.org; Mon, 26 Oct 2020 17:01:34 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:52446) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kX9bW-0000sK-AI for emacs-orgmode@gnu.org; Mon, 26 Oct 2020 17:00:34 -0400 Received: from mail-wr1-x435.google.com ([2a00:1450:4864:20::435]:38055) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kX9bU-0005Bp-Pe; Mon, 26 Oct 2020 17:00:34 -0400 Received: by mail-wr1-x435.google.com with SMTP id n18so14398324wrs.5; Mon, 26 Oct 2020 14:00:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=+P0EH95zLiuaKfQGF97bwgfDdqB74R4tF8kC4kAl6DU=; b=jiHT2WsUmY336iPMikDLeNfEvEmqYT3NOQrMTjAU8kRIuyGvO0G8d6wojrOxQjGwzo rpRkkBI0ftWoPDHYT6LN9xbvTMzojjSXz9jxgA9tCK8aCgkqx7ktexz09UlWFLL2IfRb 90radJ57rVt0kuwvcQUGycckXxCflHB19nKPkW8ByGa8nGe8WU8PLVG7xGfV34KLUjL6 oUr3R/BjFaPRUCm6//KMBik0niYeS9QD7AFtKZK4mFqtIv+ainmGWTwlAcieg9BlpRsd QyX1Otryp1+2utvKJ2+Av9K51SABovT5Q3K+sKD5RrlLboFw7ZGW+J60Z5pxClcvJBf3 ifAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=+P0EH95zLiuaKfQGF97bwgfDdqB74R4tF8kC4kAl6DU=; b=gTX/CjMQF2rf676mPJZp7wRjxtPnjuHIi1XCknxd8FlzlQmxdluY0cvVX5w47TEIf7 SAUMGNgO7QS+WoeEwvCvcOEOggqSLKGWuMMwSDCU56ihrri74fMyLNR1WCnrFeha4CBv 0CpJDD3d/IqR/5Skcu0SHcE0lHtpD2dqb+DOf3t/48LllHtM/ooR6crnjST/bHrvGJN0 W6i+nKAVFrgf26X5SXpLIIy5qG0IqHEAS667EVysErTcOBBHA25GJT4RWsNe4E47HA9S 8KT8DswyAgHsQDK0VI3jVtwEkr4Es14cFp/+Tsu+M+XcSPOBoK5Dx5OtLnuirj4dg6Ax ncAg== X-Gm-Message-State: AOAM532ea6IALB6E+0iIEBAU6HIBxlxPRS42JtrKjS6DissD7Hgz9LAQ eFfhfIDNHw5uel8W9G/K8AQphc7sB9B7qKfiDkzTcljUvWr/oQ== X-Google-Smtp-Source: ABdhPJwXIOBNnf9Go7NM4rI9TcAcnTkZxGpAH0nUb78RDZIPOrhy8MKdF5GIXIZ6ZxzYRYizK2ABomlHTTEqdUgaqYE= X-Received: by 2002:adf:a345:: with SMTP id d5mr21363952wrb.55.1603746029572; Mon, 26 Oct 2020 14:00:29 -0700 (PDT) MIME-Version: 1.0 References: <68dc1ea1-52e8-7d9e-fb2d-bcf08c111eca@intrepidus.pl> <874kno7w3v.fsf@gnu.org> <87k0vdkz7c.fsf@gmail.com> <87sga13w62.fsf@nicolasgoaziou.fr> <87r1plj71v.fsf@gmail.com> <87mu093qgn.fsf@nicolasgoaziou.fr> <87o8kpj5mu.fsf@gmail.com> <874kmgka3l.fsf@gmail.com> In-Reply-To: <874kmgka3l.fsf@gmail.com> From: Tom Gillespie Date: Mon, 26 Oct 2020 17:00:17 -0400 Message-ID: Subject: Re: official orgmode parser To: Ken Mankoff Content-Type: text/plain; charset="UTF-8" Received-SPF: pass client-ip=2a00:1450:4864:20::435; envelope-from=tgbugs@gmail.com; helo=mail-wr1-x435.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Bastien , =?UTF-8?B?UHJ6ZW15c8WCYXcgS2FtacWEc2tp?= , emacs-orgmode , Nicolas Goaziou Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=fail (rsa verify failed) header.d=gmail.com header.s=20161025 header.b=jiHT2WsU; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Spam-Score: 0.09 X-TUID: wL5aAHnrH+ho Here is an attempt to clarify my own confusion around the nested structures in org. In short: each node in the headline tree and the plain list tree can be parse using the EBNF, the nesting level cannot, which means that certain useful operations such as folding, require additional rules beyond the grammar. More in line. Best! Tom > Do you need to? This is valid as an entire Org file, I think: > > *** foo > * bar > ***** baz > > And that can be represented in EBNF. I'm not aware of places where behavior is indent-level specific, except inline tasks, and that edge case can be represented. You are correct, and as long as the heading depth doesn't change some interpretation then this is a non-issue. The reason I mentioned this though is because it means that you cannot determine how to correctly fold an org file from the grammar alone. To make sure I understand. It is possible to determine the number of leading stars (and thus the level), but I think that it is not possible to identify the end of a section. For example * a *** b ** c * d You can parse out a 1, b 3, c 2, d 1, but if you want to be able to nest b and c inside a but not nest d inside a, then you need a stack in there somewhere. You can't have a rule such as section : headline content content : text | section because the parse would incorrectly nest sections at the same level, you would have to write section-level-1 : headline-1 content-1 content-1 : text | section-level-2-n but since we have an arbitrary number of levels the grammar would have to be infinite. This is only if you want your grammar to be able to encode that the content of sections can include other more deeply nested sections, which in this context we almost certainly do not (as you point out). > > There is a similar issue with the indentation level in > > order to correctly interpret plain lists. > > list ::= ('+' string newline)+ sublist? > sublist ::= (indent list)+ > > I think this captures lists? Ah yes, I see my mistake here. In order for this to work the parser has to implement significant whitespace, so whitespace cannot be parsed into a single token. I think everything works out after that. > Definitely not able to be represented in EBNF, unless as you say {name} is a limited vocabulary. Darn those pesky open sets!