From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp12.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id mF7bGV2+5mGd/wAAgWs5BA (envelope-from ) for ; Tue, 18 Jan 2022 14:19:25 +0100 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp12.migadu.com with LMTPS id 8G9IFl2+5mGZIwEAauVa8A (envelope-from ) for ; Tue, 18 Jan 2022 14:19:25 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 98DF43E9FD for ; Tue, 18 Jan 2022 14:19:24 +0100 (CET) Received: from localhost ([::1]:58898 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1n9oOR-0005ST-JU for larch@yhetil.org; Tue, 18 Jan 2022 08:19:23 -0500 Received: from eggs.gnu.org ([209.51.188.92]:52050) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n9nF8-0006rW-DT for emacs-orgmode@gnu.org; Tue, 18 Jan 2022 07:05:42 -0500 Received: from [2a00:1450:4864:20::131] (port=37765 helo=mail-lf1-x131.google.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1n9nF5-0006QB-TN for emacs-orgmode@gnu.org; Tue, 18 Jan 2022 07:05:42 -0500 Received: by mail-lf1-x131.google.com with SMTP id m1so69726050lfq.4 for ; Tue, 18 Jan 2022 04:05:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:in-reply-to:references:date:message-id :mime-version:content-transfer-encoding; bh=tmrhzofWxApuqv/4VDGldxFUIGwRStqeNEsBMVp4XrM=; b=KJMSWuA8aBfpSbMj86n7Cm4l4NlvGR+XjPWQbxzwzCZqIVBCnOTQNJpcmjPcmCWY4V XBg/liCmAwkm55DAVSJ1Ud9/lykowG0fV/gCnPn1p2LV380e/GrOLqgPxlkEesCs4Rgm nlvKkNtBtQYJETnsbVTUTHrW+cv10N+wDdpJdR0F8/HtV/pZPL8c+sXA2Wv86InRN4f6 sIPtDGLwzN5Is4sWeKtpEyodELkAg2JNJ/e603TvxLpKoq429QR6KvApXO6amJhql7Vi UFxqbWWRGU3QWcVOlDge6PkPgzdqeuXaABdzBACenTKakHZUGvjmbyzECtjnhkVOwVxA upKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version:content-transfer-encoding; bh=tmrhzofWxApuqv/4VDGldxFUIGwRStqeNEsBMVp4XrM=; b=RxVrEiILDNuAw3YahCTmfGpawSH9Kf11h8xmcLsd/A0IM8Lm1N+PPqf5xAiETnYbBY HLcOkk374BCAet1yHnjciLNRVqaJLvKPhQ34i4eInwbIZV4uN2gr3/iw5uB2fxkpxwPS GSnxvNLgIWYMvts1JzUQJ4e7Qwdr1wDf9n6yACBvq8MlDWrTStoCt9iYeZeb/Cy85+wj mOxj8SVTvWcMsIo1SwYu5DETf+gZMgoLqlpYwFZz9IP92g1ZgdUdTPRBdF2FRnHc9+xn FdV7fjjWQ/y7F2BbZgPMAm1BYmuBr0wkWnY8LmXW0bw83R7cyEepE7xiIjzTrF5JJyFc FaHA== X-Gm-Message-State: AOAM5336e8+qoKnVDgVhrpZnQmF7lURbUYjSFcm4zje+ShodUdmI0BJK /p0tAqOhXES2cRlMWBb61do= X-Google-Smtp-Source: ABdhPJw/UqunUvq9gB1v2Iew21F82xqse39NsDSeFx+m03oHawSBB2wcfSh/dG+Uwm5w79Nauy2bOA== X-Received: by 2002:a2e:995:: with SMTP id 143mr19588647ljj.411.1642507537397; Tue, 18 Jan 2022 04:05:37 -0800 (PST) Received: from localhost ([158.255.2.9]) by smtp.gmail.com with ESMTPSA id g11sm1211182lfu.94.2022.01.18.04.05.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Jan 2022 04:05:36 -0800 (PST) From: Ihor Radchenko To: Tom Gillespie Subject: Re: Org Syntax Specification In-Reply-To: References: <871r1g936z.fsf@gmail.com> Date: Tue, 18 Jan 2022 20:09:59 +0800 Message-ID: <87r195nt2g.fsf@localhost> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Host-Lookup-Failed: Reverse DNS lookup failed for 2a00:1450:4864:20::131 (failed) Received-SPF: pass client-ip=2a00:1450:4864:20::131; envelope-from=yantar92@gmail.com; helo=mail-lf1-x131.google.com X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, PDS_HP_HELO_NORDNS=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: org-mode-email , Nicolas Goaziou , Timothy Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Migadu-Flow: FLOW_IN X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1642511964; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=tmrhzofWxApuqv/4VDGldxFUIGwRStqeNEsBMVp4XrM=; b=YZsY6UJXr7GM3MuWNGXQNCe/gdCRABguWJqGJR0r/L9d3iKyQnKBUKAqszMzFYUlpqyB7g y+nUS8dTbfvOGFyhcXoWzQCDNfNMEjH7ZTsVdoCIfDKQoJXrR+iqD2Q5XWpQ9KYZO1uaPd rrWycKw9hLYrWvLNqkly3afKNKtXZG6AVCfY85HKl0RpwoXBddbYcS95GrIGVjqUdVuQ0B MepuJ8np/DAwg2rzO7uATwQHqFnr23uuSNjrtEc6cS3iFxoNoHdJSoKJUW8+48LsNbuX7s qGeaj5oY378892xhS+ADVLJip0049RIWbuJy1VTF7OWYWK/EtIpU8KQA+exkqg== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1642511964; a=rsa-sha256; cv=none; b=mR+nCf4DoZD3HHYTVOlIanXUifbZy7KUeo24rl9oQTDGOqNeuZt+8dUCNqSPLNYs63d/3m Z5MySfd5l5WhcrZ1HIDNZtSgRgbfj0skU9wnlyX1Vnph+YHDKaWseK3qNhDERQ/NtEo/LO 9eCVV6RBxrnRb/WenC89NVmLrpKp9IY6D89/yhIIwJ17uVfrb8IpFhtwIv0SCYgHZbPrzC BXPU/cHNi6nZjbo13UKe4wY1+1/oO/pJ4dkrXSFK9PnHp+P43XhAU4FW+Db/sKyS21OCiY 3rNZ9ceKEbs8Q0luhuW9UxQOmboFeTaFevLrxPohKUpqK4kI9miY217aFjMStQ== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gmail.com header.s=20210112 header.b=KJMSWuA8; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -2.02 Authentication-Results: aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gmail.com header.s=20210112 header.b=KJMSWuA8; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: 98DF43E9FD X-Spam-Score: -2.02 X-Migadu-Scanner: scn0.migadu.com X-TUID: 4Qn66WhZ+cUI Tom Gillespie writes: > Extremely in favor of removing switches. There are so many better ways > to do this now that aren't like some eldritch unix horror crawling up > out of the abyss and into the eBNF :) I also agree that switches and $$-style equations may be deprecated. We can 1. Do not mention them in the document 2. Add org-lint warnings about obsoletion As for your other comments, you seem to be suggesting a number of changes to the existing Org syntax. Some of them looks fine, some are not. However, please keep in mind that we have to deal with back compatibility, third party compatibility, and not breaking existing Org documents unless we have a very strong justification. I suggest to branch a number of new threads from here for each concrete suggestion where you want to make changes to Org syntax, as opposed to just document wording. Otherwise, this discussion will become a total mess. More details below. > +Elements are further divided into "[[#Headings][headings]]", "[[#Section= s][sections]]"[fn::sections are not elements], "[[#Greater_Elements][greater Nope. Sections are actually elements. See =3Dorg-element-all-elements=3D. > +other headings. [fn:tom2:I would not discuss strata here because it is > +not related to the syntax of the document. It is related to how that > +syntax is interpreted by org mode. The strata are nesting rules that > +are independent of the syntax, and discussing that here in the syntax > +document is confusing, because the nesting is not something that can be > +parsed directly because it depends on the number of asterisks.] I disagree. Nesting rules are the important part of syntax. We have restrictions on what elements can be inside other element. The same patterns are not recognised in Org depending on their nesting. For example, links that you put into property drawers are not considered link objects. =20=20 > +citation references and [[#Table_Cells][table cells]].[fn:tom3:Table cel= ls should > +be treated in a way that is entirely separate from objects. This documen= t has included > +them as such as has org-element (iirc) however since they can never appe= ar in a paragraph > +and because tables are completely separate syntactically, we should prob= ably drop the > +idea that table cells are objects. I realize that this might mean the cr= eation of a > +distinction between paragraph-objects, title-objects, table-objects etc.] Again I disagree. While your idea about table cells is reasonable (similar for citation-references inside citations), I am against decoupling Org syntax from org-element implementation. In org-element.el, table-cells are just yet another object. If we make things in org-element and syntax document out of sync, confusion and errors will follow during future maintenance. =20=20 > A line containing only spaces, tabs, newlines, and line feeds (=3D\t\n\r= =3D) > -is considered a /blank line/. Blank lines can be used to separate > +is considered a /blank line/. Blank lines separate > paragraphs and other elements. This actually reads slightly confusing. "Blank lines separate paragraphs and other elements" sounds like blank lines are only relevant before/after paragraphs. However, there are also footnote references and lists. Maybe we can try something like: Blank lines can be used to indicate end of some elements. "can" because a single blank line usually does not separate anything. > +considered part of the paragraph.[fn:tom4:I don't think we need to discu= ss > +nesting scope here, it is confusing, it is always the immediately prior > +(lesser?) element.] Then where can we put it? This is one of the tricky conventions we use in the parser. =20=20 > ++ STARS :: A string consisting of one or more asterisks[fn::removed > + note about inline tasks because it is still a heading, any mention > + of a concrete number should not appear in the specification of > syntax.] I am not sure here. Inline tasks are special because a one-line inline task must not contain any text below, cannot have planning or properties. > + contains =3DTODO=3D and =3DDONE=3D, however org-todo-keywords-1 is a b= uffer local > + variable and can be set by users in an org file using =3D#+todo:=3D.]. If we mention this, we also need to elaborate kind of element is #+todo:, where it can be located, and how to parse multiple instances of #+todo in the document. > -A heading contains directly one section (optionally), followed by > -any number of deeper level headings. > +The level of a heading can be used to construct a nested structure. > +All content following a heading that appears before the next heading > +(regardless of the level of that next heading) is a section. In addition, > +text before the first heading in an org document is also a section. Note that it is not true for one-line inline tasks. > +considered a section), sections only occur within headings.[fn:: The > +choice to call this syntactic component a section is confusing because > +it is at odds with the usual notion of a section, namely that the > +usual concept of a section implies that it includes nested content. I > +personally didn't realize that it ended at the next heading until > +writing this comment (as can be seen from reading my comments in the > +laundry implementation). Therefore I suggest that we look for an > +alternate name for this syntactic component. Maybe "segment" or > +something similar that indicates that it is truncated?] Sounds reasonable. However, we may also need to make this change in Elisp level, which is tricky when you think about backward-compatibility. =20=20 > +however, contain [[Planning][planning]].[fn::This is wrong? If it is not > +wrong, then it should be. Property drawers are already annoying to imple= ment > +because they share syntax with regular drawers, and allowing a property = drawer > +at the top of a file without a heading means that it should be a regular= drawer > +not a property drawer, otherwise you have to special case the handling o= f drawers > +in the zeroth section. What is the use case for a property drawer as opp= osed to > +a #+property: line in the zeroth section? I may come around on this at s= ome point, > +but right now it seems more complex, however it might actually be more c= onsistent > +if we imagine the zeroth section as being nested inside a single heading= that has > +level zero implicitly at the top of a document. Unfortunately that means= that such > +property drawers cannot be determined from a homogeneous syntax but inst= ead require > +some operations on the internal representation. Note also that if this w= ere allowed > +then the property drawer should only be allowed as the very first line o= f a file > +because newlines at the start of a file need to be preserved. More thoug= h required.] The statement about property drawers in first section (that how we refer to it in org-element) is correct. First section and its property drawer location is special. I agree that it's inconsistent with normal property drawers. However, we cannot change it without breaking existing Org files. It we decide to change syntax in this area, we should think carefully about possible consequences. > + [fn::Without going into to much detail, affiliated keywords should > +not be distinguished from other keywords at the level of the syntax. > +The fact that they are is an artifact of the elisp implementation. > +The determination of the behavior of a keyword with regard to > +affiliating behavior should be determined in a later pass, even if in > +some cases some implementations may want to materialize them into the > +parser for performance reasons. Allowing users to promote a keyword to > +be an affiliated keyword would be incredibly powerful for attaching > +metadata to parts of org-files in a way that is user extensible. It > +may still be desirable to describe the behavior of affiliated keywords > +here, but they are not in any way distinct from other keywords at the > +level of org syntax and trying to implement them as such is usually a > +mistake (that I have made).] I generally support this idea. Handling keywords in org-element is not pretty. Having them in the parse tree would make things easier. However, we again need to consider back-compatibility. I can imagine third-party ox-* packages breaking if we make this change - we should double check if we decide to change this. > +property of the element they apply to. [fn::While it is tempting to try > +to do this at the level of the grammar it induces a number of nasty > +ambiguities in practice. It is saner to have a single unified keyword > +syntax and then to determine affiliation behavior in a later pass.] Yes, it is saner. However, our syntax document is supposed to be human-readable description of what org-element does. We cannot introduce differences between grammar document and de-facto parser implementation. This will defeat the purpose to providing reference syntax - we will get inconsistency between Emacs Org mode and external parsers. =20=20 > + ~org-element-dual-keywords~ contains =3DCAPTION=3D and =3DRESULTS=3D.]= .[fn:: > + All keywords should allow OPTVAL, it regularizes and simplifies the sy= ntax.] I support this idea. > + [fn:: ~:end:~ may be capitalized (legacy support)] Both :END: and :end: are supported by Org parser. What do you mean by legacy? > + [fn::I suggest that we remove inlinetasks from this document. > +They are a hack that cannot be implemented as part of a grammar > +because they require a concrete value to be specified which breaks > +the arbitrary nesting depth of headings. I think I wrote this somewhere > +else as well, but inline tasks can only be a layer on top of headings, > +they cannot displace them.] I disagree. inilinetasks are a part of syntax de facto and they can be encountered in Org documents in the wild. If you treat inlinetasks as ordinary headings, things may be broken unpredictably during parsing. Instead, we may consider making inlinetask level constant. > +indicate that it should, which is misleading. Further, it is actually > +not possible to implement contents as specified because grammars > +cannot track the indentation level that is required to reconstruct > +list items correctly. Therefore CONTENTS should not be defined as such > +but should only specify that they can be anything except a newline. I > +think that the intent of this document is somewhat a conflation of the > +syntax for org and of the semantics as determined by export backends > +and/or org-element, however it makes it extremely confusing because it > +is not actually possible to parse CONTENTS, they must be reconstructed > +from the parse tree.] Could you elaborate why grammars cannot track the indentation level? AFAIU, If it were the case, python would not be parseable. > + [fn::The failure mode for malformed contents needs to be > +determined more clearly here. We don't want property draws to suddenly > +become plain drawers just because a user has a malformed line, that > +could be disastrous if certain settings in the property drawer mask > +settings from further up the tree. In short, malformed contents > +should not poison the whole property drawer.] Yet, it is exactly what happens in Org. malformed property drawers will become ordinary drawers. > + SWITCHES :: Any number of SWITCH patterns, separated by a single > - space character > + space character [fn::For the love of all that is sane can we > + please just remove this from the spec or mark it as legacy.] I support this idea. > +PLANNING must directly follow HEADING without any blank lines in between= .=20 > + > + [fn::Need a spec for how to handle multiple instances of the same keywo= rd with different values.] The last one wins (as in org-element-planning-parser) > + [fn::As I think I mention elsewhere, the concrete names here > +should NOT be part of the syntax, it makes the parser brittle > +and hard to maintain. Differentiation between entities and fragments > +should be handled at the syntax level for cases where the fragment > +has brackets, and then in a second pass for values that are > +syntactically entity-or-fragment and must be determined after > +the fact.] How would you define entities object then? First/second pass is an implementation detail. Our current description follows how org-element handles entities. > + [fn::We probably want to node that BACKEND can be the empty string > +per that thread on how to deal with intra-word markup. Again this > +also touches on the general principle of wanting to close over the > +empty string so that users aren't surprised when ~@@:lol@@~ suddenly > +appears in plain text just because no backend was specified.] While I am not opposing the idea, your principle is not followed by org-element parser. We may consider changing it, but it is again a whole separate discussion where we need to consider pros and cons. > Note that the first pattern may not occur on an /unindented/ line, as it > -is then a [[#Footnote_Definitions][footnote definition]]. > +is then a [[#Footnote_Definitions][footnote definition]]. [fn::I'm not s= ure this is quite right? > +the font locking code is not consistent with actual behavior, need to > +review the laundry test cases and example files.] Do not look at font-locking. You can safely consider that fontification is wrong in all non-trivial cases. Always check org-element-at-point and org-element-context. =20=20 > - [[#Targets_and_Radio_Targets][radio targets]], [[#Targets_and_Radio_Ta= rgets][targets]], and [[#Timestamps][timestamps]]. > + [[#Targets_and_Radio_Targets][radio targets]], [[#Targets_and_Radio_Ta= rgets][targets]], and [[#Timestamps][timestamps]]. [fn::Like for the comma = in > + macros, I think it would be safe to add ~\|~ as an escape character. > + The issue in the elisp implementation is not actually at the level > + of the syntax, but is actually in the export backends or somewhere > + deeper, because even using a macro that expands to be a pipe ~|~ > + breaks the table (which is really bad).] I am not sure if it is needed. We can already to \vert > + [fn::I have some suggestions for extensions to timestamp syntax to > +support historical and far future dates, as well timezone offsets (NOT > +the 3 letter ambiguous disaster) and seconds and sub-second times.] That would be welcome, but someone=E2=84=A2 should implement timezone suppo= rt in Elisp level. We have several discussions about this in the past. > +The four =3D*/_+=3D may be arbitrarily nested to any depth. Verbatim and > +code =3D=3D~=3D may be nested inside any other markup, but no other mark= up > +will be interpreted inside of them since they are interpreted exactly. That's not accurate. you cannot nest, say, bold inside bold. You cannot put code inside any other markup freely: consider *bold =3Dasd*asd=3D not b= old* Best, Ihor