From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id ELt7GEn8qmFa/gAAgWs5BA (envelope-from ) for ; Sat, 04 Dec 2021 06:27:37 +0100 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id KIYzFEn8qmF1YwAA1q6Kng (envelope-from ) for ; Sat, 04 Dec 2021 05:27:37 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id D191818758 for ; Sat, 4 Dec 2021 06:27:36 +0100 (CET) Received: from localhost ([::1]:57414 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mtNaB-0002Ez-9n for larch@yhetil.org; Sat, 04 Dec 2021 00:27:35 -0500 Received: from eggs.gnu.org ([209.51.188.92]:60914) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mtNZH-0002Er-Bu for emacs-orgmode@gnu.org; Sat, 04 Dec 2021 00:26:39 -0500 Received: from [2a00:1450:4864:20::436] (port=39619 helo=mail-wr1-x436.google.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mtNZE-0001bc-TC for emacs-orgmode@gnu.org; Sat, 04 Dec 2021 00:26:39 -0500 Received: by mail-wr1-x436.google.com with SMTP id a18so10187974wrn.6 for ; Fri, 03 Dec 2021 21:26:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=ynFEk6jmP6sX5Asi/5errx3n8DzmMi9okgdeENZJvv4=; b=FTfdeJjdzS4ad/JPaYIoBdOHYndlbTVG7R5vdgeLXbR66+tIEnt6aus+QZuQ9GHO5O llwsY/EjmTpp8fwT+l6y78u6hHstZBKqcRtlUChrUCy35W2+GQheylLEWZLtP/A5fKqg Sl8XNfEl0OZnrq+faQt7xOZdZl0r6k5/M+Xch/vAGw2pICAkK5kYcThdJ+Cep4yAYzGS 1Ajl7/mkG4tJrh3LfpW2OI2Bi2nh5cDy7Er/Wysteb6EcJKyULrt0MXm9v0PFfAYMcCu NaLVmNA7laSxY8VGsZdwhcYc+oH+Y1GRfgd0MnqduuUwjL2LhSFnY+HECwb+eibt50jl z2KQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=ynFEk6jmP6sX5Asi/5errx3n8DzmMi9okgdeENZJvv4=; b=6s2hTmatF4y+LaV7UqRwm6fzTLHzn0cEEVB137G2ljTT5eMnBfssGbZAD7NYdZ1KHH 9xEZsKTfsM6o7a/eBy90EX0CxBF4eIxxYEcx8ggMC0YQH2Xn6nTh8+kIY/WAMqJSa1RD sSm23pBrdU0p3MXGd0cY8Lu/rp7oCctglaR51mbRq92NyJNOdNIOnHJod4C5UbfTEJ+R FB5zAAJX8qfvnzrja31oXiaZ3T+KrOjklIqcCiE/6h6kL2QQCeBRwUulK4A361Bq+lMN Hl+nAmwNfLaFCAcmOa7ap7v1avERuz36XSqYiWJwaMPaGtUw9TZs/wbXVk01lLS9voKn 6qcQ== X-Gm-Message-State: AOAM532fhS/3AK0SfKbIdlLsNPlwgT03mHkuatyeL2f4dFTiD5E8FEbT NJjVI1nn6MKBaklEt9KeU46awEjCFyieRdnf6cg= X-Google-Smtp-Source: ABdhPJwhwLA0Y/qrtivNLDTxMzpmxNFObkF+7NlbLLnzkroElrV6i1uaYzJjvCMhsK4AR1/IT3NmgVri9+zEG9WXEx0= X-Received: by 2002:a5d:6a89:: with SMTP id s9mr27177370wru.123.1638595595176; Fri, 03 Dec 2021 21:26:35 -0800 (PST) MIME-Version: 1.0 References: <87mtljpd1w.fsf@gmail.com> <871r2upy8s.fsf@gmail.com> In-Reply-To: <871r2upy8s.fsf@gmail.com> From: Tom Gillespie Date: Fri, 3 Dec 2021 21:26:23 -0800 Message-ID: Subject: Re: Some commentary on the Org Syntax document To: Timothy Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Host-Lookup-Failed: Reverse DNS lookup failed for 2a00:1450:4864:20::436 (failed) Received-SPF: pass client-ip=2a00:1450:4864:20::436; envelope-from=tgbugs@gmail.com; helo=mail-wr1-x436.google.com X-Spam_score_int: -12 X-Spam_score: -1.3 X-Spam_bar: - X-Spam_report: (-1.3 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, PDS_HP_HELO_NORDNS=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: emacs-orgmode Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Migadu-Flow: FLOW_IN X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1638595656; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=ynFEk6jmP6sX5Asi/5errx3n8DzmMi9okgdeENZJvv4=; b=iHCp16L4kLt5m+EB0MWNmB1aaUcznY2PN61rdexb6/+fOA6nNct7mD5MBXDLa8pbiyNYlp dVfSDL2syf9clAD/sQi1zFbWSrbCId2NT6htTWIYlgA8MgwfV3WffAFOgsW4KOiiwiIpyo vEYae4LwNNDutj6aon+6WZ8WGsOdhg5HwiVsbXJqH0K4XFOhqeaxM2/td2OHw1GiD2R4FN /I4j6i7/aGhXEh4UcQnmFpicKsenSL2pE6exp1+zAPePkNwoivKdqu/8g5glfgcco57buB cnx00BV7Ibvk5zBzTttb5qCRTTYsAg8eG93chkiRR+yCuTEjk+rkUZOr1LivhQ== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1638595656; a=rsa-sha256; cv=none; b=BEu8L26wtl2e555+1dTcijBJFvfY7aFjOLpqlmyGR+h6PDmKgN043yzOufGWukB5Poev6D g7wYK4/1y87l5AoNcf/TTOJju+yfJMIR+BXw3mm4C4j4P6Eg7Yq+0zu/U2b5ceO5yNoHqU 9g0ND9Eusf+TI4hcLabKP6G3pcVuSHxo7yZK5vodF/s/jM4iNS+s5SMTN6D4GcBdPD2zp5 m9PA7wZR8AEBgpdd+rA1OO+auZg6d4pB8P56c7kci7Htur3evEHzaIfBpc/jB3Vn3j84Oe n/M0om74XovpnyVjGUTIaanbrHZFTv7HNoje6yKfAUNjShgvrgkGprK93v0Dvg== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=FTfdeJjd; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -4.13 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=FTfdeJjd; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: D191818758 X-Spam-Score: -4.13 X-Migadu-Scanner: scn0.migadu.com X-TUID: eRyv1yn5QUts Hi Timothy, Replies in line. Some things might seem a bit out of order because I responded from bottom to top. Best, Tom > from heading to bed, so to quote Pascal "I have only made this letter > longer because I have not had the time to make it shorter". Likewise, and I've heard it as Mark Twain :D > I think a a big problem is the mix of implicit and explicit information. > Some components are rigorously specified in terms of the characters they > may contain, elements and objects that are recognised inside them, and > even the order in which different parts of the pattern are parsed. I agree completely. > As mentioned originally, the current Dynamic Blocks description doesn't > even mention the CONTENTS part of the pattern, and relies on the reader > inferring that it operates similarly to the CONTENTS part of Drawers. Indeed this should be fixed. > Forcing the reader to start making inferences like this is a treacherous > path, and I think I can blame for some of the other issues I've > experienced. Take for instance the "surely X can't contain a newline?" > comments I've made. In the Node Properties and Entities descriptions you > have statements along the lines of "X can contain any character [...] > except a newline". In my mind this then sets up the reader to interpret > a similar statement without the "except a newline" clause to mean that > newlines are permitted. I agree completely and had almost the exact same experience as you when I was working on it. As I mention below, my responses were to illustrate why the explicit information is missing, not to suggest that it should be left out. We should definitely work to make everything more explicit so that future readers don't have to go through the same issues we have. > I'm also thinking that the term "element" is overworked in the document. > It's basically pulling tripple duty: you have Elements, Greater > Elements, and elements which are Elements and/or Greater Elements =F0=9F= =98=93. In extreme agreement. > 3. Section Technically This isn't part of the syntax, rather it is part of elisp Org mode's internal representation. I'm not sure I would even mention sections at all, because they have to do with the interpretation of the syntax. In a section on the internal representation for Org sections definitely belong, but they are incidental. That said, I suspect we will find that they are useful for talking about the behavior of the file under transformation, e.g. "headings are not reordered when pressing M-up or M-down, sections are reordered" this allows us to make it possible to talk about an Org implementation that has commands that allow one to switch the headings without moving their associated sections. > 5. (Greater Element / Element) There are issues here with forms that are part of the syntax vs forms that are part of the intermediate representation. A line based parser for Org syntax that assembles greater blocks after the fact and a parser that uses arbitrary lookahead to truncate on headings won't have the exact same surface syntax, however they will both have an equivalent in their intermediate representation that corresponds to a greater block. Again, very deep in implementation details here, but trying to force things like sections into the syntax hierarchy seems confusing to me. > 7. Object Paragraph element maybe? Might seem odd for heading titles to have paragraph scope, but on the other hand it certainly simplifies the explanation of the grammar. And you can put an inline footnote in a heading title. > 8. Pattern / Form Don't know what to make of this one. Like "Term" these are incredibly generic. > 9. Term Use of "Term" is super confusing to me. > We could say call (1) Components, (7) Units, (6) Objects, (5) Element or > Object (why not spell it out to avoid telling people to remember > something). I'm not sure we are ready to specify this. One way that we might try to manage this would be to create a taxonomy of element types, e.g. top-level elements, paragraph elements, etc. This would be consistent with the fact that the elisp implementation of org-element has all of these as an instance of element. > I could have put more thought into this, but it should do for > illustrating my line of thinking. Let me know if you have any good > ideas. Let's leave the terminology as is right now. I'm expecting that there will be quite a few new terms that we will want to introduce and we will want to separate syntax and intermediate representation. With progress on using org-element for fontification and on laundry we should be able to come up with language that can be used to distinguish between concepts that are needed for syntax, (tokens, parser) and for intermediate representations. Things like basic syntax highlighting need only the language for syntax to be specified, but more complex syntax such as babel font-locking either requires a more advanced tokenizer or it requires that we talk about it at the level of the intermediate representation. Other things such as behavior in response to commands (e.g. M-up and M-down mentioned above) require the language of the intermediate representation. > A separate improvement could be using more formatting to distinguish > when terms are used in a particular way. I think it will be clearer to come up with distinct terms. There are times where this stuff has to be talked about in spoken language and it is hard to speak /*_markup_*/. > I've sort of covered this before, but I think the document would benefit > from being more explicit in general. Yes. The reason I brought this up was to indicate the reason why an explicit account was not present, not to suggest that we shouldn't add one. Overall the more explicit we can be the better the document. I have some stashed changes in worg from the time I was reading this syntax document deeply. I'll see if any of them are relevant for the pass you are doing now. > Specifically regarding newlines, perhaps we could add something like > this to the start of the Objects section? > > "Furthermore, while many objects may contain newlines, an empty line > (i.e. a double newline) often terminates the element that the object is > a part of, such as a paragraph." Good idea. > On this, I'm cautiously optimistic about the discussion about using > org-element for fontification. Likewise. Though I expect there will be some growing pains based on the divergent behaviors I have seen while developing the laundry test cases. > I must thank you and Ihor for pointing me to > org-element-object-restrictions! I wasn't aware of that till now, and > it's most helpful. Should all the information given by it be included in > the Syntax document? I lean towards saying yes. I'm not entirely sure. I think this may be one area where we don't want to over-specify. I consider it an implementation detail. For example, when we were discussing valid scopes for org-cite syntax a few months ago https://lists.gnu.org/archive/html/emacs-orgmode/2021-09/msg0012= 8.html I suggested that the [cite:] syntax could appear in property drawers. Nicolas corrected me on that. However, there is no reason why a parser should be prevented from recognizing [cite:] syntax wherever it wants --- so long as it does not immediately expand that syntax and execute it to add/include such a citation in the exported file. For example, in laundry I would parse it and have it expand to a no-op when exporting, but still have it expand for user interaction so that they could jump to the citation reference by clicking in the buffer. Similar thi= ng for syntax in comment comment blocks where I frequently abuse the fact that it is possible to jump to org links that are in comment blocks to make it easier to navigate files. In short, elisp Org mode doesn't have a single intermediate representation atm, so syntactic restrictions listed by org-element-object-restrictions are overly narrow and should not be included in the spec for the syntax because they can be controlled at other levels of the implementation in cases where there is a unified intermediate representation. > I'm not sure this element =3D Element / Greater Element "shorthand" is > doing us any favours, but I've discussed that already... Agree. (see response above, I responded from bottom to top) The object/element/greater-element/org-element/org-object is supremely confusing. We got the name for heading updated, (or are in the process of doing so?), but at some point I think we should see if we can make this a bit less confusing. Too many collisions when dropping a single qualifier. > Is it? Perhaps I'm not doing it right but it didn't seem bad to me when > implementing my parser (though I need to add the element support). For a ... fun? time see the test case I cooked up for plain lists (linked below) and then consider how to deal with cases where someone has put a source block at some indent level. IIRC the suggested behavior is to truncate leading whitespace to the #+end_src level. Tracking the indentation level is required to correctly ressemble the nesting of the lists and cannot be done during tokenization or during parsing as a result indentation level must be retained for _all_ paragraphs because they might be preceded by a plain list line. Not hard to implement, just a lot of things to keep track, thus complex. https://github.com/tgbugs/laundry/blame/c90700bd1c15d7b04e5ead44ac10005d8d2= ada50/laundry/test.org#L70-L91