From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id OP6+BcJFnl/dJwAA0tVLHw (envelope-from ) for ; Sun, 01 Nov 2020 05:21:06 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id aKhmAcJFnl8OQwAA1q6Kng (envelope-from ) for ; Sun, 01 Nov 2020 05:21:06 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 53875940237 for ; Sun, 1 Nov 2020 05:21:05 +0000 (UTC) Received: from localhost ([::1]:60336 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kZ5na-0007DC-H2 for larch@yhetil.org; Sun, 01 Nov 2020 01:21:02 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:51506) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kZ5n9-0007D4-Fo for emacs-orgmode@gnu.org; Sun, 01 Nov 2020 01:20:35 -0400 Received: from mail-wm1-x336.google.com ([2a00:1450:4864:20::336]:53257) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kZ5n7-0006Zo-35 for emacs-orgmode@gnu.org; Sun, 01 Nov 2020 01:20:35 -0400 Received: by mail-wm1-x336.google.com with SMTP id p22so6310467wmg.3 for ; Sat, 31 Oct 2020 22:20:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=15YZkG2GJ+YQwFSxBac4HQQ2AVcsnCcIQ7+jgxZqkvg=; b=ZwlqUf961wx9w0R12Td1vjjDaLOTjfgYqhajAfhLE1xRYohcMH8xk5AxwXBrhaExog 2Kr6wWN3x13AqTx4NRbr9AbGWe5FVe5a3pyo/20az/0bQuuGasqY2wgPAkGE5Mdl8TX7 RsWes4ovM2PnioFQcvVqZTm45BANUzGBeau9EnM5tbj5IAj3ovglYzU3MLhdqDaLIM5j LQGYQwR0iL5fwzXR9OfWp1/mNRgZHRoAPqHA4fLJEuJHMwCZSlGYCgfWGr98xq5gMDkf p/TuQJncgwFtb0YMbcEOLfp5KMwUbVKHevk6bBJxTTm6lMC0RyHf4BClqLX0YJZUAnY+ YC6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=15YZkG2GJ+YQwFSxBac4HQQ2AVcsnCcIQ7+jgxZqkvg=; b=DNAHmmnOXGhHPnWuFxXGuV+0BH+qwrtFbzQvuB9gNacqeLd+tl3l1MJFtB2pTC6b4T zKvwPCVC7NgYEgJ9SYBV/JlPwLuyoZd/C5yHzNWJGh2+mLHyORnOO5BxHUhDaf1Rl8NG CyLPAp1Y7T5H4Lr1DPCuBrzQz4AazIipxzYPb+7pGSFGsXT/EJsB3hByhueePvdOg3so kH4tsrRcCjFS+XALUfGMSq/PNnShvDp6vSK0ANlwRHOVwKZXslK4aUhtguusxF05Doc0 v1A1tJqeUegTY0HdqKiV2uLpIlhl/IFSMfB5I0XPmiR/RIEfx5AwqEctJtnighOqCgvl QcVQ== X-Gm-Message-State: AOAM532QPxFW5/bwyMb5sChJHlMmtXfICETONKCuk/L8qU9MIYMBCQyE YrV5jJ2xNGbvZt+tlz/96uYZkbTreZDWhr1RYzc= X-Google-Smtp-Source: ABdhPJxnQ/wEol3MjsLiZf0c42a7ETxRKhqg2FCeRGAWKzxYcT29/oxuN8pDM+xBUM8nSj6KC8C9xVCwTsvY1M/BOYI= X-Received: by 2002:a1c:e903:: with SMTP id q3mr11114199wmc.42.1604208031094; Sat, 31 Oct 2020 22:20:31 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Tom Gillespie Date: Sun, 1 Nov 2020 01:20:19 -0400 Message-ID: Subject: Re: Thoughts on the standardization of Org To: Asa Zeren Content-Type: text/plain; charset="UTF-8" Received-SPF: pass client-ip=2a00:1450:4864:20::336; envelope-from=tgbugs@gmail.com; helo=mail-wm1-x336.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: emacs-orgmode Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Scanner: ns3122888.ip-94-23-21.eu Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=ZwlqUf96; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Spam-Score: -1.71 X-TUID: PBWNDctlqrnf Hi Asa, My general take is that any active work toward standardization would be premature. At the very least a full implementation outside of Emacs would need to exist. In the absence of that there is little point to standardization. There is ample existing documentation to build a compliant parser (pandoc exists as well ...) and any effort toward standardization right now would be better spent improving the existing implementation or fixing broken ones (e.g. org-ruby). >From your comments, I would suggest reading through https://orgmode.org/worg/dev/org-syntax.html if you have not done so already. Much of what you mention is already there. If something like standardization is still desired, I would suggest that the proper framing for any such activities would be as improvement and clarification in the documentation, and potentially as formalization of some of the existing behaviors of the system. Org is a fairly stable system, and as others have said, explicitly leaving things open an unspecified would be vital. There are also parts of org (e.g. babel) where the behavior needs to be regularized and made consistent. At the moment those areas need contributors, not standardization. A few more thoughts in line. Best! Tom On Sat, Oct 31, 2020 at 8:22 PM Asa Zeren wrote: > this is impossible. If org catches on before it is standardized, we > end up in the situation of Markdown, with many competing standards and > non-standards. Hence, standardization is essential. The situation for Org is not comparable to markdown. There is a single reference implementation for org at the moment. The codebase is massive. There are many existing parsers for org files. Many are obviously broken since they do not match the reference implementation's behavior. The obviousness is a sign that there is not a need for standardization at this time. Further, there is little risk that another impl will be created without interoperating with the elisp implementation. For example, consider Mauro's use case: being able to get colleagues who do not use Emacs to use Org. I suspect most of the people who would be working on other implementations would be starting from Emacs and would be unlikely to leave. Also unlike markdown, html export is just one tiny part of Org, whereas markdown was implemented repeatedly to allow text input on web pages where people needed to implement parts of html that had not already been specified in markdown. > Standardizing org is much harder than standardizing something like > Markdown, but I think by breaking it down as follows will maximize the > portability of org while not compromising on development of org. See some of my other recent emails. In the short term this is impossible due to the deep dependence on Emacs Lisp. Any outside implementation that is created today would have to implement elisp. Few have been able to do this in over 30 years. Moving beyond elisp requires additional machinery to be added to org to be able to specify other top level langauges. This is not something that is remotely ready for standardization because no one even has a single working implementation yet! > I see three areas of standardization, which I think should be > standardized separately: > - Org DOM No. This is an implementation detail (see below for more). > - Org Syntax This is pretty much done, there are some outstanding points for discussion, but they are about implementation details, not about the contents of the syntax. Also extension of the syntax needs to be open and defined entirely by the elisp implementation, as mentioned by others. > - Org Standard Environments Read https://orgmode.org/worg/dev/org-syntax.html. It will get you up to speed with the existing terminology that is used in the community. > > Org DOM: > The first thing to specify is the org DOM. (Maybe a different name > should be used to avoid confusion with the HTML DOM) This is the > structure of an org-mode document, without the textual > representation. Many org-related tools operate on org documents > without needing to use the textual representation. Specifying the DOM > separately would (a) create a separation of concerns and (b) allow for > better libraries built around org mode. Depending on exactly what you mean by DOM this does not need to be standardized. There are a couple of points that need to be clarified regarding how to treeify the flat list of elements that come out of a parse in order to tie things like associated keywords to the correct elements, but these are quite minimal. The potential rats nest that is trying to standardize a DOM when it is an implementation detail means that I would strongly discourage even thinking about Org in that way. I would even discourage putting too much emphasis on the org-element api which, while extremely useful inside Emacs, is not something that should be standardized because it is a detail peculiar to the elisp implementation. There are cases where certain behaviors, such as how to parse and format footnotes, could be specified, but such behaviors don't require a dom in order to be specified, and adding a DOM to the picture does nothing but complicate the format. Org is a text format. The semantics for interaction with the text format are defined entirely by the text representation (In Emacs there.is.only.buffer). Other semantics, such as export to html and latex, are not something that you would want to try to standardize, you would likely lose friends, enemies, and whatever sanity you had left at the end (see discussion on Mauro's thread about the fact that it is probably just easier to use Emacs directly if you need to export to a certain format in a specific way. It is free software after all.) To the extent that an element tree could be useful, I think it would be as a concept in an implementation guide, not as something formally specified. > Org Syntax: > This would be specifying the mapping between the DOM and the textual > representation, specified in terms of an environment. There is no DOM. Modification to an org document must be made on the text representation otherwise it is meaningless. This isn't html where there is no canonical representation outside the DOM. The text representation of an org document IS the canonical representation (modulo a normalization pass). > Org Standard Environments: > This is how I would specify elements such as #+begin_src..#+end_src > would be specified, as standardized elements of the environment. This > would be structured as a number of individual standard environments, > such as "Source Blocks" or "Standard Header Properties" (specifying > #+title, #+author, etc.) These are well specified already in the worg syntax draft. There are a couple of special cases such as src and example blocks that could be included explicitly in the syntax to facilitate interoperability with parsers for org babel languages. Beyond that, the community already has vocabulary that covers what you describe here, as mentioned above.