From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nicolas Subject: Re: unnumbered subsections in latex export Date: Thu, 31 Mar 2011 23:58:11 +0200 Message-ID: <87tyejymto.fsf@gmail.com> References: <20110322051038.21655c80@kuru.homelinux.net> <80d3lj9wj6.fsf@somewhere.org> <20110322053134.669127e9@kuru.homelinux.net> <8999.1300804510@alphaville.dokosmarshall.org> <20110322160814.227fc53f@bhishma.homelinux.net> <27844.1300836065@alphaville.usa.hp.com> <8162r9hgxm.fsf@gmail.com> <87bp11dk4h.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain Return-path: Received: from [140.186.70.92] (port=41026 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Q5Psi-0008BO-ES for emacs-orgmode@gnu.org; Thu, 31 Mar 2011 17:58:22 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Q5Psg-0006v5-HY for emacs-orgmode@gnu.org; Thu, 31 Mar 2011 17:58:20 -0400 Received: from mail-wy0-f169.google.com ([74.125.82.169]:54133) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Q5Psg-0006v1-7r for emacs-orgmode@gnu.org; Thu, 31 Mar 2011 17:58:18 -0400 Received: by wyf19 with SMTP id 19so2890225wyf.0 for ; Thu, 31 Mar 2011 14:58:17 -0700 (PDT) In-Reply-To: <87bp11dk4h.fsf@gnu.org> (Bastien's message of "Thu, 24 Mar 2011 08:59:26 +0100") List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Bastien Cc: nicholas.dokos@hp.com, emacs-orgmode@gnu.org, Jambunathan K Hello, Bastien writes: > 2. exporters use various methods to export the file (e.g. the HTML > exporter goes line by line, the LaTeX exporter parses the file and > render each section); > > *Example*: users often ask why the LaTeX exporter cannot export a > headline of level 3 right after a headline of level 1: they ask that > because the HTML exporter can do this, while the LaTeX one cannot. > And the LaTeX one cannot because parsing an ill-structured Org buffer > is tricky for it. > > 3. exporters are maintained by various people: I know the HTML exporter > and the LaTeX one, others know the other exporters, etc. > > I need your help do deal with these issues. > > The first thing to do is to have a list of annoying inconsistencies that > need to be addressed in priority. I have been thinking about exporters for a while now, and I'd like to share my point of view. Be warned, I will be a bit verbose. Honestly, I wouldn't talk about just "annoying inconsistencies". I think we may be running into a serious problem with exporters if some work isn't done about them. Indeed, It seems to me that it is too much difficult to create new exporters and managing them could become unwieldy soon. I have my opinion on how we could anticipate and solve that. At the moment, the export process is done in two parts. At first, the buffer is parsed and changed into a quite complex, and not documented enough, format: this is the job of org-exp.el. It is complex because the new format mixes new string markers ("ORG-CENTER-END\n") and text properties (original-indentation). It isn't documented enough because some of those properties are not exactly defined, and their meaning, or their differences, aren't always explicit (org-protected, org-example, org-verbatim-emph are coming to my mind). It isn't a problem per se, after all Org is also rich and complex, and a simpler way to handle this may not be sufficient. But any person planning to create a new exporter these days has to know all of those subtleties, and pay attention to both visible and invisible markers when parsing the new format. The second part of the export process is backend specific. I'm talking about org-latex.el org-html.el, etc. As Bastien pointed out, they often parse the buffer their own way (line-wise or section-wise), adding one layer of complexity for anyone trying to understand them, and creating inconsistencies at the same time. This is why I think exporting should take a slightly different approach. In essence, org-exp.el should parse itself the format it creates and call functions from backend specific exporters for each environment or object it encounters during the parsing. In other words, specific exporters should only consist in a sum of independent functions, named uniformly (org-html-export-list, org-latex-export-center), and acting recursively on parts of the buffer, in a format precisely documented. Thus, Org documentation should provide an exhaustive list of environments and objects it offers with their associated format during export. Then, creating an exporter should be as simple as providing functions to change every one of them into meaningful strings, which would then be collected by org-exp.el. The immediate benefit is that only those among us patching org-exp.el will have to know the intermediate format it creates, and those creating or patching backends will work on a well-defined format. I'll show two examples to illustrate my point: lists and tables. Taken from a docstring, 1. first item + sub-item one + [X] sub-item two more text in first item 2. [@3] last item will be parsed as: (ordered (nil \"first item\" (unordered (nil "sub-item one") (nil "[CBON] sub-item two")) "more text in first item"") (3 "last item")) This allows to easily (see org-list-to-latex, org-list-to-html, org-list-to-texinfo, and so on) transform an Org list in many different formats. Alas, it cannot be used in org-html.el and org-docbook.el, as those, again, parse buffer line-wise. The same could be said about tables: | Row 1 | 1 | 2 | |-------+---+---| | Row 2 | 3 | 4 | can be parsed as: (("Row 1" "1" "2") 'hline ("Row 2" "3" "4")) and from that, such functions as orgtbl-to-html, or orgtbl-to-latex were easy to create. So, basically, what I suggest here is: 1. list all possible environments and objects offered by the Org format (table, lists, inlinetasks, center, verbatim, paragraph, headlines, time-stamps, LaTeX snippets, footnotes, links, source); 2. define an explicit export format for each of them; 3. determine options that should be know by org-exp, by the backend; 4. create a parser, in org-exp, that will output Org buffer in the chosen format; 5. create (many are readily available) functions for each backend to interpret them. Now about that explicit format. Taking this buffer, --8<---------------cut here---------------start------------->8--- #+title: Example buffer Some text before first headline. * First section First paragraph $\alpha = 1$. Second paragraph. - item 1 - item 2 #+begin_center Text #+end_center | Row 1 | 1 | 2 | | Row 2 | 3 | 4 | * Second section Text with footnote[fn:1]. *************** Inline task Some text and a [[http://www.gnu.org/software/emacs/][link]] :DRAWER: - I like - lists. :END: *************** END * Footnotes [fn:1] Footnote definition. --8<---------------cut here---------------end--------------->8--- It could be parsed as the following: '((:title "Example buffer") (paragraph "Some text before first headline.") (headline "First section" (paragraph "First paragraph " (latex "$\alpha = 1") ".") (paragraph "Second paragraph") (list unordered (nil "item 1") (nil "item 2") (center (paragraph "Text"))) (table ("Row 1" "1" "2") hline ("Row 2" "3" "4"))) (headline "Second section" (paragraph "Text with footnote" (footnote "Footnote definition") ".") (inlinetask "Inline task" (paragraph "Some text and\na " (link "link" "http://www.gnu.org/")) (drawer (list unordered (nil "I like") (nil "lists.")))))) Note that such a parsing will need a decent forward-paragraph function. It's also a very simplified example: headlines would need more than the title string (todo keyword, priority, tags) before starting the body. I have no code to offer at the moment, and, as we all know, Devil is in the details. But if the output from org-exp.el is clear, exporters will be more coherent. It is even provide tools to help exporters doing their task (a function to extract footnotes from the output, for example). Again, it may be a big task to undertake, but I think it will be necessary at some point. Regards, -- Nicolas