emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Nicolas <n.goaziou@gmail.com>
To: Bastien <bzg@altern.org>
Cc: nicholas.dokos@hp.com, emacs-orgmode@gnu.org,
	Jambunathan K <kjambunathan@gmail.com>
Subject: Re: unnumbered subsections in latex export
Date: Thu, 31 Mar 2011 23:58:11 +0200	[thread overview]
Message-ID: <87tyejymto.fsf@gmail.com> (raw)
In-Reply-To: <87bp11dk4h.fsf@gnu.org> (Bastien's message of "Thu, 24 Mar 2011 08:59:26 +0100")

Hello,

Bastien <bzg@altern.org> writes:

> 2. exporters use various methods to export the file (e.g. the HTML
>    exporter goes line by line, the LaTeX exporter parses the file and
>    render each section);
>
>    *Example*: users often ask why the LaTeX exporter cannot export a
>    headline of level 3 right after a headline of level 1: they ask that
>    because the HTML exporter can do this, while the LaTeX one cannot.
>    And the LaTeX one cannot because parsing an ill-structured Org buffer
>    is tricky for it.
>
> 3. exporters are maintained by various people: I know the HTML exporter
>    and the LaTeX one, others know the other exporters, etc.
>
> I need your help do deal with these issues.
>
> The first thing to do is to have a list of annoying inconsistencies that
> need to be addressed in priority.

I have been thinking about exporters for a while now, and I'd like to
share my point of view. Be warned, I will be a bit verbose.

Honestly, I wouldn't talk about just "annoying inconsistencies". I think
we may be running into a serious problem with exporters if some work
isn't done about them. Indeed, It seems to me that it is too much
difficult to create new exporters and managing them could become
unwieldy soon. I have my opinion on how we could anticipate and solve
that.

At the moment, the export process is done in two parts. At first, the
buffer is parsed and changed into a quite complex, and not documented
enough, format: this is the job of org-exp.el. It is complex because the
new format mixes new string markers ("ORG-CENTER-END\n") and text
properties (original-indentation). It isn't documented enough because
some of those properties are not exactly defined, and their meaning, or
their differences, aren't always explicit (org-protected, org-example,
org-verbatim-emph are coming to my mind).

It isn't a problem per se, after all Org is also rich and complex, and
a simpler way to handle this may not be sufficient. But any person
planning to create a new exporter these days has to know all of those
subtleties, and pay attention to both visible and invisible markers when
parsing the new format.

The second part of the export process is backend specific. I'm talking
about org-latex.el org-html.el, etc. As Bastien pointed out, they often
parse the buffer their own way (line-wise or section-wise), adding one
layer of complexity for anyone trying to understand them, and creating
inconsistencies at the same time.

This is why I think exporting should take a slightly different approach.
In essence, org-exp.el should parse itself the format it creates and
call functions from backend specific exporters for each environment or
object it encounters during the parsing. In other words, specific
exporters should only consist in a sum of independent functions, named
uniformly (org-html-export-list, org-latex-export-center), and acting
recursively on parts of the buffer, in a format precisely documented.

Thus, Org documentation should provide an exhaustive list of
environments and objects it offers with their associated format during
export. Then, creating an exporter should be as simple as providing
functions to change every one of them into meaningful strings, which
would then be collected by org-exp.el. The immediate benefit is that
only those among us patching org-exp.el will have to know the
intermediate format it creates, and those creating or patching backends
will work on a well-defined format.

I'll show two examples to illustrate my point: lists and tables. Taken
from a docstring, 

1. first item
   + sub-item one
   + [X] sub-item two
   more text in first item
2. [@3] last item

will be parsed as:

(ordered (nil \"first item\"
              (unordered (nil "sub-item one")
                         (nil "[CBON] sub-item two"))
              "more text in first item"")
         (3 "last item"))

This allows to easily (see org-list-to-latex, org-list-to-html,
org-list-to-texinfo, and so on) transform an Org list in many different
formats. Alas, it cannot be used in org-html.el and org-docbook.el, as
those, again, parse buffer line-wise.

The same could be said about tables:

| Row 1 | 1 | 2 |
|-------+---+---|
| Row 2 | 3 | 4 |

can be parsed as:

(("Row 1" "1" "2")
 'hline
 ("Row 2" "3" "4"))

and from that, such functions as orgtbl-to-html, or orgtbl-to-latex were
easy to create.

So, basically, what I suggest here is:

1. list all possible environments and objects offered by the Org format
   (table, lists, inlinetasks, center, verbatim, paragraph, headlines,
   time-stamps, LaTeX snippets, footnotes, links, source);
2. define an explicit export format for each of them;
3. determine options that should be know by org-exp, by the backend;
4. create a parser, in org-exp, that will output Org buffer in the
   chosen format;
5. create (many are readily available) functions for each backend to
   interpret them.


Now about that explicit format. Taking this buffer,

--8<---------------cut here---------------start------------->8---
#+title: Example buffer

Some text before first headline.

* First section

  First paragraph $\alpha = 1$.

  Second paragraph.

  - item 1
  - item 2
    #+begin_center
    Text
    #+end_center

  | Row 1 | 1 | 2 |
  | Row 2 | 3 | 4 |

* Second section

  Text with footnote[fn:1].
*************** Inline task
                Some text and
                a [[http://www.gnu.org/software/emacs/][link]]
                :DRAWER:
                - I like
                - lists.
                :END:
*************** END

* Footnotes
[fn:1] Footnote definition.
--8<---------------cut here---------------end--------------->8---

It could be parsed as the following:

'((:title "Example buffer")
  (paragraph "Some text before first headline.")
  (headline "First section" 
            (paragraph "First paragraph " 
                       (latex "$\alpha = 1") 
                       ".")
            (paragraph "Second paragraph")
            (list unordered (nil "item 1") 
                            (nil "item 2")
                            (center (paragraph "Text")))
            (table ("Row 1" "1" "2")
                   hline
                   ("Row 2" "3" "4")))
  (headline "Second section"
            (paragraph "Text with footnote"
                       (footnote "Footnote definition")
                       ".")
            (inlinetask "Inline task"
                        (paragraph "Some text and\na "
                                   (link "link" "http://www.gnu.org/"))
                        (drawer (list unordered (nil "I like")
                                                (nil "lists."))))))

Note that such a parsing will need a decent forward-paragraph function.
It's also a very simplified example: headlines would need more than the
title string (todo keyword, priority, tags) before starting the body.

I have no code to offer at the moment, and, as we all know, Devil is in
the details. But if the output from org-exp.el is clear, exporters will
be more coherent. It is even provide tools to help exporters doing their
task (a function to extract footnotes from the output, for example).

Again, it may be a big task to undertake, but I think it will be
necessary at some point.

Regards,

-- 
Nicolas

  parent reply	other threads:[~2011-03-31 21:58 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-22 12:10 unnumbered subsections in latex export Suvayu Ali
2011-03-22 12:20 ` Sébastien Vauban
2011-03-22 12:31   ` Suvayu Ali
2011-03-22 12:56     ` Sébastien Vauban
2011-03-22 14:26       ` [PATCH] Allow mixed export of numbered and unnumbered sections in LaTeX Lawrence Mitchell
2011-03-22 22:52         ` Suvayu Ali
2011-03-23 14:04         ` [Accepted] " Bastien Guerry
2011-03-23 14:17         ` [PATCH] " Bastien
2011-03-22 14:35     ` Re: unnumbered subsections in latex export Nick Dokos
2011-03-22 23:08       ` Suvayu Ali
2011-03-22 23:21         ` Nick Dokos
2011-03-23  9:38           ` [PATCH] Allow mixed export of numbered and unnumbered sections in HTML Lawrence Mitchell
2011-03-23 14:05             ` [Accepted] " Bastien Guerry
2011-03-23 14:57               ` Nick Dokos
2011-03-23 15:50                 ` Suvayu Ali
2011-03-23 14:18           ` Re: unnumbered subsections in latex export Bastien
2011-03-23 15:02             ` Nick Dokos
2011-03-23 16:25               ` Lawrence Mitchell
2011-03-23 16:42                 ` Nick Dokos
2011-03-23 18:17                   ` Jambunathan K
2011-03-23 19:00                     ` Nick Dokos
2011-03-23 19:18                       ` Jambunathan K
2011-03-23 16:29               ` Thomas S. Dye
2011-03-23 17:42           ` Jambunathan K
2011-03-24  7:59             ` Bastien
2011-03-24 18:27               ` Achim Gratz
2011-03-24 19:25               ` Nick Dokos
2011-03-25  1:06                 ` Suvayu Ali
2011-04-04 14:39                 ` Sébastien Vauban
2011-04-04 17:04                   ` Nick Dokos
2011-04-04 20:32                   ` Aankhen
2011-04-05 10:16                     ` Sébastien Vauban
2011-04-05 19:07                       ` Aankhen
2011-04-05 19:27                         ` Eric S Fraga
2011-04-05 21:25                           ` New features for the exporters? Sébastien Vauban
2011-04-05 21:45                           ` Re: unnumbered subsections in latex export Aankhen
2011-04-06 18:49                   ` Matt Lundin
2011-04-06 20:19                     ` Sébastien Vauban
2011-03-27 11:16               ` Jambunathan K
2011-03-27 11:40                 ` Bastien
2011-03-31 21:58               ` Nicolas [this message]
2011-04-01  4:34                 ` Jambunathan K
2011-04-01  4:41                   ` Jambunathan K
2011-04-01  6:29                   ` Nick Dokos
2011-04-01 15:41                   ` Eric S Fraga
2011-04-04 14:00                     ` Matt Lundin
2011-04-04 14:12                       ` Jambunathan K
2011-04-04 16:36                         ` Matt Lundin
2011-04-04 17:09                           ` Nick Dokos
2011-04-01  7:39                 ` Jambunathan K
2011-04-01 18:25                 ` Achim Gratz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87tyejymto.fsf@gmail.com \
    --to=n.goaziou@gmail.com \
    --cc=bzg@altern.org \
    --cc=emacs-orgmode@gnu.org \
    --cc=kjambunathan@gmail.com \
    --cc=nicholas.dokos@hp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).