emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Karl Voit <devnull@Karl-Voit.at>
To: emacs-orgmode@gnu.org
Subject: Parsing Org-mode in Python (was: Implementing Org-mode tools in languages other than ELISP)
Date: Mon, 6 Jan 2014 11:44:40 +0100	[thread overview]
Message-ID: <2014-01-06T11-23-40@devnull.Karl-Voit.at> (raw)
In-Reply-To: 87zjnaidlz.wl%n142857@gmail.com

Hi!

* Daniel Clemente <n142857@gmail.com> wrote:
>> 
>> I dream of having a general Python parser for Org mode files, knowing
>> every bit about the current syntax for Org files, surrounded by enough
>> Python machinery to make it useful.

Oh, this would be great since there are way more Python-coders out
there as ELISP coders.

> Try PyOrgMode (https://github.com/bjonnh/PyOrgMode), it works for
> some files (but still needs corrections: it crashes with date
> formats, with bold markers, etc.).

For my blogging system I am implementing [4] I was doing some
research on current Org-parsers in Python.

My notes about PyOrgMode (2013-05) were that there is not much of a
documentation to use it properly and that the list of open todos
contains rather basic things to consider it elaborated enough.

So far, I consider my own Python parser[1] as the most advanced
Python parser so far (unfortunately). However, I am completely aware
of its downsides:

- it's a very primitive line-by-line parser and not using any classical
  parsing tool at all (works for me so far!)
- it's currently limited to a few Org-mode elements so that I can
  continue to develop my blogging system
  - more Org-mode elements (not all!) will be added when my blogging
    system gets stable enough to add Org-mode syntax features such
    as tables.
- it's not written with the premise to be a stand-alone Org-mode
  parser since I only need it for my blogging system
  - feel free to use it and modify it to be a stand-alone parser

I do think that for a more general approach, somebody should develop
an Org-mode Python parser with classical parsing engines. I do have
some experience with ply[2]. Unfortunately, I have to say that using
ply feels a bit awkward in Python. I did not get the impression that
this is a parsing engine that is done the Python way. A lot of
things are done by convention (naming stuff, and so on) which has
certain limitations in details. And AFAIR there were more things that
puzzled me. However, it got my (simple) job [3] done back then.

> You don't need a Lisp interpreter written in Python, only Python
> code that understands org syntax without getting confused.

I am no expert in this. I do feel that if you are going to use a
ELISP interpreter to parse Org-mode syntax for Python, this should
completely re-use the original Org-parser and nothing else. I have
no idea if this is possible or not.

If you have to implement a parser on your own, you probably should
stick to Python-only.

In order to avoid confusion, your own Python parser implements only
a very well defined and documented sub-set of Org-mode syntax and
should accept/parse everything else als ordinary text (content).
IMHO.

HTH.

  1. https://github.com/novoid/lazyblorg/blob/master/lib/orgparser.py
  2. http://www.dabeaz.com/ply/
  3. https://github.com/novoid/2011-04-tagstore-formal-experiment/tree/master/analysis_and_derived_data/scripts
  4. https://github.com/novoid/lazyblorg
-- 
mail|git|SVN|photos|postings|SMS|phonecalls|RSS|CSV|XML to Org-mode:
       > get Memacs from https://github.com/novoid/Memacs <

https://github.com/novoid/extract_pdf_annotations_to_orgmode + more on github

  reply	other threads:[~2014-01-06 10:45 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-22 16:37 Converting org-mode/org-contacts to VCard (importing to Android) Karl Voit
2013-11-22 16:54 ` Rüdiger Sonderfeld
2013-11-22 17:09   ` Karl Voit
2013-11-23  0:00     ` Rüdiger Sonderfeld
2013-11-24 13:06       ` Implementing Org-mode tools in languages other than ELISP (was: Converting org-mode/org-contacts to VCard (importing to Android)) Karl Voit
2014-01-04 14:58         ` Implementing Org-mode tools in languages other than ELISP Bastien
2014-01-04 16:43           ` François Pinard
2014-01-05 16:43             ` Daniel Clemente
2014-01-06 10:44               ` Karl Voit [this message]
2014-01-07  2:33                 ` Parsing Org-mode in Python François Pinard
2014-01-07 16:09                 ` Brett Viren
2014-01-08 12:33                   ` François Pinard
2014-01-08 15:42                     ` Brett Viren
2014-01-08 16:11                       ` François Pinard
2014-01-09  4:13                       ` Daniel Clemente
2014-01-09 14:13                         ` Brett Viren
2014-01-07  1:47               ` Implementing Org-mode tools in languages other than ELISP François Pinard
2013-11-23  8:36     ` Converting org-mode/org-contacts to VCard (importing to Android) Feng Shu
2013-11-23  8:29 ` Feng Shu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2014-01-06T11-23-40@devnull.Karl-Voit.at \
    --to=devnull@karl-voit.at \
    --cc=emacs-orgmode@gnu.org \
    --cc=news1142@Karl-Voit.at \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).