From mboxrd@z Thu Jan 1 00:00:00 1970 From: Karl Voit Subject: Parsing Org-mode in Python (was: Implementing Org-mode tools in languages other than ELISP) Date: Mon, 6 Jan 2014 11:44:40 +0100 Message-ID: <2014-01-06T11-23-40@devnull.Karl-Voit.at> References: <2013-11-22T17-28-29@devnull.Karl-Voit.at> <3414130.xOGDSAomuL@descartes> <2013-11-22T17-57-08@devnull.Karl-Voit.at> <81482742.cUeHUGJmrV@descartes> <2013-11-24T13-29-07@devnull.Karl-Voit.at> <878uuvssi8.fsf@bzg.ath.cx> <87fvp3snof.fsf@iro.umontreal.ca> <87zjnaidlz.wl%n142857@gmail.com> Reply-To: news1142@Karl-Voit.at Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:33946) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W07g7-0003dM-TZ for emacs-orgmode@gnu.org; Mon, 06 Jan 2014 05:45:09 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1W07g2-00088E-JS for emacs-orgmode@gnu.org; Mon, 06 Jan 2014 05:45:03 -0500 Received: from plane.gmane.org ([80.91.229.3]:44519) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W07g2-00087x-Bk for emacs-orgmode@gnu.org; Mon, 06 Jan 2014 05:44:58 -0500 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1W07fz-0005JL-Uv for emacs-orgmode@gnu.org; Mon, 06 Jan 2014 11:44:55 +0100 Received: from mail.michael-prokop.at ([88.198.6.110]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 06 Jan 2014 11:44:55 +0100 Received: from news1142 by mail.michael-prokop.at with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 06 Jan 2014 11:44:55 +0100 List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: emacs-orgmode@gnu.org Hi! * Daniel Clemente wrote: >> >> I dream of having a general Python parser for Org mode files, knowing >> every bit about the current syntax for Org files, surrounded by enough >> Python machinery to make it useful. Oh, this would be great since there are way more Python-coders out there as ELISP coders. > Try PyOrgMode (https://github.com/bjonnh/PyOrgMode), it works for > some files (but still needs corrections: it crashes with date > formats, with bold markers, etc.). For my blogging system I am implementing [4] I was doing some research on current Org-parsers in Python. My notes about PyOrgMode (2013-05) were that there is not much of a documentation to use it properly and that the list of open todos contains rather basic things to consider it elaborated enough. So far, I consider my own Python parser[1] as the most advanced Python parser so far (unfortunately). However, I am completely aware of its downsides: - it's a very primitive line-by-line parser and not using any classical parsing tool at all (works for me so far!) - it's currently limited to a few Org-mode elements so that I can continue to develop my blogging system - more Org-mode elements (not all!) will be added when my blogging system gets stable enough to add Org-mode syntax features such as tables. - it's not written with the premise to be a stand-alone Org-mode parser since I only need it for my blogging system - feel free to use it and modify it to be a stand-alone parser I do think that for a more general approach, somebody should develop an Org-mode Python parser with classical parsing engines. I do have some experience with ply[2]. Unfortunately, I have to say that using ply feels a bit awkward in Python. I did not get the impression that this is a parsing engine that is done the Python way. A lot of things are done by convention (naming stuff, and so on) which has certain limitations in details. And AFAIR there were more things that puzzled me. However, it got my (simple) job [3] done back then. > You don't need a Lisp interpreter written in Python, only Python > code that understands org syntax without getting confused. I am no expert in this. I do feel that if you are going to use a ELISP interpreter to parse Org-mode syntax for Python, this should completely re-use the original Org-parser and nothing else. I have no idea if this is possible or not. If you have to implement a parser on your own, you probably should stick to Python-only. In order to avoid confusion, your own Python parser implements only a very well defined and documented sub-set of Org-mode syntax and should accept/parse everything else als ordinary text (content). IMHO. HTH. 1. https://github.com/novoid/lazyblorg/blob/master/lib/orgparser.py 2. http://www.dabeaz.com/ply/ 3. https://github.com/novoid/2011-04-tagstore-formal-experiment/tree/master/analysis_and_derived_data/scripts 4. https://github.com/novoid/lazyblorg -- mail|git|SVN|photos|postings|SMS|phonecalls|RSS|CSV|XML to Org-mode: > get Memacs from https://github.com/novoid/Memacs < https://github.com/novoid/extract_pdf_annotations_to_orgmode + more on github