emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: "Przemysław Kamiński" <pk@intrepidus.pl>
To: emacs-orgmode@gnu.org
Subject: Re: official orgmode parser
Date: Wed, 16 Sep 2020 14:09:42 +0200	[thread overview]
Message-ID: <fb792b49-7387-7a43-640c-5e76b91b50b1@intrepidus.pl> (raw)
In-Reply-To: <20200915123722.GA20532@tuxteam.de>

On 9/15/20 2:37 PM, tomas@tuxteam.de wrote:
> On Tue, Sep 15, 2020 at 01:15:56PM +0200, Przemysław Kamiński wrote:
> 
> [...]
> 
>> There's the org-json (or ox-json) package but for some reason I
>> wasn't able to run it successfully. I guess export to S-exps would
>> be best here. But yes I'll check that out.
> 
> If that's your route, perhaps the "Org element API" [1] might be
> helpful. Especially `org-element-parse-buffer' gives you a Lisp
> data structure which is supposed to be a parse of your Org buffer.
> 
>  From there to S-expression can be trivial (e.g. `print' or `pp'),
> depending on what you want to do.
> 
> Walking the structure should be nice in Lisp, too.
> 
> The topic of (non-Emacs) parsing of Org comes up regularly, and
> there is a good (but AFAIK not-quite-complete) Org syntax spec
> in Worg [2], but there are a couple of difficulties to be mastered
> before such a thing can become really enjoyable and useful.
> 
> The loose specification of Org's format (arguably its second
> or third strongest asset, the first two being its incredible
> community and Emacs itself) is something which makes this
> problem "interesting". People have invented lots of usages
> which might be broken should Org change to a strict formal
> spec. You don't want to break those people.
> 
> But yes, perhaps some day someone nails it. Perhaps it's you :)
> 
> Cheers
> 
> [1] https://orgmode.org/worg/dev/org-element-api.html
> [2] https://orgmode.org/worg/dev/org-syntax.html
> 
>   - t
> 

So I looked at (pp (org-element-parse-buffer)) however it does print out 
recursive stuff which other schemes have trouble parsing.

My code looks more or less like this:

(defun org-parse (f)
   (with-temp-buffer
     (find-file f)
     (let* ((parsed (org-element-parse-buffer))
            (all (append org-element-all-elements org-element-all-objects))
            (mapped (org-element-map parsed all
                      (lambda (item)
                        (strip-parent item)))))
       (pp mapped))))


strip-parent is basically (plist-put props :parent nil) for elements 
properties. However it turns out there are more recursive objects, like

:title
           #("Headline 1" 0 10
             (:parent
              (headline #2
                            (section

So I'm wondering do I have to do it by hand for all cases or is there 
some way to output only a simple AST without those nested objects?

Best,
Przemek


  parent reply	other threads:[~2020-09-16 12:16 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-15  7:58 official orgmode parser Przemysław Kamiński
2020-09-15  8:44 ` Gerry Agbobada
2020-09-16 16:36   ` Matt Huszagh
2020-09-23  8:09   ` Bastien
2020-09-15  9:03 ` Tim Cross
2020-09-15  9:17   ` Przemysław Kamiński
2020-09-15  9:55     ` Russell Adams
2020-09-15 11:15       ` Przemysław Kamiński
2020-09-15 12:37         ` tomas
2020-09-15 18:09           ` Diego Zamboni
2020-09-16 12:09           ` Przemysław Kamiński [this message]
2020-09-16 12:20             ` tomas
2020-09-16 12:27             ` Ihor Radchenko
2020-09-16  0:16     ` Tim Cross
2020-09-16  7:24     ` Marcin Borkowski
2020-09-16  7:56       ` Ihor Radchenko
2020-09-16 11:36         ` Przemysław Kamiński
2020-09-16 12:02           ` Ihor Radchenko
2020-09-16 12:15             ` Przemysław Kamiński
2020-09-17  1:18               ` Ihor Radchenko
2020-09-17 15:24                 ` Przemysław Kamiński
2020-09-23  8:09 ` Bastien
2020-09-23 17:46   ` Przemysław Kamiński
2020-09-23 19:50     ` rey-coyrehourcq
2020-11-11  8:58       ` Bastien
2020-10-24 21:12   ` Daniele Nicolodi
2020-10-24 21:35     ` Tom Gillespie
2020-11-11  9:13       ` Bastien
2020-11-12 17:14         ` Tom Gillespie
2020-11-11  9:15     ` Bastien
2020-11-11 13:05       ` Daniele Nicolodi
2020-11-28 19:19       ` Gerry Agbobada
2020-10-26 11:23   ` Ken Mankoff
2020-10-26 14:21     ` Nicolas Goaziou
2020-10-26 16:17       ` Ken Mankoff
2020-10-26 16:24         ` Nicolas Goaziou
2020-10-26 16:47           ` Ken Mankoff
2020-10-26 17:59             ` Tom Gillespie
2020-10-26 20:26               ` Ken Mankoff
2020-10-26 21:00                 ` Tom Gillespie
2020-10-26 21:37                   ` Ken Mankoff
2020-10-26 22:19                     ` Tom Gillespie
2020-10-27  5:42                   ` Przemysław Kamiński
2020-11-11  8:59             ` Bastien
2020-11-11  9:00         ` Bastien

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fb792b49-7387-7a43-640c-5e76b91b50b1@intrepidus.pl \
    --to=pk@intrepidus.pl \
    --cc=emacs-orgmode@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).