emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* [ANN] An Org parser for Julia
@ 2021-12-02  9:06 Timothy
  2021-12-02 10:11 ` Ihor Radchenko
  2021-12-02 12:18 ` Eric S Fraga
  0 siblings, 2 replies; 10+ messages in thread
From: Timothy @ 2021-12-02  9:06 UTC (permalink / raw)
  To: emacs-orgmode


[-- Attachment #1.1: Type: text/plain, Size: 1774 bytes --]

Hi Everyone,

I’ve gone heavily into Julia over the past year, and I recently
thought it would be nice if an Org parser existed for it — so I made one!

<file:~/.julia/dev/OrgMode/org-mode-wordart-small.png>
<https://github.com/tecosaur/OrgMode.jl>

It’s just over a week since I started, so it’s fairly young, but I’m pretty
happy with the way it’s shaping up 🙂. To give you an idea, here’s some example
usage from the readme:

┌────
│ text1 = org"Some *Org* markup, written with easy using the ~org\"\"~ macro."
│ parsetree(text1)  # show the generated parse tree
│ 
│ text2 = parse(Org, "Some *Org* markup, written with easy using the ~parse~ function.")
│ diff(text1, text2)  # show the components of the parse trees that differ
│ 
│ dochead = @doc OrgMode.Heading  # the documentation for the Heading component (::Org)
│ org(dochead)  # generate Org text that produces the OrgMode.Heading object
│ string(dochead)  # as above, but produces a String
│ 
│ parse(Org, string(dochead)) == dochead  # round-trip equality
│ 
│ filtermap(dochead, [OrgMode.SourceBlock], s -> s.lang)  # get the lang of each source block
└────

There’s also a bit of an ulterior motive here, I’ve been rather interested in
the Org syntax and how easy it is to write tools for it outside of Emacs, and
I’ve been thinking that writing a parser would be a great way to find out and
allow me to make some more informed comments on
<https://orgmode.org/worg/dev/org-syntax.html>, hopefully pushing it just a bit
closer to having “(draft)” lopped off the title 😛.

You can expect to see another email from me with some comments the Org Syntax
document shortly.

All the best,
Timothy

[-- Attachment #1.2.1: Type: text/html, Size: 6668 bytes --]

[-- Attachment #1.2.2: org-mode-wordart-small.png --]
[-- Type: image/png, Size: 11644 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [ANN] An Org parser for Julia
  2021-12-02  9:06 [ANN] An Org parser for Julia Timothy
@ 2021-12-02 10:11 ` Ihor Radchenko
  2021-12-02 10:30   ` Timothy
  2021-12-02 12:18 ` Eric S Fraga
  1 sibling, 1 reply; 10+ messages in thread
From: Ihor Radchenko @ 2021-12-02 10:11 UTC (permalink / raw)
  To: Timothy; +Cc: emacs-orgmode

Timothy <tecosaur@gmail.com> writes:

> I’ve gone heavily into Julia over the past year, and I recently
> thought it would be nice if an Org parser existed for it — so I made one!
>
> <file:~/.julia/dev/OrgMode/org-mode-wordart-small.png>
> <https://github.com/tecosaur/OrgMode.jl>

I am wondering how the third-party parsers are going to scale for larger
Org files. I did some simple testing in the past, and it seems that only
tree-sitter can potentially get sufficiently close to org-element in
terms of performance. Maybe we should implement a Elisp LSP server
instead of many individual parsers in different languages?

---

tree-sitter vs. org-element on 15M Org file
org-element-parse-buffer
(16.090262757 1 0.7365683609999962)

org-element-parse-buffer 'element granularity
(7.688000744 0 0.0)
8sec

tree-sitter via https://github.com/milisims/tree-sitter-org
parsed down to 58% of the buffer in 5.3sec and exited with error
extrapolates to ~9sec

Racket's brack via https://github.com/tgbugs/laundry
failed to finish parsing in reasonable time. Cancelled at 10m11.436s

Clojure parser via https://github.com/200ok-ch/org-parser
failed to finish parsing with java.lang.OutOfMemoryError: GC overhead limit exceeded
Running time 8m28.078s

Best,
Ihor


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [ANN] An Org parser for Julia
  2021-12-02 10:11 ` Ihor Radchenko
@ 2021-12-02 10:30   ` Timothy
  2021-12-02 11:13     ` Ihor Radchenko
  0 siblings, 1 reply; 10+ messages in thread
From: Timothy @ 2021-12-02 10:30 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 1601 bytes --]

Hi Ihor,

> I am wondering how the third-party parsers are going to scale for larger
> Org files. I did some simple testing in the past, and it seems that only
> tree-sitter can potentially get sufficiently close to org-element in
> terms of performance.

I’ve actually had a brief look at my performance using my Emacs config file
(which is ~10k lines). On this, my parser is about ~5x faster than org-element.
On a smaller file like the project’s readme it’s closer to ~10x faster. I’ve
also noticed that I can multithread the parsing, which produces a ~9x speedup on
my computer. So, that would be ~40-90x faster than org-element. I have yet to do
much profiling/benchmarking/optimisation though, I’m still in the “feature
adding” phase. This means that it could well slow down as I add more for it to
recognise, but there are probably also unrealised potential performance
improvements.

> Maybe we should implement a Elisp LSP server instead of many individual
> parsers in different languages?

For the sake of tools that operate on Org files, not just the Org editing
experience, I think it’s quite good if we have a selection of /good/ parsers
available for different languages. However, I also think an LSP server would be
good. That’s why I have <https://github.com/tecosaur/org-lsp>, even if I haven’t
spent anywhere near as much time on it as I would like (it’s barely a skeleton
at the moment).

> tree-sitter vs. org-element on 15M Org file

Might you have a link to this file? I’d be interested to try it.

All the best,
Timothy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [ANN] An Org parser for Julia
  2021-12-02 10:30   ` Timothy
@ 2021-12-02 11:13     ` Ihor Radchenko
  2021-12-02 11:20       ` Timothy
  0 siblings, 1 reply; 10+ messages in thread
From: Ihor Radchenko @ 2021-12-02 11:13 UTC (permalink / raw)
  To: Timothy; +Cc: emacs-orgmode

Timothy <tecosaur@gmail.com> writes:

> I’ve actually had a brief look at my performance using my Emacs config file
> (which is ~10k lines). On this, my parser is about ~5x faster than org-element.
> On a smaller file like the project’s readme it’s closer to ~10x faster. I’ve
> also noticed that I can multithread the parsing, which produces a ~9x speedup on
> my computer. So, that would be ~40-90x faster than org-element. I have yet to do
> much profiling/benchmarking/optimisation though, I’m still in the “feature
> adding” phase. This means that it could well slow down as I add more for it to
> recognise, but there are probably also unrealised potential performance
> improvements.

I am wondering how you did the benchmark.
I just tried the following on my config.org
(https://github.com/yantar92/emacs-config):

cd path/to/OrgMode.jl
julia1.6
push!(LOAD_PATH, pwd())
using OrgMode
orgfile = open("/home/yantar92/Git/emacs-config/config.org")
textorgfile = read(orgfile, String)
parse(Org, textorgfile)

The config.org is about 18k lines, but I did not manage to wait enough
for the parser to return.

Multithread looks promising though.

Also, the tests I mentioned are with my latest commit for
org-element-parse-buffer and on native-compiled Emacs.

>> Maybe we should implement a Elisp LSP server instead of many individual
>> parsers in different languages?
>
> For the sake of tools that operate on Org files, not just the Org editing
> experience, I think it’s quite good if we have a selection of /good/ parsers
> available for different languages. However, I also think an LSP server would be
> good. That’s why I have <https://github.com/tecosaur/org-lsp>, even if I haven’t
> spent anywhere near as much time on it as I would like (it’s barely a skeleton
> at the moment).

Thanks for reminding about this. I have seen it, forgot it, and now
reinvent the idea :D.

Also, it would be great to have a unified test set to verify third-party
parsers and org-element parser.

>> tree-sitter vs. org-element on 15M Org file
>
> Might you have a link to this file? I’d be interested to try it.

That's my personal notes file. I can test it for you if you give me the
instructions.

Best,
Ihor


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [ANN] An Org parser for Julia
  2021-12-02 11:13     ` Ihor Radchenko
@ 2021-12-02 11:20       ` Timothy
  2021-12-02 12:07         ` Ihor Radchenko
  0 siblings, 1 reply; 10+ messages in thread
From: Timothy @ 2021-12-02 11:20 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 2447 bytes --]

Hi Ihor,

> I am wondering how you did the benchmark.
> I just tried the following on my config.org
> (<https://github.com/yantar92/emacs-config>):
>
> The config.org is about 18k lines, but I did not manage to wait enough
> for the parser to return.

Hmm, I just tried yours and I think something in your file is causing it to trip
up. Not sure what though, further investigation is required. For this alone I’m
glad you’ve shared this with me :)

For reference, this is what I’ve been doing:
┌────
│ julia> using OrgMode
│ 
│ julia> config = read("/home/tec/.config/doom/config.org", String);
│ 
│ julia> @benchmark parse(Org, config)
│ BenchmarkTools.Trial: 139 samples with 1 evaluation.
│  Range (min … max):  34.042 ms … 43.269 ms  ┊ GC (min … max): 0.00% … 16.10%
│  Time  (median):     34.857 ms              ┊ GC (median):    0.00%
│  Time  (mean ± σ):   35.999 ms ±  2.490 ms  ┊ GC (mean ± σ):  2.63% ±  5.30%
│ 
│     █▁
│   ▃▇██▄▆▄▃▂▃▅▅▅▄▄▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▃▁▃▂▂▃▂▃▁▁▂▃▁▂▂▁▂ ▂
│   34 ms           Histogram: frequency by time          43 ms <
│ 
│  Memory estimate: 7.17 MiB, allocs estimate: 142185.
└────

It’s worth noting that the first time `parse(Org, config)' is called, it will
trigger JIT compilation (which for me takes ~15s).

`org-element-parse-buffer' seems to tend to take ~200ms.

> Also, the tests I mentioned are with my latest commit for
> org-element-parse-buffer and on native-compiled Emacs.

I’m on native-compiled Emacs, but ~12 commits behind.

>> Org LSP
> Thanks for reminding about this. I have seen it, forgot it, and now
> reinvent the idea :D.

I’m just hoping I’ll get to it / get help eventually 😂.

> Also, it would be great to have a unified test set to verify third-party
> parsers and org-element parser.

You know, I’ve had the same thought 🙂.

>>> tree-sitter vs. org-element on 15M Org file
>>
>> Might you have a link to this file? I’d be interested to try it.
>
> That’s my personal notes file. I can test it for you if you give me the
> instructions.

Cool, since your config seems to have revealed some issues, it would probably be
worth waiting till I’ve sorted that out.

All the best,
Timothy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [ANN] An Org parser for Julia
  2021-12-02 11:20       ` Timothy
@ 2021-12-02 12:07         ` Ihor Radchenko
  2021-12-02 12:11           ` Timothy
  0 siblings, 1 reply; 10+ messages in thread
From: Ihor Radchenko @ 2021-12-02 12:07 UTC (permalink / raw)
  To: Timothy; +Cc: emacs-orgmode

Timothy <tecosaur@gmail.com> writes:

> │ julia> @benchmark parse(Org, config)
> │ BenchmarkTools.Trial: 139 samples with 1 evaluation.
> │  Range (min … max):  34.042 ms … 43.269 ms  ┊ GC (min … max): 0.00% … 16.10%
> │  Time  (median):     34.857 ms              ┊ GC (median):    0.00%
> │  Time  (mean ± σ):   35.999 ms ±  2.490 ms  ┊ GC (mean ± σ):  2.63% ±  5.30%
> │ 
> │     █▁
> │   ▃▇██▄▆▄▃▂▃▅▅▅▄▄▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▃▁▃▂▂▃▂▃▁▁▂▃▁▂▂▁▂ ▂
> │   34 ms           Histogram: frequency by time          43 ms <
> │ 
> │  Memory estimate: 7.17 MiB, allocs estimate: 142185.
> └────
>
> It’s worth noting that the first time `parse(Org, config)' is called, it will
> trigger JIT compilation (which for me takes ~15s).
>
> `org-element-parse-buffer' seems to tend to take ~200ms.

Just FYI that I am getting similar results on your config.org:

M-: (let ((gc-cons-threshold #x40000000)) (benchmark-run (org-element-parse-buffer)))
(0.133567423 0 0.0), which is 133ms

and

@benchmark parse(Org, textorgfile)
BenchmarkTools.Trial: 196 samples with 1 evaluation.
 Range (min … max):  22.235 ms … 81.101 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     23.652 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   25.535 ms ±  5.921 ms  ┊ GC (mean ± σ):  2.56% ± 5.36%

    █▃      ▂▂                                                 
  ▆▅███▇▅▅▁▅██▆▄▄▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▁▅▁▅▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▄ ▄
  22.2 ms      Histogram: log(frequency) by time      51.3 ms <

 Memory estimate: 7.07 MiB, allocs estimate: 139566.

Best,
Ihor

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [ANN] An Org parser for Julia
  2021-12-02 12:07         ` Ihor Radchenko
@ 2021-12-02 12:11           ` Timothy
  0 siblings, 0 replies; 10+ messages in thread
From: Timothy @ 2021-12-02 12:11 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 271 bytes --]

Hi Ihor,

> Just FYI that I am getting similar results on your config.org:
> [snip]

Thanks. It’s always nice to see a confirmation of a result. Hopefully in the
near future we’ll be able to run your files through without issue 🙂.

All the best,
Timothy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [ANN] An Org parser for Julia
  2021-12-02  9:06 [ANN] An Org parser for Julia Timothy
  2021-12-02 10:11 ` Ihor Radchenko
@ 2021-12-02 12:18 ` Eric S Fraga
  2021-12-02 14:04   ` Timothy
  1 sibling, 1 reply; 10+ messages in thread
From: Eric S Fraga @ 2021-12-02 12:18 UTC (permalink / raw)
  To: Timothy; +Cc: emacs-orgmode

Timothy,

this is really good to see!  I have been using Julia as my main
programming language for some years now and all of my codes generate
output with org markup.  Now I could in principle use it for the input
as well which could be quite helpful.

As an aside, Julia 1.7 was released two or three days ago.  Works very
well.  None of my codes has broken, which is always a good sign.

-- 
: Eric S Fraga, with org release_9.5.1-231-g6766c4 in Emacs 29.0.50
: Latest paper written in org: https://arxiv.org/abs/2106.05096


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [ANN] An Org parser for Julia
  2021-12-02 12:18 ` Eric S Fraga
@ 2021-12-02 14:04   ` Timothy
  2021-12-02 15:56     ` Eric S Fraga
  0 siblings, 1 reply; 10+ messages in thread
From: Timothy @ 2021-12-02 14:04 UTC (permalink / raw)
  To: Eric S Fraga; +Cc: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 551 bytes --]

Hi Eric,

> As an aside, Julia 1.7 was released two or three days ago.  Works very
> well.  None of my codes has broken, which is always a good sign.

Funny you should mention 1.7, I tried multithreading the parser and achieved a
~10x speedup. It worked all the time, except when I tried to `@benchmark' it,
where strange errors that shouldn’t happen cropped up.

I noticed the 1.7 release blog post mentioned fixing some multithreaded race
conditions, so I’m cautiously optimistic that this might work now :)

All the best,
Timothy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [ANN] An Org parser for Julia
  2021-12-02 14:04   ` Timothy
@ 2021-12-02 15:56     ` Eric S Fraga
  0 siblings, 0 replies; 10+ messages in thread
From: Eric S Fraga @ 2021-12-02 15:56 UTC (permalink / raw)
  To: Timothy; +Cc: emacs-orgmode

On Thursday,  2 Dec 2021 at 22:04, Timothy wrote:
> It worked all the time, except when I tried to `@benchmark' it, where
> strange errors that shouldn’t happen cropped up.

automated benchmarking multi-threaded applications is a form of dark
magic...

-- 
: Eric S Fraga, with org release_9.5.1-231-g6766c4 in Emacs 29.0.50
: Latest paper written in org: https://arxiv.org/abs/2106.05096


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-12-02 16:00 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-02  9:06 [ANN] An Org parser for Julia Timothy
2021-12-02 10:11 ` Ihor Radchenko
2021-12-02 10:30   ` Timothy
2021-12-02 11:13     ` Ihor Radchenko
2021-12-02 11:20       ` Timothy
2021-12-02 12:07         ` Ihor Radchenko
2021-12-02 12:11           ` Timothy
2021-12-02 12:18 ` Eric S Fraga
2021-12-02 14:04   ` Timothy
2021-12-02 15:56     ` Eric S Fraga

Code repositories for project(s) associated with this inbox:

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).