[-- Attachment #1.1: Type: text/plain, Size: 1774 bytes --] Hi Everyone, I’ve gone heavily into Julia over the past year, and I recently thought it would be nice if an Org parser existed for it — so I made one! <file:~/.julia/dev/OrgMode/org-mode-wordart-small.png> <https://github.com/tecosaur/OrgMode.jl> It’s just over a week since I started, so it’s fairly young, but I’m pretty happy with the way it’s shaping up 🙂. To give you an idea, here’s some example usage from the readme: ┌──── │ text1 = org"Some *Org* markup, written with easy using the ~org\"\"~ macro." │ parsetree(text1) # show the generated parse tree │ │ text2 = parse(Org, "Some *Org* markup, written with easy using the ~parse~ function.") │ diff(text1, text2) # show the components of the parse trees that differ │ │ dochead = @doc OrgMode.Heading # the documentation for the Heading component (::Org) │ org(dochead) # generate Org text that produces the OrgMode.Heading object │ string(dochead) # as above, but produces a String │ │ parse(Org, string(dochead)) == dochead # round-trip equality │ │ filtermap(dochead, [OrgMode.SourceBlock], s -> s.lang) # get the lang of each source block └──── There’s also a bit of an ulterior motive here, I’ve been rather interested in the Org syntax and how easy it is to write tools for it outside of Emacs, and I’ve been thinking that writing a parser would be a great way to find out and allow me to make some more informed comments on <https://orgmode.org/worg/dev/org-syntax.html>, hopefully pushing it just a bit closer to having “(draft)” lopped off the title 😛. You can expect to see another email from me with some comments the Org Syntax document shortly. All the best, Timothy [-- Attachment #1.2.1: Type: text/html, Size: 6668 bytes --] [-- Attachment #1.2.2: org-mode-wordart-small.png --] [-- Type: image/png, Size: 11644 bytes --]
Timothy <tecosaur@gmail.com> writes: > I’ve gone heavily into Julia over the past year, and I recently > thought it would be nice if an Org parser existed for it — so I made one! > > <file:~/.julia/dev/OrgMode/org-mode-wordart-small.png> > <https://github.com/tecosaur/OrgMode.jl> I am wondering how the third-party parsers are going to scale for larger Org files. I did some simple testing in the past, and it seems that only tree-sitter can potentially get sufficiently close to org-element in terms of performance. Maybe we should implement a Elisp LSP server instead of many individual parsers in different languages? --- tree-sitter vs. org-element on 15M Org file org-element-parse-buffer (16.090262757 1 0.7365683609999962) org-element-parse-buffer 'element granularity (7.688000744 0 0.0) 8sec tree-sitter via https://github.com/milisims/tree-sitter-org parsed down to 58% of the buffer in 5.3sec and exited with error extrapolates to ~9sec Racket's brack via https://github.com/tgbugs/laundry failed to finish parsing in reasonable time. Cancelled at 10m11.436s Clojure parser via https://github.com/200ok-ch/org-parser failed to finish parsing with java.lang.OutOfMemoryError: GC overhead limit exceeded Running time 8m28.078s Best, Ihor
[-- Attachment #1: Type: text/plain, Size: 1601 bytes --] Hi Ihor, > I am wondering how the third-party parsers are going to scale for larger > Org files. I did some simple testing in the past, and it seems that only > tree-sitter can potentially get sufficiently close to org-element in > terms of performance. I’ve actually had a brief look at my performance using my Emacs config file (which is ~10k lines). On this, my parser is about ~5x faster than org-element. On a smaller file like the project’s readme it’s closer to ~10x faster. I’ve also noticed that I can multithread the parsing, which produces a ~9x speedup on my computer. So, that would be ~40-90x faster than org-element. I have yet to do much profiling/benchmarking/optimisation though, I’m still in the “feature adding” phase. This means that it could well slow down as I add more for it to recognise, but there are probably also unrealised potential performance improvements. > Maybe we should implement a Elisp LSP server instead of many individual > parsers in different languages? For the sake of tools that operate on Org files, not just the Org editing experience, I think it’s quite good if we have a selection of /good/ parsers available for different languages. However, I also think an LSP server would be good. That’s why I have <https://github.com/tecosaur/org-lsp>, even if I haven’t spent anywhere near as much time on it as I would like (it’s barely a skeleton at the moment). > tree-sitter vs. org-element on 15M Org file Might you have a link to this file? I’d be interested to try it. All the best, Timothy
Timothy <tecosaur@gmail.com> writes: > I’ve actually had a brief look at my performance using my Emacs config file > (which is ~10k lines). On this, my parser is about ~5x faster than org-element. > On a smaller file like the project’s readme it’s closer to ~10x faster. I’ve > also noticed that I can multithread the parsing, which produces a ~9x speedup on > my computer. So, that would be ~40-90x faster than org-element. I have yet to do > much profiling/benchmarking/optimisation though, I’m still in the “feature > adding” phase. This means that it could well slow down as I add more for it to > recognise, but there are probably also unrealised potential performance > improvements. I am wondering how you did the benchmark. I just tried the following on my config.org (https://github.com/yantar92/emacs-config): cd path/to/OrgMode.jl julia1.6 push!(LOAD_PATH, pwd()) using OrgMode orgfile = open("/home/yantar92/Git/emacs-config/config.org") textorgfile = read(orgfile, String) parse(Org, textorgfile) The config.org is about 18k lines, but I did not manage to wait enough for the parser to return. Multithread looks promising though. Also, the tests I mentioned are with my latest commit for org-element-parse-buffer and on native-compiled Emacs. >> Maybe we should implement a Elisp LSP server instead of many individual >> parsers in different languages? > > For the sake of tools that operate on Org files, not just the Org editing > experience, I think it’s quite good if we have a selection of /good/ parsers > available for different languages. However, I also think an LSP server would be > good. That’s why I have <https://github.com/tecosaur/org-lsp>, even if I haven’t > spent anywhere near as much time on it as I would like (it’s barely a skeleton > at the moment). Thanks for reminding about this. I have seen it, forgot it, and now reinvent the idea :D. Also, it would be great to have a unified test set to verify third-party parsers and org-element parser. >> tree-sitter vs. org-element on 15M Org file > > Might you have a link to this file? I’d be interested to try it. That's my personal notes file. I can test it for you if you give me the instructions. Best, Ihor
[-- Attachment #1: Type: text/plain, Size: 2447 bytes --] Hi Ihor, > I am wondering how you did the benchmark. > I just tried the following on my config.org > (<https://github.com/yantar92/emacs-config>): > > The config.org is about 18k lines, but I did not manage to wait enough > for the parser to return. Hmm, I just tried yours and I think something in your file is causing it to trip up. Not sure what though, further investigation is required. For this alone I’m glad you’ve shared this with me :) For reference, this is what I’ve been doing: ┌──── │ julia> using OrgMode │ │ julia> config = read("/home/tec/.config/doom/config.org", String); │ │ julia> @benchmark parse(Org, config) │ BenchmarkTools.Trial: 139 samples with 1 evaluation. │ Range (min … max): 34.042 ms … 43.269 ms ┊ GC (min … max): 0.00% … 16.10% │ Time (median): 34.857 ms ┊ GC (median): 0.00% │ Time (mean ± σ): 35.999 ms ± 2.490 ms ┊ GC (mean ± σ): 2.63% ± 5.30% │ │ █▁ │ ▃▇██▄▆▄▃▂▃▅▅▅▄▄▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▃▁▃▂▂▃▂▃▁▁▂▃▁▂▂▁▂ ▂ │ 34 ms Histogram: frequency by time 43 ms < │ │ Memory estimate: 7.17 MiB, allocs estimate: 142185. └──── It’s worth noting that the first time `parse(Org, config)' is called, it will trigger JIT compilation (which for me takes ~15s). `org-element-parse-buffer' seems to tend to take ~200ms. > Also, the tests I mentioned are with my latest commit for > org-element-parse-buffer and on native-compiled Emacs. I’m on native-compiled Emacs, but ~12 commits behind. >> Org LSP > Thanks for reminding about this. I have seen it, forgot it, and now > reinvent the idea :D. I’m just hoping I’ll get to it / get help eventually 😂. > Also, it would be great to have a unified test set to verify third-party > parsers and org-element parser. You know, I’ve had the same thought 🙂. >>> tree-sitter vs. org-element on 15M Org file >> >> Might you have a link to this file? I’d be interested to try it. > > That’s my personal notes file. I can test it for you if you give me the > instructions. Cool, since your config seems to have revealed some issues, it would probably be worth waiting till I’ve sorted that out. All the best, Timothy
Timothy <tecosaur@gmail.com> writes:
> │ julia> @benchmark parse(Org, config)
> │ BenchmarkTools.Trial: 139 samples with 1 evaluation.
> │ Range (min … max): 34.042 ms … 43.269 ms ┊ GC (min … max): 0.00% … 16.10%
> │ Time (median): 34.857 ms ┊ GC (median): 0.00%
> │ Time (mean ± σ): 35.999 ms ± 2.490 ms ┊ GC (mean ± σ): 2.63% ± 5.30%
> │
> │ █▁
> │ ▃▇██▄▆▄▃▂▃▅▅▅▄▄▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▃▁▃▂▂▃▂▃▁▁▂▃▁▂▂▁▂ ▂
> │ 34 ms Histogram: frequency by time 43 ms <
> │
> │ Memory estimate: 7.17 MiB, allocs estimate: 142185.
> └────
>
> It’s worth noting that the first time `parse(Org, config)' is called, it will
> trigger JIT compilation (which for me takes ~15s).
>
> `org-element-parse-buffer' seems to tend to take ~200ms.
Just FYI that I am getting similar results on your config.org:
M-: (let ((gc-cons-threshold #x40000000)) (benchmark-run (org-element-parse-buffer)))
(0.133567423 0 0.0), which is 133ms
and
@benchmark parse(Org, textorgfile)
BenchmarkTools.Trial: 196 samples with 1 evaluation.
Range (min … max): 22.235 ms … 81.101 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 23.652 ms ┊ GC (median): 0.00%
Time (mean ± σ): 25.535 ms ± 5.921 ms ┊ GC (mean ± σ): 2.56% ± 5.36%
█▃ ▂▂
▆▅███▇▅▅▁▅██▆▄▄▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▁▅▁▅▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▄ ▄
22.2 ms Histogram: log(frequency) by time 51.3 ms <
Memory estimate: 7.07 MiB, allocs estimate: 139566.
Best,
Ihor
[-- Attachment #1: Type: text/plain, Size: 271 bytes --] Hi Ihor, > Just FYI that I am getting similar results on your config.org: > [snip] Thanks. It’s always nice to see a confirmation of a result. Hopefully in the near future we’ll be able to run your files through without issue 🙂. All the best, Timothy
Timothy, this is really good to see! I have been using Julia as my main programming language for some years now and all of my codes generate output with org markup. Now I could in principle use it for the input as well which could be quite helpful. As an aside, Julia 1.7 was released two or three days ago. Works very well. None of my codes has broken, which is always a good sign. -- : Eric S Fraga, with org release_9.5.1-231-g6766c4 in Emacs 29.0.50 : Latest paper written in org: https://arxiv.org/abs/2106.05096
[-- Attachment #1: Type: text/plain, Size: 551 bytes --] Hi Eric, > As an aside, Julia 1.7 was released two or three days ago. Works very > well. None of my codes has broken, which is always a good sign. Funny you should mention 1.7, I tried multithreading the parser and achieved a ~10x speedup. It worked all the time, except when I tried to `@benchmark' it, where strange errors that shouldn’t happen cropped up. I noticed the 1.7 release blog post mentioned fixing some multithreaded race conditions, so I’m cautiously optimistic that this might work now :) All the best, Timothy
On Thursday, 2 Dec 2021 at 22:04, Timothy wrote: > It worked all the time, except when I tried to `@benchmark' it, where > strange errors that shouldn’t happen cropped up. automated benchmarking multi-threaded applications is a form of dark magic... -- : Eric S Fraga, with org release_9.5.1-231-g6766c4 in Emacs 29.0.50 : Latest paper written in org: https://arxiv.org/abs/2106.05096
Hi Timothy, Timothy <tecosaur@gmail.com> writes: > https://github.com/tecosaur/OrgMode.jl Great! I advertized this new parser on Worg: https://orgmode.org/worg/org-tools/index.html Thanks! -- Bastien