Re: Poor org-babel-tangle-file performance with more than 100 trivial noweb-references

emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed

From: Ihor Radchenko <yantar92@gmail.com>
To: pareto optimal <pareto.optimal@mailfence.com>
Cc: emacs-orgmode@gnu.org
Subject: Re: Poor org-babel-tangle-file performance with more than 100 trivial noweb-references
Date: Tue, 25 Jan 2022 22:11:25 +0800	[thread overview]
Message-ID: <87k0enx5v6.fsf@localhost> (raw)
In-Reply-To: <147178804.982080.1643087944384@ichabod.co-bxl>

[-- Attachment #1: Type: text/plain, Size: 2138 bytes --]

pareto optimal <pareto.optimal@mailfence.com> writes:

> Using =emacs -Q= to tangle org files with more than over 100 noweb-refs gets slow fast.
>
> Given this org code for N=2:
> Using Gcc Emacs 28.0.91 (which I typicall use) I get these results:
>
> | N blocks | runtime | # of GCs |
> |----------+---------+----------|
> |       10 |   0.027 |        0 |
> |       20 |   0.049 |        0 |
> |       40 |   0.121 |        0 |
> |       80 |   0.391 |        0 |
> |      160 |   1.426 |        0 |
> |      320 |   6.765 |        0 |
> |      640 |  23.972 |        0 |
>
> so roughly it scales like:
>
> #+begin_example
> 0.8x (10-20)
> 2.5x  (20-40)
> 3x  (40-80)
> 3.5x  (160-320)
> 4x  (320-640)
> #+end_example
>
> Though I'm not sure how much that can tell us... my guess is noweb-ref's don't use a dictionary with =O(1)= lookups?

Thanks for the nice benchmark and for providing a test case!

You are using stable version of Org. We did some important performance
improvements on main. The same benchmark using latest main (26decec):

| N blocks | runtime | # of GCs |
|----------+---------+----------|
|       10 |   0.030 |        0 |
|       20 |   0.067 |        0 |
|       40 |   0.065 |        0 |
|       80 |   0.081 |        0 |
|      160 |   0.214 |        0 |
|      320 |   0.597 |        0 |
|      640 |   2.225 |        0 |
|     1280 |   9.221 |        0 |
|     2560 |  36.966 |        0 |

And with disabled self-verification code:
(setq org-element--cache-self-verify nil)
| N blocks | runtime | # of GCs |
|----------+---------+----------|
|       10 |   0.078 |        0 |
|       20 |   0.075 |        0 |
|       40 |   0.063 |        0 |
|       80 |   0.085 |        0 |
|      160 |   0.095 |        0 |
|      320 |   0.178 |        0 |
|      640 |   0.311 |        0 |
|     1280 |   0.819 |        0 |
|     2560 |   2.302 |        0 |
|     5120 |   8.878 |        0 |
|    10240 |  32.706 |        0 |

Graphical comparison:

[-- Attachment #2: benchmark1.png --]
[-- Type: image/png, Size: 8259 bytes --]

[-- Attachment #3: Type: text/plain, Size: 77 bytes --]


There is ~80x improvement.

However, the scaling is still not quite linear:

[-- Attachment #4: benchmark2.png --]
[-- Type: image/png, Size: 9706 bytes --]

[-- Attachment #5: Type: text/plain, Size: 5111 bytes --]


So, there is still some power-law nonlinearity in the tangle code.

Benchmarking revealed the following:
        8259  95%               - org-babel-tangle-file
        8259  95%                - org-babel-tangle
        8170  94%                 - org-babel-tangle-collect-blocks
        3620  41%                  - org-in-archived-heading-p
        3600  41%                     org-before-first-heading-p

Showing that nonlinear scaling comes from regexp search.

Caching org-before-first-heading-p improves the performance further (see
Org 9.6-dev no self-verify + patch curves):

| N blocks | runtime | # of GCs |
|----------+---------+----------|
|       10 |   0.118 |        0 |
|       20 |   0.106 |        0 |
|       40 |   0.136 |        0 |
|       80 |   0.157 |        0 |
|      160 |   0.139 |        0 |
|      320 |   0.212 |        0 |
|      640 |   0.542 |        0 |
|     1280 |   0.797 |        0 |
|     2560 |   1.533 |        0 |
|     5120 |   4.651 |        0 |
|    10240 |  16.745 |        0 |

The profiling gives:

       16817  63%                    - org-babel-tangle-file
       16280  61%                     - org-babel-tangle
       16200  61%                      - org-babel-tangle-collect-blocks
        1360   5%                       + org-babel-tangle-single-block
        1210   4%                       + org-babel-get-src-block-info

Most of the CPU time is spent in org-babel-tangle-collect-blocks, which
is basically another regexp search for all the source blocks in buffer.
The scaling is still slightly non-linear - maybe your source block
regexp is too complex:

(defvar org-babel-src-block-regexp
  (concat
   ;; (1) indentation                 (2) lang
   "^\\([ \t]*\\)#\\+begin_src[ \t]+\\([^ \f\t\n\r\v]+\\)[ \t]*"
   ;; (3) switches
   "\\([^\":\n]*\"[^\"\n*]*\"[^\":\n]*\\|[^\":\n]*\\)"
   ;; (4) header arguments
   "\\([^\n]*\\)\n"
   ;; (5) body
   "\\([^\000]*?\n\\)??[ \t]*#\\+end_src")
  "Regexp used to identify code blocks.")

On the other hand, the test data is already within quite non-realistic
over 5000 number of blocks. Probably, further performance improvement is
not very useful (we already have some overheads at smaller N).

Best,
Ihor

-----
The Org file used to generate plots:

#+name: nocache
| N blocks | runtime | # of GCs |
|----------+---------+----------|
|       10 |   0.027 |        0 |
|       20 |   0.049 |        0 |
|       40 |   0.121 |        0 |
|       80 |   0.391 |        0 |
|      160 |   1.426 |        0 |
|      320 |   6.765 |        0 |
|      640 |  23.972 |        0 |

#+name: cache
| N blocks | runtime | # of GCs |
|----------+---------+----------|
|       10 |   0.030 |        0 |
|       20 |   0.067 |        0 |
|       40 |   0.065 |        0 |
|       80 |   0.081 |        0 |
|      160 |   0.214 |        0 |
|      320 |   0.597 |        0 |
|      640 |   2.225 |        0 |
|     1280 |   9.221 |        0 |
|     2560 |  36.966 |        0 |

#+name: cache-no-self
| N blocks | runtime | # of GCs |
|----------+---------+----------|
|       10 |   0.078 |        0 |
|       20 |   0.075 |        0 |
|       40 |   0.063 |        0 |
|       80 |   0.085 |        0 |
|      160 |   0.095 |        0 |
|      320 |   0.178 |        0 |
|      640 |   0.311 |        0 |
|     1280 |   0.819 |        0 |
|     2560 |   2.302 |        0 |
|     5120 |   8.878 |        0 |
|    10240 |  32.706 |        0 |

#+name: cache-no-self+fix
| N blocks | runtime | # of GCs |
|----------+---------+----------|
|       10 |   0.118 |        0 |
|       20 |   0.106 |        0 |
|       40 |   0.136 |        0 |
|       80 |   0.157 |        0 |
|      160 |   0.139 |        0 |
|      320 |   0.212 |        0 |
|      640 |   0.542 |        0 |
|     1280 |   0.797 |        0 |
|     2560 |   1.533 |        0 |
|     5120 |   4.651 |        0 |
|    10240 |  16.745 |        0 |


#+begin_src gnuplot :var d1=nocache :var d2=cache :var d3=cache-no-self :var d4=cache-no-self+fix :file benchmark1.png
set title 'Tangle code performance timing'
US='u 1:2 w lp ps 2'
set xlabel "N blocks"
set ylabel "Time, sec"
set key top right
plot d1 @US t'Org 9.5.2 stable', d2 @US t'Org 9.6-dev', d3 @US t'Org 9.6-dev no self-verify', d4 @US t'Org 9.6-dev no self-verify + patch'
#+end_src

#+RESULTS[edd2a2d5d80b31876917faee349ed71ba0fe239a]:
[[file:/home/yantar92/.data/52/0930af-75ae-4d88-ae6a-d8dde39ecc72/benchmark1.png]]

#+begin_src gnuplot :var d1=nocache :var d2=cache :var d3=cache-no-self :var d4=cache-no-self+fix :file benchmark2.png
set title 'Tangle code performance scaling (normalized)'
US='w lp ps 2'
set xlabel "N blocks"
set ylabel "Time, normalized by time at N=640"
set key top left
set xrange [0:6000]
plot d2 u 1:($2/2.225) @US t'Org 9.6-dev', d3 u 1:($2/0.311) @US t'Org 9.6-dev no self-verify', d1 u 1:($2/23.972) @US t'Org 9.5.2 stable', d4 u 1:($2/0.542) @US t'Org 9.6-dev no self-verify + patch', x*x/870000 t'x^2', 
#+end_src

#+RESULTS[b170ef78ac377f6d2ad1c2eabb20cd62dc0ea33f]:
[[file:/home/yantar92/.data/52/0930af-75ae-4d88-ae6a-d8dde39ecc72/benchmark2.png]]

next prev parent reply	other threads:[~2022-01-25 15:34 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-25  5:19 Poor org-babel-tangle-file performance with more than 100 trivial noweb-references pareto optimal
2022-01-25 14:03 ` Sébastien Miquel
2022-01-25 14:25   ` Ihor Radchenko
2022-01-25 14:11 ` Ihor Radchenko [this message]
2022-01-25 16:25   ` Tim Cross
2022-01-26 11:43   ` Ihor Radchenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k0enx5v6.fsf@localhost \
    --to=yantar92@gmail.com \
    --cc=emacs-orgmode@gnu.org \
    --cc=pareto.optimal@mailfence.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).