Ihor Radchenko <yantar92@gmail.com> writes:

> Most of the CPU time is spent in org-babel-tangle-collect-blocks, which
> is basically another regexp search for all the source blocks in buffer.
> The scaling is still slightly non-linear - maybe your source block
> regexp is too complex:

After further investigation I found that it was not the problem with
source block regexp. The code was doing an extra backward regexp search
to find current headline. When there are no headlines in the Org file,
that search created quadratic scaling.

After caching the backwards search, the performance is further improved:
| N blocks | runtime | # of GCs |
|----------+---------+----------|
|       10 |   0.204 |        0 |
|       20 |   0.047 |        0 |
|       40 |   0.171 |        0 |
|       80 |   0.063 |        0 |
|      160 |   0.096 |        0 |
|      320 |   0.155 |        0 |
|      640 |   0.651 |        0 |
|     1280 |   0.419 |        0 |
|     2560 |   0.799 |        0 |
|     5120 |   1.628 |        0 |
|    10240 |   3.306 |        0 |
|    20480 |   5.633 |        0 |
|    40960 |  11.415 |        0 |

41k blocks in 12sec.

Graphical comparison: