emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* Exporting large documents
@ 2013-04-27 19:28 Achim Gratz
  2013-04-27 19:35 ` Carsten Dominik
  0 siblings, 1 reply; 18+ messages in thread
From: Achim Gratz @ 2013-04-27 19:28 UTC (permalink / raw)
  To: emacs-orgmode

I've been looking at export runtimes for large documents with the new
exporter.  The example I've used is the orgmanual.org from Tom.  I first
exported each subtree standalone, then the document as a whole to
texinfo.  The startup of Emacs takes about 1 s of user time and 1.5 s of
wall time, these have not been subtracted in the table below.  The table
shows the individual runtimes for each subtree export, their total and
the last line is for the export of the full document.

|    user |   sys |   wall |  util |
|---------+-------+--------+-------|
|   4.856 | 0.048 |   5.52 | 88.5% |
|  13.748 | 0.160 |  15.04 | 92.4% |
|  15.004 | 0.036 |  16.06 | 93.5% |
|   8.464 | 0.068 |  10.37 | 82.1% |
|   8.420 | 0.088 |  13.29 | 63.9% |
|   5.568 | 0.052 |   8.03 | 69.8% |
|   7.648 | 0.064 |   9.26 | 83.1% |
|  12.020 | 0.056 |  14.16 | 85.2% |
|   7.796 | 0.044 |  11.00 | 71.1% |
|  27.352 | 0.068 |  33.71 | 81.3% |
|   6.564 | 0.044 |   7.00 | 94.2% |
|  17.124 | 0.108 |  19.17 | 89.8% |
|   6.124 | 0.068 |   6.79 | 91.0% |
|  10.632 | 0.068 |  11.73 | 91.1% |
|  15.932 | 0.052 |  17.33 | 92.2% |
|   6.836 | 0.080 |   7.61 | 90.8% |
|   3.964 | 0.040 |   4.54 | 88.1% |
|   5.076 | 0.160 |   6.01 | 87.0% |
|   3.488 | 0.060 |   4.06 | 87.1% |
|   3.532 | 0.056 |   4.14 | 86.4% |
|   3.516 | 0.044 |   4.20 | 84.5% |
|   3.576 | 0.064 |   4.17 | 87.0% |
|   3.552 | 0.064 |   4.12 | 87.6% |
|   6.528 | 0.176 |  10.73 | 62.3% |
|---------+-------+--------+-------|
| 207.320 | 1.768 | 248.04 | 84.3% |
| 386.384 | 0.392 | 415.94 | 92.9% |

As you can see, the export gets slower (a lot) the larger the scope of
the export gets.  I would hope that something can be done about it, I've
earlier tried to profile the export (posted over in the Orgmanual
thread), but I don't think the result was conclusive.

So as an additional experiment, I just used the preamble and
Introduction of orgmanual.org and then doubled the copies of the
Introduction subtress with each iteration.  I runtime was linear in
size, you'd expect to see the runtimes about double on each iteration,
too.

|    user |   sys |   wall |  util | size |
|---------+-------+--------+-------+------|
|   2.500 | 0.064 |   3.14 | 81.5% |  18K |
|   3.740 | 0.056 |   4.37 | 86.7% |  33K |
|   6.224 | 0.112 |   6.98 | 90.6% |  63K |
|  11.524 | 0.060 |  12.53 | 92.4% | 122K |
|  22.860 | 0.084 |  24.35 | 94.2% | 241K |
|  48.760 | 0.100 |  51.87 | 94.1% | 479K |
| 110.804 | 0.124 | 120.64 | 91.9% | 955K |
| 280.084 | 0.360 | 304.48 | 92.1% | 1.9M |
| 868.712 | 0.768 | 930.24 | 93.4% | 3.8M |

Octave thinks that y = 1.725 x^2 + 1.025 x + 0.009 is a good fit to that
data, so O(N^2) behaviour overall as suspected.


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Factory and User Sound Singles for Waldorf rackAttack:
http://Synth.Stromeko.net/Downloads.html#WaldorfSounds

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Exporting large documents
  2013-04-27 19:28 Exporting large documents Achim Gratz
@ 2013-04-27 19:35 ` Carsten Dominik
  2013-04-29 16:04   ` Lawrence Mitchell
  0 siblings, 1 reply; 18+ messages in thread
From: Carsten Dominik @ 2013-04-27 19:35 UTC (permalink / raw)
  To: Achim Gratz; +Cc: emacs-orgmode

Hi Achim,

this is an interesting experiment, thank you!

I think it would also be interesting to use elp to see which
function are taking up the non-linear time.

- Carsten

On 27.4.2013, at 21:28, Achim Gratz <Stromeko@nexgo.de> wrote:

> I've been looking at export runtimes for large documents with the new
> exporter.  The example I've used is the orgmanual.org from Tom.  I first
> exported each subtree standalone, then the document as a whole to
> texinfo.  The startup of Emacs takes about 1 s of user time and 1.5 s of
> wall time, these have not been subtracted in the table below.  The table
> shows the individual runtimes for each subtree export, their total and
> the last line is for the export of the full document.
> 
> |    user |   sys |   wall |  util |
> |---------+-------+--------+-------|
> |   4.856 | 0.048 |   5.52 | 88.5% |
> |  13.748 | 0.160 |  15.04 | 92.4% |
> |  15.004 | 0.036 |  16.06 | 93.5% |
> |   8.464 | 0.068 |  10.37 | 82.1% |
> |   8.420 | 0.088 |  13.29 | 63.9% |
> |   5.568 | 0.052 |   8.03 | 69.8% |
> |   7.648 | 0.064 |   9.26 | 83.1% |
> |  12.020 | 0.056 |  14.16 | 85.2% |
> |   7.796 | 0.044 |  11.00 | 71.1% |
> |  27.352 | 0.068 |  33.71 | 81.3% |
> |   6.564 | 0.044 |   7.00 | 94.2% |
> |  17.124 | 0.108 |  19.17 | 89.8% |
> |   6.124 | 0.068 |   6.79 | 91.0% |
> |  10.632 | 0.068 |  11.73 | 91.1% |
> |  15.932 | 0.052 |  17.33 | 92.2% |
> |   6.836 | 0.080 |   7.61 | 90.8% |
> |   3.964 | 0.040 |   4.54 | 88.1% |
> |   5.076 | 0.160 |   6.01 | 87.0% |
> |   3.488 | 0.060 |   4.06 | 87.1% |
> |   3.532 | 0.056 |   4.14 | 86.4% |
> |   3.516 | 0.044 |   4.20 | 84.5% |
> |   3.576 | 0.064 |   4.17 | 87.0% |
> |   3.552 | 0.064 |   4.12 | 87.6% |
> |   6.528 | 0.176 |  10.73 | 62.3% |
> |---------+-------+--------+-------|
> | 207.320 | 1.768 | 248.04 | 84.3% |
> | 386.384 | 0.392 | 415.94 | 92.9% |
> 
> As you can see, the export gets slower (a lot) the larger the scope of
> the export gets.  I would hope that something can be done about it, I've
> earlier tried to profile the export (posted over in the Orgmanual
> thread), but I don't think the result was conclusive.
> 
> So as an additional experiment, I just used the preamble and
> Introduction of orgmanual.org and then doubled the copies of the
> Introduction subtress with each iteration.  I runtime was linear in
> size, you'd expect to see the runtimes about double on each iteration,
> too.
> 
> |    user |   sys |   wall |  util | size |
> |---------+-------+--------+-------+------|
> |   2.500 | 0.064 |   3.14 | 81.5% |  18K |
> |   3.740 | 0.056 |   4.37 | 86.7% |  33K |
> |   6.224 | 0.112 |   6.98 | 90.6% |  63K |
> |  11.524 | 0.060 |  12.53 | 92.4% | 122K |
> |  22.860 | 0.084 |  24.35 | 94.2% | 241K |
> |  48.760 | 0.100 |  51.87 | 94.1% | 479K |
> | 110.804 | 0.124 | 120.64 | 91.9% | 955K |
> | 280.084 | 0.360 | 304.48 | 92.1% | 1.9M |
> | 868.712 | 0.768 | 930.24 | 93.4% | 3.8M |
> 
> Octave thinks that y = 1.725 x^2 + 1.025 x + 0.009 is a good fit to that
> data, so O(N^2) behaviour overall as suspected.
> 
> 
> Regards,
> Achim.
> -- 
> +<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+
> 
> Factory and User Sound Singles for Waldorf rackAttack:
> http://Synth.Stromeko.net/Downloads.html#WaldorfSounds
> 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Exporting large documents
  2013-04-27 19:35 ` Carsten Dominik
@ 2013-04-29 16:04   ` Lawrence Mitchell
  2013-04-29 18:44     ` Achim Gratz
  2013-05-03  8:43     ` Exporting large documents Carsten Dominik
  0 siblings, 2 replies; 18+ messages in thread
From: Lawrence Mitchell @ 2013-04-29 16:04 UTC (permalink / raw)
  To: emacs-orgmode

Carsten Dominik wrote:
> Hi Achim,

> this is an interesting experiment, thank you!

> I think it would also be interesting to use elp to see which
> function are taking up the non-linear time.

I did a bit of digging and here are the results.  No potential
fixes though.

Taking the "Introduction" section of orgmanual.org and doubling
it up so the buffer is 16x, 32x, 64x and 128x copies of the
introduction and then running latex export having
elp-instrumented the org package shows the following.

There are a few instances of quadratic behaviour that contribute
to the slowdown.

Main culprit:

Name times-called cumulative-time time-per-call
org-export-data 10132 29.364160173 0.0028981603
org-export-data 20180 90.198301053 0.0044696878
org-export-data 40276 316.37200089 0.0078550998
org-export-data 80468 1155.4851323 0.0143595607

Less important but still a noticeable total runtime:

org-element-map 1133 2.6814707420 0.0023666996
org-element-map 2285 10.799367732 0.0047262003
org-element-map 4589 43.787327887 0.0095418016
org-element-map 9197 173.27839595 0.0188407519

org-export-resolve-fuzzy-link 48 2.6659073480 0.0555397364
org-export-resolve-fuzzy-link 96 10.766515020 0.1121511981
org-export-resolve-fuzzy-link 192 43.725658059 0.2277378023
org-export-resolve-fuzzy-link 384 173.15348462 0.4509205328

org-latex-link 144 2.6730487589 0.0185628386
org-latex-link 288 10.783675007 0.0374433159
org-latex-link 576 43.768676906 0.0759872862
org-latex-link 1152 173.27176368 0.1504095170


Unimportant but still quadratic:

org-export-get-headline-number 176 0.0036720380 2.086...e-05
org-export-get-headline-number 352 0.0154215390 4.381...e-05
org-export-get-headline-number 704 0.0636496679 9.041...e-05
org-export-get-headline-number 1408 0.2382477599 0.0001692100

org-babel-get-inline-src-block-matches 112 0.0174396369 0.0001557110
org-babel-get-inline-src-block-matches 224 0.0521645539 0.0002328774
org-babel-get-inline-src-block-matches 448 0.182069907 0.0004064060
org-babel-get-inline-src-block-matches 896 0.66889546 0.0007465351

org-babel-remove-result 112 0.0332858050 0.0002971946
org-babel-remove-result 224 0.0837776260 0.0003740072
org-babel-remove-result 448 0.2475016210 0.0005524589
org-babel-remove-result 896 0.8013491290 0.0008943628

org-babel-where-is-src-block-result 112 0.0320815769 0.0002864426
org-babel-where-is-src-block-result 224 0.081381881 0.0003633119
org-babel-where-is-src-block-result 448 0.2425831529 0.0005414802
org-babel-where-is-src-block-result 896 0.7915090309 0.0008833806



Cheers,

Lawrence

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Exporting large documents
  2013-04-29 16:04   ` Lawrence Mitchell
@ 2013-04-29 18:44     ` Achim Gratz
  2013-05-01 12:18       ` [PATCH] ox: Cache locations of fuzzy links Lawrence Mitchell
  2013-05-03  8:43     ` Exporting large documents Carsten Dominik
  1 sibling, 1 reply; 18+ messages in thread
From: Achim Gratz @ 2013-04-29 18:44 UTC (permalink / raw)
  To: emacs-orgmode

Lawrence Mitchell writes:
> I did a bit of digging and here are the results.  No potential
> fixes though.

Thanks for beating me to it, that frees up quite some processor cycles
on my end!  :-)


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Waldorf MIDI Implementation & additional documentation:
http://Synth.Stromeko.net/Downloads.html#WaldorfDocs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] ox: Cache locations of fuzzy links
  2013-04-29 18:44     ` Achim Gratz
@ 2013-05-01 12:18       ` Lawrence Mitchell
  2013-05-01 21:46         ` Nicolas Goaziou
  0 siblings, 1 reply; 18+ messages in thread
From: Lawrence Mitchell @ 2013-05-01 12:18 UTC (permalink / raw)
  To: emacs-orgmode

* ox.el (org-export-resolve-fuzzy-link): Look for fuzzy link in a
  cache before trying to resolve it in the parse tree.

When a document contains a large number of identical fuzzy links, it
doesn't make sense to continually search for them.  Instead, as
long as we're looking for position independent links, cache the
locations and look there first.
---
 lisp/ox.el | 48 +++++++++++++++++++++++++++++++++++-------------
 1 file changed, 35 insertions(+), 13 deletions(-)

Achim Gratz wrote:
> Lawrence Mitchell writes:
>> I did a bit of digging and here are the results.  No potential
>> fixes though.

I couldn't see how to fix up org-export-data, but here's some
band-aid to speed up resolving fuzzy links.  It works much better
for the fake test case (where there are many identical links)
than the real org manual, but I do get a slight improvement
(about 6%).  As per elp:

Before:

org-latex-export-to-latex      1           373.02289908  373.02289908
org-export-resolve-fuzzy-link  281         42.108304211  0.1498516164

After:

org-latex-export-to-latex      1           349.7238257   349.7238257
org-export-resolve-fuzzy-link  281         19.329938028  0.0687898150

Cheers,
Lawrence

diff --git a/lisp/ox.el b/lisp/ox.el
index 88b4122..bb49512 100644
--- a/lisp/ox.el
+++ b/lisp/ox.el
@@ -3976,27 +3976,49 @@ significant."
 	 ;; Split PATH at white spaces so matches are space
 	 ;; insensitive.
 	 (path (org-split-string
-		(if match-title-p (substring raw-path 1) raw-path))))
+		(if match-title-p (substring raw-path 1) raw-path)))
+	 (link-cache (plist-get info :fuzzy-link-cache)))
+    ;; Cache for locations of fuzzy links that are not position dependent
+    (unless link-cache
+      (setq info (plist-put info :fuzzy-link-cache
+			    (make-hash-table :test 'equal)))
+      (setq link-cache (plist-get info :fuzzy-link-cache)))
     (cond
      ;; First try to find a matching "<<path>>" unless user specified
      ;; he was looking for a headline (path starts with a "*"
      ;; character).
      ((and (not match-title-p)
-	   (org-element-map (plist-get info :parse-tree) 'target
-	     (lambda (blob)
-	       (and (equal (org-split-string (org-element-property :value blob))
-			   path)
-		    blob))
-	     info t)))
+	   (let ((found (gethash (cons 'path path)
+				 link-cache
+				 'fuzzy-link-not-found)))
+	     (or (not (eq found 'fuzzy-link-not-found))
+		 (puthash (cons 'path path)
+			  (org-element-map (plist-get info :parse-tree) 'target
+			    (lambda (blob)
+			      (and (equal (org-split-string
+					   (org-element-property :value blob))
+					  path)
+				   blob))
+			    info t)
+			  link-cache)))))
      ;; Then try to find an element with a matching "#+NAME: path"
      ;; affiliated keyword.
      ((and (not match-title-p)
-	   (org-element-map (plist-get info :parse-tree)
-	       org-element-all-elements
-	     (lambda (el)
-	       (let ((name (org-element-property :name el)))
-		 (when (and name (equal (org-split-string name) path)) el)))
-	     info 'first-match)))
+	   (let ((found (gethash (cons 'name path)
+				 link-cache
+				 'fuzzy-link-not-found)))
+	     (or (not (eq found 'fuzzy-link-not-found))
+		 (puthash (cons 'name path)
+			  (org-element-map (plist-get info :parse-tree)
+			      org-element-all-elements
+			    (lambda (el)
+			      (let ((name (org-element-property :name el)))
+				(when (and name (equal
+						 (org-split-string name)
+						 path))
+				  el)))
+			    info 'first-match)
+			  link-cache)))))
      ;; Last case: link either points to a headline or to nothingness.
      ;; Try to find the source, with priority given to headlines with
      ;; the closest common ancestor.  If such candidate is found,
-- 
1.8.2-rc3

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] ox: Cache locations of fuzzy links
  2013-05-01 12:18       ` [PATCH] ox: Cache locations of fuzzy links Lawrence Mitchell
@ 2013-05-01 21:46         ` Nicolas Goaziou
  2013-05-02  9:03           ` [PATCH v2] " Lawrence Mitchell
  0 siblings, 1 reply; 18+ messages in thread
From: Nicolas Goaziou @ 2013-05-01 21:46 UTC (permalink / raw)
  To: Lawrence Mitchell; +Cc: emacs-orgmode

Hello,

Lawrence Mitchell <wence@gmx.li> writes:

> * ox.el (org-export-resolve-fuzzy-link): Look for fuzzy link in a
>   cache before trying to resolve it in the parse tree.

Thanks for your patch. A few comments follow.

> -		(if match-title-p (substring raw-path 1) raw-path))))
> +		(if match-title-p (substring raw-path 1) raw-path)))
> +	 (link-cache (plist-get info :fuzzy-link-cache)))
> +    ;; Cache for locations of fuzzy links that are not position dependent
> +    (unless link-cache
> +      (setq info (plist-put info :fuzzy-link-cache
> +			    (make-hash-table :test 'equal)))
> +      (setq link-cache (plist-get info :fuzzy-link-cache)))

Minor nitpick: I'd rather have this included in the (let ...), like:

  (let (...
        (link-cache
         (or (plist-get info :fuzzy-link-cache)
             (plist-get (setq info (plist-put info :fuzzy-link-cache
                                              (make-hash-table :test 'eq)))
                        :fuzzy-link-cache)))))
> -	   (org-element-map (plist-get info :parse-tree) 'target
> -	     (lambda (blob)
> -	       (and (equal (org-split-string (org-element-property :value blob))
> -			   path)
> -		    blob))
> -	     info t)))
> +	   (let ((found (gethash (cons 'path path)
> +				 link-cache
> +				 'fuzzy-link-not-found)))
> +	     (or (not (eq found 'fuzzy-link-not-found))
> +		 (puthash (cons 'path path)
> +			  (org-element-map (plist-get info :parse-tree) 'target
> +			    (lambda (blob)
> +			      (and (equal (org-split-string
> +					   (org-element-property :value blob))
> +					  path)
> +				   blob))
> +			    info t)
> +			  link-cache)))))

I don't get why you need to use such a key. Simply use the link as key
and `eq' as the test.


Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2] ox: Cache locations of fuzzy links
  2013-05-01 21:46         ` Nicolas Goaziou
@ 2013-05-02  9:03           ` Lawrence Mitchell
  2013-05-02 12:35             ` Nicolas Goaziou
  0 siblings, 1 reply; 18+ messages in thread
From: Lawrence Mitchell @ 2013-05-02  9:03 UTC (permalink / raw)
  To: emacs-orgmode

* ox.el (org-export-resolve-fuzzy-link): Look for fuzzy link in a
  cache before trying to resolve it in the parse tree.

When a document contains a large number of identical fuzzy links, it
doesn't make sense to continually search for them.  Instead, cache the
locations in the position independent case.
---
 lisp/ox.el | 42 +++++++++++++++++++++++++++++-------------
 1 file changed, 29 insertions(+), 13 deletions(-)

Changes since v1:

- Pull initialisation of link-cache into let
- Don't use cons cells for keys, just use the path
  - lift found check to top-level let since it's now common

Nicolas Goaziou wrote:
> Hello,

> Lawrence Mitchell <wence@gmx.li> writes:

>> * ox.el (org-export-resolve-fuzzy-link): Look for fuzzy link in a
>>   cache before trying to resolve it in the parse tree.

> Thanks for your patch. A few comments follow.

[...]

> Minor nitpick: I'd rather have this included in the (let ...), like:

>   (let (...
>         (link-cache
>          (or (plist-get info :fuzzy-link-cache)
>              (plist-get (setq info (plist-put info :fuzzy-link-cache
>                                               (make-hash-table :test 'eq)))
>                         :fuzzy-link-cache)))))

I've made this change.  Barring the eq test.

Remember, paths are strings and two strings are only eq or eql if
they are actually the same string (in memory).  In particular:

(let ((p "foo")) (eq (substring p 1) (substring p 1))) => nil
(let ((p "foo")) (eql (substring p 1) (substring p 1))) => nil
(let ((p "foo")) (equal (substring p 1) (substring p 1))) => t

Hence, we must use equal or string-equal as a test in the hash
table.  But string-equal isn't a predefined test, hence equal.

[...]

> I don't get why you need to use such a key. Simply use the link as key
> and `eq' as the test.

I was worried that I'd have the same key pointing to two
different places, but I see now that those worries were
unfounded, since if we didn't see it in the <<path>> case the
first time we never will.  I've simplified the key to just look
for the path.

diff --git a/lisp/ox.el b/lisp/ox.el
index 88b4122..6aa8617 100644
--- a/lisp/ox.el
+++ b/lisp/ox.el
@@ -3976,27 +3976,43 @@ significant."
 	 ;; Split PATH at white spaces so matches are space
 	 ;; insensitive.
 	 (path (org-split-string
-		(if match-title-p (substring raw-path 1) raw-path))))
+		(if match-title-p (substring raw-path 1) raw-path)))
+	 ;; Cache for locations of fuzzy links that are not position dependent
+	 (link-cache
+	  (or (plist-get info :fuzzy-link-cache)
+	      (plist-get (setq info (plist-put info :fuzzy-link-cache
+	  				       (make-hash-table :test 'equal)))
+			 :fuzzy-link-cache)))
+	 (found-in-cache (gethash path link-cache 'fuzzy-link-not-found)))
     (cond
      ;; First try to find a matching "<<path>>" unless user specified
      ;; he was looking for a headline (path starts with a "*"
      ;; character).
      ((and (not match-title-p)
-	   (org-element-map (plist-get info :parse-tree) 'target
-	     (lambda (blob)
-	       (and (equal (org-split-string (org-element-property :value blob))
-			   path)
-		    blob))
-	     info t)))
+	   (or (not (eq found-in-cache 'fuzzy-link-not-found))
+	       (puthash path
+			(org-element-map (plist-get info :parse-tree) 'target
+			  (lambda (blob)
+			    (and (equal (org-split-string
+					 (org-element-property :value blob))
+					path)
+				 blob))
+			  info t)
+			link-cache))))
      ;; Then try to find an element with a matching "#+NAME: path"
      ;; affiliated keyword.
      ((and (not match-title-p)
-	   (org-element-map (plist-get info :parse-tree)
-	       org-element-all-elements
-	     (lambda (el)
-	       (let ((name (org-element-property :name el)))
-		 (when (and name (equal (org-split-string name) path)) el)))
-	     info 'first-match)))
+	   (or (not (eq found-in-cache 'fuzzy-link-not-found))
+	       (puthash path
+			(org-element-map (plist-get info :parse-tree)
+			    org-element-all-elements
+			  (lambda (el)
+			    (let ((name (org-element-property :name el)))
+			      (when (and name
+					 (equal (org-split-string name) path))
+				el)))
+			  info 'first-match)
+			link-cache))))
      ;; Last case: link either points to a headline or to nothingness.
      ;; Try to find the source, with priority given to headlines with
      ;; the closest common ancestor.  If such candidate is found,

-- 
1.8.2-rc3

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] ox: Cache locations of fuzzy links
  2013-05-02  9:03           ` [PATCH v2] " Lawrence Mitchell
@ 2013-05-02 12:35             ` Nicolas Goaziou
  2013-05-02 12:53               ` Nicolas Goaziou
  0 siblings, 1 reply; 18+ messages in thread
From: Nicolas Goaziou @ 2013-05-02 12:35 UTC (permalink / raw)
  To: Lawrence Mitchell; +Cc: emacs-orgmode

Hello,

Lawrence Mitchell <wence@gmx.li> writes:

> * ox.el (org-export-resolve-fuzzy-link): Look for fuzzy link in a
>   cache before trying to resolve it in the parse tree.
>
> When a document contains a large number of identical fuzzy links, it
> doesn't make sense to continually search for them.  Instead, cache the
> locations in the position independent case.
> ---
>  lisp/ox.el | 42 +++++++++++++++++++++++++++++-------------
>  1 file changed, 29 insertions(+), 13 deletions(-)
>
> Changes since v1:
>
> - Pull initialisation of link-cache into let
> - Don't use cons cells for keys, just use the path
>   - lift found check to top-level let since it's now common

Thanks for the changes. 

> I've made this change.  Barring the eq test.
>
> Remember, paths are strings and two strings are only eq or eql if
> they are actually the same string (in memory).  In particular:
>
> (let ((p "foo")) (eq (substring p 1) (substring p 1))) => nil
> (let ((p "foo")) (eql (substring p 1) (substring p 1))) => nil
> (let ((p "foo")) (equal (substring p 1) (substring p 1))) => t
>
> Hence, we must use equal or string-equal as a test in the hash
> table.  But string-equal isn't a predefined test, hence equal.

Sorry for being dense, but why do you use _path_, which is a string and,
as you say, requires `equal' for equality, instead of the first argument
of the function, i.e. _link_, which only needs `eq'?


Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] ox: Cache locations of fuzzy links
  2013-05-02 12:35             ` Nicolas Goaziou
@ 2013-05-02 12:53               ` Nicolas Goaziou
  0 siblings, 0 replies; 18+ messages in thread
From: Nicolas Goaziou @ 2013-05-02 12:53 UTC (permalink / raw)
  To: Lawrence Mitchell; +Cc: emacs-orgmode

Correcting myself:

> Sorry for being dense, but why do you use _path_, which is a string and,
> as you say, requires `equal' for equality, instead of the first argument
> of the function, i.e. _link_, which only needs `eq'?

Forget it. Caching LINK will only be useful when resolving the very same
link several times, which is less useful.

I have applied your patch. Thank you.


Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Exporting large documents
  2013-04-29 16:04   ` Lawrence Mitchell
  2013-04-29 18:44     ` Achim Gratz
@ 2013-05-03  8:43     ` Carsten Dominik
  2013-05-03 11:12       ` Lawrence Mitchell
  1 sibling, 1 reply; 18+ messages in thread
From: Carsten Dominik @ 2013-05-03  8:43 UTC (permalink / raw)
  To: Lawrence Mitchell; +Cc: Nicolas Goaziou, emacs-orgmode@gnu.org orgmode

Hi Lawrence,

thanks for doing this.  Stuff to think about - but no good
ideas for improvements here either - I am just not familiar enough
with the export engine.  Nicolas, it would be interesting to
hear from you if you have comments and ideas about quadratic
behavior of the exporter, and if you think these are
inevitable.

My guess is that quadratic behavior would mostly result
from searches of the data structure.  From the data you
show it seems that most of the damage is done during
export, not during parsing.

- Carsten

On 29.4.2013, at 18:04, Lawrence Mitchell <wence@gmx.li> wrote:

> Carsten Dominik wrote:
>> Hi Achim,
> 
>> this is an interesting experiment, thank you!
> 
>> I think it would also be interesting to use elp to see which
>> function are taking up the non-linear time.
> 
> I did a bit of digging and here are the results.  No potential
> fixes though.
> 
> Taking the "Introduction" section of orgmanual.org and doubling
> it up so the buffer is 16x, 32x, 64x and 128x copies of the
> introduction and then running latex export having
> elp-instrumented the org package shows the following.
> 
> There are a few instances of quadratic behaviour that contribute
> to the slowdown.
> 
> Main culprit:
> 
> Name times-called cumulative-time time-per-call
> org-export-data 10132 29.364160173 0.0028981603
> org-export-data 20180 90.198301053 0.0044696878
> org-export-data 40276 316.37200089 0.0078550998
> org-export-data 80468 1155.4851323 0.0143595607
> 
> Less important but still a noticeable total runtime:
> 
> org-element-map 1133 2.6814707420 0.0023666996
> org-element-map 2285 10.799367732 0.0047262003
> org-element-map 4589 43.787327887 0.0095418016
> org-element-map 9197 173.27839595 0.0188407519
> 
> org-export-resolve-fuzzy-link 48 2.6659073480 0.0555397364
> org-export-resolve-fuzzy-link 96 10.766515020 0.1121511981
> org-export-resolve-fuzzy-link 192 43.725658059 0.2277378023
> org-export-resolve-fuzzy-link 384 173.15348462 0.4509205328
> 
> org-latex-link 144 2.6730487589 0.0185628386
> org-latex-link 288 10.783675007 0.0374433159
> org-latex-link 576 43.768676906 0.0759872862
> org-latex-link 1152 173.27176368 0.1504095170
> 
> 
> Unimportant but still quadratic:
> 
> org-export-get-headline-number 176 0.0036720380 2.086...e-05
> org-export-get-headline-number 352 0.0154215390 4.381...e-05
> org-export-get-headline-number 704 0.0636496679 9.041...e-05
> org-export-get-headline-number 1408 0.2382477599 0.0001692100
> 
> org-babel-get-inline-src-block-matches 112 0.0174396369 0.0001557110
> org-babel-get-inline-src-block-matches 224 0.0521645539 0.0002328774
> org-babel-get-inline-src-block-matches 448 0.182069907 0.0004064060
> org-babel-get-inline-src-block-matches 896 0.66889546 0.0007465351
> 
> org-babel-remove-result 112 0.0332858050 0.0002971946
> org-babel-remove-result 224 0.0837776260 0.0003740072
> org-babel-remove-result 448 0.2475016210 0.0005524589
> org-babel-remove-result 896 0.8013491290 0.0008943628
> 
> org-babel-where-is-src-block-result 112 0.0320815769 0.0002864426
> org-babel-where-is-src-block-result 224 0.081381881 0.0003633119
> org-babel-where-is-src-block-result 448 0.2425831529 0.0005414802
> org-babel-where-is-src-block-result 896 0.7915090309 0.0008833806
> 
> 
> 
> Cheers,
> 
> Lawrence
> 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Exporting large documents
  2013-05-03  8:43     ` Exporting large documents Carsten Dominik
@ 2013-05-03 11:12       ` Lawrence Mitchell
       [not found]         ` <877gjfgnl9.fsf@gmail.com>
  0 siblings, 1 reply; 18+ messages in thread
From: Lawrence Mitchell @ 2013-05-03 11:12 UTC (permalink / raw)
  To: emacs-orgmode; +Cc: Nicolas Goaziou, Carsten Dominik

Carsten Dominik wrote:
> Hi Lawrence,

> thanks for doing this.  Stuff to think about - but no good
> ideas for improvements here either - I am just not familiar enough
> with the export engine.  Nicolas, it would be interesting to
> hear from you if you have comments and ideas about quadratic
> behavior of the exporter, and if you think these are
> inevitable.

> My guess is that quadratic behavior would mostly result
> from searches of the data structure.  From the data you
> show it seems that most of the damage is done during
> export, not during parsing.

Yes, I think that's right.  The parse tree is built in one pass I
think, but some of the export requires walking over this tree a
lot.  It's unclear to me if this is necessary or not, or if the
export could be built with a single linear pass over the parse
tree.

Lawrence

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Exporting large documents
       [not found]             ` <87ip2xfd0x.fsf@gmail.com>
@ 2013-05-06 11:07               ` Lawrence Mitchell
  2013-05-06 16:15                 ` Lawrence Mitchell
  2013-05-06 18:41                 ` Achim Gratz
  0 siblings, 2 replies; 18+ messages in thread
From: Lawrence Mitchell @ 2013-05-06 11:07 UTC (permalink / raw)
  To: Nicolas Goaziou; +Cc: Lawrence Mitchell, emacs-orgmode, Carsten Dominik

[Reintroducing org mailing list CC]

On 05/05/2013 20:21, Nicolas Goaziou wrote:
> Carsten Dominik <carsten.dominik@gmail.com> writes:
>
>>> I don't think there's much to do about that. Though, some tools could
>>> benefit from caching, like Lawrence did for
>>> `org-export-resolve-fuzzy-link'.
>>
>> Could you point out specific ones where it would make sense?  Maybe
>> someone would like to take this up as a task.
>
> It requires some careful benchmarking. Though, good candidates are tools
> searching for an object or element within the full parse tree. This
> includes:
>
>    - org-export-footnote-first-reference-p
>    - org-export-get-footnote-number
>    - org-export-get-category
>    - org-export-resolve-coderef
>    - org-export-resolve-radio-link
>    - org-export-get-loc
>    - org-export-get-ordinal

I wonder if it would be possible to store a copy of the parse tree in a 
form that is more amenable to log or constant time searches.

However, I note that my caching of the fuzzy link stuff brought the 
quadratic time export of copies of the org manual introduction down to 
linear (or close to) time.

Most of the problem now seems to be that for big documents, many 
functions are called a /lot/.  For example:

org-element--current-element takes (on my machine) 0.0003 seconds per 
call.  However, when exporting 128x the orgmanual introduction, it's 
called around 250000 times giving ~ 80 seconds total time (out of ~200 
total).

So it sort of feels like actually what is needed is microoptimisations 
of the bits of the export engine that are called the most.

Lawrence


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Exporting large documents
  2013-05-06 11:07               ` Lawrence Mitchell
@ 2013-05-06 16:15                 ` Lawrence Mitchell
  2013-05-07 10:26                   ` Bastien
  2013-05-06 18:41                 ` Achim Gratz
  1 sibling, 1 reply; 18+ messages in thread
From: Lawrence Mitchell @ 2013-05-06 16:15 UTC (permalink / raw)
  To: emacs-orgmode

Lawrence Mitchell wrote:

[...]

And here's the profile for exporting the whole of orgmanual.org
to latex.  You can see that a lot of the time comes from quite
"cheap" functions that are called lots.

org-latex-export-as-latex          1           415.52740777  415.52740777
org-export-to-buffer               1           414.89446185  414.89446185
org-export-as                      1           414.89314727  414.89314727
org-entry-get                      362909      377.15499961  0.0010392550
org-export-execute-babel-code      1           248.09360062  248.09360062
org-babel-exp-process-buffer       1           248.08328800  248.08328800
org-babel-exp-src-block            137         243.10403972  1.7744820417
org-export-data                    33704       241.13159257  0.0071543909
org-babel-get-src-block-info       3811        241.06197704  0.0632542579
org-babel-parse-src-block-match    3811        239.00537768  0.0627146097
org-babel-exp-do-export            137         238.11539604  1.7380685842
org-babel-exp-code                 137         237.68002300  1.7348906788
org-babel-expand-noweb-references  24          237.62374387  9.9009893283
org-babel-params-from-properties   3811        237.29706205  0.0622663505
org-entry-get-with-inheritance     99086       234.87738424  0.0023704396
org-get-property-block             322856      121.21322218  0.0003754405
org-macro-replace-all              2           112.94112931  56.470564655
org-element-context                8588        107.43778316  0.0125102216
org-element-at-point               8976        101.79823538  0.0113411581
org-element--current-element       236519      91.414278451  0.0003864986
org-up-heading-safe                263822      84.507423221  0.0003203198
org-element--parse-elements        3113        40.774293673  0.0130980705
org-back-to-heading                1209279     40.394825581  3.340...e-05
org-element-headline-parser        59033       32.808386543  0.0005557634
org-before-first-heading-p         272799      32.791086826  0.0001202023
org-list-struct                    8035        20.801415973  0.0025888507
org-element-map                    19598       20.466600874  0.0010443208
org-latex-link                     341         19.91215084   0.0583934042
org-export-resolve-fuzzy-link      281         19.879402747  0.0707452055

...

Functions taking less than 15 seconds cumulative time elided.


-- 
Lawrence Mitchell <wence@gmx.li>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Exporting large documents
  2013-05-06 11:07               ` Lawrence Mitchell
  2013-05-06 16:15                 ` Lawrence Mitchell
@ 2013-05-06 18:41                 ` Achim Gratz
  2013-05-06 19:17                   ` Nicolas Goaziou
  1 sibling, 1 reply; 18+ messages in thread
From: Achim Gratz @ 2013-05-06 18:41 UTC (permalink / raw)
  To: emacs-orgmode

Lawrence Mitchell writes:
> org-element--current-element takes (on my machine) 0.0003 seconds per
> call.  However, when exporting 128x the orgmanual introduction, it's
> called around 250000 times giving ~ 80 seconds total time (out of ~200
> total).

I've traced this a bit and the question does warrant further
investigation.  Exporting the introduction without any duplications
already shows some interesting things: the property drawer for the
introduction is scanned a whopping 137 times, followed by 134 times the
cindex entry following it, followed by 125 times the "Summary" headline.
The header options feature prominently with around 100 scans each as
well.

The rest of the calls have mostly just a single invocation, but there
are some instances where parts of the tree are traversed multiple times
in succession to apparently adjust the :end property to the leaf element
in small increments or decrements.  If elements are mutable during
parsing then caching is more difficult as well, obviously.

> So it sort of feels like actually what is needed is microoptimisations
> of the bits of the export engine that are called the most.

Looking at the traces I'd think if we could eliminate the repeated
backtracking to adjust the leafs or at least skip over those elements in
a backtrack that are already fully parsed instead of parsing them again,
that would be a good start.


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Wavetables for the Terratec KOMPLEXER:
http://Synth.Stromeko.net/Downloads.html#KomplexerWaves

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Exporting large documents
  2013-05-06 18:41                 ` Achim Gratz
@ 2013-05-06 19:17                   ` Nicolas Goaziou
  2013-05-06 19:32                     ` Achim Gratz
  0 siblings, 1 reply; 18+ messages in thread
From: Nicolas Goaziou @ 2013-05-06 19:17 UTC (permalink / raw)
  To: Achim Gratz; +Cc: emacs-orgmode

Hello,

Achim Gratz <Stromeko@nexgo.de> writes:

> Lawrence Mitchell writes:
>> org-element--current-element takes (on my machine) 0.0003 seconds per
>> call.  However, when exporting 128x the orgmanual introduction, it's
>> called around 250000 times giving ~ 80 seconds total time (out of ~200
>> total).
>
> I've traced this a bit and the question does warrant further
> investigation.  Exporting the introduction without any duplications
> already shows some interesting things: the property drawer for the
> introduction is scanned a whopping 137 times, followed by 134 times the
> cindex entry following it, followed by 125 times the "Summary" headline.
> The header options feature prominently with around 100 scans each as
> well.
>
> The rest of the calls have mostly just a single invocation, but there
> are some instances where parts of the tree are traversed multiple times
> in succession to apparently adjust the :end property to the leaf element
> in small increments or decrements.  If elements are mutable during
> parsing then caching is more difficult as well, obviously.
>
>> So it sort of feels like actually what is needed is microoptimisations
>> of the bits of the export engine that are called the most.
>
> Looking at the traces I'd think if we could eliminate the repeated
> backtracking to adjust the leafs or at least skip over those elements in
> a backtrack that are already fully parsed instead of parsing them again,
> that would be a good start.

Actually this is a bit different. Parsing doesn't backtrack. Look at
`org-element-parse-buffer' through elp to see that elements are parsed
only once.

The problem comes from `org-element-at-point'. To be effective, it needs
to move back to the current headline, and start parsing buffer again
from there. That means the first element after the headline (often
a property drawer) will be parsed each time we need information within
the section.

A very good improvement for the exporter and, more importantly, for the
parser, would be to cache results from `org-element--current-element'.
Though, this cache would also need to be refreshed after each buffer
modification. This is the tricky part.

One solution would be to use `after-change-functions' and
`before-change-functions' to store intervals of modified areas in the
buffer. Then, during idle time, a `maphash' could update boundaries of
cached values or remove them completely, according to the intervals.


Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Exporting large documents
  2013-05-06 19:17                   ` Nicolas Goaziou
@ 2013-05-06 19:32                     ` Achim Gratz
  2013-05-07 14:29                       ` Nicolas Goaziou
  0 siblings, 1 reply; 18+ messages in thread
From: Achim Gratz @ 2013-05-06 19:32 UTC (permalink / raw)
  To: emacs-orgmode

Nicolas Goaziou writes:
> Actually this is a bit different. Parsing doesn't backtrack. Look at
> `org-element-parse-buffer' through elp to see that elements are parsed
> only once.

Sorry for the loose terminology.

> The problem comes from `org-element-at-point'. To be effective, it needs
> to move back to the current headline, and start parsing buffer again
> from there. That means the first element after the headline (often
> a property drawer) will be parsed each time we need information within
> the section.

What I was suggesting is that we might mark those elements that were
already successfully parsed so that org-element-at-point can just pick
up the markings and only parse again when there are none.  Best have
them in the buffer as properties and remove them when an edit is done.

> A very good improvement for the exporter and, more importantly, for the
> parser, would be to cache results from `org-element--current-element'.
> Though, this cache would also need to be refreshed after each buffer
> modification. This is the tricky part.

For the exporter we know that the buffer isn't changing, or is it?

> One solution would be to use `after-change-functions' and
> `before-change-functions' to store intervals of modified areas in the
> buffer. Then, during idle time, a `maphash' could update boundaries of
> cached values or remove them completely, according to the intervals.

I'd prefer to store this in the buffer, based on no data or experience…
:-)


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

SD adaptation for Waldorf microQ V2.22R2:
http://Synth.Stromeko.net/Downloads.html#WaldorfSDada

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Exporting large documents
  2013-05-06 16:15                 ` Lawrence Mitchell
@ 2013-05-07 10:26                   ` Bastien
  0 siblings, 0 replies; 18+ messages in thread
From: Bastien @ 2013-05-07 10:26 UTC (permalink / raw)
  To: Lawrence Mitchell; +Cc: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 664 bytes --]

Hi Lawrence and all,

Lawrence Mitchell <wence@gmx.li> writes:

> org-up-heading-safe                263822      84.507423221  0.0003203198
> org-element--parse-elements        3113        40.774293673  0.0130980705
> org-back-to-heading                1209279     40.394825581  3.340...e-05
> org-element-headline-parser        59033       32.808386543  0.0005557634

Given the above, I wonder how the attached patch would help microoptimizing
`org-up-heading-safe'.  The idea is that maybe we don't need the better error
that `org-up-heading-safe' provides, and the wrapping into (condition-case...)
may slow down things.  I've not tested -- feel free to test!


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: org.el4.patch --]
[-- Type: text/x-patch, Size: 511 bytes --]

diff --git a/lisp/org.el b/lisp/org.el
index 8ec6781..619fec4 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -22901,7 +22901,7 @@ Also, this function will be a lot faster than `outline-up-heading',
 because it relies on stars being the outline starters.  This can really
 make a significant difference in outlines with very many siblings."
   (let (start-level re)
-    (org-back-to-heading t)
+    (outline-back-to-heading t)
     (setq start-level (funcall outline-level))
     (if (equal start-level 1)
 	nil

[-- Attachment #3: Type: text/plain, Size: 14 bytes --]


-- 
 Bastien

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: Exporting large documents
  2013-05-06 19:32                     ` Achim Gratz
@ 2013-05-07 14:29                       ` Nicolas Goaziou
  0 siblings, 0 replies; 18+ messages in thread
From: Nicolas Goaziou @ 2013-05-07 14:29 UTC (permalink / raw)
  To: Achim Gratz; +Cc: emacs-orgmode

Hello,

Achim Gratz <Stromeko@nexgo.de> writes:

> Nicolas Goaziou writes:

>> The problem comes from `org-element-at-point'. To be effective, it needs
>> to move back to the current headline, and start parsing buffer again
>> from there. That means the first element after the headline (often
>> a property drawer) will be parsed each time we need information within
>> the section.
>
> What I was suggesting is that we might mark those elements that were
> already successfully parsed so that org-element-at-point can just pick
> up the markings and only parse again when there are none.  Best have
> them in the buffer as properties and remove them when an edit is done.

Are you talking about text properties? If so, I fail to see how they
could be used here. What would you store as a text property?

>> A very good improvement for the exporter and, more importantly, for the
>> parser, would be to cache results from `org-element--current-element'.
>> Though, this cache would also need to be refreshed after each buffer
>> modification. This is the tricky part.
>
> For the exporter we know that the buffer isn't changing, or is it?

Of course it is. Macros are expanded, Babel blocks executed, files
included. `org-element-at-point' is used between each of these steps.

Moreover, a speed-up for `org-element--current-element' has a broader
usefulness than just export.

>> One solution would be to use `after-change-functions' and
>> `before-change-functions' to store intervals of modified areas in the
>> buffer. Then, during idle time, a `maphash' could update boundaries of
>> cached values or remove them completely, according to the intervals.
>
> I'd prefer to store this in the buffer, based on no data or experience…
> :-)

I was thinking about a buffer-local variable storing the cache, and
another one storing the intervals. Again, I don't see how text
properties could fit in.


Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2013-05-07 14:30 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-27 19:28 Exporting large documents Achim Gratz
2013-04-27 19:35 ` Carsten Dominik
2013-04-29 16:04   ` Lawrence Mitchell
2013-04-29 18:44     ` Achim Gratz
2013-05-01 12:18       ` [PATCH] ox: Cache locations of fuzzy links Lawrence Mitchell
2013-05-01 21:46         ` Nicolas Goaziou
2013-05-02  9:03           ` [PATCH v2] " Lawrence Mitchell
2013-05-02 12:35             ` Nicolas Goaziou
2013-05-02 12:53               ` Nicolas Goaziou
2013-05-03  8:43     ` Exporting large documents Carsten Dominik
2013-05-03 11:12       ` Lawrence Mitchell
     [not found]         ` <877gjfgnl9.fsf@gmail.com>
     [not found]           ` <0F877AB5-D488-4223-B0E7-F11B4B973614@gmail.com>
     [not found]             ` <87ip2xfd0x.fsf@gmail.com>
2013-05-06 11:07               ` Lawrence Mitchell
2013-05-06 16:15                 ` Lawrence Mitchell
2013-05-07 10:26                   ` Bastien
2013-05-06 18:41                 ` Achim Gratz
2013-05-06 19:17                   ` Nicolas Goaziou
2013-05-06 19:32                     ` Achim Gratz
2013-05-07 14:29                       ` Nicolas Goaziou

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).