emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* [PATCH] Use cache in org-up-heading-safe
@ 2021-05-04 15:08 Ihor Radchenko
  2021-05-05 16:40 ` Maxim Nikulin
  2021-05-16  6:15 ` Bastien
  0 siblings, 2 replies; 11+ messages in thread
From: Ihor Radchenko @ 2021-05-04 15:08 UTC (permalink / raw)
  To: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 795 bytes --]

Hello,

While testing another patch for agenda fontification, I noticed that
agenda can spend up to half!! time doing org-up-heading-safe. Mostly
inside queries for inherited tags and properties.

I managed to make org-up-heading-safe up to 50x faster using position
cache.

If this patch also works for others, org-agenda-use-tag-inheritance
might become much less of an issue.

Benchmark:

1. (elp-instrument-function #'org-up-heading-safe)
2. Run agenda
3. (elp-results) ;; function, # calls, total time, avg time

Without cache:
org-up-heading-safe  27555       8.4234025759  0.0003056941

With cache, first run:
org-up-heading-safe  23227       0.5300747539  2.282...e-05

With cache, second run on unchanged buffer:
org-up-heading-safe  23227       0.1447754880  6.233...e-06

Best,
Ihor


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Use-cache-in-org-up-heading-safe.patch --]
[-- Type: text/x-diff, Size: 4449 bytes --]

From fe1e7094a7b76ba6bf282c5c4409a32b36827d56 Mon Sep 17 00:00:00 2001
Message-Id: <fe1e7094a7b76ba6bf282c5c4409a32b36827d56.1620140628.git.yantar92@gmail.com>
From: Ihor Radchenko <yantar92@gmail.com>
Date: Tue, 4 May 2021 22:55:14 +0800
Subject: [PATCH] Use cache in org-up-heading-safe

* lisp/org.el (org-up-heading-safe): Use buffer-local cache to store
positions of parent headings.  The cache is invalidated when buffer
text is changed, according to `buffer-chars-modified-tick'.
(org--up-heading-cache):  Buffer-local hash-table storing the cache.
(org--up-heading-cache-tick):  The buffer modification state for
currently active `org--up-heading-cache'.

The optimisation is critical when running agenda or org-entry-get
queries using property/tag inheritance.  In such scenarios
`org-up-heading-safe' can be called thousands of times.  For example,
building todo agenda on large number of headings lead to the following
benchmark results:

Benchmark:

1. (elp-instrument-function #'org-up-heading-safe)
2. Run agenda
3. (elp-results) ;; function, # calls, total time, avg time

Without cache:
org-up-heading-safe  27555       8.4234025759  0.0003056941

With cache, first run:
org-up-heading-safe  23227       0.5300747539  2.282...e-05

With cache, second run on unchanged buffer:
org-up-heading-safe  23227       0.1447754880  6.233...e-06

The final speedup is 1-2 orders of magnitude (~15-56 times).
---
 lisp/org.el | 43 +++++++++++++++++++++++++++++++++++++++----
 1 file changed, 39 insertions(+), 4 deletions(-)

diff --git a/lisp/org.el b/lisp/org.el
index b8678359f..4385903ee 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -19434,6 +19434,10 @@ (defun org-up-heading-all (arg)
 With argument, move up ARG levels."
   (outline-up-heading arg t))
 
+(defvar-local org--up-heading-cache (make-hash-table)
+  "Buffer-local `org-up-heading-safe' cache.")
+(defvar-local org--up-heading-cache-tick nil
+  "Buffer `buffer-chars-modified-tick' in `org--up-heading-cache'.")
 (defun org-up-heading-safe ()
   "Move to the heading line of which the present line is a subheading.
 This version will not throw an error.  It will return the level of the
@@ -19443,10 +19447,41 @@ (defun org-up-heading-safe ()
 because it relies on stars being the outline starters.  This can really
 make a significant difference in outlines with very many siblings."
   (when (ignore-errors (org-back-to-heading t))
-    (let ((level-up (1- (funcall outline-level))))
-      (and (> level-up 0)
-	   (re-search-backward (format "^\\*\\{1,%d\\} " level-up) nil t)
-	   (funcall outline-level)))))
+    (let ((level-cache (gethash (point) org--up-heading-cache)))
+      (if (and level-cache
+               (eq (buffer-chars-modified-tick) org--up-heading-cache-tick))
+          (when (<= (point-min) level-cache (point-max))
+            ;; Parent is inside accessible part of the buffer.
+            (progn (goto-char level-cache)
+                   (funcall outline-level)))
+        ;; Buffer modified.  Invalidate cache.
+        (when level-cache
+          (clrhash org--up-heading-cache)
+          (setq-local org--up-heading-cache-tick
+                      (buffer-chars-modified-tick)))
+        (let ((level-up (1- (funcall outline-level)))
+              (pos (point)))
+          (let (result)
+            (setq result
+                  (and (> level-up 0)
+	               (re-search-backward
+                        (format "^\\*\\{1,%d\\} " level-up) nil t)
+	               (funcall outline-level)))
+            (when result
+              (puthash pos (point) org--up-heading-cache))))))))
+;; (defun org-up-heading-safe ()
+;;   "Move to the heading line of which the present line is a subheading.
+;; This version will not throw an error.  It will return the level of the
+;; headline found, or nil if no higher level is found.
+
+;; Also, this function will be a lot faster than `outline-up-heading',
+;; because it relies on stars being the outline starters.  This can really
+;; make a significant difference in outlines with very many siblings."
+;;   (when (ignore-errors (org-back-to-heading t))
+;;     (let ((level-up (1- (funcall outline-level))))
+;;       (and (> level-up 0)
+;; 	   (re-search-backward (format "^\\*\\{1,%d\\} " level-up) nil t)
+;; 	   (funcall outline-level)))))
 
 (defun org-up-heading-or-point-min ()
   "Move to the heading line of which the present is a subheading, or point-min.
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] Use cache in org-up-heading-safe
  2021-05-04 15:08 [PATCH] Use cache in org-up-heading-safe Ihor Radchenko
@ 2021-05-05 16:40 ` Maxim Nikulin
  2021-05-06 14:34   ` Ihor Radchenko
  2021-05-16  6:15 ` Bastien
  1 sibling, 1 reply; 11+ messages in thread
From: Maxim Nikulin @ 2021-05-05 16:40 UTC (permalink / raw)
  To: emacs-orgmode

On 04/05/2021 22:08, Ihor Radchenko wrote:
> 
> While testing another patch for agenda fontification, I noticed that
> agenda can spend up to half!! time doing org-up-heading-safe. Mostly
> inside queries for inherited tags and properties.
> 
> I managed to make org-up-heading-safe up to 50x faster using position
> cache.

I have not tested the patch due to I do not use agenda. My interest was 
stimulated solely by my attempts to make org-refile-get-targets faster.

I consider it as an improvement. I have noticed the only thing that must 
be fixed: comments with old variant of the function. Reaction to my 
other comments is optional.

In org-agenda.el org-up-heading-safe function is called only from 
org-find-top-headline. So the purpose of the dance with backward 
searches is to get top level headings. My expectation is that scan 
through the whole buffer and storing result in a structure that allows 
binary search of position (array, red-black tree, etc) may be even faster.

Alternatively lazily obtained position of top heading could be stored in 
cache to reduce number of iterations on org-find-top-line.

> +    (let ((level-cache (gethash (point) org--up-heading-cache)))
> +      (if (and level-cache
> +               (eq (buffer-chars-modified-tick) org--up-heading-cache-tick))

If buffer-chars-modified-tick is faster than gethash then reordering 
might result in very slight improvement of performance.

> +            ;; Parent is inside accessible part of the buffer.
> +            (progn (goto-char level-cache)
> +                   (funcall outline-level)))

I do not see any reason why outline level can not be cached in pair with 
position.

> +          (let (result)
> +            (setq result

I am not a lisp guru, so from my point of view this can be reduced to
(let ((result ...

> +                        (format "^\\*\\{1,%d\\} " level-up) nil t)

\t as an alternative to " " is used in org-refile.el,
e.g. "^\\*+[ \t]+". Unsure which variant is canonical one and I see that 
regexp is taken from original function, so no regression is expected.

> +;; (defun org-up-heading-safe ()
> +;;   "Move to the heading line of which the present line is a subheading.

Clean-up here is required, other notes may be ignored.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Use cache in org-up-heading-safe
  2021-05-05 16:40 ` Maxim Nikulin
@ 2021-05-06 14:34   ` Ihor Radchenko
  2021-05-06 17:02     ` Maxim Nikulin
  0 siblings, 1 reply; 11+ messages in thread
From: Ihor Radchenko @ 2021-05-06 14:34 UTC (permalink / raw)
  To: Maxim Nikulin; +Cc: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 4271 bytes --]

Maxim Nikulin <manikulin@gmail.com> writes:
> I have not tested the patch due to I do not use agenda. My interest was 
> stimulated solely by my attempts to make org-refile-get-targets faster.

Thanks for the feedback!

> I consider it as an improvement. I have noticed the only thing that must 
> be fixed: comments with old variant of the function. Reaction to my 
> other comments is optional.

Oops. Fixed.

> In org-agenda.el org-up-heading-safe function is called only from 
> org-find-top-headline.

That's probably not the slowest part. org-agenda also calls
org-up-heading-safe indirectly. In particular, org-get-tags is calling
org-up-heading-safe to get inherited tags. It is org-get-tags that is
taking >50% time of agenda generation in my agendas. And
org-up-heading-safe was the largest contributor to that >50% time.

> ...  So the purpose of the dance with backward 
> searches is to get top level headings. My expectation is that scan 
> through the whole buffer and storing result in a structure that allows 
> binary search of position (array, red-black tree, etc) may be even
> faster.

Scan through the whole buffer could be faster, but it is not always
desired. Most of Org code only need information for current headline.
Re-scanning the whole buffer would be costly.

Also, I tried to compare avl-tree with hash table. However, avl-tree
does not give much benefit for my test case of an agenda with ~12000
todo items. Since avl-tree requires much more custom code, hash table
should be better.

;; No cache
;; org-up-heading-safe  180493      285.37713697  0.0015810980
;; org-up-heading-safe  134876      199.44584854  0.0014787349
;; org-up-heading-safe  134872      198.23494205  0.0014698005

;; Hash cache
;; org-up-heading-safe  180493      5.0327451709  2.788...e-05
;; org-up-heading-safe  134872      1.3568663460  1.006...e-05
;; org-up-heading-safe  134872      0.7527555679  5.581...e-06

;; AVL cache
;; org-up-heading-safe  180500      5.1044455589  2.827...e-05
;; org-up-heading-safe  134872      1.2781428280  9.476...e-06
;; org-up-heading-safe  134872      1.2866258809  9.539...e-06

> Alternatively lazily obtained position of top heading could be stored in 
> cache to reduce number of iterations on org-find-top-line.

That one is used for org-agenda-filter-by-top-headline. I do not really
use it, so I have no clue if there is much need to optimise it
specifically. Yet, caching org-up-heading-safe will indeed speed it up
as well.

>> +    (let ((level-cache (gethash (point) org--up-heading-cache)))
>> +      (if (and level-cache
>> +               (eq (buffer-chars-modified-tick) org--up-heading-cache-tick))
>
> If buffer-chars-modified-tick is faster than gethash then reordering 
> might result in very slight improvement of performance.

Sure. Done.

>> +            ;; Parent is inside accessible part of the buffer.
>> +            (progn (goto-char level-cache)
>> +                   (funcall outline-level)))
>
> I do not see any reason why outline level can not be cached in pair with 
> position.

Hmm. You are right. Benchmarking with and without additional caching
shows that it can give extra ~2x improvement:

;; Hash cache
;; org-up-heading-safe  180493      5.0327451709  2.788...e-05
;; org-up-heading-safe  134872      1.3568663460  1.006...e-05
;; org-up-heading-safe  134872      0.7527555679  5.581...e-06

;; Hash cache + outline-level cache
;; org-up-heading-safe  180507      4.3326449169  2.400...e-05
;; org-up-heading-safe  134872      0.4952472380  3.671...e-06
;; org-up-heading-safe  134879      0.5298789350  3.928...e-06

So, outline-level is also cached in the new version of the patch.

>> +          (let (result)
>> +            (setq result
>
> I am not a lisp guru, so from my point of view this can be reduced to
> (let ((result ...

Sure.

>> +                        (format "^\\*\\{1,%d\\} " level-up) nil t)
>
> \t as an alternative to " " is used in org-refile.el,
> e.g. "^\\*+[ \t]+". Unsure which variant is canonical one and I see that 
> regexp is taken from original function, so no regression is expected.

I do not know either. Actually, I wish if Org mode code base used less
literal strings and more regexp variables from org-element.el and
org.el.

Best,
Ihor


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Use-cache-in-org-up-heading-safe.patch --]
[-- Type: text/x-diff, Size: 3775 bytes --]

From d40b2bbb1dc5113d3085be2fae52ebe5ac1b023d Mon Sep 17 00:00:00 2001
Message-Id: <d40b2bbb1dc5113d3085be2fae52ebe5ac1b023d.1620299446.git.yantar92@gmail.com>
From: Ihor Radchenko <yantar92@gmail.com>
Date: Thu, 6 May 2021 14:13:20 +0800
Subject: [PATCH] Use cache in org-up-heading-safe

* lisp/org.el (org-up-heading-safe): Use buffer-local cache to store
positions of parent headings.  The cache is invalidated when buffer
text is changed, according to `buffer-chars-modified-tick'.
(org--up-heading-cache):  Buffer-local hash-table storing the cache.
(org--up-heading-cache-tick):  The buffer modification state for
currently active `org--up-heading-cache'.

The optimisation is critical when running agenda or org-entry-get
queries using property/tag inheritance.  In such scenarios
`org-up-heading-safe' can be called thousands of times.  For example,
building todo agenda on large number of headings lead to the following
benchmark results:

Benchmark:

1. (elp-instrument-function #'org-up-heading-safe)
2. Run agenda
3. (elp-results) ;; function, # calls, total time, avg time

Without cache:
org-up-heading-safe  27555       8.4234025759  0.0003056941

With cache, first run:
org-up-heading-safe  23227       0.5300747539  2.282...e-05

With cache, second run on unchanged buffer:
org-up-heading-safe  23227       0.1447754880  6.233...e-06

The final speedup is 1-2 orders of magnitude (~15-56 times).
---
 lisp/org.el | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/lisp/org.el b/lisp/org.el
index 0ff13c79c..c7e87a804 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -20440,6 +20440,10 @@ (defun org-up-heading-all (arg)
 With argument, move up ARG levels."
   (outline-up-heading arg t))
 
+(defvar-local org--up-heading-cache (make-hash-table)
+  "Buffer-local `org-up-heading-safe' cache.")
+(defvar-local org--up-heading-cache-tick nil
+  "Buffer `buffer-chars-modified-tick' in `org--up-heading-cache'.")
 (defun org-up-heading-safe ()
   "Move to the heading line of which the present line is a subheading.
 This version will not throw an error.  It will return the level of the
@@ -20449,10 +20453,26 @@ (defun org-up-heading-safe ()
 because it relies on stars being the outline starters.  This can really
 make a significant difference in outlines with very many siblings."
   (when (ignore-errors (org-back-to-heading t))
-    (let ((level-up (1- (funcall outline-level))))
-      (and (> level-up 0)
-	   (re-search-backward (format "^\\*\\{1,%d\\} " level-up) nil t)
-	   (funcall outline-level)))))
+    (let (level-cache)
+      (if (and (eq (buffer-chars-modified-tick) org--up-heading-cache-tick)
+               (setq level-cache (gethash (point) org--up-heading-cache)))
+          (when (<= (point-min) (car level-cache) (point-max))
+            ;; Parent is inside accessible part of the buffer.
+            (progn (goto-char (car level-cache))
+                   (cdr level-cache)))
+        ;; Buffer modified.  Invalidate cache.
+        (unless (eq (buffer-chars-modified-tick) org--up-heading-cache-tick)
+          (setq-local org--up-heading-cache-tick
+                      (buffer-chars-modified-tick))
+          (clrhash org--up-heading-cache))
+        (let* ((level-up (1- (funcall outline-level)))
+               (pos (point))
+               (result (and (> level-up 0)
+	                    (re-search-backward
+                             (format "^\\*\\{1,%d\\} " level-up) nil t)
+	                    (funcall outline-level))))
+          (when result (puthash pos (cons (point) result) org--up-heading-cache))
+          result)))))
 
 (defun org-up-heading-or-point-min ()
   "Move to the heading line of which the present is a subheading, or point-min.
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] Use cache in org-up-heading-safe
  2021-05-06 14:34   ` Ihor Radchenko
@ 2021-05-06 17:02     ` Maxim Nikulin
  2021-05-07  2:08       ` Ihor Radchenko
  0 siblings, 1 reply; 11+ messages in thread
From: Maxim Nikulin @ 2021-05-06 17:02 UTC (permalink / raw)
  To: emacs-orgmode

Though I still have not tested the patch, I think, it is an improvement 
and it is helpful in its current form. I am unable to tell if it follows 
code style.

Despite continuing discussion, I am unsure if it could be significantly 
better.

On 06/05/2021 21:34, Ihor Radchenko wrote:
> Maxim Nikulin writes:
>> In org-agenda.el org-up-heading-safe function is called only from
>> org-find-top-headline.
> 
> That's probably not the slowest part. org-agenda also calls
> org-up-heading-safe indirectly. In particular, org-get-tags is calling
> org-up-heading-safe to get inherited tags. It is org-get-tags that is
> taking >50% time of agenda generation in my agendas. And
> org-up-heading-safe was the largest contributor to that >50% time.

My bad, you mentioned tags earlier, but I grepped org-agenda.el only.
My new idea is that org-get-tags may have its own cache as well. Unsure 
if it should be tried.

> Scan through the whole buffer could be faster, but it is not always
> desired. Most of Org code only need information for current headline.
> Re-scanning the whole buffer would be costly.
> 
> Also, I tried to compare avl-tree with hash table. However, avl-tree
> does not give much benefit for my test case of an agenda with ~12000
> todo items. Since avl-tree requires much more custom code, hash table
> should be better.

Did you just replace gethash by avl-tree? Likely my idea is based on a 
wrong assumption. I hoped that having positions of headers it is 
possible to avoid jumps (goto-char ...) preceded or followed by regexp 
matching almost completely. Previous header for arbitrary initial 
position can be found using binary search through structure obtained 
during scan.

> +	                    (re-search-backward
> +                             (format "^\\*\\{1,%d\\} " level-up) nil t)
> +	                    (funcall outline-level))))

Unsure concerning the following optimization from the point of 
readability and reliability in respect to future modifications. Outline 
level can be derived from the length of matched string without the 
funcall requiring extra regexp.

If originally this code path had 50% contribution and performance 
already becomes several times better, further optimization of this part 
does not matter.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Use cache in org-up-heading-safe
  2021-05-06 17:02     ` Maxim Nikulin
@ 2021-05-07  2:08       ` Ihor Radchenko
  2021-05-08 11:28         ` Maxim Nikulin
  2021-05-15 11:58         ` Bastien
  0 siblings, 2 replies; 11+ messages in thread
From: Ihor Radchenko @ 2021-05-07  2:08 UTC (permalink / raw)
  To: Maxim Nikulin; +Cc: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 2193 bytes --]

Maxim Nikulin <manikulin@gmail.com> writes:

> Though I still have not tested the patch, I think, it is an improvement 
> and it is helpful in its current form. I am unable to tell if it follows 
> code style.

FYI, this patch may also slightly improve the performance of
org-get-outline-path.

> My bad, you mentioned tags earlier, but I grepped org-agenda.el only.
> My new idea is that org-get-tags may have its own cache as well. Unsure 
> if it should be tried.

I am trying it now ;) No benchmarks yet, but it also provides a
subjective improvement. However, I am trying to reuse org-element-cache
for tags. org-element-cache is smart enough to update itself partially
on buffer changes. However, there are (known) bugs in org-element-cache.
I managed to track down some, but still need to test thoughtfully before
submitting upstream.

> Did you just replace gethash by avl-tree?

Yes

> Likely my idea is based on a 
> wrong assumption. I hoped that having positions of headers it is 
> possible to avoid jumps (goto-char ...) preceded or followed by regexp 
> matching almost completely. Previous header for arbitrary initial 
> position can be found using binary search through structure obtained 
> during scan.

Sorry, I cannot follow what you mean. The call to goto-char in
org-up-heading-safe is required by function docstring - we need to move
point to the parent heading.

>> +	                    (re-search-backward
>> +                             (format "^\\*\\{1,%d\\} " level-up) nil t)
>> +	                    (funcall outline-level))))
>
> Unsure concerning the following optimization from the point of 
> readability and reliability in respect to future modifications. Outline 
> level can be derived from the length of matched string without the 
> funcall requiring extra regexp.

I am not sure here. outline-level also consults outline-heading-alist,
though I found no references to that variable in Org mode code.
Otherwise, outline-level already reuses match-data. There is no regexp
matching there. 

P.S. I just found that hash-tables are created by reference, so we need
to call make-hash-table separately in each buffer with cache. Updated
patch attached.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Use-cache-in-org-up-heading-safe.patch --]
[-- Type: text/x-diff, Size: 3857 bytes --]

From 08a175752b14f84767a71665773aa64807606af4 Mon Sep 17 00:00:00 2001
Message-Id: <08a175752b14f84767a71665773aa64807606af4.1620316036.git.yantar92@gmail.com>
From: Ihor Radchenko <yantar92@gmail.com>
Date: Thu, 6 May 2021 14:13:20 +0800
Subject: [PATCH] Use cache in org-up-heading-safe

* lisp/org.el (org-up-heading-safe): Use buffer-local cache to store
positions of parent headings.  The cache is invalidated when buffer
text is changed, according to `buffer-chars-modified-tick'.
(org--up-heading-cache):  Buffer-local hash-table storing the cache.
(org--up-heading-cache-tick):  The buffer modification state for
currently active `org--up-heading-cache'.

The optimisation is critical when running agenda or org-entry-get
queries using property/tag inheritance.  In such scenarios
`org-up-heading-safe' can be called thousands of times.  For example,
building todo agenda on large number of headings lead to the following
benchmark results:

Benchmark:

1. (elp-instrument-function #'org-up-heading-safe)
2. Run agenda
3. (elp-results) ;; function, # calls, total time, avg time

Without cache:
org-up-heading-safe  27555       8.4234025759  0.0003056941

With cache, first run:
org-up-heading-safe  23227       0.5300747539  2.282...e-05

With cache, second run on unchanged buffer:
org-up-heading-safe  23227       0.1447754880  6.233...e-06

The final speedup is 1-2 orders of magnitude (~15-56 times).
---
 lisp/org.el | 30 ++++++++++++++++++++++++++----
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/lisp/org.el b/lisp/org.el
index 0ff13c79c..7c58f0e2e 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -20440,6 +20440,10 @@ (defun org-up-heading-all (arg)
 With argument, move up ARG levels."
   (outline-up-heading arg t))
 
+(defvar-local org--up-heading-cache nil
+  "Buffer-local `org-up-heading-safe' cache.")
+(defvar-local org--up-heading-cache-tick nil
+  "Buffer `buffer-chars-modified-tick' in `org--up-heading-cache'.")
 (defun org-up-heading-safe ()
   "Move to the heading line of which the present line is a subheading.
 This version will not throw an error.  It will return the level of the
@@ -20449,10 +20453,28 @@ (defun org-up-heading-safe ()
 because it relies on stars being the outline starters.  This can really
 make a significant difference in outlines with very many siblings."
   (when (ignore-errors (org-back-to-heading t))
-    (let ((level-up (1- (funcall outline-level))))
-      (and (> level-up 0)
-	   (re-search-backward (format "^\\*\\{1,%d\\} " level-up) nil t)
-	   (funcall outline-level)))))
+    (let (level-cache)
+      (unless org--up-heading-cache
+        (setq org--up-heading-cache (make-hash-table)))
+      (if (and (eq (buffer-chars-modified-tick) org--up-heading-cache-tick)
+               (setq level-cache (gethash (point) org--up-heading-cache)))
+          (when (<= (point-min) (car level-cache) (point-max))
+            ;; Parent is inside accessible part of the buffer.
+            (progn (goto-char (car level-cache))
+                   (cdr level-cache)))
+        ;; Buffer modified.  Invalidate cache.
+        (unless (eq (buffer-chars-modified-tick) org--up-heading-cache-tick)
+          (setq-local org--up-heading-cache-tick
+                      (buffer-chars-modified-tick))
+          (clrhash org--up-heading-cache))
+        (let* ((level-up (1- (funcall outline-level)))
+               (pos (point))
+               (result (and (> level-up 0)
+	                    (re-search-backward
+                             (format "^\\*\\{1,%d\\} " level-up) nil t)
+	                    (funcall outline-level))))
+          (when result (puthash pos (cons (point) result) org--up-heading-cache))
+          result)))))
 
 (defun org-up-heading-or-point-min ()
   "Move to the heading line of which the present is a subheading, or point-min.
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] Use cache in org-up-heading-safe
  2021-05-07  2:08       ` Ihor Radchenko
@ 2021-05-08 11:28         ` Maxim Nikulin
  2021-05-10 15:14           ` Ihor Radchenko
  2021-05-15 11:58         ` Bastien
  1 sibling, 1 reply; 11+ messages in thread
From: Maxim Nikulin @ 2021-05-08 11:28 UTC (permalink / raw)
  To: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 3054 bytes --]

On 07/05/2021 09:08, Ihor Radchenko wrote:
> Maxim Nikulin writes:
>> Did you just replace gethash by avl-tree?
> 
> Yes
> 
>> Likely my idea is based on a
>> wrong assumption. I hoped that having positions of headers it is
>> possible to avoid jumps (goto-char ...) preceded or followed by regexp
>> matching almost completely. Previous header for arbitrary initial
>> position can be found using binary search through structure obtained
>> during scan.
> 
> Sorry, I cannot follow what you mean. The call to goto-char in
> org-up-heading-safe is required by function docstring - we need to move
> point to the parent heading.

I am trying to minimize number of regexp searches. Mostly it is applied 
when information concerning multiple headings is required (agenda, 
refile targets). It unlikely will get some benefits during interactive 
calls related to single heading.

For a file having 3000 headings, scanning though it takes ~0.1sec to get 
the following properties: position, level, list of parents (with same 
properties). Note that no expensive operations are performed like 
cleaning up of heading title.

Having list of headings (and its length), it is possible to build a tree 
for binary search in linear time. It takes ~0.01sec.

Having the tree, it is possible to instantly find heading for 
*arbitrary* position in the buffer. Likely the next operation is goto to 
the heading or to it parent and parsing the line for more detailed 
properties. The latter is cacheable, structure for heading properties 
can be expanded.

Hash works only for fixed set of positions, to use hash it is necessary 
to find position of the heading at first. On the other hand, to take 
advantage of binary tree, more substantial modification of code is required.

Since there is no operations as insertion or deletion of nodes from 
tree, no complex code is required to implement balancing rotations. That 
is why I think that avl-tree is an overkill.

See the attachment for experimental (thus almost untested) code. Likely 
you will find code style quite ugly. I am not happy with 0.1 sec for a 
moderately large file. It is close to the limit for comfortable 
interactive operations.

>>> +	                    (re-search-backward
>>> +                             (format "^\\*\\{1,%d\\} " level-up) nil t)
>>> +	                    (funcall outline-level))))
>>
>> Unsure concerning the following optimization from the point of
>> readability and reliability in respect to future modifications. Outline
>> level can be derived from the length of matched string without the
>> funcall requiring extra regexp.
> 
> I am not sure here. outline-level also consults outline-heading-alist,
> though I found no references to that variable in Org mode code.
> Otherwise, outline-level already reuses match-data. There is no regexp
> matching there.

Sorry. You are right. The function does what I proposed to write 
explicitly. For some reason I believed that outline-level calls 
something like looking-at. Maybe I checked it earlier and completely forgot.


[-- Attachment #2: nm-btree.org --]
[-- Type: text/plain, Size: 4764 bytes --]


#+begin_src elisp :results none

  ;; Heading properties
  (defun nm-heading-properties-new (position level parents)
    "Heading properties: (position . (level . parent))"
    (cons position (cons level parents)))

  (defun nm-heading-properties-level (props)
    (cadr props))

  (defun nm-heading-properties-pos (props)
    (car props))

  (defun nm-heading-properties-parents (props)
    (cddr props))

  (defun nm-heading-pos-lessp (value props)
    (< value (nm-heading-properties-pos props)))

  (defun nm-buffer-headings-reversed (buffer)
    (with-current-buffer buffer
      (save-restriction
	(save-excursion
	  (widen)
	  (goto-char (point-min))
	  (let ((count 0)
		(headings ())
		(parents ()))
	    (while (re-search-forward org-outline-regexp-bol nil t)
	      (let* ((pos (match-beginning 0))
		     (level (- (match-end 0) pos 1)))
		(while (and parents (>= (nm-heading-properties-level (car parents)) level))
		  (pop parents))
		(setq count (1+ count))
		(let ((props (nm-heading-properties-new pos level parents)))
		  (push props headings)
		  (push props parents))))
	    (and headings (cons headings count)))))))


  ;; binary search tree
  (defun nm-btree-new-node ()
    "((left right) . properties"
    (cons (cons nil nil) nil))

  (defun nm-btree-node-left (node)
    (caar node))

  (defun nm-btree-node-set-left (node child)
    (setcar (car node) child))

  (defun nm-btree-node-set-right (node child)
    (setcdr (car node) child))

  (defun nm-btree-node-right (node)
    (cdar node))

  (defun nm-btree-node-properties (node)
    (cdr node))

  (defun nm-btree-node-set-properties (node properties)
    (setcdr node properties))

  (defun nm-btree-from-reversed (scan-result)
    (and
     scan-result
     (let* ((key-properties-list (car scan-result))
	    (length (cdr scan-result))
	    (head (nm-btree-new-node))
	    (queue (list (cons length head)))) ; list of (count . node)
       (while queue
	 (let* ((item (pop queue))
		(count (car item))
		(node (cdr item)))
	   (cond
	    ((eq count 1) ; leaf or only single child
	     (nm-btree-node-set-properties node (pop key-properties-list)))
	    ((nm-btree-node-right node) ; right children completed
	     (nm-btree-node-set-properties node (pop key-properties-list))
	     (let ((left-node (nm-btree-new-node)))
	       (nm-btree-node-set-left node left-node)
	       (push (cons (1- count) left-node) queue)))
	    (t (let* ((right-count (/ (car item) 2))
		      (right-node (nm-btree-new-node)))
		 (nm-btree-node-set-right node right-node)
		 (setcar item (- count right-count))
		 (push item queue)
		 (push (cons right-count right-node) queue))))))
       head)))

  (defun nm-btree-find-left (tree value &optional cmp)
    "Find last element not less than value"
    (let ((cmp (or cmp #'nm-heading-pos-lessp))
	  (result nil))
      (while tree
	(setq tree (if (funcall cmp value (nm-btree-node-properties tree))
		       (nm-btree-node-left tree)
		     (setq result tree)
		     (nm-btree-node-right tree))))
      (nm-btree-node-properties result)))
#+end_src

#+begin_src elisp
  (byte-compile #'nm-buffer-headings-reversed)
  (byte-compile #'nm-btree-from-reversed)
  (byte-compile #'nm-btree-find-left)

  (let* ((buffer "notes.org")
	 (scan-result (nm-buffer-headings-reversed buffer))
	 (tree (nm-btree-from-reversed scan-result))
	 (lim (with-current-buffer buffer
		(save-restriction
		  (widen)
		  (point-max)))))
    (list
     (append '("scan x10")
	     (benchmark-run 10
	       (nm-buffer-headings-reversed buffer)))
     (append '("btree x10")
	     (benchmark-run 10
	       (nm-btree-from-reversed scan-result)))
     (append '("scan+btree x10")
	     (benchmark-run 10
	       (let* ((scan-result1 (nm-buffer-headings-reversed buffer))
		      (tree1 (nm-btree-from-reversed scan-result1)))
		 tree1)))
     (append '("find random x10 000")
	     (benchmark-run 10000
	       (nm-btree-find-left tree (random lim))))
     (list "nodes" (cdr scan-result) "" "")))
#+end_src

#+RESULTS:
| scan x10            |   0.8611382689999999 | 0 |                 0.0 |
| btree x10           |  0.07705962400000001 | 1 | 0.05648322199999978 |
| scan+btree x10      |          0.940467238 | 1 | 0.05685373699999996 |
| find random x10 000 | 0.047712096999999995 | 0 |                 0.0 |
| nodes               |                 3413 |   |                     |

Without ~byte-compile~
| scan x10            |  1.2031535999999998 |  4 | 0.22845906700000018 |
| btree x10           |         0.498214241 |  6 | 0.34067464299999894 |
| scan+btree x10      |  1.7026304230000002 | 10 |  0.5686926149999998 |
| find random x10 000 | 0.08789912700000001 |  0 |                 0.0 |
| nodes               |                3413 |    |                     |

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Use cache in org-up-heading-safe
  2021-05-08 11:28         ` Maxim Nikulin
@ 2021-05-10 15:14           ` Ihor Radchenko
  0 siblings, 0 replies; 11+ messages in thread
From: Ihor Radchenko @ 2021-05-10 15:14 UTC (permalink / raw)
  To: Maxim Nikulin; +Cc: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 3913 bytes --]

Maxim Nikulin <manikulin@gmail.com> writes:

> I am trying to minimize number of regexp searches. Mostly it is applied 
> when information concerning multiple headings is required (agenda, 
> refile targets). It unlikely will get some benefits during interactive 
> calls related to single heading.

> Having the tree, it is possible to instantly find heading for 
> *arbitrary* position in the buffer. Likely the next operation is goto to 
> the heading or to it parent and parsing the line for more detailed 
> properties. The latter is cacheable, structure for heading properties 
> can be expanded.

Thanks for the explanation! I very much like this idea, though I do not
think that it is going to be practical on actual large files. Your code
runs for 0.25sec on my largest file:

| scan x10                         | 2.482179163 | 1 | 0.38996822200002157 |
| btree x10                        | 0.103329151 | 0 |                 0.0 |
| scan+btree x10                   | 2.666915096 | 1 |  0.3657946569999808 |
| find random x10 000              | 0.048668756 | 0 |                 0.0 |
| nodes                            |       16641 |   |                     |

> Since there is no operations as insertion or deletion of nodes from 
> tree, no complex code is required to implement balancing rotations. That 
> is why I think that avl-tree is an overkill.

Yet, if we could keep the tree in sync with buffer modifications... And
we can, using org-element-cache. It already uses AVL-tree and already
syncs it automatically on buffer edits. The only problem is that it does
not work with headlines. However, I wrote some code adding headline
support to org-element-cache in https://github.com/yantar92/org.

Some benchmarks:

| scan x10                                 |        2.179845106 | 0 |               0.0 |
| btree x10                                |        0.474887597 | 1 | 0.364201286999986 |
| scan+btree x10                           |        2.318029902 | 0 |               0.0 |
| scan using org-element-cache x10         | 2.8941826990000004 | 0 |               0.0 |
| populating the cache x1                  |       11.332366941 | 5 | 2.040159146999997 |
| find random x10 000                      |        0.053074866 | 0 |               0.0 |
| find random in org-element-cache x10 000 |        0.007963375 | 0 |               0.0 |
| nodes                                    |              16641 |   |                   |

Scan through the file takes pretty much the same time with the btree
approach. Sometimes more, sometimes less. And most of the time is really
just re-search-forward. Find random showcases that avl-tree vs. btree
really makes a difference on such large number of headings.

On the worse side, initial population of the org-element-cache takes
quite a bit of time. However, it is normally done asynchronously during
idle time and does not really affect the user except the first loading
(which might be further optimised if we store cache between Emacs
sessions). Later, cache is updated automatically with buffer
modifications.

Moreover, the org-element-cache will already provide the parsed
headlines, including titles (and tags and other useful staff). No need
to match headline text with regexp.

For me, the idea of storing headlines in cache looks promising. I can
imagine many other Org functions using the cache after trivial
modifications. At least org-get-tags, org-get-category, org-entry-get
can be easily adapted to use the cache. What do you think?

> See the attachment for experimental (thus almost untested) code. Likely 
> you will find code style quite ugly. I am not happy with 0.1 sec for a 
> moderately large file. It is close to the limit for comfortable 
> interactive operations.

No worries about the style. It is testing anyway.

I modified your code slightly for testing the org-element-cache. See the
attached diff.

Best,
Ihor


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: diff-btree.diff --]
[-- Type: text/x-diff, Size: 3548 bytes --]

diff -u /tmp/nm-btree-1.org /tmp/nm-btree.org
--- /tmp/nm-btree-1.org	2021-05-10 23:04:21.286395360 +0800
+++ /tmp/nm-btree.org	2021-05-10 23:02:17.586396326 +0800
@@ -1,3 +1,6 @@
+:PROPERTIES:
+:ID:       0aba93db-491d-46f7-b952-e138ba179a12
+:END:
 
 #+begin_src elisp :results none
 
@@ -38,6 +41,26 @@
 		(push props parents))))
 	  (and headings (cons headings count)))))))
 
+(defun nm-buffer-headings-reversed-cache (buffer)
+  (with-current-buffer buffer
+    (save-restriction
+      (save-excursion
+	(widen)
+	(goto-char (point-min))
+	(let ((count 0)
+	      (headings ())
+	      (parents ()))
+	  (while (re-search-forward org-outline-regexp-bol nil t)
+            (let ((cached (org-element--cache-find (line-beginning-position))))
+              (if (eq (org-element-property :begin cached) (line-beginning-position))
+                  (push cached headings)
+                (push (org-element-at-point) headings)))
+            
+            ;; (save-excursion
+            ;;   (beginning-of-line)
+            ;;   (org-element-at-point 'cached))
+            ))))))
+
 
 ;; binary search tree
 (defun nm-btree-new-node ()
@@ -103,6 +126,7 @@
 
 #+begin_src elisp
 (byte-compile #'nm-buffer-headings-reversed)
+(byte-compile #'nm-buffer-headings-reversed-cache)
 (byte-compile #'nm-btree-from-reversed)
 (byte-compile #'nm-btree-find-left)
 
@@ -125,13 +149,44 @@
 	     (let* ((scan-result1 (nm-buffer-headings-reversed buffer))
 		    (tree1 (nm-btree-from-reversed scan-result1)))
 	       tree1)))
+   (append '("scan using org-element-cache x10")
+           (progn
+             (let ((org-element-cache-sync-duration 1000.0))
+               (org-element-cache-reset)
+	       (org-element--cache-buffer-stealthly))
+	     (benchmark-run 10
+	       (nm-buffer-headings-reversed-cache buffer))))
+   (append '("populating the cache x1")
+	   (benchmark-run 1
+             (with-current-buffer buffer
+               ;; Disable async processing.
+               (let ((org-element-cache-sync-duration 1000.0))
+                 (org-element-cache-reset)
+	         (org-element--cache-buffer-stealthly)))))
    (append '("find random x10 000")
 	   (benchmark-run 10000
 	     (nm-btree-find-left tree (random lim))))
+   (append '("find random in org-element-cache x10 000")
+	   (benchmark-run 10000
+             (org-element-lineage
+	      (org-element--cache-find (random lim))
+              '(headlines) t)))
    (list "nodes" (cdr scan-result) "" "")))
 #+end_src
 
 #+RESULTS:
+| scan x10                                 |          1.693757321 | 0 |                 0.0 |
+| btree x10                                |          0.552682783 | 1 | 0.43690057000000593 |
+| scan+btree x10                           |   2.3174109880000002 | 0 |                 0.0 |
+| scan using org-element-cache x10         |   2.9140550789999997 | 0 |                 0.0 |
+| populating the cache x1                  |          10.83575134 | 9 |  3.9872376689999953 |
+| find random x10 000                      | 0.061988949999999994 | 0 |                 0.0 |
+| find random in org-element-cache x10 000 |          0.003262685 | 0 |                 0.0 |
+| nodes                                    |                16641 |   |                     |
+
+
+
+
 | scan x10            |   0.8611382689999999 | 0 |                 0.0 |
 | btree x10           |  0.07705962400000001 | 1 | 0.05648322199999978 |
 | scan+btree x10      |          0.940467238 | 1 | 0.05685373699999996 |

Diff finished.  Mon May 10 23:04:27 2021

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Use cache in org-up-heading-safe
  2021-05-07  2:08       ` Ihor Radchenko
  2021-05-08 11:28         ` Maxim Nikulin
@ 2021-05-15 11:58         ` Bastien
  1 sibling, 0 replies; 11+ messages in thread
From: Bastien @ 2021-05-15 11:58 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: Maxim Nikulin, emacs-orgmode

Hi Ihor,

Ihor Radchenko <yantar92@gmail.com> writes:

> From 08a175752b14f84767a71665773aa64807606af4 Mon Sep 17 00:00:00 2001
> Message-Id: <08a175752b14f84767a71665773aa64807606af4.1620316036.git.yantar92@gmail.com>
> From: Ihor Radchenko <yantar92@gmail.com>
> Date: Thu, 6 May 2021 14:13:20 +0800
> Subject: [PATCH] Use cache in org-up-heading-safe

I have now installed this patch as ba273278a to let more people test
it from master.

Thanks!

-- 
 Bastien


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Use cache in org-up-heading-safe
  2021-05-04 15:08 [PATCH] Use cache in org-up-heading-safe Ihor Radchenko
  2021-05-05 16:40 ` Maxim Nikulin
@ 2021-05-16  6:15 ` Bastien
  2021-05-16  6:36   ` Ihor Radchenko
  1 sibling, 1 reply; 11+ messages in thread
From: Bastien @ 2021-05-16  6:15 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 500 bytes --]

Hi,

Ihor Radchenko <yantar92@gmail.com> writes:

> While testing another patch for agenda fontification, I noticed that
> agenda can spend up to half!! time doing org-up-heading-safe. Mostly
> inside queries for inherited tags and properties.

I encounter a bug with this cache, it seems the buffer-local variable
`org--up-heading-cache' is not initialized in every buffer from which
org-agenda-list might need it.

I'm attaching the bugtrace.

Ihor, do you see what's happening here?

-- 
 Bastien

[-- Attachment #2: bugtrace.txt --]
[-- Type: text/plain, Size: 1034 bytes --]

Debugger entered--Lisp error: (void-variable org--up-heading-cache)
  org-up-heading-safe()
  org-block-todo-from-children-or-siblings-or-parent((:type todo-state-change :position 20793 :from todo :to done))
  org-entry-blocked-p()
  org-agenda--mark-blocked-entry()))
  org-agenda-finalize-entries((... ... ... ... ... ... ... ... ... ... ... ... ... ... ...) tags)
  org-tags-view((4) #("+Code+TODO={NEXT\\|STRT}" 11 23 (regexp t)))
  #f(compiled-function () #<bytecode -0xe5c4a74ae77f6cc>)()
  funcall(#f(compiled-function () #<bytecode -0xe5c4a74ae77f6cc>))
  (let ((org-agenda-category-filter-preset '("-ETL"))) (funcall '#f(compiled-function () #<bytecode -0xe5c4a74ae77f6cc>)))
  eval((let ((org-agenda-category-filter-preset '("-ETL"))) (funcall '#f(compiled-function () #<bytecode -0xe5c4a74ae77f6cc>))))
  org-agenda(nil "cc")
  (lambda nil (interactive) (org-agenda nil "cc"))()
  funcall-interactively((lambda nil (interactive) (org-agenda nil "cc")))
  command-execute((lambda nil (interactive) (org-agenda nil "cc")))
 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Use cache in org-up-heading-safe
  2021-05-16  6:15 ` Bastien
@ 2021-05-16  6:36   ` Ihor Radchenko
  2021-05-16  8:53     ` Bastien
  0 siblings, 1 reply; 11+ messages in thread
From: Ihor Radchenko @ 2021-05-16  6:36 UTC (permalink / raw)
  To: Bastien; +Cc: emacs-orgmode

Bastien <bzg@gnu.org> writes:

> Debugger entered--Lisp error: (void-variable org--up-heading-cache)
>   org-up-heading-safe()

> Ihor, do you see what's happening here?

This is very odd. `org--up-heading-cache' is supposed to be buffer-local
variable available in all buffers with default value of nil. Yet, the
backtrace suggests that variable does not exist. Can you see the
variable description when running describe-variable? Will the error
disappear if you eval-last-sexp on the defvar-local?

Can it be some strange interaction between .el, .elc, or even .eln
files compiled for different Org versions?

Best,
Ihor


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Use cache in org-up-heading-safe
  2021-05-16  6:36   ` Ihor Radchenko
@ 2021-05-16  8:53     ` Bastien
  0 siblings, 0 replies; 11+ messages in thread
From: Bastien @ 2021-05-16  8:53 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: emacs-orgmode

Hi Ihor,

Ihor Radchenko <yantar92@gmail.com> writes:

> Can it be some strange interaction between .el, .elc, or even .eln
> files compiled for different Org versions?

Mhh... probably -- I'll monitor this closely.

-- 
 Bastien


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-05-16  8:54 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-04 15:08 [PATCH] Use cache in org-up-heading-safe Ihor Radchenko
2021-05-05 16:40 ` Maxim Nikulin
2021-05-06 14:34   ` Ihor Radchenko
2021-05-06 17:02     ` Maxim Nikulin
2021-05-07  2:08       ` Ihor Radchenko
2021-05-08 11:28         ` Maxim Nikulin
2021-05-10 15:14           ` Ihor Radchenko
2021-05-15 11:58         ` Bastien
2021-05-16  6:15 ` Bastien
2021-05-16  6:36   ` Ihor Radchenko
2021-05-16  8:53     ` Bastien

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).