[-- Attachment #1: Type: text/plain, Size: 1543 bytes --] Reviewing my attempt to speedup collecting of refile targets https://orgmode.org/list/s209r8$16en$1@ciao.gmane.io/ I have realized that refile cache is unreliable. With specific customization, cache content and thus refile targets depend on the function called first: `org-refile' ([C-u] C-c C-w) or `org-goto' (C-u C-c C-w). I decided to try to provide minimal example and steps to reproduce in the form of a test marked as expected failure. Certainly cache should be enabled (org-refile-use-cache t) `org-refile-targets' should have (nil . (:maxlevel . 5)) for the affected buffer. The value of the following setting is intentionally set to alternative value in comparison to `org-goto' code (org-refile-use-outline-path nil) To see the issue interactively, you likely should set (org-outline-path-complete-in-steps nil) Clean cache (C-u C-u C-u C-c C-w) and try jumping using C-u C-c C-w and C-u C-c C-j in various order. Use TAB completion to see targets. To make difference more apparent, define `org-refile-target-verify-function' to filter-out some headings. My expectation that each command has list of targets formatted accordingly to user setting or `org-goto' internal overrides. Actually lists of target are the same since they share cache entry. I suppose, cache keys should include values of all parameters affecting filtering and formatting, not only regexp for heading selection. However I have no idea how to derive some value suitable for cache key from `org-refile-target-verify-function'. [-- Attachment #2: 0001-testing-lisp-test-org.el-Refile-cache-failure.patch --] [-- Type: text/x-patch, Size: 2431 bytes --] From d23b10d658539a4646ef015ac2660e1f8c8e7e1b Mon Sep 17 00:00:00 2001 From: Max Nikulin <manikulin@gmail.com> Date: Wed, 28 Apr 2021 22:30:04 +0700 Subject: [PATCH] testing/lisp/test-org.el: Refile cache failure testing/lisp/test-org.el (test-org/org-refile-cache): Add expected failure for cache content discrepancy when it is populated by `org-refile' or by `org-goto' (C-u C-c C-j). Usage of just regexp for heading filtering as cache key is unreliable since other settings affects selected entries (`org-refile-target-verify-function') or content of entries (`org-refile-use-outline-path'). However most of users have no chance to notice collision since they do not call both functions interchangeably or have `org-refile-targets' different from the value in `org-goto' code. --- testing/lisp/test-org.el | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/testing/lisp/test-org.el b/testing/lisp/test-org.el index 9f0ede1b9..f44cd76a6 100644 --- a/testing/lisp/test-org.el +++ b/testing/lisp/test-org.el @@ -6394,6 +6394,30 @@ Paragraph<point>" (org-refile-targets `((nil :level . 1)))) (member (buffer-name) (mapcar #'car (org-refile-get-targets))))))) +(ert-deftest test-org/org-refile-cache () + "Demostrate a cache conflict of `org-goto' and `org-refile'. + +Notice that `org-refile-target-verify-function' can cause more severe +divergence of cache content generated by these functions." + :expected-result :failed + (org-test-with-temp-text "* H1\n** H2\n" + (let ((org-refile-use-cache t) + ;; Regexp serves as cache key, so targets should be the same + ;; as in `org-goto' code. + (org-refile-targets '((nil . (:maxlevel . 5)))) + ;; This value is opposite to the one in `org-goto' code. + (org-refile-use-outline-path nil) + (targets-refile)) + ;; If jumping using `org-refile' (C-u C-c C-w)... + (setq targets-refile (mapcar #'car (org-refile-get-targets))) + (org-refile-cache-clear) + ;; Tune settings to simulate `org-goto' (C-u C-c C-j). + (let ((org-refile-use-outline-path t)) + ;; Value is discarded, call just to populate the cache. + (org-refile-get-targets)) + ;; Targets got by `org-refile'. + (let ((targets-goto (mapcar #'car (org-refile-get-targets)))) + (should (equal targets-refile targets-goto)))))) \f ;;; Sparse trees -- 2.25.1
long ago i used to use the refile cache. i think it is probably not widely used, or maybe even not at all. the reason i stopped was behavior that sounds similar to your description. such as, it would produce a set of results that did not match the calling conditions. i cannot reach the thread you point to, however. what i did was use ido-hacks to speed up ido and restrict to :goto: and :refile: tagged entries for common targets. this worked for me; it is fast despite my huge set of olpaths. it would be great if the cache were unnecessary. that might nt work for you, but perhaps this provides confirmation or ideas for you. On 4/28/21, Maxim Nikulin <manikulin@gmail.com> wrote: > Reviewing my attempt to speedup collecting of refile targets > https://orgmode.org/list/s209r8$16en$1@ciao.gmane.io/ > I have realized that refile cache is unreliable. > > With specific customization, cache content and thus refile targets > depend on the function called first: `org-refile' ([C-u] C-c C-w) or > `org-goto' (C-u C-c C-w). > > I decided to try to provide minimal example and steps to reproduce in > the form of a test marked as expected failure. > > Certainly cache should be enabled > (org-refile-use-cache t) > `org-refile-targets' should have (nil . (:maxlevel . 5)) for the > affected buffer. The value of the following setting is intentionally set > to alternative value in comparison to `org-goto' code > (org-refile-use-outline-path nil) > To see the issue interactively, you likely should set > (org-outline-path-complete-in-steps nil) > > Clean cache (C-u C-u C-u C-c C-w) and try jumping using C-u C-c C-w and > C-u C-c C-j in various order. Use TAB completion to see targets. > > To make difference more apparent, define > `org-refile-target-verify-function' to filter-out some headings. > > My expectation that each command has list of targets formatted > accordingly to user setting or `org-goto' internal overrides. Actually > lists of target are the same since they share cache entry. > > I suppose, cache keys should include values of all parameters affecting > filtering and formatting, not only regexp for heading selection. > > However I have no idea how to derive some value suitable for cache key > from `org-refile-target-verify-function'. > -- The Kafka Pandemic Please learn what misopathy is. https://thekafkapandemic.blogspot.com/2013/10/why-some-diseases-are-wronged.html
Samuel Wales <samologist@gmail.com> writes: > long ago i used to use the refile cache. i think it is probably not > widely used, or maybe even not at all. At least, I do use it. A lot. I rely on it. I do not observe the breakage as described in the first message, mostly because I use refile cache exclusively for org-refile. Yet, I often see "Please regenerate the refile cache with C-0 C-c C-w" when I do a lot of batch refiling. A faster, more reliable, caching would be certainly welcome. In general, various parts of Org mode code base implement different types of caches in parallel. I am aware at least about org-element, org-scan-tags, org-agenda, org-refile, and org-goto. Probably Org mode could benefit from unified caching mechanism? A good implementation coming to my mind is org-ql [1]. It implements tag caches, outline path caches, and can even be used to cache results of an arbitrary function with point at heading. Basically, all (except org-element) types of caches Org mode uses now are already implemented in org-ql in unified way. WDYT? [1] https://github.com/alphapapa/org-ql Best, Ihor
thanks for reporting that you use it. would it be more useful if it automaticaly generated the cache instead of telling you to runt he command to do so? if a solid, perhaps unified, cache existed, would org-id use it too? On 4/28/21, Ihor Radchenko <yantar92@gmail.com> wrote: > Samuel Wales <samologist@gmail.com> writes: > >> long ago i used to use the refile cache. i think it is probably not >> widely used, or maybe even not at all. > > At least, I do use it. A lot. I rely on it. > > I do not observe the breakage as described in the first message, mostly > because I use refile cache exclusively for org-refile. Yet, I often see > "Please regenerate the refile cache with C-0 C-c C-w" when I do a lot of > batch refiling. A faster, more reliable, caching would be certainly > welcome. > > In general, various parts of Org mode code base implement different > types of caches in parallel. I am aware at least about org-element, > org-scan-tags, org-agenda, org-refile, and org-goto. Probably Org mode > could benefit from unified caching mechanism? A good implementation > coming to my mind is org-ql [1]. It implements tag caches, outline path > caches, and can even be used to cache results of an arbitrary function > with point at heading. Basically, all (except org-element) types of > caches Org mode uses now are already implemented in org-ql in unified > way. > > WDYT? > > [1] https://github.com/alphapapa/org-ql > > Best, > Ihor > -- The Kafka Pandemic Please learn what misopathy is. https://thekafkapandemic.blogspot.com/2013/10/why-some-diseases-are-wronged.html
On 29/04/2021 08:29, Ihor Radchenko wrote: > I do not observe the breakage as described in the first message, > mostly because I use refile cache exclusively for org-refile. Maybe I could avoid org-goto as well. Actual reason to use it was that it does not ask for file name as the first step in the case of (org-refile-use-outline-path 'file). It took enough time to me to realize how to jump/refile to non-leaf heading without such settings. > A faster, more reliable, caching would > be certainly welcome. Just an idea. Is it possible to implement some specific text property for heading lines, namely cleaned out heading text (no cookies, tags, hidden parts of links), that is updated after each editing (likely something like font locks)? It could significantly speed up scanning buffer for goto/refile targets. Unfortunately it would not help for files that have not opened yet. > I am aware at least about org-element, > org-scan-tags, org-agenda, org-refile, and org-goto. To be clear, org-refile and org-goto share the same cache and it is the source of the problem. > A good > implementation coming to my mind is org-ql [1]. I have not looked into the code yet, so I am not ready to discuss it. At least Adam should agree to submit patches to org and Bastien and Nicolas should agree to support it. On 29/04/2021 08:34, Samuel Wales wrote: > > would it be more useful if it automaticaly generated the cache instead > of telling you to runt he command to do so? I am surprised as well that cache is not just regenerated when org-refile detects that it is stale. Can it be that under certain circumstances it just causes delay and, even with updated cache, user action fails anyway? On 29/04/2021 07:50, Samuel Wales wrote: > i cannot reach the thread you point to, however. Actually namely your response inspired me to look closer at the implementation of collection of refile targets. Is there a particular form of Message-Id that is convenient for you? The date of initial message is 2021-03-02 (in UTC).
Samuel Wales <samologist@gmail.com> writes: > would it be more useful if it automaticaly generated the cache instead > of telling you to runt he command to do so? I think so. To be frank, I do not understand the reason why it is not done by default. > if a solid, perhaps unified, cache existed, would org-id use it too? Sure. Why not? I imagine such cache can store the following info: org files in system -> per-file cache -> per-heading cache -> ... Best, Ihor
Maxim Nikulin <manikulin@gmail.com> writes: > Maybe I could avoid org-goto as well. Actual reason to use it was that > it does not ask for file name as the first step in the case of > (org-refile-use-outline-path 'file). It took enough time to me to > realize how to jump/refile to non-leaf heading without such settings. Note that Org mode has integration with imenu. It is an alternative you can use to jump around current buffer. > Just an idea. Is it possible to implement some specific text property > for heading lines, namely cleaned out heading text (no cookies, tags, > hidden parts of links), that is updated after each editing (likely > something like font locks)? It could significantly speed up scanning > buffer for goto/refile targets. Unfortunately it would not help for > files that have not opened yet. There is org-outline-path-cache used by org-get-outline-cache. It avoids computing parent's outline path multiple times, which is already a great improvement. For the cleaned heading text, I do not think that re-calculating the heading text on each change is a good idea. It may degrade typing latency. Yet, an acceptable approach could be simply invalidating cache for the changed headings. Then, outline paths can be re-calculated on changed headings when needed. This should be an improvement compared to blanket (setq org-outline-path-cache nil) in org-refile-get-targets. > To be clear, org-refile and org-goto share the same cache and it is the > source of the problem. I would say that the problem is rather org-goto (ab)using org-refile incorrectly. I have the following piece of elisp using org-refile mechanism for headline selection: (let ((org-refile-history (append saved-id parent-project)) (org-refile-cache nil) (org-refile-target-verify-function #'org-att-id-skip-function)) (let ((prompt-ans (org-refile-get-location "Link to attachment from"))) (prog1 (org-id-get (seq-find #'markerp prompt-ans) 'create)))) org-refile-cache can be simply let-bound to nil (in my case) or alternative cache variable, if the alternative cache should persist. Best, Ihor
On 29/04/2021 21:12, Ihor Radchenko wrote: > There is org-outline-path-cache used by org-get-outline-cache. It avoids > computing parent's outline path multiple times, which is already a great > improvement. Curiously my experience is that avoiding this lazy cache with backtracking and maintaining custom structure during sequential scan of the buffer works several times faster. However it is appropriate time to populate the cache you mentioned. Unfortunately it is still necessary to cleanup heading text, and it consumes significant time. > org-refile-cache can be simply let-bound to nil (in my case) or > alternative cache variable, if the alternative cache should persist. Thank you for the idea. But I still hope that both function could use the same cache.
Maxim Nikulin <manikulin@gmail.com> writes: > Curiously my experience is that avoiding this lazy cache with > backtracking and maintaining custom structure during sequential scan of > the buffer works several times faster. My experience is exactly opposite. Or maybe I miss something. Can you elaborate? > ... However it is appropriate time to > populate the cache you mentioned. Unfortunately it is still necessary to > cleanup heading text, and it consumes significant time. Did you do any benchmarks? I just tried Outline path without cache: (benchmark-run 1 (goto-char (point-min)) (while (re-search-forward "^\\*+" nil t) (org-get-outline-path t nil))) => (6.051079914 1 0.2864724879999869) Outline path with cache: (benchmark-run 1 (goto-char (point-min)) (while (re-search-forward "^\\*+" nil t) (org-get-outline-path t nil))) => (1.658461165 0 0.0) Just cleanup heading text: (benchmark-run 1 (goto-char (point-min)) (while (re-search-forward "^\\*+" nil t) (let ((case-fold-search nil)) (looking-at org-complex-heading-regexp) (if (not (match-end 4)) "" ;; Remove statistics cookies. (org-trim (org-link-display-format (replace-regexp-in-string "\\[[0-9]+%\\]\\|\\[[0-9]+/[0-9]+\\]" "" (match-string-no-properties 4)))))))) => (0.013364877 0 0.0) Best, Ihor
On 29/04/2021 23:08, Ihor Radchenko wrote: > Maxim Nikulin writes: > >> Curiously my experience is that avoiding this lazy cache with >> backtracking and maintaining custom structure during sequential scan of >> the buffer works several times faster. > > My experience is exactly opposite. Or maybe I miss something. Can you > elaborate? My benchmarks may be incorrect due to for development version I did not compile files. I did not purged outline path cache as well. https://orgmode.org/list/s1qola$158l$1@ciao.gmane.io/ > Outline path without cache: > > (benchmark-run 1 > (goto-char (point-min)) > (while (re-search-forward "^\\*+" nil t) > (org-get-outline-path t nil))) => (6.051079914 1 0.2864724879999869) > > Outline path with cache: > > (benchmark-run 1 > (goto-char (point-min)) > (while (re-search-forward "^\\*+" nil t) > (org-get-outline-path t nil))) => (1.658461165 0 0.0) ^ t I suppose. I agree with such test. Notice however the following patch (warning: :level and :max-level a cached with the same key) https://orgmode.org/list/s209r8$16en$1@ciao.gmane.io/ Avoiding call to org-get-outline-path and using custom structure during single pass scan through the buffer allowed to significantly improve performance. > Just cleanup heading text: > > (benchmark-run 1 > (goto-char (point-min)) > (while (re-search-forward "^\\*+" nil t) > (let ((case-fold-search nil)) > (looking-at org-complex-heading-regexp) > (if (not (match-end 4)) "" > ;; Remove statistics cookies. > (org-trim > (org-link-display-format > (replace-regexp-in-string > "\\[[0-9]+%\\]\\|\\[[0-9]+/[0-9]+\\]" "" > (match-string-no-properties 4)))))))) => (0.013364877 0 0.0) I may be wrong with the following statement. Attempt to profile org-refile-get-targets could give quite different results. I have seen a note that Emacs use internally a cache for only 5 compiled regular expressions. Just one extra regexp and every matching function require compiling of its regexp just wiped from the cache. It is a time consuming procedure. I am unsure whether you added all regexps used (directly or through function calls) by inner loop of org-refile-get-targets.
Ihor Radchenko <yantar92@gmail.com> writes:
> Samuel Wales <samologist@gmail.com> writes:
>
>> would it be more useful if it automaticaly generated the cache instead
>> of telling you to runt he command to do so?
>
> I think so. To be frank, I do not understand the reason why it is not
> done by default.
>
>> if a solid, perhaps unified, cache existed, would org-id use it too?
>
> Sure. Why not? I imagine such cache can store the following info:
>
> org files in system -> per-file cache -> per-heading cache -> ...
>
I suspect the reason it is not done automatically is that getting that
right for all use cases is very hard to do without adding adverse impact
to performance. A cache which is marked as 'dirty' too often results in
too frequent cache refresh operations, but at the same time, determining
what changes in an org file actually invalidate the cache can be a
process intensive operation. Allowing the user to force cache refresh
when needed is likely a reasonable compromise.
I recall having a lot of trouble getting org-refile to work well for me.
I use it a lot, but it was so long ago, I don't recall how I got to my
final configuration (I think I may have modified my workflow to work
better with what I was able to get working reasonably reliably and
efficiently). I now tend to refile to a fairly static set of paths, so
all works OK.
Sorry I cannot provide anything more substantial. I do understand your
frustration, but not sure what the right fix is. I can see having a
cache which is automatically refreshed when necessary will be
problematic to get working for all use cases. Having the ability to turn
automatic refresh on/off and having the ability to manually force a
refresh will likely always be required.
--
Tim Cross
some users of org might not know that org-refile can go to the location without refiling anything if you pass a c-u arg. thus, org-goto is not the only header-selecting goto mechanism; there is also refile goto. [and imenu, helm, etc.] i use refile-goto and find it satisfying for my purposes. the selection mechanism is exactly the same as for org-refile.
On 29/04/2021 23:08, Ihor Radchenko wrote:
> Did you do any benchmarks? I just tried
>
> Outline path without cache:
I have expanded your tests to make them more close to org-get-outline-path
org-get-outline-path without cache
| 5.459114059 | 12 | 1.2053580009999987 |
org-get-outline-path with cache
| 2.166378543 | 5 | 0.5011187770000003 |
Custom code to track outline path during scan
| 0.718288205 | 4 | 0.4005034349999992 |
The same without (org-trim ...) stuff
| 0.208798040 | 0 | 0.0 |
From my point of view avoiding org-get-outline-path allows to
significantly improve performance. Removing cookies, etc. is far from
being negligible.
Test file is org-manual.org with 404 headings. Since it is not so much,
I set minimal CPU frequency.
Notice that outline cache is cleared before each scan to simulate the
case when org-goto is invoked just after emacs start or after
adding/removing some lines in the beginning of the file. Actually it is
a problem of outline cache that it easily became stale due to e.g. extra
line close to buffer start.
Details:
#+begin_example bash
for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ;
do
echo powersave >"$i" ;
done
#+end_example
#+begin_src elisp
(byte-compile-file (expand-file-name "~/src/emacs/org-mode/lisp/org.el"))
(defun nm-count-headings (headings)
(let ((n 0))
(dolist (h headings)
(setq n (+ n (length h))))
(list (length headings) n)))
#+end_src
#+RESULTS:
: nm-count-headings
#+begin_src elisp
(defun ir-tst-scan (buffer use-cache)
(with-current-buffer buffer
(goto-char 1)
(let ((result nil))
(while (re-search-forward "^\\*+[ \t]" nil t)
(beginning-of-line)
(let ((case-fold-search nil))
(looking-at org-complex-heading-regexp))
(push (org-get-outline-path t use-cache) result)
(end-of-line))
result)))
(byte-compile 'ir-tst-scan)
(mapcar
(lambda (use-cache)
(let ((headings))
(append
(list use-cache)
(benchmark-run 10
(progn
(setq org-outline-path-cache nil)
(setq headings
(ir-tst-scan "org-manual.org" use-cache))))
(nm-count-headings headings))))
'(nil t))
#+end_src
#+RESULTS:
| nil | 5.459114059 | 12 | 1.2053580009999987 | 404 | 1069 |
| t | 2.166378543 | 5 | 0.5011187770000003 | 404 | 1069 |
| use-cache | benchmark | | | headings | outline
components |
#+begin_src elisp
(defun nm-tst-scan (buffer clean-headers)
(with-current-buffer buffer
(goto-char 1)
(let ((result nil))
(let ((outline-path))
(while (re-search-forward "^\\*+[ \t]" nil t)
(beginning-of-line)
(let ((case-fold-search nil))
(looking-at org-complex-heading-regexp))
(let ((heading (match-string-no-properties 4))
(heading-level (length (match-string-no-properties 1))))
(while (and outline-path (<= heading-level (caar outline-path)))
(pop outline-path))
(push (cons heading-level
(if (not heading)
""
(if (not clean-headers)
heading
(org-trim
(org-link-display-format
(replace-regexp-in-string
"\\[[0-9]+%\\]\\|\\[[0-9]+/[0-9]+\\]" ""
heading))))))
outline-path))
(end-of-line)
(push (mapcar #'cdr outline-path) result)))
result)))
(byte-compile 'nm-tst-scan)
(mapcar
(lambda (clean-headers)
(let ((headings))
(append
(list clean-headers)
(benchmark-run 10
(progn
(setq org-outline-path-cache nil)
(setq headings
(nm-tst-scan "org-manual.org" clean-headers))))
(nm-count-headings headings))))
'(nil t))
#+end_src
#+RESULTS:
| nil | 0.20879804000000002 | 0 | 0.0 | 404 | 1069 |
| t | 0.718288205 | 4 | 0.4005034349999992 | 404 | 1069 |
| clean-headers | benchmark | | | headings |
outline components |
On 29/04/2021 23:08, Ihor Radchenko wrote: > > My experience is exactly opposite. Or maybe I miss something. Can you > elaborate? Some additions. org-outline-path-cache is used solely by org-refile-get-targets (maybe there are some calls in other packages) but it efficiency is questionable. It was not clear for me earlier that the cache is reset before each scan through a buffer. So if org-refile-use is disabled, org-outline-path-cache from previous run of org-refile or org-goto is not used as well. A query to org-outline-path-cache requires at least one backward search and hash lookup. During sequential scan in org-refile-get-targets it is enough to have previous heading path to update it when new heading is found. I think, org-outline-path-cache should be deprecated. > Just cleanup heading text: I have realized what is wrong with this benchmark. It runs so fast because it matches no headings, so it never spent time for cleaning them up. > (benchmark-run 1 > (goto-char (point-min)) > (while (re-search-forward "^\\*+" nil t) > (let ((case-fold-search nil)) (beginning-of-line) should be added before next call and (end-of-line) somewhere later in while body. > (looking-at org-complex-heading-regexp) > (if (not (match-end 4)) "" > ;; Remove statistics cookies. > (org-trim > (org-link-display-format > (replace-regexp-in-string > "\\[[0-9]+%\\]\\|\\[[0-9]+/[0-9]+\\]" "" > (match-string-no-properties 4)))))))) => (0.013364877 0 0.0) On 29/04/2021 21:12, Ihor Radchenko wrote: > For the cleaned heading text, I do not think that re-calculating the > heading text on each change is a good idea. It may degrade typing > latency. Yet, an acceptable approach could be simply invalidating cache > for the changed headings. Then, outline paths can be re-calculated on > changed headings when needed. I agree that it is enough to invalidate cleaned heading on edit to refresh it in org-refile-get-targets. On the other hand, I still prefer text properties since they could be fetched even if some lines have been added or removed before. Position-based cache is useless in such cases. Concerning typing latency, it should be postponed and resumed when no new edits is performed for certain period of time (~1s). However I am unsure if it is possible to accurately track all affected lines since later changes can add/remove lines before the line scheduled for invalidation. On 30/04/2021 02:17, Tim Cross wrote: > I suspect the reason it is not done automatically is that getting that > right for all use cases is very hard to do without adding adverse impact > to performance. A cache which is marked as 'dirty' too often results in > too frequent cache refresh operations, but at the same time, determining > what changes in an org file actually invalidate the cache can be a > process intensive operation. I believed that idea risen in this thread was to regenerate cache instead of spitting "Please regenerate the refile cache with `C-0 C-c C-w'" leaving more tricky cases for the user.
Maxim Nikulin <manikulin@gmail.com> writes: > Some additions. org-outline-path-cache is used solely by > org-refile-get-targets (maybe there are some calls in other packages) > but it efficiency is questionable. It was not clear for me earlier that > the cache is reset before each scan through a buffer. So if > org-refile-use is disabled, org-outline-path-cache from previous run of > org-refile or org-goto is not used as well. A query to > org-outline-path-cache requires at least one backward search and hash > lookup. During sequential scan in org-refile-get-targets it is enough to > have previous heading path to update it when new heading is found. I > think, org-outline-path-cache should be deprecated. At least helm-org-ql and helm-org are using `org-get-outline-path' with cache. I do think it is useful. Though, indeed, not as fast as single-pass scan. Probably, we can rewrite the code of `org-get-outline-path' instead of deprecating it? >> Just cleanup heading text: > > I have realized what is wrong with this benchmark. It runs so fast > because it matches no headings, so it never spent time for cleaning them up. Oh. You are right. I tested with fixed benchmark and it is indeed much slower. > On 29/04/2021 21:12, Ihor Radchenko wrote: >> For the cleaned heading text, I do not think that re-calculating the >> heading text on each change is a good idea. It may degrade typing >> latency. Yet, an acceptable approach could be simply invalidating cache >> for the changed headings. Then, outline paths can be re-calculated on >> changed headings when needed. > > I agree that it is enough to invalidate cleaned heading on edit to > refresh it in org-refile-get-targets. On the other hand, I still prefer > text properties since they could be fetched even if some lines have been > added or removed before. Position-based cache is useless in such cases. Note that `org-refresh-category-properties' is using pretty much same idea. However, there is no automatic invalidation and one needs to run org-refresh-category-properties to ensure that cache is up to date. Alternatively, markers can be used here. They will not be invalidated by changes in buffer. > Concerning typing latency, it should be postponed and resumed when no > new edits is performed for certain period of time (~1s). However I am > unsure if it is possible to accurately track all affected lines since > later changes can add/remove lines before the line scheduled for > invalidation. As one option, we can accumulate the changed regions in some variable and process them on timer. As I remember, org-element-cache is already doing something similar. In fact, org-element-cache might be modified to handle outline-path as well. The problem is that org-element-cache has some difficult bugs. I tried using it multiple times and ran into issues I was not able to debug. I may work on this in future. Best, Ihor
Tim Cross <theophilusx@gmail.com> writes:
> I suspect the reason it is not done automatically is that getting that
> right for all use cases is very hard to do without adding adverse impact
> to performance. A cache which is marked as 'dirty' too often results in
> too frequent cache refresh operations, but at the same time, determining
> what changes in an org file actually invalidate the cache can be a
> process intensive operation. Allowing the user to force cache refresh
> when needed is likely a reasonable compromise.
Not really. `helm-org-ql' uses cache that is invalidated by _any_ change
in buffer + updating cache only when results are queried. Performance is
much better compared with built-in Org solutions.
Also, there is `org-element-use-cache' doing more granular cache
invalidation. You can enable it and test the performance. It is not
really degraded (except some hard-to-catch bugs existing in the
org-element-cache code).
Best,
Ihor
On 02/05/2021 13:59, Ihor Radchenko wrote: > Maxim Nikulin writes: > >> Some additions. org-outline-path-cache is used solely by >> org-refile-get-targets (maybe there are some calls in other packages) >> but it efficiency is questionable. It was not clear for me earlier that >> the cache is reset before each scan through a buffer. So if >> org-refile-use is disabled, org-outline-path-cache from previous run of >> org-refile or org-goto is not used as well. A query to >> org-outline-path-cache requires at least one backward search and hash >> lookup. During sequential scan in org-refile-get-targets it is enough to >> have previous heading path to update it when new heading is found. I >> think, org-outline-path-cache should be deprecated. > > At least helm-org-ql and helm-org are using `org-get-outline-path' with > cache. I do think it is useful. Though, indeed, not as fast as > single-pass scan. Probably, we can rewrite the code of > `org-get-outline-path' instead of deprecating it? helm-org-ql.el that is a part of org-ql does not use cache when calling org-get-outline-path. helm-org performs sequential scan similar to org-refile-get-targets. Maybe a new function should be introduced to address such case. On 29/04/2021 21:12, Ihor Radchenko wrote: > Note that Org mode has integration with imenu. It is an alternative you > can use to jump around current buffer. I do not see any advantages of imenu over jumping using org-goto or org-refile with enabled cache.
Maxim Nikulin <manikulin@gmail.com> writes:
> helm-org-ql.el that is a part of org-ql does not use cache when calling
> org-get-outline-path. helm-org performs sequential scan similar to
> org-refile-get-targets.
Hmm. You are right. But they could. I myself ran into an issue with
helm-org-ql exactly because formatting displayed headings was slow. If
org-refile-use-cache is deprecated, there would be no option to use
cache at all.
Best,
Ihor