From: Ihor Radchenko <firstname.lastname@example.org> To: Maxim Nikulin <email@example.com> Cc: firstname.lastname@example.org Subject: Re: [PATCH] Use cache in org-up-heading-safe Date: Mon, 10 May 2021 23:14:14 +0800 [thread overview] Message-ID: <87eeeey62h.fsf@localhost> (raw) In-Reply-To: <email@example.com> [-- Attachment #1: Type: text/plain, Size: 3913 bytes --] Maxim Nikulin <firstname.lastname@example.org> writes: > I am trying to minimize number of regexp searches. Mostly it is applied > when information concerning multiple headings is required (agenda, > refile targets). It unlikely will get some benefits during interactive > calls related to single heading. > Having the tree, it is possible to instantly find heading for > *arbitrary* position in the buffer. Likely the next operation is goto to > the heading or to it parent and parsing the line for more detailed > properties. The latter is cacheable, structure for heading properties > can be expanded. Thanks for the explanation! I very much like this idea, though I do not think that it is going to be practical on actual large files. Your code runs for 0.25sec on my largest file: | scan x10 | 2.482179163 | 1 | 0.38996822200002157 | | btree x10 | 0.103329151 | 0 | 0.0 | | scan+btree x10 | 2.666915096 | 1 | 0.3657946569999808 | | find random x10 000 | 0.048668756 | 0 | 0.0 | | nodes | 16641 | | | > Since there is no operations as insertion or deletion of nodes from > tree, no complex code is required to implement balancing rotations. That > is why I think that avl-tree is an overkill. Yet, if we could keep the tree in sync with buffer modifications... And we can, using org-element-cache. It already uses AVL-tree and already syncs it automatically on buffer edits. The only problem is that it does not work with headlines. However, I wrote some code adding headline support to org-element-cache in https://github.com/yantar92/org. Some benchmarks: | scan x10 | 2.179845106 | 0 | 0.0 | | btree x10 | 0.474887597 | 1 | 0.364201286999986 | | scan+btree x10 | 2.318029902 | 0 | 0.0 | | scan using org-element-cache x10 | 2.8941826990000004 | 0 | 0.0 | | populating the cache x1 | 11.332366941 | 5 | 2.040159146999997 | | find random x10 000 | 0.053074866 | 0 | 0.0 | | find random in org-element-cache x10 000 | 0.007963375 | 0 | 0.0 | | nodes | 16641 | | | Scan through the file takes pretty much the same time with the btree approach. Sometimes more, sometimes less. And most of the time is really just re-search-forward. Find random showcases that avl-tree vs. btree really makes a difference on such large number of headings. On the worse side, initial population of the org-element-cache takes quite a bit of time. However, it is normally done asynchronously during idle time and does not really affect the user except the first loading (which might be further optimised if we store cache between Emacs sessions). Later, cache is updated automatically with buffer modifications. Moreover, the org-element-cache will already provide the parsed headlines, including titles (and tags and other useful staff). No need to match headline text with regexp. For me, the idea of storing headlines in cache looks promising. I can imagine many other Org functions using the cache after trivial modifications. At least org-get-tags, org-get-category, org-entry-get can be easily adapted to use the cache. What do you think? > See the attachment for experimental (thus almost untested) code. Likely > you will find code style quite ugly. I am not happy with 0.1 sec for a > moderately large file. It is close to the limit for comfortable > interactive operations. No worries about the style. It is testing anyway. I modified your code slightly for testing the org-element-cache. See the attached diff. Best, Ihor [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: diff-btree.diff --] [-- Type: text/x-diff, Size: 3548 bytes --] diff -u /tmp/nm-btree-1.org /tmp/nm-btree.org --- /tmp/nm-btree-1.org 2021-05-10 23:04:21.286395360 +0800 +++ /tmp/nm-btree.org 2021-05-10 23:02:17.586396326 +0800 @@ -1,3 +1,6 @@ +:PROPERTIES: +:ID: 0aba93db-491d-46f7-b952-e138ba179a12 +:END: #+begin_src elisp :results none @@ -38,6 +41,26 @@ (push props parents)))) (and headings (cons headings count))))))) +(defun nm-buffer-headings-reversed-cache (buffer) + (with-current-buffer buffer + (save-restriction + (save-excursion + (widen) + (goto-char (point-min)) + (let ((count 0) + (headings ()) + (parents ())) + (while (re-search-forward org-outline-regexp-bol nil t) + (let ((cached (org-element--cache-find (line-beginning-position)))) + (if (eq (org-element-property :begin cached) (line-beginning-position)) + (push cached headings) + (push (org-element-at-point) headings))) + + ;; (save-excursion + ;; (beginning-of-line) + ;; (org-element-at-point 'cached)) + )))))) + ;; binary search tree (defun nm-btree-new-node () @@ -103,6 +126,7 @@ #+begin_src elisp (byte-compile #'nm-buffer-headings-reversed) +(byte-compile #'nm-buffer-headings-reversed-cache) (byte-compile #'nm-btree-from-reversed) (byte-compile #'nm-btree-find-left) @@ -125,13 +149,44 @@ (let* ((scan-result1 (nm-buffer-headings-reversed buffer)) (tree1 (nm-btree-from-reversed scan-result1))) tree1))) + (append '("scan using org-element-cache x10") + (progn + (let ((org-element-cache-sync-duration 1000.0)) + (org-element-cache-reset) + (org-element--cache-buffer-stealthly)) + (benchmark-run 10 + (nm-buffer-headings-reversed-cache buffer)))) + (append '("populating the cache x1") + (benchmark-run 1 + (with-current-buffer buffer + ;; Disable async processing. + (let ((org-element-cache-sync-duration 1000.0)) + (org-element-cache-reset) + (org-element--cache-buffer-stealthly))))) (append '("find random x10 000") (benchmark-run 10000 (nm-btree-find-left tree (random lim)))) + (append '("find random in org-element-cache x10 000") + (benchmark-run 10000 + (org-element-lineage + (org-element--cache-find (random lim)) + '(headlines) t))) (list "nodes" (cdr scan-result) "" ""))) #+end_src #+RESULTS: +| scan x10 | 1.693757321 | 0 | 0.0 | +| btree x10 | 0.552682783 | 1 | 0.43690057000000593 | +| scan+btree x10 | 2.3174109880000002 | 0 | 0.0 | +| scan using org-element-cache x10 | 2.9140550789999997 | 0 | 0.0 | +| populating the cache x1 | 10.83575134 | 9 | 3.9872376689999953 | +| find random x10 000 | 0.061988949999999994 | 0 | 0.0 | +| find random in org-element-cache x10 000 | 0.003262685 | 0 | 0.0 | +| nodes | 16641 | | | + + + + | scan x10 | 0.8611382689999999 | 0 | 0.0 | | btree x10 | 0.07705962400000001 | 1 | 0.05648322199999978 | | scan+btree x10 | 0.940467238 | 1 | 0.05685373699999996 | Diff finished. Mon May 10 23:04:27 2021
next prev parent reply other threads:[~2021-05-10 15:11 UTC|newest] Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-05-04 15:08 Ihor Radchenko 2021-05-05 16:40 ` Maxim Nikulin 2021-05-06 14:34 ` Ihor Radchenko 2021-05-06 17:02 ` Maxim Nikulin 2021-05-07 2:08 ` Ihor Radchenko 2021-05-08 11:28 ` Maxim Nikulin 2021-05-10 15:14 ` Ihor Radchenko [this message] 2021-05-15 11:58 ` Bastien 2021-05-16 6:15 ` Bastien 2021-05-16 6:36 ` Ihor Radchenko 2021-05-16 8:53 ` Bastien
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: https://www.orgmode.org/ * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=87eeeey62h.fsf@localhost \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --subject='Re: [PATCH] Use cache in org-up-heading-safe' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Code repositories for project(s) associated with this inbox: https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).