From: Maxim Nikulin <manikulin@gmail.com>
To: emacs-orgmode@gnu.org
Subject: Re: [PATCH] Use cache in org-up-heading-safe
Date: Sat, 8 May 2021 18:28:41 +0700 [thread overview]
Message-ID: <s75sla$1fm$1@ciao.gmane.io> (raw)
In-Reply-To: <87o8dn2t3a.fsf@localhost>
[-- Attachment #1: Type: text/plain, Size: 3054 bytes --]
On 07/05/2021 09:08, Ihor Radchenko wrote:
> Maxim Nikulin writes:
>> Did you just replace gethash by avl-tree?
>
> Yes
>
>> Likely my idea is based on a
>> wrong assumption. I hoped that having positions of headers it is
>> possible to avoid jumps (goto-char ...) preceded or followed by regexp
>> matching almost completely. Previous header for arbitrary initial
>> position can be found using binary search through structure obtained
>> during scan.
>
> Sorry, I cannot follow what you mean. The call to goto-char in
> org-up-heading-safe is required by function docstring - we need to move
> point to the parent heading.
I am trying to minimize number of regexp searches. Mostly it is applied
when information concerning multiple headings is required (agenda,
refile targets). It unlikely will get some benefits during interactive
calls related to single heading.
For a file having 3000 headings, scanning though it takes ~0.1sec to get
the following properties: position, level, list of parents (with same
properties). Note that no expensive operations are performed like
cleaning up of heading title.
Having list of headings (and its length), it is possible to build a tree
for binary search in linear time. It takes ~0.01sec.
Having the tree, it is possible to instantly find heading for
*arbitrary* position in the buffer. Likely the next operation is goto to
the heading or to it parent and parsing the line for more detailed
properties. The latter is cacheable, structure for heading properties
can be expanded.
Hash works only for fixed set of positions, to use hash it is necessary
to find position of the heading at first. On the other hand, to take
advantage of binary tree, more substantial modification of code is required.
Since there is no operations as insertion or deletion of nodes from
tree, no complex code is required to implement balancing rotations. That
is why I think that avl-tree is an overkill.
See the attachment for experimental (thus almost untested) code. Likely
you will find code style quite ugly. I am not happy with 0.1 sec for a
moderately large file. It is close to the limit for comfortable
interactive operations.
>>> + (re-search-backward
>>> + (format "^\\*\\{1,%d\\} " level-up) nil t)
>>> + (funcall outline-level))))
>>
>> Unsure concerning the following optimization from the point of
>> readability and reliability in respect to future modifications. Outline
>> level can be derived from the length of matched string without the
>> funcall requiring extra regexp.
>
> I am not sure here. outline-level also consults outline-heading-alist,
> though I found no references to that variable in Org mode code.
> Otherwise, outline-level already reuses match-data. There is no regexp
> matching there.
Sorry. You are right. The function does what I proposed to write
explicitly. For some reason I believed that outline-level calls
something like looking-at. Maybe I checked it earlier and completely forgot.
[-- Attachment #2: nm-btree.org --]
[-- Type: text/plain, Size: 4764 bytes --]
#+begin_src elisp :results none
;; Heading properties
(defun nm-heading-properties-new (position level parents)
"Heading properties: (position . (level . parent))"
(cons position (cons level parents)))
(defun nm-heading-properties-level (props)
(cadr props))
(defun nm-heading-properties-pos (props)
(car props))
(defun nm-heading-properties-parents (props)
(cddr props))
(defun nm-heading-pos-lessp (value props)
(< value (nm-heading-properties-pos props)))
(defun nm-buffer-headings-reversed (buffer)
(with-current-buffer buffer
(save-restriction
(save-excursion
(widen)
(goto-char (point-min))
(let ((count 0)
(headings ())
(parents ()))
(while (re-search-forward org-outline-regexp-bol nil t)
(let* ((pos (match-beginning 0))
(level (- (match-end 0) pos 1)))
(while (and parents (>= (nm-heading-properties-level (car parents)) level))
(pop parents))
(setq count (1+ count))
(let ((props (nm-heading-properties-new pos level parents)))
(push props headings)
(push props parents))))
(and headings (cons headings count)))))))
;; binary search tree
(defun nm-btree-new-node ()
"((left right) . properties"
(cons (cons nil nil) nil))
(defun nm-btree-node-left (node)
(caar node))
(defun nm-btree-node-set-left (node child)
(setcar (car node) child))
(defun nm-btree-node-set-right (node child)
(setcdr (car node) child))
(defun nm-btree-node-right (node)
(cdar node))
(defun nm-btree-node-properties (node)
(cdr node))
(defun nm-btree-node-set-properties (node properties)
(setcdr node properties))
(defun nm-btree-from-reversed (scan-result)
(and
scan-result
(let* ((key-properties-list (car scan-result))
(length (cdr scan-result))
(head (nm-btree-new-node))
(queue (list (cons length head)))) ; list of (count . node)
(while queue
(let* ((item (pop queue))
(count (car item))
(node (cdr item)))
(cond
((eq count 1) ; leaf or only single child
(nm-btree-node-set-properties node (pop key-properties-list)))
((nm-btree-node-right node) ; right children completed
(nm-btree-node-set-properties node (pop key-properties-list))
(let ((left-node (nm-btree-new-node)))
(nm-btree-node-set-left node left-node)
(push (cons (1- count) left-node) queue)))
(t (let* ((right-count (/ (car item) 2))
(right-node (nm-btree-new-node)))
(nm-btree-node-set-right node right-node)
(setcar item (- count right-count))
(push item queue)
(push (cons right-count right-node) queue))))))
head)))
(defun nm-btree-find-left (tree value &optional cmp)
"Find last element not less than value"
(let ((cmp (or cmp #'nm-heading-pos-lessp))
(result nil))
(while tree
(setq tree (if (funcall cmp value (nm-btree-node-properties tree))
(nm-btree-node-left tree)
(setq result tree)
(nm-btree-node-right tree))))
(nm-btree-node-properties result)))
#+end_src
#+begin_src elisp
(byte-compile #'nm-buffer-headings-reversed)
(byte-compile #'nm-btree-from-reversed)
(byte-compile #'nm-btree-find-left)
(let* ((buffer "notes.org")
(scan-result (nm-buffer-headings-reversed buffer))
(tree (nm-btree-from-reversed scan-result))
(lim (with-current-buffer buffer
(save-restriction
(widen)
(point-max)))))
(list
(append '("scan x10")
(benchmark-run 10
(nm-buffer-headings-reversed buffer)))
(append '("btree x10")
(benchmark-run 10
(nm-btree-from-reversed scan-result)))
(append '("scan+btree x10")
(benchmark-run 10
(let* ((scan-result1 (nm-buffer-headings-reversed buffer))
(tree1 (nm-btree-from-reversed scan-result1)))
tree1)))
(append '("find random x10 000")
(benchmark-run 10000
(nm-btree-find-left tree (random lim))))
(list "nodes" (cdr scan-result) "" "")))
#+end_src
#+RESULTS:
| scan x10 | 0.8611382689999999 | 0 | 0.0 |
| btree x10 | 0.07705962400000001 | 1 | 0.05648322199999978 |
| scan+btree x10 | 0.940467238 | 1 | 0.05685373699999996 |
| find random x10 000 | 0.047712096999999995 | 0 | 0.0 |
| nodes | 3413 | | |
Without ~byte-compile~
| scan x10 | 1.2031535999999998 | 4 | 0.22845906700000018 |
| btree x10 | 0.498214241 | 6 | 0.34067464299999894 |
| scan+btree x10 | 1.7026304230000002 | 10 | 0.5686926149999998 |
| find random x10 000 | 0.08789912700000001 | 0 | 0.0 |
| nodes | 3413 | | |
next prev parent reply other threads:[~2021-05-08 11:29 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-04 15:08 [PATCH] Use cache in org-up-heading-safe Ihor Radchenko
2021-05-05 16:40 ` Maxim Nikulin
2021-05-06 14:34 ` Ihor Radchenko
2021-05-06 17:02 ` Maxim Nikulin
2021-05-07 2:08 ` Ihor Radchenko
2021-05-08 11:28 ` Maxim Nikulin [this message]
2021-05-10 15:14 ` Ihor Radchenko
2021-05-15 11:58 ` Bastien
2021-05-16 6:15 ` Bastien
2021-05-16 6:36 ` Ihor Radchenko
2021-05-16 8:53 ` Bastien
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.orgmode.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='s75sla$1fm$1@ciao.gmane.io' \
--to=manikulin@gmail.com \
--cc=emacs-orgmode@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).