emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Maxim Nikulin <manikulin@gmail.com>
To: emacs-orgmode@gnu.org
Subject: Re: [PATCH] Use cache in org-up-heading-safe
Date: Sat, 8 May 2021 18:28:41 +0700	[thread overview]
Message-ID: <s75sla$1fm$1@ciao.gmane.io> (raw)
In-Reply-To: <87o8dn2t3a.fsf@localhost>

[-- Attachment #1: Type: text/plain, Size: 3054 bytes --]

On 07/05/2021 09:08, Ihor Radchenko wrote:
> Maxim Nikulin writes:
>> Did you just replace gethash by avl-tree?
> 
> Yes
> 
>> Likely my idea is based on a
>> wrong assumption. I hoped that having positions of headers it is
>> possible to avoid jumps (goto-char ...) preceded or followed by regexp
>> matching almost completely. Previous header for arbitrary initial
>> position can be found using binary search through structure obtained
>> during scan.
> 
> Sorry, I cannot follow what you mean. The call to goto-char in
> org-up-heading-safe is required by function docstring - we need to move
> point to the parent heading.

I am trying to minimize number of regexp searches. Mostly it is applied 
when information concerning multiple headings is required (agenda, 
refile targets). It unlikely will get some benefits during interactive 
calls related to single heading.

For a file having 3000 headings, scanning though it takes ~0.1sec to get 
the following properties: position, level, list of parents (with same 
properties). Note that no expensive operations are performed like 
cleaning up of heading title.

Having list of headings (and its length), it is possible to build a tree 
for binary search in linear time. It takes ~0.01sec.

Having the tree, it is possible to instantly find heading for 
*arbitrary* position in the buffer. Likely the next operation is goto to 
the heading or to it parent and parsing the line for more detailed 
properties. The latter is cacheable, structure for heading properties 
can be expanded.

Hash works only for fixed set of positions, to use hash it is necessary 
to find position of the heading at first. On the other hand, to take 
advantage of binary tree, more substantial modification of code is required.

Since there is no operations as insertion or deletion of nodes from 
tree, no complex code is required to implement balancing rotations. That 
is why I think that avl-tree is an overkill.

See the attachment for experimental (thus almost untested) code. Likely 
you will find code style quite ugly. I am not happy with 0.1 sec for a 
moderately large file. It is close to the limit for comfortable 
interactive operations.

>>> +	                    (re-search-backward
>>> +                             (format "^\\*\\{1,%d\\} " level-up) nil t)
>>> +	                    (funcall outline-level))))
>>
>> Unsure concerning the following optimization from the point of
>> readability and reliability in respect to future modifications. Outline
>> level can be derived from the length of matched string without the
>> funcall requiring extra regexp.
> 
> I am not sure here. outline-level also consults outline-heading-alist,
> though I found no references to that variable in Org mode code.
> Otherwise, outline-level already reuses match-data. There is no regexp
> matching there.

Sorry. You are right. The function does what I proposed to write 
explicitly. For some reason I believed that outline-level calls 
something like looking-at. Maybe I checked it earlier and completely forgot.


[-- Attachment #2: nm-btree.org --]
[-- Type: text/plain, Size: 4764 bytes --]


#+begin_src elisp :results none

  ;; Heading properties
  (defun nm-heading-properties-new (position level parents)
    "Heading properties: (position . (level . parent))"
    (cons position (cons level parents)))

  (defun nm-heading-properties-level (props)
    (cadr props))

  (defun nm-heading-properties-pos (props)
    (car props))

  (defun nm-heading-properties-parents (props)
    (cddr props))

  (defun nm-heading-pos-lessp (value props)
    (< value (nm-heading-properties-pos props)))

  (defun nm-buffer-headings-reversed (buffer)
    (with-current-buffer buffer
      (save-restriction
	(save-excursion
	  (widen)
	  (goto-char (point-min))
	  (let ((count 0)
		(headings ())
		(parents ()))
	    (while (re-search-forward org-outline-regexp-bol nil t)
	      (let* ((pos (match-beginning 0))
		     (level (- (match-end 0) pos 1)))
		(while (and parents (>= (nm-heading-properties-level (car parents)) level))
		  (pop parents))
		(setq count (1+ count))
		(let ((props (nm-heading-properties-new pos level parents)))
		  (push props headings)
		  (push props parents))))
	    (and headings (cons headings count)))))))


  ;; binary search tree
  (defun nm-btree-new-node ()
    "((left right) . properties"
    (cons (cons nil nil) nil))

  (defun nm-btree-node-left (node)
    (caar node))

  (defun nm-btree-node-set-left (node child)
    (setcar (car node) child))

  (defun nm-btree-node-set-right (node child)
    (setcdr (car node) child))

  (defun nm-btree-node-right (node)
    (cdar node))

  (defun nm-btree-node-properties (node)
    (cdr node))

  (defun nm-btree-node-set-properties (node properties)
    (setcdr node properties))

  (defun nm-btree-from-reversed (scan-result)
    (and
     scan-result
     (let* ((key-properties-list (car scan-result))
	    (length (cdr scan-result))
	    (head (nm-btree-new-node))
	    (queue (list (cons length head)))) ; list of (count . node)
       (while queue
	 (let* ((item (pop queue))
		(count (car item))
		(node (cdr item)))
	   (cond
	    ((eq count 1) ; leaf or only single child
	     (nm-btree-node-set-properties node (pop key-properties-list)))
	    ((nm-btree-node-right node) ; right children completed
	     (nm-btree-node-set-properties node (pop key-properties-list))
	     (let ((left-node (nm-btree-new-node)))
	       (nm-btree-node-set-left node left-node)
	       (push (cons (1- count) left-node) queue)))
	    (t (let* ((right-count (/ (car item) 2))
		      (right-node (nm-btree-new-node)))
		 (nm-btree-node-set-right node right-node)
		 (setcar item (- count right-count))
		 (push item queue)
		 (push (cons right-count right-node) queue))))))
       head)))

  (defun nm-btree-find-left (tree value &optional cmp)
    "Find last element not less than value"
    (let ((cmp (or cmp #'nm-heading-pos-lessp))
	  (result nil))
      (while tree
	(setq tree (if (funcall cmp value (nm-btree-node-properties tree))
		       (nm-btree-node-left tree)
		     (setq result tree)
		     (nm-btree-node-right tree))))
      (nm-btree-node-properties result)))
#+end_src

#+begin_src elisp
  (byte-compile #'nm-buffer-headings-reversed)
  (byte-compile #'nm-btree-from-reversed)
  (byte-compile #'nm-btree-find-left)

  (let* ((buffer "notes.org")
	 (scan-result (nm-buffer-headings-reversed buffer))
	 (tree (nm-btree-from-reversed scan-result))
	 (lim (with-current-buffer buffer
		(save-restriction
		  (widen)
		  (point-max)))))
    (list
     (append '("scan x10")
	     (benchmark-run 10
	       (nm-buffer-headings-reversed buffer)))
     (append '("btree x10")
	     (benchmark-run 10
	       (nm-btree-from-reversed scan-result)))
     (append '("scan+btree x10")
	     (benchmark-run 10
	       (let* ((scan-result1 (nm-buffer-headings-reversed buffer))
		      (tree1 (nm-btree-from-reversed scan-result1)))
		 tree1)))
     (append '("find random x10 000")
	     (benchmark-run 10000
	       (nm-btree-find-left tree (random lim))))
     (list "nodes" (cdr scan-result) "" "")))
#+end_src

#+RESULTS:
| scan x10            |   0.8611382689999999 | 0 |                 0.0 |
| btree x10           |  0.07705962400000001 | 1 | 0.05648322199999978 |
| scan+btree x10      |          0.940467238 | 1 | 0.05685373699999996 |
| find random x10 000 | 0.047712096999999995 | 0 |                 0.0 |
| nodes               |                 3413 |   |                     |

Without ~byte-compile~
| scan x10            |  1.2031535999999998 |  4 | 0.22845906700000018 |
| btree x10           |         0.498214241 |  6 | 0.34067464299999894 |
| scan+btree x10      |  1.7026304230000002 | 10 |  0.5686926149999998 |
| find random x10 000 | 0.08789912700000001 |  0 |                 0.0 |
| nodes               |                3413 |    |                     |

  reply	other threads:[~2021-05-08 11:29 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-04 15:08 Ihor Radchenko
2021-05-05 16:40 ` Maxim Nikulin
2021-05-06 14:34   ` Ihor Radchenko
2021-05-06 17:02     ` Maxim Nikulin
2021-05-07  2:08       ` Ihor Radchenko
2021-05-08 11:28         ` Maxim Nikulin [this message]
2021-05-10 15:14           ` Ihor Radchenko
2021-05-15 11:58         ` Bastien
2021-05-16  6:15 ` Bastien
2021-05-16  6:36   ` Ihor Radchenko
2021-05-16  8:53     ` Bastien

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='s75sla$1fm$1@ciao.gmane.io' \
    --to=manikulin@gmail.com \
    --cc=emacs-orgmode@gnu.org \
    --subject='Re: [PATCH] Use cache in org-up-heading-safe' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Code repositories for project(s) associated with this inbox:

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).