From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id EJX5CyNNmWDEBAEAgWs5BA (envelope-from ) for ; Mon, 10 May 2021 17:11:31 +0200 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id ILq+ByNNmWA/XAAA1q6Kng (envelope-from ) for ; Mon, 10 May 2021 15:11:31 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 53E1F204A9 for ; Mon, 10 May 2021 17:11:30 +0200 (CEST) Received: from localhost ([::1]:48726 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lg7ZA-000848-G7 for larch@yhetil.org; Mon, 10 May 2021 11:11:28 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:46412) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lg7Xb-0007Kq-A4 for emacs-orgmode@gnu.org; Mon, 10 May 2021 11:09:51 -0400 Received: from mail-lf1-x136.google.com ([2a00:1450:4864:20::136]:38611) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1lg7XZ-0001id-5M for emacs-orgmode@gnu.org; Mon, 10 May 2021 11:09:51 -0400 Received: by mail-lf1-x136.google.com with SMTP id r5so6675538lfr.5 for ; Mon, 10 May 2021 08:09:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:in-reply-to:references:cc:date:message-id :mime-version; bh=pj00JGKL384wgESXcSICKPZBvax6Vi3bVM+rKO9w+LE=; b=Qag2bgHNGzV/44j9pqf67K9FiGv46jl9jg5wxxEGVe9PDB3ODKyrBFLCbAzJO+Wf3b lIGuLfoT+73xAtt7BRWuBI9ZbnOlaRNdOKIckMgs8FgLsS4aXqBCvOCllBiBGKaTGkvq soEIIn8LF7YX5GcSbG2SeX5nXTsrcKwxxfdwxs7cRyhCL9cLrq5S6PWy7F21uKQQy36E kBop+qtxGh9EeORSccep3isgtgCvR6dmVcbP4A9RT7lLQFGzif3VTPu26hZ+NN+LulSP Rlyb5+lrlS0uKHJsCF27TM5f5CQgiI8E04Ds0z5htG2mE/oeo6Yp8yJGEbteDfvd0lJs t6Ug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:in-reply-to:references:cc:date :message-id:mime-version; bh=pj00JGKL384wgESXcSICKPZBvax6Vi3bVM+rKO9w+LE=; b=sfTmXOX6cWap598MRucXbYUTES3fcjKlxH4goKPfWuuJZYfyVhR999RHVoQ3TVA6oZ uihfhHk+RTlzOtjathhosEc7c6g0QBkqL/JGvKJjDNJiCIPiSQW/t8lTBOhdjQ8Qii52 X6A9vNr0ru2kqhGVKVnrJskTB8/0Zofwb3nW4F3vqZQT7AKIWLXbgiAhbQ/oKP0dzYad U5v8w4CqoXqiCSFA97EBf/9Sn8Rd4slcuHL3ANNGOtNlWLrDAgvwG9dP1AEZCOAXC54R ULZ5h18wGr96deExM7PPX3C+f04H11RGCW2aGWCkHn/oei2ln0496ZtDKMOJ3+paWjf4 o4Mg== X-Gm-Message-State: AOAM530e4fOzIPGHJJeXq/XgfPPSUw9XSDLTqQtCFjeQMB16r/1nv4Ed AVv2054C5BZG4QGaSPvrBok= X-Google-Smtp-Source: ABdhPJyJlgfcwwVQzjr1JGI/J0ExszLJ88kvlCV060NqSblxHjEMvN8vsDvy5O5bnfmFnYfc1eaSLQ== X-Received: by 2002:ac2:428e:: with SMTP id m14mr17039879lfh.478.1620659386723; Mon, 10 May 2021 08:09:46 -0700 (PDT) Received: from localhost ([91.210.107.197]) by smtp.gmail.com with ESMTPSA id i6sm2301349lfv.96.2021.05.10.08.09.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 May 2021 08:09:45 -0700 (PDT) From: Ihor Radchenko To: Maxim Nikulin Subject: Re: [PATCH] Use cache in org-up-heading-safe In-Reply-To: References: <874kfieduh.fsf@localhost> <87tung2aoq.fsf@localhost> <87o8dn2t3a.fsf@localhost> Date: Mon, 10 May 2021 23:14:14 +0800 Message-ID: <87eeeey62h.fsf@localhost> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Received-SPF: pass client-ip=2a00:1450:4864:20::136; envelope-from=yantar92@gmail.com; helo=mail-lf1-x136.google.com X-Spam_score_int: -17 X-Spam_score: -1.8 X-Spam_bar: - X-Spam_report: (-1.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: emacs-orgmode@gnu.org Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1620659490; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=pj00JGKL384wgESXcSICKPZBvax6Vi3bVM+rKO9w+LE=; b=Tn/vCSzAAJUecEOcEOkeMFfjklhTe9NkUlLjbfhrP+7RtrDxo95/3k5PWGavV75emsYAKc D0Md5YOMfzgi860GtpPLhGY+X8KUcV8uuSJRghloo1NjkMH1VH8qLVpwjcDsuldvyF1wBA f/2hhtY8af9cqA7TyeW7DDIBlMFyqfHHaPFWabjUqmcYamrzC9FEsxH7tAwr/6BgX8aSuG PU2OkoUT9tKeA3czlUSwlxWgtVufqGxtThrC45JKXrjacniIFEk5BkVE94uCkMrH7MCf2j QNW9DEIceDqAHbnRrTWaK9B+wqvyrKE1kyLCdmMy3s1twufBAPwvfMJ56Yk6PQ== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1620659490; a=rsa-sha256; cv=none; b=pRYIpFNyKvklRHXvZPQQesKzDPTCGBHUz7jEp2iIEkUYD1z4STKxBSErze09gOSsEDcYS0 ByL9UVna2bOJ9BI0t6oRT+JA8cFP4+Qq4GRTX4pklGkvoBs9tyZsIi/9PpyiewJe3CNmhz AoXFPz9/P5pWzclPBY7pTixhEEyd+TJx3BRQsb7LGcM0HedGqyj/QgzP4ULvxc4ANvCdRP Uk4h6aJOE3tm/U4k1xtM1LpqANX3dzbQNpGAt84NYIXdb+Xsl7hrvnVQgZ38O8LouNfz/Z rhCydXqlxEmxJTdS8DA7CmP+V1qSVXmKltdzr9k8S9EYjXohsmS/+BuvBMiTZQ== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=Qag2bgHN; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Migadu-Spam-Score: -2.65 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=Qag2bgHN; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Migadu-Queue-Id: 53E1F204A9 X-Spam-Score: -2.65 X-Migadu-Scanner: scn0.migadu.com X-TUID: flJJFQmRl01W --=-=-= Content-Type: text/plain Maxim Nikulin writes: > I am trying to minimize number of regexp searches. Mostly it is applied > when information concerning multiple headings is required (agenda, > refile targets). It unlikely will get some benefits during interactive > calls related to single heading. > Having the tree, it is possible to instantly find heading for > *arbitrary* position in the buffer. Likely the next operation is goto to > the heading or to it parent and parsing the line for more detailed > properties. The latter is cacheable, structure for heading properties > can be expanded. Thanks for the explanation! I very much like this idea, though I do not think that it is going to be practical on actual large files. Your code runs for 0.25sec on my largest file: | scan x10 | 2.482179163 | 1 | 0.38996822200002157 | | btree x10 | 0.103329151 | 0 | 0.0 | | scan+btree x10 | 2.666915096 | 1 | 0.3657946569999808 | | find random x10 000 | 0.048668756 | 0 | 0.0 | | nodes | 16641 | | | > Since there is no operations as insertion or deletion of nodes from > tree, no complex code is required to implement balancing rotations. That > is why I think that avl-tree is an overkill. Yet, if we could keep the tree in sync with buffer modifications... And we can, using org-element-cache. It already uses AVL-tree and already syncs it automatically on buffer edits. The only problem is that it does not work with headlines. However, I wrote some code adding headline support to org-element-cache in https://github.com/yantar92/org. Some benchmarks: | scan x10 | 2.179845106 | 0 | 0.0 | | btree x10 | 0.474887597 | 1 | 0.364201286999986 | | scan+btree x10 | 2.318029902 | 0 | 0.0 | | scan using org-element-cache x10 | 2.8941826990000004 | 0 | 0.0 | | populating the cache x1 | 11.332366941 | 5 | 2.040159146999997 | | find random x10 000 | 0.053074866 | 0 | 0.0 | | find random in org-element-cache x10 000 | 0.007963375 | 0 | 0.0 | | nodes | 16641 | | | Scan through the file takes pretty much the same time with the btree approach. Sometimes more, sometimes less. And most of the time is really just re-search-forward. Find random showcases that avl-tree vs. btree really makes a difference on such large number of headings. On the worse side, initial population of the org-element-cache takes quite a bit of time. However, it is normally done asynchronously during idle time and does not really affect the user except the first loading (which might be further optimised if we store cache between Emacs sessions). Later, cache is updated automatically with buffer modifications. Moreover, the org-element-cache will already provide the parsed headlines, including titles (and tags and other useful staff). No need to match headline text with regexp. For me, the idea of storing headlines in cache looks promising. I can imagine many other Org functions using the cache after trivial modifications. At least org-get-tags, org-get-category, org-entry-get can be easily adapted to use the cache. What do you think? > See the attachment for experimental (thus almost untested) code. Likely > you will find code style quite ugly. I am not happy with 0.1 sec for a > moderately large file. It is close to the limit for comfortable > interactive operations. No worries about the style. It is testing anyway. I modified your code slightly for testing the org-element-cache. See the attached diff. Best, Ihor --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename=diff-btree.diff diff -u /tmp/nm-btree-1.org /tmp/nm-btree.org --- /tmp/nm-btree-1.org 2021-05-10 23:04:21.286395360 +0800 +++ /tmp/nm-btree.org 2021-05-10 23:02:17.586396326 +0800 @@ -1,3 +1,6 @@ +:PROPERTIES: +:ID: 0aba93db-491d-46f7-b952-e138ba179a12 +:END: #+begin_src elisp :results none @@ -38,6 +41,26 @@ (push props parents)))) (and headings (cons headings count))))))) +(defun nm-buffer-headings-reversed-cache (buffer) + (with-current-buffer buffer + (save-restriction + (save-excursion + (widen) + (goto-char (point-min)) + (let ((count 0) + (headings ()) + (parents ())) + (while (re-search-forward org-outline-regexp-bol nil t) + (let ((cached (org-element--cache-find (line-beginning-position)))) + (if (eq (org-element-property :begin cached) (line-beginning-position)) + (push cached headings) + (push (org-element-at-point) headings))) + + ;; (save-excursion + ;; (beginning-of-line) + ;; (org-element-at-point 'cached)) + )))))) + ;; binary search tree (defun nm-btree-new-node () @@ -103,6 +126,7 @@ #+begin_src elisp (byte-compile #'nm-buffer-headings-reversed) +(byte-compile #'nm-buffer-headings-reversed-cache) (byte-compile #'nm-btree-from-reversed) (byte-compile #'nm-btree-find-left) @@ -125,13 +149,44 @@ (let* ((scan-result1 (nm-buffer-headings-reversed buffer)) (tree1 (nm-btree-from-reversed scan-result1))) tree1))) + (append '("scan using org-element-cache x10") + (progn + (let ((org-element-cache-sync-duration 1000.0)) + (org-element-cache-reset) + (org-element--cache-buffer-stealthly)) + (benchmark-run 10 + (nm-buffer-headings-reversed-cache buffer)))) + (append '("populating the cache x1") + (benchmark-run 1 + (with-current-buffer buffer + ;; Disable async processing. + (let ((org-element-cache-sync-duration 1000.0)) + (org-element-cache-reset) + (org-element--cache-buffer-stealthly))))) (append '("find random x10 000") (benchmark-run 10000 (nm-btree-find-left tree (random lim)))) + (append '("find random in org-element-cache x10 000") + (benchmark-run 10000 + (org-element-lineage + (org-element--cache-find (random lim)) + '(headlines) t))) (list "nodes" (cdr scan-result) "" ""))) #+end_src #+RESULTS: +| scan x10 | 1.693757321 | 0 | 0.0 | +| btree x10 | 0.552682783 | 1 | 0.43690057000000593 | +| scan+btree x10 | 2.3174109880000002 | 0 | 0.0 | +| scan using org-element-cache x10 | 2.9140550789999997 | 0 | 0.0 | +| populating the cache x1 | 10.83575134 | 9 | 3.9872376689999953 | +| find random x10 000 | 0.061988949999999994 | 0 | 0.0 | +| find random in org-element-cache x10 000 | 0.003262685 | 0 | 0.0 | +| nodes | 16641 | | | + + + + | scan x10 | 0.8611382689999999 | 0 | 0.0 | | btree x10 | 0.07705962400000001 | 1 | 0.05648322199999978 | | scan+btree x10 | 0.940467238 | 1 | 0.05685373699999996 | Diff finished. Mon May 10 23:04:27 2021 --=-=-=--