emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Ihor Radchenko <yantar92@gmail.com>
To: Nicolas Goaziou <mail@nicolasgoaziou.fr>
Cc: emacs-orgmode@gnu.org
Subject: Re: [patch suggestion] Mitigating the poor Emacs performance on huge org files: Do not use overlays for PROPERTY and LOGBOOK drawers
Date: Tue, 02 Jun 2020 17:25:18 +0800	[thread overview]
Message-ID: <87v9k93jlt.fsf@localhost> (raw)
In-Reply-To: <87367d4ydc.fsf@localhost>

Github link to the patch:
https://gist.github.com/yantar92/6447754415457927293acda43a7fcaef 


Ihor Radchenko <yantar92@gmail.com> writes:

> Hello,
>
> [The patch itself will be provided in the following email]
>
> I have three updates from the previous version of the patch:
>
> 1. I managed to implement buffer-local text properties.
>    Now, outline folding also uses text properties without a need to give
>    up independent folding in indirect buffers.
>
> 2. The code handling modifications in folded drawers/blocks was
>    rewritten. The new code uses after-change-functions to re-hide text
>    inserted in the middle of folded regions; and text properties to
>    unfold folded drawers/blocks if one changes BEGIN/END line.
>
> 3. [experimental] Started working on improving memory and cpu footprint
>    of the old code related to folding/unfolding. org-hide-drawer-all now
>    works significantly faster because I can utilise simplified drawer
>    parser, which require a lot less memory. Overall, I managed to reduce
>    Emacs memory footprint after loading all my agenda_files twice. The
>    loading is also noticeably faster.
>
> -----------------------------------------------------------------------
> -----------------------------------------------------------------------
>
> More details on the buffer-local text properties:
>
> I have found char-property-alias-alist variable that controls how Emacs
> calculates text property value if the property is not set. This variable
> can be buffer-local, which allows independent 'invisible states in
> different buffers.
>
> All the implementation stays in
> org--get-buffer-local-text-property-symbol, which takes care about
> generating unique property name and mapping it to 'invisible (or any
> other) text property.
>
> -----------------------------------------------------------------------
> -----------------------------------------------------------------------
>
> More details on the new implementation for tracking changes:
>
> I simplified the code as suggested, without using pairs of before- and
> after-change-functions.
>
> Handling text inserted into folded/invisible region is handled by a
> simple after-change function. After testing, it turned out that simple
> re-hiding text based on 'invisible property of the text before/after the
> inserted region works pretty well.
>
> Modifications to BEGIN/END line of the drawers and blocks is handled via
> 'modification-hooks + 'insert-behind-hooks text properties (there is no
> after-change-functions analogue for text properties in Emacs). The
> property is applied during folding and the modification-hook function is
> made aware about the drawer/block boundaries (via apply-partially
> passing element containing :begin :end markers for the current
> drawer/block). Passing the element boundary is important because the
> 'modification-hook will not directly know where it belongs to. Only the
> modified region (which can be larger than the drawer) is passed to the
> function. In the worst case, the region can be the whole buffer (if one
> runs revert-buffer).
>
> It turned out that adding 'modification-hook text property takes a
> significant cpu time (partially, because we need to take care about
> possible existing 'modification-hook value, see
> org--add-to-list-text-property). For now, I decided to not clear the
> modification hooks during unfolding because of poor performance.
> However, this approach would lead to partial unfolding in the following
> case:
>
> :asd:
> :drawer:
> lksjdfksdfjl
> sdfsdfsdf
> :end:
>
> If :asd: was inserted in front of folded :drawer:, changes in :drawer:
> line of the new folded :asd: drawer would reveal the text between
> :drawer: and :end:.
>
> Let me know what you think on this.
>
>> You shouldn't be bothered by the case you're describing here, for
>> multiple reasons.
>> 
>> First, this issue already arises in the current implementation. No one
>> bothered so far: this change is very unlikely to happen. If it becomes
>> an issue, we could make sure that `org-reveal' handles this.
>> 
>> But, more importantly, we actually /want it/ as a feature. Indeed, if
>> DRAWER is expanded every time ":BLAH:" is inserted above, then inserting
>> a drawer manually would unfold /all/ drawers in the section. The user is
>> more likely to write first ":BLAH:" (everything is unfolded) then
>> ":END:" than ":END:", then ":BLAH:".
>
> Agree. This allowed me to simplify the code significantly.
>
>> It seems you're getting it backwards. `before-change-functions' are the
>> functions being called with a possibly wide, imprecise, region to
>> handle:
>> 
>>     When that happens, the arguments to ‘before-change-functions’ will
>>     enclose a region in which the individual changes are made, but won’t
>>     necessarily be the minimal such region
>> 
>> however, after-change-functions calls are always minimal:
>> 
>>     and the arguments to each successive call of
>>     ‘after-change-functions’ will then delimit the part of text being
>>     changed exactly.
>> 
>> If you stick to `after-change-functions', there will be no such thing as
>> you describe.
>
> You are right here, I missed that before-change-functions are likely to
> be called on large regions. I thought that the regions are same for
> before/after-change-functions, but after-change-functions could be
> called more than 1 time. After second thought, your vision that it is
> mostly 0 or 1 times should be the majority of cases in practice.
>
> -----------------------------------------------------------------------
> -----------------------------------------------------------------------
>
> More details on reducing cpu and memory footprint of org buffers:
>
> My simplified implementation of element boundary parser
> (org--get-element-region-at-point) appears to be much faster and also
> uses much less memory in comparison with org-element-at-point.
> Moreover, not all the places where org-element-at-point is called
> actually need the full parsed element. For example, org-hide-drawer-all,
> org-hide-drawer-toggle, org-hide-block-toggle, and
> org--hide-wrapper-toggle only need element type and some information
> about the element boundaries - the information we can get from
> org--get-element-region-at-point.
>
> The following version of org-hide-drawer-all seems to work much faster
> in comparison with original:
>
> (defun org-hide-drawer-all ()
>   "Fold all drawers in the current buffer."
>   (save-excursion
>     (goto-char (point-min))
>     (while (re-search-forward org-drawer-regexp nil t)
>       (when-let* ((drawer (org--get-element-region-at-point '(property-drawer drawer)))
> 		  (type (org-element-type drawer)))
> 	(org-hide-drawer-toggle t nil drawer)
> 	;; Make sure to skip drawer entirely or we might flag it
> 	;; another time when matching its ending line with
> 	;; `org-drawer-regexp'.
> 	(goto-char (org-element-property :end drawer))))))
>
> What do you think about the idea of making use of
> org--get-element-region-at-point in org code base?
>
> -----------------------------------------------------------------------
> -----------------------------------------------------------------------
>
> Further work:
>
> 1. Look into other code using overlays. Specifically,
> org-toggle-custom-properties, Babel hashes, and narrowed table columns.
>
> Best,
> Ihor
>
> Nicolas Goaziou <mail@nicolasgoaziou.fr> writes:
>
>> Hello,
>>
>> Ihor Radchenko <yantar92@gmail.com> writes:
>>
>>> I have five updates from the previous version of the patch:
>>
>> Thank you.
>>
>>> 1. I implemented a simplified version of element parsing to detect
>>> changes in folded drawers or blocks. No computationally expensive calls
>>> of org-element-at-point or org-element-parse-buffer are needed now.
>>>
>>> 2. The patch is now compatible with master (commit 2e96dc639). I
>>> reverted the earlier change in folding drawers and blocks. Now, they are
>>> back to using 'org-hide-block and 'org-hide-drawer. Using 'outline would
>>> achieve nothing when we use text properties.
>>>
>>> 3. 'invisible text property can now be nested. This is important, for
>>> example, when text inside drawers contains fontified links (which also
>>> use 'invisible text property to hide parts of the link). Now, the old
>>> 'invisible spec is recovered after unfolding.
>>
>> Interesting. I'm running out of time, so I cannot properly inspect the
>> code right now. I'll try to do that before the end of the week.
>>
>>> 4. Some outline-* function calls in org referred to outline-flag-region
>>> implementation, which is not in sync with org-flag-region in this patch.
>>> I have implemented their org-* versions and replaced the calls
>>> throughout .el files. Actually, some org-* versions were already
>>> implemented in org, but not used for some reason (or not mentioned in
>>> the manual). I have updated the relevant sections of manual. These
>>> changes might be relevant to org independently of this feature branch.
>>
>> Yes, we certainly want to move to org-specific versions in all cases.
>>
>>> 5. I have managed to get a working version of outline folding via text
>>> properties. However, that approach has a big downside - folding state
>>> cannot be different in indirect buffer when we use text properties. I
>>> have seen packages relying on this feature of org and I do not see any
>>> obvious way to achieve different folding state in indirect buffer while
>>> using text properties for outline folding.
>>
>> Hmm. Good point. This is a serious issue to consider. Even if we don't
>> use text properties for outline, this also affects drawers and blocks.
>>
>>> For now, I still used before/after-change-functions combination.
>>
>> You shouldn't.
>>
>>> I see the following problems with using only after-change-functions: 
>>>
>>> 1. They are not guaranteed to be called after every single change:
>>
>> Of course they are! See below.
>>
>>> From (elisp) Change Hooks:
>>> "... some complex primitives call ‘before-change-functions’ once before
>>> making changes, and then call ‘after-change-functions’ zero or more
>>> times"
>>
>> "zero" means there are no changes at all, so, `after-change-functions'
>> are not called, which is expected.
>>
>>> The consequence of it is a possibility that region passed to the
>>> after-change-functions is quite big (including all the singular changes,
>>> even if they are distant). This region may contain changed drawers as
>>> well and unchanged drawers and needs to be parsed to determine which
>>> drawers need to be re-folded.
>>
>> It seems you're getting it backwards. `before-change-functions' are the
>> functions being called with a possibly wide, imprecise, region to
>> handle:
>>
>>     When that happens, the arguments to ‘before-change-functions’ will
>>     enclose a region in which the individual changes are made, but won’t
>>     necessarily be the minimal such region
>>
>> however, after-change-functions calls are always minimal:
>>
>>     and the arguments to each successive call of
>>     ‘after-change-functions’ will then delimit the part of text being
>>     changed exactly.
>>
>> If you stick to `after-change-functions', there will be no such thing as
>> you describe.
>>
>>>> And, more importantly, they are not meant to be used together, i.e., you
>>>> cannot assume that a single call to `before-change-functions' always
>>>> happens before calling `after-change-functions'. This can be tricky if
>>>> you want to use the former to pass information to the latter.
>>>
>>> The fact that before-change-functions can be called multiple times
>>> before after-change-functions, is trivially solved by using buffer-local
>>> changes register (see org--modified-elements).
>>
>> Famous last words. Been there, done that, and it failed.
>>
>> Let me quote the manual:
>>
>>     In general, we advise to use either before- or the after-change
>>     hooks, but not both.
>>
>> So, let me insist: don't do that. If you don't agree with me, let's at
>> least agree with Emacs developers.
>>
>>> The register is populated by before-change-functions and cleared by
>>> after-change-functions.
>>
>> You cannot expect `after-change-functions' to clear what
>> `before-change-functions' did. This is likely to introduce pernicious
>> bugs. Sorry if it sounds like FUD, but bugs in those areas are just
>> horrible to squash.
>>
>>>> Well, `before-change-fuctions' and `after-change-functions' are not
>>>> clean at all: you modify an unrelated part of the buffer, but still call
>>>> those to check if a drawer needs to be unfolded somewhere.
>>>
>>> 2. As you pointed, instead of global before-change-functions, we can use
>>> modification-hooks text property on sensitive parts of the
>>> drawers/blocks. This would work, but I am concerned about one annoying
>>> special case:
>>>
>>> -------------------------------------------------------------------------
>>> :BLAH: <inserted outside any of the existing drawers>
>>>
>>> <some text>
>>>
>>> :DRAWER: <folded>
>>> Donec at pede.
>>> :END:
>>> -------------------------------------------------------------------------
>>> In this example, the user would not be able to unfold the folder DRAWER
>>> because it will technically become a part of a new giant BLAH drawer.
>>> This may be especially annoying if <some text> is more than one screen
>>> long and there is no easy way to identify why unfolding does not work
>>> (with point at :DRAWER:).
>>
>> You shouldn't be bothered by the case you're describing here, for
>> multiple reasons.
>>
>> First, this issue already arises in the current implementation. No one
>> bothered so far: this change is very unlikely to happen. If it becomes
>> an issue, we could make sure that `org-reveal' handles this.
>>
>> But, more importantly, we actually /want it/ as a feature. Indeed, if
>> DRAWER is expanded every time ":BLAH:" is inserted above, then inserting
>> a drawer manually would unfold /all/ drawers in the section. The user is
>> more likely to write first ":BLAH:" (everything is unfolded) then
>> ":END:" than ":END:", then ":BLAH:".
>>
>>> Because of this scenario, limiting before-change-functions to folded
>>> drawers is not sufficient. Any change in text may need to trigger
>>> unfolding.
>>
>> after-change-functions is more appropriate than before-change-functions,
>> and local parsing, as explained in this thread, is more efficient than
>> re-inventing the parser.
>>
>>> In the patch, I always register possible modifications in the
>>> blocks/drawers intersecting with the modified region + a drawer/block
>>> right next to the region.
>>>
>>> -----------------------------------------------------------------------
>>> -----------------------------------------------------------------------
>>>
>>> More details on the nested 'invisible text property implementation.
>>>
>>> The idea is to keep 'invisible property stack push and popping from it
>>> as we add/remove 'invisible text property. All the work is done in
>>> org-flag-region.
>>
>> This sounds like a good idea.
>>
>>> This was originally intended for folding outlines via text properties.
>>> Since using text properties for folding outlines is not a good idea,
>>> nested text properties have much less use.
>>
>> AFAIU, they have. You mention link fontification, but there are other
>> pieces that we could switch to text properties instead of overlays,
>> e.g., Babel hashes, narrowed table columns…
>>
>>> 3. Multiple calls to before/after-change-functions is still a problem. I
>>> am looking into following ways to reduce this number:
>>>  - reduce the number of elements registered as potentially modified
>>>    + do not add duplicates to org--modified-elements
>>>    + do not add unfolded elements to org--modified-elements
>>>    + register after-change-function as post-command hook and remove it
>>>      from global after-change-functions. This way, it will be called
>>>      twice per command only.
>>>  - determine common region containing org--modified-elements. if change
>>>    is happening within that region, there is no need to parse
>>>    drawers/blocks there again.
>>
>> This is over-engineering. Again, please focus on local changes, as
>> discussed before.
>>
>>> Recipe to have different (org-element-at-point) and
>>> (org-element-parse-buffer 'element)
>>> -------------------------------------------------------------------------
>>> <point-min>
>>> :PROPERTIES:
>>> :CREATED:  [2020-05-23 Sat 02:32]
>>> :END:
>>>
>>>
>>> <point-max>
>>> -------------------------------------------------------------------------
>>
>> I didn't look at this situation in particular, but there are cases where
>> different :post-blank values are inevitable, for example at the end of
>> a section.
>>
>> Regards,
>>
>> -- 
>> Nicolas Goaziou
>
> -- 
> Ihor Radchenko,
> PhD,
> Center for Advancing Materials Performance from the Nanoscale (CAMP-nano)
> State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong University, Xi'an, China
> Email: yantar92@gmail.com, ihor_radchenko@alumni.sutd.edu.sg

-- 
Ihor Radchenko,
PhD,
Center for Advancing Materials Performance from the Nanoscale (CAMP-nano)
State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong University, Xi'an, China
Email: yantar92@gmail.com, ihor_radchenko@alumni.sutd.edu.sg


  parent reply	other threads:[~2020-06-02  9:30 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-24  6:55 Ihor Radchenko
2020-04-24  8:02 ` Nicolas Goaziou
2020-04-25  0:29   ` stardiviner
2020-04-26 16:04   ` Ihor Radchenko
2020-05-04 16:56     ` Karl Voit
2020-05-07  7:18       ` Karl Voit
2020-05-09 15:43       ` Ihor Radchenko
2020-05-07 11:04     ` Christian Heinrich
2020-05-09 15:46       ` Ihor Radchenko
2020-05-08 16:38     ` Nicolas Goaziou
2020-05-09 13:58       ` Nicolas Goaziou
2020-05-09 16:22         ` Ihor Radchenko
2020-05-09 17:21           ` Nicolas Goaziou
2020-05-10  5:25             ` Ihor Radchenko
2020-05-10  9:47               ` Nicolas Goaziou
2020-05-10 13:29                 ` Ihor Radchenko
2020-05-10 14:46                   ` Nicolas Goaziou
2020-05-10 16:21                     ` Ihor Radchenko
2020-05-10 16:38                       ` Nicolas Goaziou
2020-05-10 17:08                         ` Ihor Radchenko
2020-05-10 19:38                           ` Nicolas Goaziou
2020-05-09 15:40       ` Ihor Radchenko
2020-05-09 16:30         ` Ihor Radchenko
2020-05-09 17:32           ` Nicolas Goaziou
2020-05-09 18:06             ` Ihor Radchenko
2020-05-10 14:59               ` Nicolas Goaziou
2020-05-10 15:15                 ` Kyle Meyer
2020-05-10 16:30                 ` Ihor Radchenko
2020-05-10 19:32                   ` Nicolas Goaziou
2020-05-12 10:03                     ` Nicolas Goaziou
2020-05-17 15:00                     ` Ihor Radchenko
2020-05-17 15:40                       ` Ihor Radchenko
2020-05-18 14:35                         ` Nicolas Goaziou
2020-05-18 16:52                           ` Ihor Radchenko
2020-05-19 13:07                             ` Nicolas Goaziou
2020-05-23 13:52                               ` Ihor Radchenko
2020-05-23 13:53                                 ` Ihor Radchenko
2020-05-23 15:26                                   ` Ihor Radchenko
2020-05-26  8:33                                 ` Nicolas Goaziou
2020-06-02  9:21                                   ` Ihor Radchenko
2020-06-02  9:23                                     ` Ihor Radchenko
2020-06-02 12:10                                       ` Bastien
2020-06-02 13:12                                         ` Ihor Radchenko
2020-06-02 13:23                                           ` Bastien
2020-06-02 13:30                                             ` Ihor Radchenko
2020-06-02  9:25                                     ` Ihor Radchenko [this message]
2020-06-05  7:26                                     ` Nicolas Goaziou
2020-06-05  8:18                                       ` Ihor Radchenko
2020-06-05 13:50                                         ` Nicolas Goaziou
2020-06-08  5:05                                           ` Ihor Radchenko
2020-06-08  5:06                                             ` Ihor Radchenko
2020-06-08  5:08                                             ` Ihor Radchenko
2020-06-10 17:14                                             ` Nicolas Goaziou
2020-06-21  9:52                                               ` Ihor Radchenko
2020-06-21 15:01                                                 ` Nicolas Goaziou
2020-08-11  6:45                                               ` Ihor Radchenko
2020-08-11 23:07                                                 ` Kyle Meyer
2020-08-12  6:29                                                   ` Ihor Radchenko
2020-09-20  5:53                                                     ` Ihor Radchenko
2020-09-20 11:45                                                       ` Kévin Le Gouguec
2020-09-22  9:05                                                         ` Ihor Radchenko
2020-09-22 10:00                                                           ` Ihor Radchenko
2020-09-23  6:16                                                             ` Kévin Le Gouguec
2020-09-23  6:48                                                               ` Ihor Radchenko
2020-09-23  7:09                                                                 ` Bastien
2020-09-23  7:30                                                                   ` Ihor Radchenko
2020-09-24 18:07                                                                 ` Kévin Le Gouguec
2020-09-25  2:16                                                                   ` Ihor Radchenko
2020-12-15 17:38                                                                     ` [9.4] Fixing logbook visibility during isearch Kévin Le Gouguec
2020-12-16  3:15                                                                       ` Ihor Radchenko
2020-12-16 18:05                                                                         ` Kévin Le Gouguec
2020-12-17  3:18                                                                           ` Ihor Radchenko
2020-12-17 14:50                                                                             ` Kévin Le Gouguec
2020-12-18  2:23                                                                               ` Ihor Radchenko
2020-12-24 23:37                                                                                 ` Kévin Le Gouguec
2020-12-25  2:51                                                                                   ` Ihor Radchenko
2020-12-25 10:59                                                                                     ` Kévin Le Gouguec
2020-12-25 12:32                                                                                       ` Ihor Radchenko
2020-12-25 21:35                                                                                     ` Kévin Le Gouguec
2020-12-26  4:14                                                                                       ` Ihor Radchenko
2020-12-26 11:44                                                                                         ` Kévin Le Gouguec
2020-12-26 12:22                                                                                           ` Ihor Radchenko
2020-12-04  5:58                                                       ` [patch suggestion] Mitigating the poor Emacs performance on huge org files: Do not use overlays for PROPERTY and LOGBOOK drawers Ihor Radchenko
2021-03-21  9:09                                                         ` Ihor Radchenko
2021-05-03 17:28                                                           ` Bastien
2021-09-21 13:32                                                             ` Timothy
2021-10-26 17:25                                                               ` Matt Price
2021-10-27  6:27                                                                 ` Ihor Radchenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87v9k93jlt.fsf@localhost \
    --to=yantar92@gmail.com \
    --cc=emacs-orgmode@gnu.org \
    --cc=mail@nicolasgoaziou.fr \
    --subject='Re: [patch suggestion] Mitigating the poor Emacs performance on huge org files: Do not use overlays for PROPERTY and LOGBOOK drawers' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Code repositories for project(s) associated with this inbox:

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).