emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Max Nikulin <manikulin@gmail.com>
To: emacs-orgmode@gnu.org
Subject: Re: profiling latency in large org-mode buffers (under both main & org-fold feature)
Date: Wed, 2 Mar 2022 19:23:02 +0700	[thread overview]
Message-ID: <svnnj8$pl3$1@ciao.gmane.io> (raw)
In-Reply-To: <87y21wkdwu.fsf@localhost>

On 27/02/2022 13:43, Ihor Radchenko wrote:
> 
> Now, I did an extended profiling of what is happening using perf:
> 
>       6.20%   [.] buf_bytepos_to_charpos

Maybe I am interpreting such results wrongly, but it does not look like 
a bottleneck. Anyway thank you very much for such efforts, however it is 
unlikely that I will join to profiling in near future.

> buf_bytepos_to_charpos contains the following loop:
> 
>    for (tail = BUF_MARKERS (b); tail; tail = tail->next)
>      {
>        CONSIDER (tail->bytepos, tail->charpos);
> 
>        /* If we are down to a range of 50 chars,
> 	 don't bother checking any other markers;
> 	 scan the intervening chars directly now.  */
>        if (best_above - bytepos < distance
>            || bytepos - best_below < distance)
> 	break;
>        else
>          distance += BYTECHAR_DISTANCE_INCREMENT;
>      }
> 
> I am not sure if I understand the code correctly, but that loop is
> clearly scaling performance with the number of markers

I may be terribly wrong, but it looks like an optimization attempt that 
may actually ruin performance. My guess is the following. Due to 
multibyte characters position in buffer counted in characters may 
significantly differ from index in byte sequence. Since markers have 
both values bytepos and charpos, they are used (when available) to 
narrow down initial estimation interval [0, buffer size) to nearest 
existing markers. The code below even creates temporary markers to make 
next call of the function faster.

It seems, buffers do not have any additional structures that track size 
in bytes and in characters of spans (I would not expect that 
representation of whole buffer in memory is single contiguous byte 
array). When there are no markers at all, the function has to iterate 
over each character and count its length.

The problem is that when the buffer has a lot of markers far aside from 
the position passed as argument, then iteration over markers just 
consumes CPU with no significant improvement of original estimation of 
boundaries.

If markers were organized in a tree than search would be much faster (at 
least for buffers with a lot of markers.

In some cases such function may take a hint: previous known 
bytepos+charpos pair.

I hope I missed something, but what I can expect from the code of 
buf_bytepos_to_charpos is that it is necessary to iterate over all 
markers to update positions after each typed character.

> Finally, FYI. I plan to work on an alternative mechanism to access Org
> headings - generic Org query library. It will not use markers and
> implement ideas from org-ql. org-refile will eventually use that generic
> library instead of current mechanism.

I suppose that markers might be implemented in an efficient way, and 
much better performance may be achieved when low-level data structures 
are accessible. I am in doubts concerning attempts to create something 
that resembles markers but based purely on high-level API.



  reply	other threads:[~2022-03-02 12:29 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-21 21:06 profiling latency in large org-mode buffers (under both main & org-fold feature) Matt Price
2022-02-21 22:22 ` Samuel Wales
2022-02-22  5:33   ` Ihor Radchenko
2022-02-22  5:44     ` Kaushal Modi
     [not found]       ` <CAN_Dec8kW5hQoa0xr7sszafYJJNmGipX0DA94DKNh11DWjce8g@mail.gmail.com>
2022-02-23  2:41         ` Matt Price
2022-02-23  5:22           ` Ihor Radchenko
2022-02-23 14:47             ` Matt Price
2022-02-23 15:10               ` Ihor Radchenko
2022-02-22 21:11     ` Rudolf Adamkovič
2022-02-23 12:37       ` Org mode profiling meetup on Sat, Feb 26 (was: profiling latency in large org-mode buffers (under both main & org-fold feature)) Ihor Radchenko
2022-02-23 16:43         ` Kaushal Modi
2022-02-25 14:30         ` Ihor Radchenko
2022-02-26 12:04           ` Ihor Radchenko
2022-02-26 12:51             ` Ihor Radchenko
2022-02-26 15:51               ` Quiliro Ordóñez
2022-03-23 10:57                 ` #2 Org mode profiling meetup on Sat, Mar 26 (was: Org mode profiling meetup on Sat, Feb 26 (was: profiling latency in large org-mode buffers (under both main & org-fold feature))) Ihor Radchenko
2022-03-24 11:17                   ` Ihor Radchenko
2022-03-24 11:27                   ` Bruce D'Arcus
2022-03-24 13:43                     ` Matt Price
2022-03-24 13:49                     ` Ihor Radchenko
2022-03-26 11:59                   ` Ihor Radchenko
2022-03-27  8:14                     ` Ihor Radchenko
2022-04-21  8:05                   ` #3 Org mode profiling meetup on Sat, Apr 23 (was: #2 Org mode profiling meetup on Sat, Mar 26) Ihor Radchenko
2022-04-23 12:08                     ` Ihor Radchenko
2022-04-24  4:27                       ` Ihor Radchenko
2022-02-27  7:41               ` Org mode profiling meetup on Sat, Feb 26 (was: profiling latency in large org-mode buffers (under both main & org-fold feature)) Ihor Radchenko
2022-02-23 16:03     ` profiling latency in large org-mode buffers (under both main & org-fold feature) Max Nikulin
2022-02-23 16:35       ` Ihor Radchenko
2022-02-25 12:38         ` Max Nikulin
2022-02-26  7:45           ` Ihor Radchenko
2022-02-26 12:45             ` Max Nikulin
2022-02-27  6:43               ` Ihor Radchenko
2022-03-02 12:23                 ` Max Nikulin [this message]
2022-03-02 15:12                   ` Ihor Radchenko
2022-03-03 14:56                     ` Max Nikulin
2022-03-19  8:49                       ` Ihor Radchenko
2022-02-26 15:07     ` Jean Louis
2022-02-23  2:39   ` Matt Price
2022-02-23  5:25     ` Ihor Radchenko
2022-02-22  5:30 ` Ihor Radchenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='svnnj8$pl3$1@ciao.gmane.io' \
    --to=manikulin@gmail.com \
    --cc=emacs-orgmode@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).