From mboxrd@z Thu Jan 1 00:00:00 1970 From: Adam Porter Subject: Re: How to make agenda generation faster Date: Fri, 19 Oct 2018 21:12:01 -0500 Message-ID: References: <87h8hy1ho5.fsf@mbork.pl> <87o9c34ocl.fsf@alphapapa.net> <877eirdrqt.fsf@nicolasgoaziou.fr> <87ftx5fx3n.fsf@alphapapa.net> <87pnw8engh.fsf@nicolasgoaziou.fr> <87h8hkqtfb.fsf@alphapapa.net> <87r2gm6fdp.fsf@nicolasgoaziou.fr> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="000000000000b05a8c05789f8b2a" Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:46067) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gDgkH-0008HV-Fp for emacs-orgmode@gnu.org; Fri, 19 Oct 2018 22:12:06 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gDgkG-0005ey-4I for emacs-orgmode@gnu.org; Fri, 19 Oct 2018 22:12:05 -0400 Received: from mail-ed1-x534.google.com ([2a00:1450:4864:20::534]:40897) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gDgkF-0005dz-Mx for emacs-orgmode@gnu.org; Fri, 19 Oct 2018 22:12:03 -0400 Received: by mail-ed1-x534.google.com with SMTP id r1-v6so33052352edd.7 for ; Fri, 19 Oct 2018 19:12:03 -0700 (PDT) In-Reply-To: <87r2gm6fdp.fsf@nicolasgoaziou.fr> List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: "Emacs-orgmode" Cc: emacs-orgmode@gnu.org --000000000000b05a8c05789f8b2a Content-Type: text/plain; charset="UTF-8" On Oct 18, 2018 5:48 PM, "Nicolas Goaziou" wrote: > Are you saying that queries are turned into regexp searches within Org files? If so, I don't think they should. Yes, because this is the fastest way to search for matching entries in a buffer, when it's possible to use a regexp search. > Queries should only operate on the output of the data extraction, possibly a list of defstructs. I.e., you first extract all meaningful data from the document (during idle time, with cache, or whatever optimization would be chosen), store it in an appropriate format, then query it. > > WDYT? That would be ideal. The problem I foresee is that, when a buffer's cache is not up-to-date, and the user runs an agenda query, the user will have to wait for the buffer to be parsed and cached, which is much slower than a regexp search through the buffer. That was what I first tried with org-agenda-ng: I parsed the whole buffer with org-element and ran predicates against the element tree. It was much too slow to be practical, so I switched to the current approach, which runs predicates against each node, only checking the necessary metadata. It's fast enough to be useful, but can still be slow in some cases, and I don't think it would be fast enough as a replacement for the current agenda code. But with further optimization, like using whole-buffer regexp searches when possible, it might be. Another idea I've had, similar to yours, would be to pre-process buffers, adding metadata as text-properties on heading lines. However, I haven't tested it, and I don't know what the performance would be like. And it would still suffer from the caching problem I mentioned. I think the fundamental problems are 1) keeping the cache in sync with the raw buffer, and 2) the slow speed of parsing an entire buffer's metadata at once (depending on the size of the files, of course, but mine are big enough to be slow, and I'm sure many users have larger ones). Of course, maybe someone cleverer than me can figure out a clever solution to these problems. :) --000000000000b05a8c05789f8b2a Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

On Oct 18, 2018 5:48 PM, "Nicolas Goaziou" <mail@nicolasgoaziou.fr> wrote= :

> Are you saying that queries are turned into regexp sear= ches within Org files? If so, I don't think they should.

Yes, because this is the fastest way to search for matching = entries in a buffer, when it's possible to use a regexp search.=C2=A0 <= /p>

> Queries should only operate on the output of the data e= xtraction, possibly a list of defstructs. I.e., you first extract all meani= ngful data from the document (during idle time, with cache, or whatever opt= imization would be chosen), store it in an appropriate format, then query i= t.
>
> WDYT?

That would be ideal. The problem I foresee is that, when a b= uffer's cache is not up-to-date, and the user runs an agenda query, the= user will have to wait for the buffer to be parsed and cached, which is mu= ch slower than a regexp search through the buffer.

That was what I first tried with org-agenda-ng: I parsed the= whole buffer with org-element and ran predicates against the element tree.= =C2=A0 It was much too slow to be practical, so I switched to the current a= pproach, which runs predicates against each node, only checking the necessa= ry metadata. It's fast enough to be useful, but can still be slow in so= me cases, and I don't think it would be fast enough as a replacement fo= r the current agenda code.=C2=A0 But with further optimization, like using = whole-buffer regexp searches when possible, it might be.

Another idea I've had, similar to yours, would be to pre= -process buffers, adding metadata as text-properties on heading lines. Howe= ver, I haven't tested it, and I don't know what the performance wou= ld be like. And it would still suffer from the caching problem I mentioned.=

I think the fundamental problems are 1) keeping the cache in= sync with the raw buffer, and 2) the slow speed of parsing an entire buffe= r's metadata at once (depending on the size of the files, of course, bu= t mine are big enough to be slow, and I'm sure many users have larger o= nes).

Of course, maybe someone cleverer than me can figure out a c= lever solution to these problems. :)

--000000000000b05a8c05789f8b2a--