From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nicolas Goaziou Subject: Re: Exporting large documents Date: Mon, 06 May 2013 21:17:50 +0200 Message-ID: <87haifex41.fsf@gmail.com> References: <877gjnojsq.fsf@Rainer.invalid> <5654CA29-5F6D-4E8B-8B8B-C3609D76D189@gmail.com> <8761z5gw6w.fsf@gmx.li> <707EAAA5-D27C-47B7-9A1E-874C3A375BD9@gmail.com> <87zjwcwc4b.fsf@gmx.li> <877gjfgnl9.fsf@gmail.com> <0F877AB5-D488-4223-B0E7-F11B4B973614@gmail.com> <87ip2xfd0x.fsf@gmail.com> <51878F09.1050904@ed.ac.uk> <87vc6wuf0t.fsf@Rainer.invalid> Mime-Version: 1.0 Content-Type: text/plain Return-path: Received: from eggs.gnu.org ([208.118.235.92]:49019) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UZQv1-0005Pp-D5 for emacs-orgmode@gnu.org; Mon, 06 May 2013 15:17:53 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UZQuz-0001fZ-QZ for emacs-orgmode@gnu.org; Mon, 06 May 2013 15:17:51 -0400 Received: from mail-wg0-x22f.google.com ([2a00:1450:400c:c00::22f]:52479) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UZQuz-0001fU-Ju for emacs-orgmode@gnu.org; Mon, 06 May 2013 15:17:49 -0400 Received: by mail-wg0-f47.google.com with SMTP id e11so3915780wgh.14 for ; Mon, 06 May 2013 12:17:48 -0700 (PDT) In-Reply-To: <87vc6wuf0t.fsf@Rainer.invalid> (Achim Gratz's message of "Mon, 06 May 2013 20:41:54 +0200") List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Achim Gratz Cc: emacs-orgmode@gnu.org Hello, Achim Gratz writes: > Lawrence Mitchell writes: >> org-element--current-element takes (on my machine) 0.0003 seconds per >> call. However, when exporting 128x the orgmanual introduction, it's >> called around 250000 times giving ~ 80 seconds total time (out of ~200 >> total). > > I've traced this a bit and the question does warrant further > investigation. Exporting the introduction without any duplications > already shows some interesting things: the property drawer for the > introduction is scanned a whopping 137 times, followed by 134 times the > cindex entry following it, followed by 125 times the "Summary" headline. > The header options feature prominently with around 100 scans each as > well. > > The rest of the calls have mostly just a single invocation, but there > are some instances where parts of the tree are traversed multiple times > in succession to apparently adjust the :end property to the leaf element > in small increments or decrements. If elements are mutable during > parsing then caching is more difficult as well, obviously. > >> So it sort of feels like actually what is needed is microoptimisations >> of the bits of the export engine that are called the most. > > Looking at the traces I'd think if we could eliminate the repeated > backtracking to adjust the leafs or at least skip over those elements in > a backtrack that are already fully parsed instead of parsing them again, > that would be a good start. Actually this is a bit different. Parsing doesn't backtrack. Look at `org-element-parse-buffer' through elp to see that elements are parsed only once. The problem comes from `org-element-at-point'. To be effective, it needs to move back to the current headline, and start parsing buffer again from there. That means the first element after the headline (often a property drawer) will be parsed each time we need information within the section. A very good improvement for the exporter and, more importantly, for the parser, would be to cache results from `org-element--current-element'. Though, this cache would also need to be refreshed after each buffer modification. This is the tricky part. One solution would be to use `after-change-functions' and `before-change-functions' to store intervals of modified areas in the buffer. Then, during idle time, a `maphash' could update boundaries of cached values or remove them completely, according to the intervals. Regards, -- Nicolas Goaziou