From: Matt Lundin <mdl@imapmail.org>
To: Org Mode <emacs-orgmode@gnu.org>
Subject: [PATCH] Re: Problems with org publish cache checking
Date: Wed, 25 Nov 2015 20:30:54 -0600 [thread overview]
Message-ID: <87lh9llcf5.fsf@fastmail.fm> (raw)
In-Reply-To: 87r3jfh1js.fsf@fastmail.fm
[-- Attachment #1: Type: text/plain, Size: 1859 bytes --]
Matt Lundin <mdl@imapmail.org> writes:
> I've been doing some testing of org-publish functions and have found a
> few problems with org-publish-cache-file-needs-publishing. They arise
> from the fact that it attempts to take included files into account.
OK, I've worked up a patch that solves several of these issues. The
basic idea is to check when publishing an org file whether it includes
other org files and then to store that data in the cache. That way,
org-publish-cache-file-needs-publishing does not need to open each
buffer but rather can compare the stored timestamp data against the
actual modified times of the included files.
> Org-publish does not check the cache of included files at all. It
> simply compares the last modified time of an included file with the
> last modified time of the master/including file. The result is that a
> master file will perpetually be republished if an included file
> happened to be changed afterwards (even if both files were changed
> years ago and the project has been published 100s of times since
> then).
This patch fixes this by caching timestamps for included files, thus
allowing org-publish to track changes in included files.
> 3. It is slow!!! The function visits every file in a project to check
> for #+INCLUDE declarations, thus offsetting much of the benefit of
> caching timestamps. To test this, I created a dummy project with over
> 1000 pages (not typical usage, of course, but possible for someone
> writing a blog over several years or creating a large interlinked
> wiki).
This patch should make things much faster, since we only need to scan
for included files during publishing (when the buffer is already
active). Org-publish no longer has to visit each file individually
during publishing (which takes a lot of time); rather, it can just use
the cache.
Matt
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Speed-up-publishing-by-caching-included-file-data.patch --]
[-- Type: text/x-diff, Size: 4167 bytes --]
From 7a69052334416309802c861a7b6b72865c331a37 Mon Sep 17 00:00:00 2001
From: Matt Lundin <mdl@imapmail.org>
Date: Wed, 25 Nov 2015 20:23:39 -0600
Subject: [PATCH] Speed up publishing by caching included file data
* lisp/ox-publish.el: (org-publish-cache-get-included-files): New function
(org-publish-org-to): Use new function
(org-publish-cache-file-needs-publishing): Use cache instead of
visiting every file in a project.
Org-publish can now quickly determine a) whether an org source includes
other files and b) whether those files have changed. This speeds up the
publishing process and makes tracking of changes in included files more
reliable.
---
lisp/ox-publish.el | 62 ++++++++++++++++++++++++++++++------------------------
1 file changed, 34 insertions(+), 28 deletions(-)
diff --git a/lisp/ox-publish.el b/lisp/ox-publish.el
index 90f307c..ba85c7e 100644
--- a/lisp/ox-publish.el
+++ b/lisp/ox-publish.el
@@ -574,6 +574,7 @@ Return output file name."
(let ((output-file
(org-export-output-file-name extension nil pub-dir))
(body-p (plist-get plist :body-only)))
+ (when org-publish-cache (org-publish-cache-get-included-files))
(org-export-to-file backend output-file
nil nil nil body-p
;; Add `org-publish--collect-references' and
@@ -1227,36 +1228,41 @@ the file including them will be republished as well."
(unless org-publish-cache
(error
"`org-publish-cache-file-needs-publishing' called, but no cache present"))
- (let* ((case-fold-search t)
- (key (org-publish-timestamp-filename filename pub-dir pub-func))
+ (let* ((key (org-publish-timestamp-filename filename pub-dir pub-func))
(pstamp (org-publish-cache-get key))
- (org-inhibit-startup t)
- (visiting (find-buffer-visiting filename))
- included-files-ctime buf)
- (when (equal (file-name-extension filename) "org")
- (setq buf (find-file (expand-file-name filename)))
- (with-current-buffer buf
- (goto-char (point-min))
- (while (re-search-forward "^[ \t]*#\\+INCLUDE:" nil t)
- (let* ((element (org-element-at-point))
- (included-file
- (and (eq (org-element-type element) 'keyword)
- (let ((value (org-element-property :value element)))
- (and value
- (string-match "^\\(\".+?\"\\|\\S-+\\)" value)
- ;; Ignore search suffix.
- (car (split-string
- (org-remove-double-quotes
- (match-string 1 value)))))))))
- (when included-file
- (push (org-publish-cache-ctime-of-src
- (expand-file-name included-file))
- included-files-ctime)))))
- (unless visiting (kill-buffer buf)))
+ (ctime (when pstamp (org-publish-cache-ctime-of-src filename))))
(or (null pstamp)
- (let ((ctime (org-publish-cache-ctime-of-src filename)))
- (or (< pstamp ctime)
- (cl-some (lambda (ct) (< ctime ct)) included-files-ctime))))))
+ (< pstamp ctime)
+ (cl-some (lambda (incl)
+ ;; See if cached time is before modification time.
+ (< (cdr incl)
+ (org-publish-cache-ctime-of-src (car incl))))
+ (org-publish-cache-get-file-property filename :includes)))))
+
+(defun org-publish-cache-get-included-files ()
+ "Get names and last modified times of included files in current buffer."
+ (let ((case-fold-search t)
+ included)
+ (save-excursion
+ (goto-char (point-min))
+ (while (re-search-forward "^[ \t]*#\\+INCLUDE:" nil t)
+ (let* ((element (org-element-at-point))
+ (included-file
+ (and (eq (org-element-type element) 'keyword)
+ (let ((value (org-element-property :value element)))
+ (and value
+ (string-match "^\\(\".+?\"\\|\\S-+\\)" value)
+ ;; Ignore search suffix.
+ (car (split-string
+ (org-remove-double-quotes
+ (match-string 1 value)))))))))
+ (when included-file
+ (let ((iname (expand-file-name included-file)))
+ (push (cons iname (org-publish-cache-ctime-of-src
+ (expand-file-name iname)))
+ included))))))
+ (org-publish-cache-set-file-property (buffer-file-name)
+ :includes included)))
(defun org-publish-cache-set-file-property
(filename property value &optional project-name)
--
2.6.2
next prev parent reply other threads:[~2015-11-26 2:31 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-24 15:14 Problems with org publish cache checking Matt Lundin
2015-11-25 16:56 ` Matt Lundin
2015-11-26 2:30 ` Matt Lundin [this message]
2015-11-26 8:25 ` [PATCH] " Nicolas Goaziou
2015-11-27 1:30 ` Matt Lundin
2015-11-29 16:18 ` Nicolas Goaziou
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.orgmode.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87lh9llcf5.fsf@fastmail.fm \
--to=mdl@imapmail.org \
--cc=emacs-orgmode@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).