From mboxrd@z Thu Jan 1 00:00:00 1970 From: Matt Lundin Subject: [PATCH] Re: Problems with org publish cache checking Date: Wed, 25 Nov 2015 20:30:54 -0600 Message-ID: <87lh9llcf5.fsf@fastmail.fm> References: <87r3jfh1js.fsf@fastmail.fm> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:51602) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a1mKr-0002rC-H0 for emacs-orgmode@gnu.org; Wed, 25 Nov 2015 21:31:02 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a1mKm-0001wz-3K for emacs-orgmode@gnu.org; Wed, 25 Nov 2015 21:31:01 -0500 Received: from out2-smtp.messagingengine.com ([66.111.4.26]:34770) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a1mKl-0001wt-Rw for emacs-orgmode@gnu.org; Wed, 25 Nov 2015 21:30:56 -0500 Received: from archthink (c-50-172-132-15.hsd1.il.comcast.net [50.172.132.15]) by mail.messagingengine.com (Postfix) with ESMTPA id 170BF68009C for ; Wed, 25 Nov 2015 21:30:55 -0500 (EST) List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Org Mode --=-=-= Content-Type: text/plain Matt Lundin writes: > I've been doing some testing of org-publish functions and have found a > few problems with org-publish-cache-file-needs-publishing. They arise > from the fact that it attempts to take included files into account. OK, I've worked up a patch that solves several of these issues. The basic idea is to check when publishing an org file whether it includes other org files and then to store that data in the cache. That way, org-publish-cache-file-needs-publishing does not need to open each buffer but rather can compare the stored timestamp data against the actual modified times of the included files. > Org-publish does not check the cache of included files at all. It > simply compares the last modified time of an included file with the > last modified time of the master/including file. The result is that a > master file will perpetually be republished if an included file > happened to be changed afterwards (even if both files were changed > years ago and the project has been published 100s of times since > then). This patch fixes this by caching timestamps for included files, thus allowing org-publish to track changes in included files. > 3. It is slow!!! The function visits every file in a project to check > for #+INCLUDE declarations, thus offsetting much of the benefit of > caching timestamps. To test this, I created a dummy project with over > 1000 pages (not typical usage, of course, but possible for someone > writing a blog over several years or creating a large interlinked > wiki). This patch should make things much faster, since we only need to scan for included files during publishing (when the buffer is already active). Org-publish no longer has to visit each file individually during publishing (which takes a lot of time); rather, it can just use the cache. Matt --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename=0001-Speed-up-publishing-by-caching-included-file-data.patch >From 7a69052334416309802c861a7b6b72865c331a37 Mon Sep 17 00:00:00 2001 From: Matt Lundin Date: Wed, 25 Nov 2015 20:23:39 -0600 Subject: [PATCH] Speed up publishing by caching included file data * lisp/ox-publish.el: (org-publish-cache-get-included-files): New function (org-publish-org-to): Use new function (org-publish-cache-file-needs-publishing): Use cache instead of visiting every file in a project. Org-publish can now quickly determine a) whether an org source includes other files and b) whether those files have changed. This speeds up the publishing process and makes tracking of changes in included files more reliable. --- lisp/ox-publish.el | 62 ++++++++++++++++++++++++++++++------------------------ 1 file changed, 34 insertions(+), 28 deletions(-) diff --git a/lisp/ox-publish.el b/lisp/ox-publish.el index 90f307c..ba85c7e 100644 --- a/lisp/ox-publish.el +++ b/lisp/ox-publish.el @@ -574,6 +574,7 @@ Return output file name." (let ((output-file (org-export-output-file-name extension nil pub-dir)) (body-p (plist-get plist :body-only))) + (when org-publish-cache (org-publish-cache-get-included-files)) (org-export-to-file backend output-file nil nil nil body-p ;; Add `org-publish--collect-references' and @@ -1227,36 +1228,41 @@ the file including them will be republished as well." (unless org-publish-cache (error "`org-publish-cache-file-needs-publishing' called, but no cache present")) - (let* ((case-fold-search t) - (key (org-publish-timestamp-filename filename pub-dir pub-func)) + (let* ((key (org-publish-timestamp-filename filename pub-dir pub-func)) (pstamp (org-publish-cache-get key)) - (org-inhibit-startup t) - (visiting (find-buffer-visiting filename)) - included-files-ctime buf) - (when (equal (file-name-extension filename) "org") - (setq buf (find-file (expand-file-name filename))) - (with-current-buffer buf - (goto-char (point-min)) - (while (re-search-forward "^[ \t]*#\\+INCLUDE:" nil t) - (let* ((element (org-element-at-point)) - (included-file - (and (eq (org-element-type element) 'keyword) - (let ((value (org-element-property :value element))) - (and value - (string-match "^\\(\".+?\"\\|\\S-+\\)" value) - ;; Ignore search suffix. - (car (split-string - (org-remove-double-quotes - (match-string 1 value))))))))) - (when included-file - (push (org-publish-cache-ctime-of-src - (expand-file-name included-file)) - included-files-ctime))))) - (unless visiting (kill-buffer buf))) + (ctime (when pstamp (org-publish-cache-ctime-of-src filename)))) (or (null pstamp) - (let ((ctime (org-publish-cache-ctime-of-src filename))) - (or (< pstamp ctime) - (cl-some (lambda (ct) (< ctime ct)) included-files-ctime)))))) + (< pstamp ctime) + (cl-some (lambda (incl) + ;; See if cached time is before modification time. + (< (cdr incl) + (org-publish-cache-ctime-of-src (car incl)))) + (org-publish-cache-get-file-property filename :includes))))) + +(defun org-publish-cache-get-included-files () + "Get names and last modified times of included files in current buffer." + (let ((case-fold-search t) + included) + (save-excursion + (goto-char (point-min)) + (while (re-search-forward "^[ \t]*#\\+INCLUDE:" nil t) + (let* ((element (org-element-at-point)) + (included-file + (and (eq (org-element-type element) 'keyword) + (let ((value (org-element-property :value element))) + (and value + (string-match "^\\(\".+?\"\\|\\S-+\\)" value) + ;; Ignore search suffix. + (car (split-string + (org-remove-double-quotes + (match-string 1 value))))))))) + (when included-file + (let ((iname (expand-file-name included-file))) + (push (cons iname (org-publish-cache-ctime-of-src + (expand-file-name iname))) + included)))))) + (org-publish-cache-set-file-property (buffer-file-name) + :includes included))) (defun org-publish-cache-set-file-property (filename property value &optional project-name) -- 2.6.2 --=-=-=--