emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* [PATCH] Fix and optimize publishing cache check
@ 2014-08-12 13:09 Matt Lundin
  2014-08-13  4:32 ` Matt Lundin
  0 siblings, 1 reply; 3+ messages in thread
From: Matt Lundin @ 2014-08-12 13:09 UTC (permalink / raw)
  To: Org Mode

[-- Attachment #1: Type: text/plain, Size: 1460 bytes --]

This patch does two things:

1. It prevents org-publish from perpetually republishing files
containing includes. Currently, if an included file changes, it is
published, as is the file that includes it. However, until changes are
made in the file that includes it (which may be never) its modification
time remain older than that of the included file, so it will be
republished every time org-publish is called, even if neither file has
changed. 

   - Note: This patch fixes this behavior by updating the modification
     time of the file that contains the includes. If this is deemed too
     destructive/meddlesome, we could consider alternate behaviors, such
     as setting the cached time with a simple float-time. Let me know
     what would be best.

2. It optimizes checking for included files. Currently,
org-publish-cache-file-needs-publishing visits every file in a project
to check for includes. On my underpowered box, this takes a long time
(over a minute) on a project with 1000+ files, thus defeating the
purpose of the cache. This patch causes org-publish to store information
about included files in the cache itself:

  a. If a file has changed it updates information about the files it
  includes and stores it in the cache.

  b. If a file has not changed, it checks the cache for included files
  and sees if any of those files have been updated (thus preventing the
  need to check every file in the project for includes)

Best,
Matt


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Fix-and-optimize-publish-cache-check.patch --]
[-- Type: text/x-diff, Size: 4097 bytes --]

From 94a8061bb30e1992213fb8e71ee949d336d37435 Mon Sep 17 00:00:00 2001
From: Matt Lundin <mdl@imapmail.org>
Date: Tue, 12 Aug 2014 07:51:44 -0500
Subject: [PATCH] Fix and optimize publish cache check

* lisp/ox-publish.el: (org-publish-cache-file-needs-publishing) Fix
  org-publish-cache-file-needs-publishing to change timestamp of files
  containing includes. Speed up check for includes by storing
  information about included files in cache itself.

This patch ensures that org-publish-cache-file-needs-publishing does
not keep publishing a file containing includes by updating the
modification time of that file. It also speeds up publishing by
caching information about included files, thus keeping
org-publish-cache-file-needs-publishing from having to visit every
file just to check includes (which can take a long time on a project
containing hundreds or thousands of files).
---
 lisp/ox-publish.el | 60 +++++++++++++++++++++++++++++++-----------------------
 1 file changed, 35 insertions(+), 25 deletions(-)

diff --git a/lisp/ox-publish.el b/lisp/ox-publish.el
index df40572..228411f 100644
--- a/lisp/ox-publish.el
+++ b/lisp/ox-publish.el
@@ -1168,32 +1168,42 @@ the file including them will be republished as well."
 	 (pstamp (org-publish-cache-get key))
 	 (org-inhibit-startup t)
 	 (visiting (find-buffer-visiting filename))
-	 included-files-ctime buf)
-    (when (equal (file-name-extension filename) "org")
-      (setq buf (find-file (expand-file-name filename)))
+	 (ctime (org-publish-cache-ctime-of-src filename))
+	 (needsp (or (null pstamp) (< pstamp ctime)))
+	 includes buf)
+    ;; if the file needs publishing, refresh the included-files cache property
+    (when (and needsp
+	       (equal (file-name-extension filename) "org"))
+      (setq buf (find-file-noselect (expand-file-name filename)))
       (with-current-buffer buf
-	(goto-char (point-min))
-	(while (re-search-forward "^[ \t]*#\\+INCLUDE:" nil t)
-	  (let* ((element (org-element-at-point))
-		 (included-file
-		  (and (eq (org-element-type element) 'keyword)
-		       (let ((value (org-element-property :value element)))
-			 (and value
-			      (string-match "^\\(\".+?\"\\|\\S-+\\)" value)
-			      (org-remove-double-quotes
-			       (match-string 1 value)))))))
-	    (when included-file
-	      (add-to-list 'included-files-ctime
-			   (org-publish-cache-ctime-of-src
-			    (expand-file-name included-file))
-			   t)))))
-      (unless visiting (kill-buffer buf)))
-    (if (null pstamp) t
-      (let ((ctime (org-publish-cache-ctime-of-src filename)))
-	(or (< pstamp ctime)
-	    (when included-files-ctime
-	      (not (null (delq nil (mapcar (lambda (ct) (< ctime ct))
-					   included-files-ctime))))))))))
+	(save-excursion
+	  (goto-char (point-min))
+	  (while (re-search-forward "^[ \t]*#\\+INCLUDE:" nil t)
+	    (let* ((element (org-element-at-point))
+		   (included-file
+		    (and (eq (org-element-type element) 'keyword)
+			 (let ((value (org-element-property :value element)))
+			   (and value
+				(string-match "^\\(\".+?\"\\|\\S-+\\)" value)
+				(org-remove-double-quotes
+				 (match-string 1 value)))))))
+	      (when included-file (add-to-list 'includes (expand-file-name included-file)))))))
+      (unless visiting (kill-buffer buf))
+      (when includes (org-publish-cache-set-file-property filename :includes includes)))
+    ;; return t if needsp or if included files have changed
+    (or needsp
+	(when (delq nil
+		    (mapcar (lambda (file)
+			      (let ((ct (org-publish-cache-ctime-of-src file)))
+				(and (file-exists-p file)
+				     (< ctime ct))))
+			    (org-publish-cache-get-file-property filename :includes)))
+	  ;; update the timestamp of the published file, but only if buffer is not modified
+	  (setq buf (find-file-noselect (expand-file-name filename)))
+	  (with-current-buffer buf
+	    (when (not (buffer-modified-p)) (set-buffer-modified-p t) (save-buffer)))
+	  (unless visiting (kill-buffer buf))
+	  t))))
 
 (defun org-publish-cache-set-file-property
   (filename property value &optional project-name)
-- 
2.0.4


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] Fix and optimize publishing cache check
  2014-08-12 13:09 [PATCH] Fix and optimize publishing cache check Matt Lundin
@ 2014-08-13  4:32 ` Matt Lundin
  2014-08-14  9:13   ` Nicolas Goaziou
  0 siblings, 1 reply; 3+ messages in thread
From: Matt Lundin @ 2014-08-13  4:32 UTC (permalink / raw)
  To: Org Mode

[-- Attachment #1: Type: text/plain, Size: 375 bytes --]

Here is an improved version of the previous patch (please apply it
rather than the previous). This version further optimizes cache checking
by only calling find-buffer-visiting if necessary. (It also fixes some
long lines.) On a test project containing 5000 files, running
org-publish on the project with the cache enabled now takes seconds
rather than minutes.

Best,
Matt


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Fix-and-optimize-publish-cache-check.patch --]
[-- Type: text/x-diff, Size: 4304 bytes --]

From ea7203b4d988967f0a70bd45ad7502a961a28aee Mon Sep 17 00:00:00 2001
From: Matt Lundin <mdl@imapmail.org>
Date: Tue, 12 Aug 2014 23:25:23 -0500
Subject: [PATCH] Fix and optimize publish cache check

* lisp/ox-publish.el: (org-publish-cache-file-needs-publishing) Fix
  org-publish-cache-file-needs-publishing to change timestamp of files
  containing includes. Speed up check for includes by storing
  information about included files in cache itself.

This patch ensures that org-publish-cache-file-needs-publishing does
not keep publishing a file containing includes by updating the
modification time of that file. It also speeds up publishing by
caching information about included files, thus keeping
org-publish-cache-file-needs-publishing from having to visit every
file just to check includes (which can take a long time on a project
containing hundreds or thousands of files).
---
 lisp/ox-publish.el | 68 ++++++++++++++++++++++++++++++++----------------------
 1 file changed, 41 insertions(+), 27 deletions(-)

diff --git a/lisp/ox-publish.el b/lisp/ox-publish.el
index df40572..e848a0c 100644
--- a/lisp/ox-publish.el
+++ b/lisp/ox-publish.el
@@ -1167,33 +1167,47 @@ the file including them will be republished as well."
 	 (key (org-publish-timestamp-filename filename pub-dir pub-func))
 	 (pstamp (org-publish-cache-get key))
 	 (org-inhibit-startup t)
-	 (visiting (find-buffer-visiting filename))
-	 included-files-ctime buf)
-    (when (equal (file-name-extension filename) "org")
-      (setq buf (find-file (expand-file-name filename)))
-      (with-current-buffer buf
-	(goto-char (point-min))
-	(while (re-search-forward "^[ \t]*#\\+INCLUDE:" nil t)
-	  (let* ((element (org-element-at-point))
-		 (included-file
-		  (and (eq (org-element-type element) 'keyword)
-		       (let ((value (org-element-property :value element)))
-			 (and value
-			      (string-match "^\\(\".+?\"\\|\\S-+\\)" value)
-			      (org-remove-double-quotes
-			       (match-string 1 value)))))))
-	    (when included-file
-	      (add-to-list 'included-files-ctime
-			   (org-publish-cache-ctime-of-src
-			    (expand-file-name included-file))
-			   t)))))
-      (unless visiting (kill-buffer buf)))
-    (if (null pstamp) t
-      (let ((ctime (org-publish-cache-ctime-of-src filename)))
-	(or (< pstamp ctime)
-	    (when included-files-ctime
-	      (not (null (delq nil (mapcar (lambda (ct) (< ctime ct))
-					   included-files-ctime))))))))))
+	 (ctime (org-publish-cache-ctime-of-src filename))
+	 (needsp (or (null pstamp) (< pstamp ctime)))
+	 includes)
+    ;; if the file needs publishing, refresh the included-files cache property
+    (when (and needsp
+	       (equal (file-name-extension filename) "org"))
+      (let ((visiting (find-buffer-visiting filename))
+	    (buf (find-file-noselect (expand-file-name filename))))
+	(with-current-buffer buf
+	  (save-excursion
+	    (goto-char (point-min))
+	    (while (re-search-forward "^[ \t]*#\\+INCLUDE:" nil t)
+	      (let* ((element (org-element-at-point))
+		     (included-file
+		      (and (eq (org-element-type element) 'keyword)
+			   (let ((value (org-element-property :value element)))
+			     (and value
+				  (string-match "^\\(\".+?\"\\|\\S-+\\)" value)
+				  (org-remove-double-quotes
+				   (match-string 1 value)))))))
+		(when included-file
+		  (add-to-list 'includes (expand-file-name included-file)))))))
+	(unless visiting (kill-buffer buf))
+	(when includes
+	  (org-publish-cache-set-file-property filename :includes includes))))
+    ;; return t if needsp or if included files have changed
+    (or needsp
+	(when (delq nil
+		    (mapcar (lambda (file)
+			      (let ((ct (org-publish-cache-ctime-of-src file)))
+				(and (file-exists-p file)
+				     (< ctime ct))))
+			    (org-publish-cache-get-file-property filename :includes)))
+	  ;; update the timestamp of the published file if buffer is not modified
+	  (let ((visiting (find-buffer-visiting filename))
+		(buf (find-file-noselect (expand-file-name filename))))
+	    (with-current-buffer buf
+	      (when (not (buffer-modified-p))
+		(set-buffer-modified-p t) (save-buffer)))
+	    (unless visiting (kill-buffer buf)))
+	  t))))
 
 (defun org-publish-cache-set-file-property
   (filename property value &optional project-name)
-- 
2.0.4


[-- Attachment #3: Type: text/plain, Size: 5830 bytes --]



Matt Lundin <mdl@imapmail.org> writes:

> This patch does two things:
>
> 1. It prevents org-publish from perpetually republishing files
> containing includes. Currently, if an included file changes, it is
> published, as is the file that includes it. However, until changes are
> made in the file that includes it (which may be never) its modification
> time remain older than that of the included file, so it will be
> republished every time org-publish is called, even if neither file has
> changed. 
>
>    - Note: This patch fixes this behavior by updating the modification
>      time of the file that contains the includes. If this is deemed too
>      destructive/meddlesome, we could consider alternate behaviors, such
>      as setting the cached time with a simple float-time. Let me know
>      what would be best.
>
> 2. It optimizes checking for included files. Currently,
> org-publish-cache-file-needs-publishing visits every file in a project
> to check for includes. On my underpowered box, this takes a long time
> (over a minute) on a project with 1000+ files, thus defeating the
> purpose of the cache. This patch causes org-publish to store information
> about included files in the cache itself:
>
>   a. If a file has changed it updates information about the files it
>   includes and stores it in the cache.
>
>   b. If a file has not changed, it checks the cache for included files
>   and sees if any of those files have been updated (thus preventing the
>   need to check every file in the project for includes)
>
> Best,
> Matt
>
> From 94a8061bb30e1992213fb8e71ee949d336d37435 Mon Sep 17 00:00:00 2001
> From: Matt Lundin <mdl@imapmail.org>
> Date: Tue, 12 Aug 2014 07:51:44 -0500
> Subject: [PATCH] Fix and optimize publish cache check
>
> * lisp/ox-publish.el: (org-publish-cache-file-needs-publishing) Fix
>   org-publish-cache-file-needs-publishing to change timestamp of files
>   containing includes. Speed up check for includes by storing
>   information about included files in cache itself.
>
> This patch ensures that org-publish-cache-file-needs-publishing does
> not keep publishing a file containing includes by updating the
> modification time of that file. It also speeds up publishing by
> caching information about included files, thus keeping
> org-publish-cache-file-needs-publishing from having to visit every
> file just to check includes (which can take a long time on a project
> containing hundreds or thousands of files).
> ---
>  lisp/ox-publish.el | 60 +++++++++++++++++++++++++++++++-----------------------
>  1 file changed, 35 insertions(+), 25 deletions(-)
>
> diff --git a/lisp/ox-publish.el b/lisp/ox-publish.el
> index df40572..228411f 100644
> --- a/lisp/ox-publish.el
> +++ b/lisp/ox-publish.el
> @@ -1168,32 +1168,42 @@ the file including them will be republished as well."
>  	 (pstamp (org-publish-cache-get key))
>  	 (org-inhibit-startup t)
>  	 (visiting (find-buffer-visiting filename))
> -	 included-files-ctime buf)
> -    (when (equal (file-name-extension filename) "org")
> -      (setq buf (find-file (expand-file-name filename)))
> +	 (ctime (org-publish-cache-ctime-of-src filename))
> +	 (needsp (or (null pstamp) (< pstamp ctime)))
> +	 includes buf)
> +    ;; if the file needs publishing, refresh the included-files cache property
> +    (when (and needsp
> +	       (equal (file-name-extension filename) "org"))
> +      (setq buf (find-file-noselect (expand-file-name filename)))
>        (with-current-buffer buf
> -	(goto-char (point-min))
> -	(while (re-search-forward "^[ \t]*#\\+INCLUDE:" nil t)
> -	  (let* ((element (org-element-at-point))
> -		 (included-file
> -		  (and (eq (org-element-type element) 'keyword)
> -		       (let ((value (org-element-property :value element)))
> -			 (and value
> -			      (string-match "^\\(\".+?\"\\|\\S-+\\)" value)
> -			      (org-remove-double-quotes
> -			       (match-string 1 value)))))))
> -	    (when included-file
> -	      (add-to-list 'included-files-ctime
> -			   (org-publish-cache-ctime-of-src
> -			    (expand-file-name included-file))
> -			   t)))))
> -      (unless visiting (kill-buffer buf)))
> -    (if (null pstamp) t
> -      (let ((ctime (org-publish-cache-ctime-of-src filename)))
> -	(or (< pstamp ctime)
> -	    (when included-files-ctime
> -	      (not (null (delq nil (mapcar (lambda (ct) (< ctime ct))
> -					   included-files-ctime))))))))))
> +	(save-excursion
> +	  (goto-char (point-min))
> +	  (while (re-search-forward "^[ \t]*#\\+INCLUDE:" nil t)
> +	    (let* ((element (org-element-at-point))
> +		   (included-file
> +		    (and (eq (org-element-type element) 'keyword)
> +			 (let ((value (org-element-property :value element)))
> +			   (and value
> +				(string-match "^\\(\".+?\"\\|\\S-+\\)" value)
> +				(org-remove-double-quotes
> +				 (match-string 1 value)))))))
> +	      (when included-file (add-to-list 'includes (expand-file-name included-file)))))))
> +      (unless visiting (kill-buffer buf))
> +      (when includes (org-publish-cache-set-file-property filename :includes includes)))
> +    ;; return t if needsp or if included files have changed
> +    (or needsp
> +	(when (delq nil
> +		    (mapcar (lambda (file)
> +			      (let ((ct (org-publish-cache-ctime-of-src file)))
> +				(and (file-exists-p file)
> +				     (< ctime ct))))
> +			    (org-publish-cache-get-file-property filename :includes)))
> +	  ;; update the timestamp of the published file, but only if buffer is not modified
> +	  (setq buf (find-file-noselect (expand-file-name filename)))
> +	  (with-current-buffer buf
> +	    (when (not (buffer-modified-p)) (set-buffer-modified-p t) (save-buffer)))
> +	  (unless visiting (kill-buffer buf))
> +	  t))))
>  
>  (defun org-publish-cache-set-file-property
>    (filename property value &optional project-name)

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] Fix and optimize publishing cache check
  2014-08-13  4:32 ` Matt Lundin
@ 2014-08-14  9:13   ` Nicolas Goaziou
  0 siblings, 0 replies; 3+ messages in thread
From: Nicolas Goaziou @ 2014-08-14  9:13 UTC (permalink / raw)
  To: Matt Lundin; +Cc: Org Mode

Hello,

Matt Lundin <mdl@imapmail.org> writes:

> Here is an improved version of the previous patch (please apply it
> rather than the previous).

Thank you for the patch.

> This version further optimizes cache checking by only calling
> find-buffer-visiting if necessary. (It also fixes some long lines.) On
> a test project containing 5000 files, running org-publish on the
> project with the cache enabled now takes seconds rather than minutes.

Great.

> +	 (ctime (org-publish-cache-ctime-of-src filename))
> +	 (needsp (or (null pstamp) (< pstamp ctime)))
> +	 includes)
> +    ;; if the file needs publishing, refresh the included-files cache property

Comments need to start with a capital and end with a period.

> +    (when (and needsp
> +	       (equal (file-name-extension filename) "org"))
> +      (let ((visiting (find-buffer-visiting filename))
> +	    (buf (find-file-noselect (expand-file-name filename))))
> +	(with-current-buffer buf
> +	  (save-excursion
> +	    (goto-char (point-min))
> +	    (while (re-search-forward "^[ \t]*#\\+INCLUDE:" nil t)
> +	      (let* ((element (org-element-at-point))
> +		     (included-file
> +		      (and (eq (org-element-type element) 'keyword)
> +			   (let ((value (org-element-property :value element)))
> +			     (and value
> +				  (string-match "^\\(\".+?\"\\|\\S-+\\)" value)
> +				  (org-remove-double-quotes
> +				   (match-string 1 value)))))))
> +		(when included-file
> +		  (add-to-list 'includes (expand-file-name included-file)))))))
> +	(unless visiting (kill-buffer buf))

Do not kill buffer yet, since you may re-open it later.

> +	(when includes
> +	  (org-publish-cache-set-file-property filename :includes includes))))
> +    ;; return t if needsp or if included files have changed

Ditto.

> +    (or needsp
> +	(when (delq nil
> +		    (mapcar (lambda (file)
> +			      (let ((ct (org-publish-cache-ctime-of-src file)))
> +				(and (file-exists-p file)
> +				     (< ctime ct))))
> +			    (org-publish-cache-get-file-property filename :includes)))

  (when (delq nil (mapcar PREDICATE LIST)))

=> 

  (org-some PREDICATE LIST)

> +	  ;; update the timestamp of the published file if buffer is not modified

Ditto.

> +	  (let ((visiting (find-buffer-visiting filename))
> +		(buf (find-file-noselect (expand-file-name filename))))
> +	    (with-current-buffer buf

Not needed if you didn't kill buffer yet.

> +	      (when (not (buffer-modified-p))

  (unless ...)

> +		(set-buffer-modified-p t) (save-buffer)))

Hmm.  Would `set-file-times' do the job?

> +	    (unless visiting (kill-buffer buf)))
> +	  t))))

You can now kill-buffer, but for all cases.


Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-08-14  9:13 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-12 13:09 [PATCH] Fix and optimize publishing cache check Matt Lundin
2014-08-13  4:32 ` Matt Lundin
2014-08-14  9:13   ` Nicolas Goaziou

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).