emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* [PATCH] Merge loaded org-persist index with index file contents
@ 2022-12-11  7:59 Timothy
  2022-12-11 10:07 ` Max Nikulin
  2022-12-14 15:07 ` Timothy
  0 siblings, 2 replies; 4+ messages in thread
From: Timothy @ 2022-12-11  7:59 UTC (permalink / raw)
  To: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 1171 bytes --]

Hi All,

I’ve recently pushed a batch of improvements/fixes to org-persist, particularly
with the handling of url containers. Expect to actually see caching of remote
images now 🙂.

I’ve also noticed that the org-persist index is written when Emacs closes, and
so this sequence of event can happen:

1. Initial empty index file
2. Start Emacs session #1, add 20 entries to the index
3. Start Emacs session #2, add 1 entry to the index
4. Close Emacs session #1
   ⁃ The index file now has 20 entries in it
5. Close Emacs session #2
   ⁃ The index file now has 1 entry in it

I think ideally at the end we’d have 21 entries in the index. So, I’ve added a
simple merge function that checks to see if the index file has been updated
since loading, and if so just grabs new index entries from the file. We don’t
bother trying to merge modified entries.

How does this sound?

All the best,
Timothy

-- 
Timothy (‘tecosaur’/‘TEC’), Org mode contributor.
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/tec>.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-org-persist-Merge-index-with-index-file-content.patch --]
[-- Type: text/x-patch, Size: 4898 bytes --]

From 6a12fca24f1b89129424b8fc2902719f5e053832 Mon Sep 17 00:00:00 2001
From: TEC <git@tecosaur.net>
Date: Sat, 10 Dec 2022 23:53:44 +0800
Subject: [PATCH] org-persist: Merge index with index file content

* lisp/org-persist.el (org-persist-write, org-persist-load,
org-persist--index-age): Check if the index file has been externally
updated since loading, and if so try to perform basic merging of the
current index file contents and the loaded index before performing GC or
overwriting the index file.
---
 lisp/org-persist.el | 53 +++++++++++++++++++++++++++++++++++++++------
 1 file changed, 46 insertions(+), 7 deletions(-)

diff --git a/lisp/org-persist.el b/lisp/org-persist.el
index f215310a2..e9310d172 100644
--- a/lisp/org-persist.el
+++ b/lisp/org-persist.el
@@ -259,6 +259,9 @@ (defvar org-persist--index-hash nil
   "Hash table storing `org-persist--index'.  Used for quick access.
 They keys are conses of (container . associated).")
 
+(defvar org-persist--index-age nil
+  "The time at which the index was loaded, as given by `current-time'.")
+
 (defvar org-persist--report-time 0.5
   "Whether to report read/write time.
 
@@ -589,8 +592,9 @@ (defalias 'org-persist-load:file #'org-persist-read:file)
 (defun org-persist-load:index (container index-file _)
   "Load `org-persist--index' from INDEX-FILE according to CONTAINER."
   (unless org-persist--index
-    (setq org-persist--index (org-persist-read:index container index-file nil))
-    (setq org-persist--index-hash nil)
+    (setq org-persist--index (org-persist-read:index container index-file nil)
+          org-persist--index-hash nil
+          org-persist--index-age (current-time))
     (if org-persist--index
         (mapc (lambda (collection) (org-persist--add-to-index collection 'hash)) org-persist--index)
       (setq org-persist--index nil)
@@ -690,17 +694,49 @@ (defun org-persist-write:index (container _)
         (message "Missing write access rights to org-persist-directory: %S"
                  org-persist-directory))))
   (when (file-exists-p org-persist-directory)
-    (org-persist--write-elisp-file
-     (org-file-name-concat org-persist-directory org-persist-index-file)
-     org-persist--index
-     t t)
-    (org-file-name-concat org-persist-directory org-persist-index-file)))
+    (let ((index-file
+           (org-file-name-concat org-persist-directory org-persist-index-file)))
+      (org-persist--merge-index-with-disk)
+      (org-persist--write-elisp-file index-file org-persist--index t t)
+      (setq org-persist--index-age (current-time))
+      index-file)))
 
 (defun org-persist--save-index ()
   "Save `org-persist--index'."
   (org-persist-write:index
    `(index ,org-persist--storage-version) nil))
 
+(defun org-persist--merge-index-with-disk ()
+  "Merge `org-persist--index' with the current index file on disk."
+  (let* ((index-file
+          (org-file-name-concat org-persist-directory org-persist-index-file))
+         (disk-index
+          (and (file-exists-p index-file)
+               (org-file-newer-than-p index-file org-persist--index-age)
+               (org-persist-read:index `(index ,org-persist--storage-version) index-file nil)))
+         (combined-index
+          (org-persist--merge-index org-persist--index disk-index)))
+    (when disk-index
+      (setq org-persist--index combined-index
+            org-persist--index-age (current-time)))))
+
+(defun org-persist--merge-index (base other)
+  "Attempt to merge new index items in OTHER into BASE.
+Items with different details are considered too difficult, and skipped."
+  (if other
+      (let ((new (cl-set-difference other base :test #'equal))
+            (base-files (mapcar (lambda (s) (plist-get s :persist-file)) base))
+            (combined (reverse base)))
+        (dolist (item (nreverse new))
+          (unless (or (memq 'index (mapcar #'car (plist-get collection :container)))
+                      (not (file-exists-p
+                            (org-file-name-concat org-persist-directory
+                                                  (plist-get item :persist-file))))
+                      (member (plist-get item :persist-file) base-files))
+            (push item combined)))
+        (nreverse combined))
+    base))
+
 ;;;; Public API
 
 (cl-defun org-persist-register (container &optional associated &rest misc
@@ -951,6 +987,9 @@ (defun org-persist-associated-files:url (container collection)
 (defun org-persist-gc ()
   "Remove expired or unregistered containers and orphaned files.
 Also, remove containers associated with non-existing files."
+  (if org-persist--index
+      (org-persist--merge-index-with-disk)
+    (org-persist--load-index))
   (unless (and org-persist-disable-when-emacs-Q
                ;; FIXME: This is relying on undocumented fact that
                ;; Emacs sets `user-init-file' to nil when loaded with
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] Merge loaded org-persist index with index file contents
  2022-12-11  7:59 [PATCH] Merge loaded org-persist index with index file contents Timothy
@ 2022-12-11 10:07 ` Max Nikulin
  2022-12-11 11:26   ` Timothy
  2022-12-14 15:07 ` Timothy
  1 sibling, 1 reply; 4+ messages in thread
From: Max Nikulin @ 2022-12-11 10:07 UTC (permalink / raw)
  To: emacs-orgmode

On 11/12/2022 14:59, Timothy wrote:
> +      (setq org-persist--index combined-index
> +            org-persist--index-age (current-time)))))

Please, avoid mixing of system clock and filesystem timestamps.

     (file-attribute-modification-time (file-attributes file))

should be more reliable. See the `org-file-newer-than-p' docstring. In 
general, I would prefer to avoid relying on timestamps at all, but I am 
not sure if it is possible to implement in elisp with reasonable 
efforts. The idea is to save into file header a hash of content (or a 
random number). To check if file has not been modified, just header is 
read at first. If hash does not match the value stored in memory then it 
is necessary to read the whole file.

Another point that I am unsure is if Emacs ensures file locks. If one 
emacs process writes disk cache file then attempts to read the same file 
by other emacs instances must be postponed.

Cooperation in respect to disk cache would be an improvement, but it may 
be tricky to implement it reliably.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Merge loaded org-persist index with index file contents
  2022-12-11 10:07 ` Max Nikulin
@ 2022-12-11 11:26   ` Timothy
  0 siblings, 0 replies; 4+ messages in thread
From: Timothy @ 2022-12-11 11:26 UTC (permalink / raw)
  To: Max Nikulin; +Cc: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 1831 bytes --]

Hi Max,

> Please, avoid mixing of system clock and filesystem timestamps.
>
>     (file-attribute-modification-time (file-attributes file))
>
> should be more reliable.

Thanks for the tip, I’ve updated the patch to use this.

> In general, I would prefer to avoid relying on timestamps at all, but I am not
> sure if it is possible to implement in elisp with reasonable efforts. The idea
> is to save into file header a hash of content (or a random number). To check
> if file has not been modified, just header is read at first. If hash does not
> match the value stored in memory then it is necessary to read the whole file.

The current approach is certainly sub-optimal, but it’s also simple and unlikely
to cause any issues in practice.

> Another point that I am unsure is if Emacs ensures file locks. If one emacs
> process writes disk cache file then attempts to read the same file by other
> emacs instances must be postponed.

I’m not sure, but you’d need two Emacs instances to try reading/writing the
index file at near-/exactly/ the same time, which seems vanishingly unlikely.

> Cooperation in respect to disk cache would be an improvement, but it may be
> tricky to implement it reliably.

We could just write more often and run `org-persist--merge-index-with-disk' with
some heuristic during operation (if the file is unmodified since it was last
checked nothing will happen). However, I think that’s probably best left to
another conversation — let’s not allow perfect to be the enemy of good here 🙂.

All the best,
Timothy

-- 
Timothy (‘tecosaur’/‘TEC’), Org mode contributor.
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/tec>.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Merge loaded org-persist index with index file contents
  2022-12-11  7:59 [PATCH] Merge loaded org-persist index with index file contents Timothy
  2022-12-11 10:07 ` Max Nikulin
@ 2022-12-14 15:07 ` Timothy
  1 sibling, 0 replies; 4+ messages in thread
From: Timothy @ 2022-12-14 15:07 UTC (permalink / raw)
  To: Timothy; +Cc: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 449 bytes --]

Hi All,

Since there have been no further comments, I’ve just pushed a tweaked version of
this patch (with Max’s comments addressed and with a minor oversight fixed) as
555dacfa8.

All the best,
Timothy

-- 
Timothy (‘tecosaur’/‘TEC’), Org mode contributor.
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/tec>.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-12-14 15:21 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-11  7:59 [PATCH] Merge loaded org-persist index with index file contents Timothy
2022-12-11 10:07 ` Max Nikulin
2022-12-11 11:26   ` Timothy
2022-12-14 15:07 ` Timothy

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).