emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* tangle option to not write a file with same contents?
@ 2021-10-28  4:04 Greg Minshall
  2021-10-29 16:21 ` Max Nikulin
  2022-05-07  8:05 ` Max Nikulin
  0 siblings, 2 replies; 13+ messages in thread
From: Greg Minshall @ 2021-10-28  4:04 UTC (permalink / raw)
  To: Org Mode

hi.

i wonder if it would be reasonable to add an option such that, when
tangling, `org-babel-tangle` would not write a file with the
already-existing contents of the target file?

this would be helpful, e.g., for those of us who use make(1)-based work
flows.

then, if this might generally be thought useful, i wonder if this should
be implemented as specifically this, or whether we might implement a
callback at the appropriate point in `org-babel-tangle` asking whether
or not to proceed.  (then, the user's callback routine could do the
comparison.)

if we do "specifically this", i would suggest that this comparison be
dead simple: read in the existing file's contents into some hidden
buffer, and use `compare-buffer-substrings` to compare point-{min,max}
of both.

cheers, Greg


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: tangle option to not write a file with same contents?
  2021-10-28  4:04 tangle option to not write a file with same contents? Greg Minshall
@ 2021-10-29 16:21 ` Max Nikulin
  2021-10-29 17:58   ` Greg Minshall
  2022-05-07  8:05 ` Max Nikulin
  1 sibling, 1 reply; 13+ messages in thread
From: Max Nikulin @ 2021-10-29 16:21 UTC (permalink / raw)
  To: emacs-orgmode

On 28/10/2021 11:04, Greg Minshall wrote:
> 
> i wonder if it would be reasonable to add an option such that, when
> tangling, `org-babel-tangle` would not write a file with the
> already-existing contents of the target file?

Are you going to celebrate a decade of the feature request?

https://list.orgmode.org/orgmode/CAFDswJtMzFZgRQqsFtiQNT7c1uFE7HV4kek5j9akx1yAdWZZEg@mail.gmail.com/T/#u
Holger Hoefling @ 2011-11-18 13:17 UTC
Not overwriting unchanged source code files when tangling

> this would be helpful, e.g., for those of us who use make(1)-based work
> flows.

I agree with you, make is wide spread and fast tool that is really 
convenient in simple cases.

Some hash-based build systems are mentioned in that thread. Since that 
time more more similar tools have appeared, e.g. buck, reimplementations 
of DJB's redo https://cr.yp.to/redo.html

> then, if this might generally be thought useful, i wonder if this should
> be implemented as specifically this, or whether we might implement a
> callback at the appropriate point in `org-babel-tangle` asking whether
> or not to proceed.  (then, the user's callback routine could do the
> comparison.)
> 
> if we do "specifically this", i would suggest that this comparison be
> dead simple: read in the existing file's contents into some hidden
> buffer, and use `compare-buffer-substrings` to compare point-{min,max}
> of both.

Tom Gillespie mentioned that it is easy to lost modifications of tangled 
files created during debugging.
https://list.orgmode.org/orgmode/CA+G3_PNo8i8U9rOrC8BBydjEv9=LKgtJBopwX9J6vyNXdnjtXg@mail.gmail.com/T/#u

I think, some header may be added to tangled file containing hash of 
rest part of file. So the file may be checked for user modifications 
before overwriting, that hash may be compared with the new buffer to 
keep existing file in place.

It seems `compare-buffer-substrings` has more logic than just byte to 
byte comparison. Is it to handle alternatives for unicode character 
alternatives? For tangled buffer it should be size that is checked first...



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: tangle option to not write a file with same contents?
  2021-10-29 16:21 ` Max Nikulin
@ 2021-10-29 17:58   ` Greg Minshall
  2021-10-30 15:13     ` Max Nikulin
  0 siblings, 1 reply; 13+ messages in thread
From: Greg Minshall @ 2021-10-29 17:58 UTC (permalink / raw)
  To: Max Nikulin; +Cc: emacs-orgmode

Max,

> > i wonder if it would be reasonable to add an option such that, when
> > tangling, `org-babel-tangle` would not write a file with the
> > already-existing contents of the target file?
> 
> Are you going to celebrate a decade of the feature request?
> I agree with you, make is wide spread and fast tool that is really
> convenient in simple cases.

hah -- i should have waited a few weeks!

> Some hash-based build systems are mentioned in that thread. Since that
> time more more similar tools have appeared, e.g. buck,
> reimplementations of DJB's redo https://cr.yp.to/redo.html

i think different people will settle on different build tools.

> It seems `compare-buffer-substrings` has more logic than just byte to
> byte comparison. Is it to handle alternatives for unicode character
> alternatives? For tangled buffer it should be size that is checked
> first...

you are right, it definitely makes sense to look first at size.  (which
is what, e.g., rsync(1) does.)  also, probably i needn't have mentioned
`compare-buffer-substrings` -- i was really just trying to suggest
"simple" (which maybe i anti-did?).

cheers, Greg


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: tangle option to not write a file with same contents?
  2021-10-29 17:58   ` Greg Minshall
@ 2021-10-30 15:13     ` Max Nikulin
  2021-10-30 16:13       ` Greg Minshall
  0 siblings, 1 reply; 13+ messages in thread
From: Max Nikulin @ 2021-10-30 15:13 UTC (permalink / raw)
  To: emacs-orgmode

On 30/10/2021 00:58, Greg Minshall wrote:
> 
>> Some hash-based build systems are mentioned in that thread. Since that
>> time more more similar tools have appeared, e.g. buck,
>> reimplementations of DJB's redo https://cr.yp.to/redo.html
> 
> i think different people will settle on different build tools.

Greg, I see your point and often I am not happy to change established 
workflow as well. Partially a reason is that it requires some efforts. 
This particular issue should be handled in Org code. (Unfortunately it 
requires some efforts as well.) On the other hand, it may be treated in 
a more general way by external hash&cache build tool.

Actually I have no suggestion concerning particular build system. E.g. 
buck is too heavy (python+java), and my experience is not purely positive.

>> It seems `compare-buffer-substrings` has more logic than just byte to
>> byte comparison. Is it to handle alternatives for unicode character
>> alternatives? For tangled buffer it should be size that is checked
>> first...
> 
> you are right, it definitely makes sense to look first at size.  (which
> is what, e.g., rsync(1) does.)  also, probably i needn't have mentioned
> `compare-buffer-substrings` -- i was really just trying to suggest
> "simple" (which maybe i anti-did?).

I think, `compare-buffer-substrings' is a good starting point. It is 
ready to use and I am not aware of obvious problems with it. (Can it 
happen that change of file encoding would be discarded since buffers are 
equal?) I just was curious whether the function performs size check. It 
does, but comparison is not identity test, so it is at the end of the 
function.

In the meanwhile I realized that check for modification by user should 
be performed *before* tangle, and hash to detect changes is appropriate 
for such purpose. I think, a copy of tangled file just to detect 
modification will cause some tension from users.

Comparison of earlier and current tangle results should be done at the 
end, so implementation should be independent. There is no point to use 
hash, size + byte to byte comparison is fast and reliable.

A subtle point partially discussed earlier is overwriting content of 
existing file vs. tangling to temporary file and atomic replacement. In 
most cases the latter is preferred. However if target file is open for 
debugging in an editor, content should be written to the existing file 
(preserving inode). It allows to preserve unsaved changes if the editor 
notifies user that file is modified.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: tangle option to not write a file with same contents?
  2021-10-30 15:13     ` Max Nikulin
@ 2021-10-30 16:13       ` Greg Minshall
  0 siblings, 0 replies; 13+ messages in thread
From: Greg Minshall @ 2021-10-30 16:13 UTC (permalink / raw)
  To: Max Nikulin; +Cc: emacs-orgmode

Max,

thanks for the reply.

> In the meanwhile I realized that check for modification by user should
> be performed *before* tangle, and hash to detect changes is
> appropriate for such purpose. I think, a copy of tangled file just to
> detect modification will cause some tension from users.

i guess you mean org's current "cache" stuff?

> Comparison of earlier and current tangle results should be done at the
> end, so implementation should be independent. There is no point to use 
> hash, size + byte to byte comparison is fast and reliable.

that makes sense.

cheers, Greg


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: tangle option to not write a file with same contents?
  2021-10-28  4:04 tangle option to not write a file with same contents? Greg Minshall
  2021-10-29 16:21 ` Max Nikulin
@ 2022-05-07  8:05 ` Max Nikulin
  2022-05-08  4:42   ` [PATCH] " Ihor Radchenko
  1 sibling, 1 reply; 13+ messages in thread
From: Max Nikulin @ 2022-05-07  8:05 UTC (permalink / raw)
  To: Greg Minshall, Org Mode

A couple additional notes for the case that someone will find this 
thread in future.

On 28/10/2021 11:04, Greg Minshall wrote:
> 
> i wonder if it would be reasonable to add an option such that, when
> tangling, `org-babel-tangle` would not write a file with the
> already-existing contents of the target file?
> 
> this would be helpful, e.g., for those of us who use make(1)-based work
> flows.

It was not obvious for me earlier that it should be namely an *option*, 
not just change of behavior, since e.g. `org-babel-load-file' relies on 
timestamp comparison of the source .org file and the derived .el file. I 
am unsure concerning default value of such setting.

It was an issue with `org-file-newer-than-p' that reminded me about this 
thread.

P.S. Timestamp comparison is not always reliable to determine whether 
prerequisite has not been updated. Earlier python had timestamp of .py 
file saved in the header of compiled .pyc files, but at some moment they 
switched to hash.

https://peps.python.org/pep-0552/
PEP 552 – Deterministic pycs


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] Re: tangle option to not write a file with same contents?
  2022-05-07  8:05 ` Max Nikulin
@ 2022-05-08  4:42   ` Ihor Radchenko
  2022-05-17 15:39     ` Max Nikulin
  2022-06-01 13:18     ` Greg Minshall
  0 siblings, 2 replies; 13+ messages in thread
From: Ihor Radchenko @ 2022-05-08  4:42 UTC (permalink / raw)
  To: Max Nikulin; +Cc: Greg Minshall, Org Mode

[-- Attachment #1: Type: text/plain, Size: 846 bytes --]

Max Nikulin <manikulin@gmail.com> writes:

> On 28/10/2021 11:04, Greg Minshall wrote:
>> 
>> i wonder if it would be reasonable to add an option such that, when
>> tangling, `org-babel-tangle` would not write a file with the
>> already-existing contents of the target file?
>> 
>> this would be helpful, e.g., for those of us who use make(1)-based work
>> flows.
>
> It was not obvious for me earlier that it should be namely an *option*, 
> not just change of behavior, since e.g. `org-babel-load-file' relies on 
> timestamp comparison of the source .org file and the derived .el file. I 
> am unsure concerning default value of such setting.

I agree that it should be the default behaviour.
The patch is attached.

On SSD, when tangling into ~200 files, the patch speeds up tangling
by almost 2x: before 7.6 sec; after 4.4 sec.

Best,
Ihor


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-org-tangle-Do-not-overwrite-when-contents-does-not-c.patch --]
[-- Type: text/x-patch, Size: 2212 bytes --]

From 68d90e73da17e423220211897ad1e86a4eb2e5a1 Mon Sep 17 00:00:00 2001
Message-Id: <68d90e73da17e423220211897ad1e86a4eb2e5a1.1651984776.git.yantar92@gmail.com>
From: Ihor Radchenko <yantar92@gmail.com>
Date: Sun, 8 May 2022 12:32:40 +0800
Subject: [PATCH] org-tangle: Do not overwrite when contents does not change

* lisp/ob-tangle.el (org-babel-tangle): Do not overwrite existing
tangled files if their contents is exactly the same as we are going to
write during tangle process.  This avoids unneeded disk writes and can
speed up tangling significantly when many small files are tangles from
a single .org source.

An example of performance improvement when tangling an .org file into
~200 files:
(benchmark-run 10 (org-babel-tangle))
Before the commit (on SSD): (76.33826743 8 11.551725374)
After the commit:           (43.628606052 4 5.751274237)
---
 lisp/ob-tangle.el | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/lisp/ob-tangle.el b/lisp/ob-tangle.el
index 16d938afb..c011bf662 100644
--- a/lisp/ob-tangle.el
+++ b/lisp/ob-tangle.el
@@ -282,11 +282,17 @@ (defun org-babel-tangle (&optional arg target-file lang-re)
 		    lspecs)
 		   (when make-dir
 		     (make-directory fnd 'parents))
-                   ;; erase previous file
-                   (when (file-exists-p file-name)
-                     (delete-file file-name))
-		   (write-region nil nil file-name)
-		   (mapc (lambda (mode) (set-file-modes file-name mode)) modes)
+                   (unless
+                       (let ((new-contents-hash (buffer-hash)))
+                         (with-temp-buffer
+                           (when (file-exists-p file-name)
+                             (insert-file-contents file-name))
+                           (equal (buffer-hash) new-contents-hash)))
+                     ;; erase previous file
+                     (when (file-exists-p file-name)
+                       (delete-file file-name))
+		     (write-region nil nil file-name)
+		     (mapc (lambda (mode) (set-file-modes file-name mode)) modes))
                    (push file-name path-collector))))))
 	 (if (equal arg '(4))
 	     (org-babel-tangle-single-block 1 t)
-- 
2.35.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Re: tangle option to not write a file with same contents?
  2022-05-08  4:42   ` [PATCH] " Ihor Radchenko
@ 2022-05-17 15:39     ` Max Nikulin
  2022-05-30  3:14       ` Ihor Radchenko
  2022-06-01 13:18     ` Greg Minshall
  1 sibling, 1 reply; 13+ messages in thread
From: Max Nikulin @ 2022-05-17 15:39 UTC (permalink / raw)
  To: emacs-orgmode

On 08/05/2022 11:42, Ihor Radchenko wrote:
>> On 28/10/2021 11:04, Greg Minshall wrote:
>>>
>>> i wonder if it would be reasonable to add an option such that, when
>>> tangling, `org-babel-tangle` would not write a file with the
>>> already-existing contents of the target file?
[...]
> The patch is attached.
> 
> On SSD, when tangling into ~200 files, the patch speeds up tangling
> by almost 2x: before 7.6 sec; after 4.4 sec.

By mistake I sent my reply to Ihor off-list, so a part of discussion is 
missed in the list archives. The only excuse is that a copy of message 
received as Cc does not have List-Post header, so reply action works as 
for private messages.

Ihor Radchenko. [BUG] org-babel-load-file can not compile file. Fri, 13 
May 2022 18:38:14 +0800. https://list.orgmode.org/878rr5ra3t.fsf@localhost

This is a patch from another thread that should be a part of this change.

> diff --git a/lisp/org.el b/lisp/org.el
> index 47a16e94b..09a001414 100644
> --- a/lisp/org.el
> +++ b/lisp/org.el
> @@ -256,6 +256,11 @@ (defun org-babel-load-file (file &optional compile)
>  	     tangled-file
>  	     (file-attribute-modification-time
>  	      (file-attributes (file-truename file))))
> +      ;; Make sure that tangled file modification time is
> +      ;; updated even when `org-babel-tangle-file' does not make changes.
> +      ;; This avoids re-tangling changed FILE where the changes did
> +      ;; not affect the tangled code.
> +      (set-file-times tangled-file)
>        (org-babel-tangle-file file
>                               tangled-file
>                               (rx string-start

`set-file-times' signals if the file does not exist, so I expect that 
the call should be after `org-babel-tangle-file' otherwise first 
invocation for a new org file will fail. I would prefer to avoid 
touching the tangled file at all, but it makes impossible to check if 
the file is up to date (at least without saving hashes somewhere, that 
is unnecessary complication here). With optimizing of writhing of the 
tangled file overall behavior is rather close to original approach, so 
`set-file-times' should be OK.

Ihor Radchenko, off-list [PATCH v2] Re: tangle option to not write a 
file with same contents? Mon, 09 May 2022 21:22:55 +0800.

> diff --git a/lisp/ob-tangle.el b/lisp/ob-tangle.el
> index 16d938afb..76243f83f 100644
> --- a/lisp/ob-tangle.el
> +++ b/lisp/ob-tangle.el
> @@ -282,11 +282,24 @@ (defun org-babel-tangle (&optional arg target-file lang-re)
>  		    lspecs)
>  		   (when make-dir
>  		     (make-directory fnd 'parents))
> -                   ;; erase previous file
> -                   (when (file-exists-p file-name)
> -                     (delete-file file-name))
> -		   (write-region nil nil file-name)
> -		   (mapc (lambda (mode) (set-file-modes file-name mode)) modes)
> +                   (unless
> +                       (when (file-exists-p file-name)
> +                         (let ((tangle-buf (current-buffer)))
> +                           (with-temp-buffer
> +                             (insert-file-contents file-name)
> +                             (and
> +                              (equal (buffer-size)
> +                                     (buffer-size tangle-buf))
> +                              (= 0
> +                                 (let (case-fold-search)
> +                                   (compare-buffer-substrings
> +                                    nil nil nil
> +                                    tangle-buf nil nil)))))))
> +                     ;; erase previous file
> +                     (when (file-exists-p file-name)
> +                       (delete-file file-name))
> +		     (write-region nil nil file-name)
> +		     (mapc (lambda (mode) (set-file-modes file-name mode)) modes))
>                     (push file-name path-collector))))))
>  	 (if (equal arg '(4))
>  	     (org-babel-tangle-single-block 1 t)

I do not like (unless (when ...)) composition. If I remember correctly, 
`when' should be used for side effects, so `and' may be more suitable 
here. Otherwise it looks like what Greg suggested and should work faster 
than first variant of this patch.

My fault caused significant delay, so feel free to ignore comments below.

I still had a hope that `org-babel-load-file' might be improved a bit by 
using `byte-recompile-file' with 0 passed for ARG (previously I 
incorrectly wrote FORCE). The goal is to avoid recompiling the tangled 
.el file if it is not changed.

I am still curious if it is reliable to compare file size from 
`file-attributes' with (+ 1 (bufferpos-to-filepos (buffer-size))) for 
tangle result prior to loading existing file. I am unsure due to 
variations in encodings and newline formats, however it might further 
improve performance then tangle result changes.

I have noticed that `org-babel-tangle-file' may create empty org file if 
it does not exist. From my point of view it is questionable behavior.

Finally some comments on performance numbers. Ihor, your test simulates 
iterative debugging. Tangle results were likely in disk caches. Another 
case may give different numbers. Consider single pass after small 
modification of the source .org file. For comparison existing files are 
mostly should be loaded from disk. I did not mean disabling disk caches 
completely. After tangling it may take some time till files are actually 
written to disk. I am unsure if during repetitive benchmarking some 
files may be replaced in caches without writing to disk at all, likely 
timeout for dirty cache pages is small enough.

Outline of more fair performance test (however I do not think that such 
accuracy is really required):
- purge disk caches, so earlier tangled files have to be loaded from disk
- tangle
- flush caches (sync) to complete cycle.

And of course, tangling to single large file is not the same as multiple 
small ones.

Leaving aside further changes and details of benchmarking, I hope these 
2 patches will improve experience for make users and will not break 
anything in Org.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Re: tangle option to not write a file with same contents?
  2022-05-17 15:39     ` Max Nikulin
@ 2022-05-30  3:14       ` Ihor Radchenko
  2022-05-31 16:07         ` Max Nikulin
  0 siblings, 1 reply; 13+ messages in thread
From: Ihor Radchenko @ 2022-05-30  3:14 UTC (permalink / raw)
  To: Max Nikulin; +Cc: emacs-orgmode

I applied the discussed two patches onto main via 1525a5a64 and
f6f26d4ce. The suggested amendments were incorporated.

> I do not like (unless (when ...)) composition. If I remember correctly, 
> `when' should be used for side effects, so `and' may be more suitable 
> here. Otherwise it looks like what Greg suggested and should work faster 
> than first variant of this patch.

I did not see any documentation saying that when is for side effects.
However, or would do equivalent job there and also easier to read. So, I
changed when to or as suggested. 

> I still had a hope that `org-babel-load-file' might be improved a bit by 
> using `byte-recompile-file' with 0 passed for ARG (previously I 
> incorrectly wrote FORCE). The goal is to avoid recompiling the tangled 
> .el file if it is not changed.

Can you please open a new thread on this? People who did not keep track
of this thread might not notice this suggestion.

> I am still curious if it is reliable to compare file size from 
> `file-attributes' with (+ 1 (bufferpos-to-filepos (buffer-size))) for 
> tangle result prior to loading existing file. I am unsure due to 
> variations in encodings and newline formats, however it might further 
> improve performance then tangle result changes.

In my testing, I did not notice any significant difference. Given than
bufferpos-to-filepos can be tricky and sometimes return nil (see the
docstring on QUALITY arg), I do not think that it is worth bothering.

> I have noticed that `org-babel-tangle-file' may create empty org file if 
> it does not exist. From my point of view it is questionable behavior.

Fixed. Now, set-file-times call is guarded by file-exists-p check.

> Finally some comments on performance numbers. Ihor, your test simulates 
> iterative debugging. Tangle results were likely in disk caches. Another 
> case may give different numbers. Consider single pass after small 
> modification of the source .org file. For comparison existing files are 
> mostly should be loaded from disk. I did not mean disabling disk caches 
> completely. After tangling it may take some time till files are actually 
> written to disk. I am unsure if during repetitive benchmarking some 
> files may be replaced in caches without writing to disk at all, likely 
> timeout for dirty cache pages is small enough.

My benchmark was for the best-case scenario for the proposed patch.

Everything is tangled to files prior to running the benchmark. This way,
none of the tangled files should change when calling org-babel-tangle
and no disk caches should be involved.

The benchmark without the patch, would write to disks. If your concern
is correct and no actual disk writing happens due to disk caching, the
benchmark time provided in the commit message should be a lower bound of
the actual time. Then, we don't care. The existing benchmark already
illustrates that the patch can reduce tangle time significantly.

Best,
Ihor



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Re: tangle option to not write a file with same contents?
  2022-05-30  3:14       ` Ihor Radchenko
@ 2022-05-31 16:07         ` Max Nikulin
  2022-06-03  7:04           ` Ihor Radchenko
  0 siblings, 1 reply; 13+ messages in thread
From: Max Nikulin @ 2022-05-31 16:07 UTC (permalink / raw)
  To: emacs-orgmode

On 30/05/2022 10:14, Ihor Radchenko wrote:
> I applied the discussed two patches onto main via 1525a5a64 and
> f6f26d4ce. The suggested amendments were incorporated.

So Greg's feature request is implemented and it is great.

I am less confident with `org-babel-load-file' though. Due to poor error 
handling in `org-babel-tangle' & Co., I am unsure that I can provide an 
example when new code incorrectly updates file modification time despite 
tangle failure, but I do not like that modification time is changed 
before attempt to tangle. Generally I expected that it is performed 
after tangling, perhaps with check that `org-babel-tangle-file' returned 
expected file name (as some hope for success).

>> I do not like (unless (when ...)) composition. If I remember correctly,
>> `when' should be used for side effects, so `and' may be more suitable
>> here. Otherwise it looks like what Greg suggested and should work faster
>> than first variant of this patch.
> 
> I did not see any documentation saying that when is for side effects.
> However, or would do equivalent job there and also easier to read. So, I
> changed when to or as suggested.

I have realized that the case was not exactly the same when Nicolas 
asked for a patch correction:

Nicolas Goaziou to emacs-orgmode. Re: [PATCH] Fix moving cursor in 
org-set-tags-command. Fri, 08 May 2020 09:53:55 +0200.
https://list.orgmode.org/87ftca6ewc.fsf@nicolasgoaziou.fr

"Please replace `and' with `when' if side-effects are involved."

>> I am still curious if it is reliable to compare file size from
>> `file-attributes' with (+ 1 (bufferpos-to-filepos (buffer-size))) for
>> tangle result prior to loading existing file. I am unsure due to
>> variations in encodings and newline formats, however it might further
>> improve performance then tangle result changes.
> 
> In my testing, I did not notice any significant difference.

It is because you tangle a lot of files most of them are not changed. 
Otherwise it is possible to avoid loading of large enough file just 
because file size is different in comparison to the tangle result.

> Given than
> bufferpos-to-filepos can be tricky and sometimes return nil (see the
> docstring on QUALITY arg), I do not think that it is worth bothering.

It seems I was wrong that buf_charpos_to_bytepos has a quick way to get 
byte when end of buffer is passed as the argument. So improvement of 
performance may be significantly less than I initially expected. Let's 
leave it as is.

> Everything is tangled to files prior to running the benchmark. This way,
> none of the tangled files should change when calling org-babel-tangle
> and no disk caches should be involved.

I am trying to say that in your benchmark file are not read from disk, 
almost certainly they are cached in memory. Besides write cache, while 
reading, content may be taken from cache as well.

Actually I am just speculating, while you have benchmark result. It is 
unlikely that I will decide to spend some time arranging another test to 
convince you with numbers, priority of such activity is not high enough.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Re: tangle option to not write a file with same contents?
  2022-05-08  4:42   ` [PATCH] " Ihor Radchenko
  2022-05-17 15:39     ` Max Nikulin
@ 2022-06-01 13:18     ` Greg Minshall
  1 sibling, 0 replies; 13+ messages in thread
From: Greg Minshall @ 2022-06-01 13:18 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: Max Nikulin, Org Mode

Ihor,

> The patch is attached.

great -- thanks very much!  this will be very helpful.

cheers, Greg


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Re: tangle option to not write a file with same contents?
  2022-05-31 16:07         ` Max Nikulin
@ 2022-06-03  7:04           ` Ihor Radchenko
  2022-06-07  3:47             ` Tom Gillespie
  0 siblings, 1 reply; 13+ messages in thread
From: Ihor Radchenko @ 2022-06-03  7:04 UTC (permalink / raw)
  To: Max Nikulin; +Cc: emacs-orgmode

Max Nikulin <manikulin@gmail.com> writes:

> On 30/05/2022 10:14, Ihor Radchenko wrote:
>> I applied the discussed two patches onto main via 1525a5a64 and
>> f6f26d4ce. The suggested amendments were incorporated.
>
> So Greg's feature request is implemented and it is great.
>
> I am less confident with `org-babel-load-file' though. Due to poor error 
> handling in `org-babel-tangle' & Co., I am unsure that I can provide an 
> example when new code incorrectly updates file modification time despite 
> tangle failure, but I do not like that modification time is changed 
> before attempt to tangle. Generally I expected that it is performed 
> after tangling, perhaps with check that `org-babel-tangle-file' returned 
> expected file name (as some hope for success).

I agree that updating the modification time after tangling will be
slightly more robust in common cases. Though not necessarily strictly
better.

Handling errors is challenging. org-babel-tangle potentially writes to
multiple files in unspecified order. Tangle error may happen in the
middle of tangling when some files are written and some are not. Those
tangled files could also depend on each other during loading (think of
config.org tangling into init.el and config.el one requiring another).
Then, an error during tangling can result in some files being tangled to
newer versions and some still having old versions.

Now, consider that config.org is tangled to config.el and then to
init.el with the latter tangling process failing. Next time we call
org-babel-load-file, the updated config.el will be loaded despite error
and incorrectly require the old init.el.

Another case is when init.el is tangled first and config.el tangling
fails. This time, it matters whether we update the modification time of
config.el (before of after tangling). In our current code (with my
patch), the older version of config.el will be touched despite tangling
error and next time old config.el + new init.el will be loaded.

The existing code will create problems in both the described cases.
If we update file modification time only after tangling succeeds without
error, only the first case will be problematic.

So, your proposal will indeed be more robust in a sense that it will
prevent _some_ problems when tangling fails.

However, I see the "robustness" as a bad thing here. org-babel-load will
sometimes fail and sometimes not depending on the where in config.org
the error appears. I see it as a potential nightmare for debugging.

Ideally, we should solve the problem with tangle errors in some other
way. I am not sure how (or rather haven't spent much time thinking).

Ideas are welcome!

> I have realized that the case was not exactly the same when Nicolas 
> asked for a patch correction:
>
> Nicolas Goaziou to emacs-orgmode. Re: [PATCH] Fix moving cursor in 
> org-set-tags-command. Fri, 08 May 2020 09:53:55 +0200.
> https://list.orgmode.org/87ftca6ewc.fsf@nicolasgoaziou.fr
>
> "Please replace `and' with `when' if side-effects are involved."

I see. `and' is indeed considered a logical statement and should not
normally be abused for side-effects. But it does not imply the opposite
for `when'.

>> Everything is tangled to files prior to running the benchmark. This way,
>> none of the tangled files should change when calling org-babel-tangle
>> and no disk caches should be involved.
>
> I am trying to say that in your benchmark file are not read from disk, 
> almost certainly they are cached in memory. Besides write cache, while 
> reading, content may be taken from cache as well.
>
> Actually I am just speculating, while you have benchmark result. It is 
> unlikely that I will decide to spend some time arranging another test to 
> convince you with numbers, priority of such activity is not high enough.

Let me clarify. I am looking at this benchmark from a broader
perspective:
- We want the patch to improve performance.
- We don't want the patch to degrade performance.

1. My benchmark certainly shows that there is an improvement in some
   cases. It is something we want.

2. My benchmark may not be accurate depending on the system state. It is
   also fine. Just shows that improvement might not happen in 100% of
   cases.

3. Is my benchmark degrading performance? I do not see how and I do not
   see how your observations prove otherwise. So, unless you can see how
   my patch degrades performance on some systems, I do not see any
   reason to perform more accurate benchmarking other than for science.

Best,
Ihor
   


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Re: tangle option to not write a file with same contents?
  2022-06-03  7:04           ` Ihor Radchenko
@ 2022-06-07  3:47             ` Tom Gillespie
  0 siblings, 0 replies; 13+ messages in thread
From: Tom Gillespie @ 2022-06-07  3:47 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: Max Nikulin, emacs-orgmode

I can report that with the current changes in the tree
I see some nice performance improvements in files
where I have large numbers of blocks where I modify
a subset of them (beyond a single case where C-u
C-c C-v C-t works) and then retangle the whole file.
Best,
Tom


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-06-07  3:48 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-28  4:04 tangle option to not write a file with same contents? Greg Minshall
2021-10-29 16:21 ` Max Nikulin
2021-10-29 17:58   ` Greg Minshall
2021-10-30 15:13     ` Max Nikulin
2021-10-30 16:13       ` Greg Minshall
2022-05-07  8:05 ` Max Nikulin
2022-05-08  4:42   ` [PATCH] " Ihor Radchenko
2022-05-17 15:39     ` Max Nikulin
2022-05-30  3:14       ` Ihor Radchenko
2022-05-31 16:07         ` Max Nikulin
2022-06-03  7:04           ` Ihor Radchenko
2022-06-07  3:47             ` Tom Gillespie
2022-06-01 13:18     ` Greg Minshall

Code repositories for project(s) associated with this inbox:

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).