emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Max Nikulin <manikulin@gmail.com>
To: emacs-orgmode@gnu.org
Subject: Re: [PATCH] Re: tangle option to not write a file with same contents?
Date: Tue, 17 May 2022 22:39:30 +0700	[thread overview]
Message-ID: <t60fjj$s6v$1@ciao.gmane.io> (raw)
In-Reply-To: <878rrcws6q.fsf@localhost>

On 08/05/2022 11:42, Ihor Radchenko wrote:
>> On 28/10/2021 11:04, Greg Minshall wrote:
>>> i wonder if it would be reasonable to add an option such that, when
>>> tangling, `org-babel-tangle` would not write a file with the
>>> already-existing contents of the target file?
> The patch is attached.
> On SSD, when tangling into ~200 files, the patch speeds up tangling
> by almost 2x: before 7.6 sec; after 4.4 sec.

By mistake I sent my reply to Ihor off-list, so a part of discussion is 
missed in the list archives. The only excuse is that a copy of message 
received as Cc does not have List-Post header, so reply action works as 
for private messages.

Ihor Radchenko. [BUG] org-babel-load-file can not compile file. Fri, 13 
May 2022 18:38:14 +0800. https://list.orgmode.org/878rr5ra3t.fsf@localhost

This is a patch from another thread that should be a part of this change.

> diff --git a/lisp/org.el b/lisp/org.el
> index 47a16e94b..09a001414 100644
> --- a/lisp/org.el
> +++ b/lisp/org.el
> @@ -256,6 +256,11 @@ (defun org-babel-load-file (file &optional compile)
>  	     tangled-file
>  	     (file-attribute-modification-time
>  	      (file-attributes (file-truename file))))
> +      ;; Make sure that tangled file modification time is
> +      ;; updated even when `org-babel-tangle-file' does not make changes.
> +      ;; This avoids re-tangling changed FILE where the changes did
> +      ;; not affect the tangled code.
> +      (set-file-times tangled-file)
>        (org-babel-tangle-file file
>                               tangled-file
>                               (rx string-start

`set-file-times' signals if the file does not exist, so I expect that 
the call should be after `org-babel-tangle-file' otherwise first 
invocation for a new org file will fail. I would prefer to avoid 
touching the tangled file at all, but it makes impossible to check if 
the file is up to date (at least without saving hashes somewhere, that 
is unnecessary complication here). With optimizing of writhing of the 
tangled file overall behavior is rather close to original approach, so 
`set-file-times' should be OK.

Ihor Radchenko, off-list [PATCH v2] Re: tangle option to not write a 
file with same contents? Mon, 09 May 2022 21:22:55 +0800.

> diff --git a/lisp/ob-tangle.el b/lisp/ob-tangle.el
> index 16d938afb..76243f83f 100644
> --- a/lisp/ob-tangle.el
> +++ b/lisp/ob-tangle.el
> @@ -282,11 +282,24 @@ (defun org-babel-tangle (&optional arg target-file lang-re)
>  		    lspecs)
>  		   (when make-dir
>  		     (make-directory fnd 'parents))
> -                   ;; erase previous file
> -                   (when (file-exists-p file-name)
> -                     (delete-file file-name))
> -		   (write-region nil nil file-name)
> -		   (mapc (lambda (mode) (set-file-modes file-name mode)) modes)
> +                   (unless
> +                       (when (file-exists-p file-name)
> +                         (let ((tangle-buf (current-buffer)))
> +                           (with-temp-buffer
> +                             (insert-file-contents file-name)
> +                             (and
> +                              (equal (buffer-size)
> +                                     (buffer-size tangle-buf))
> +                              (= 0
> +                                 (let (case-fold-search)
> +                                   (compare-buffer-substrings
> +                                    nil nil nil
> +                                    tangle-buf nil nil)))))))
> +                     ;; erase previous file
> +                     (when (file-exists-p file-name)
> +                       (delete-file file-name))
> +		     (write-region nil nil file-name)
> +		     (mapc (lambda (mode) (set-file-modes file-name mode)) modes))
>                     (push file-name path-collector))))))
>  	 (if (equal arg '(4))
>  	     (org-babel-tangle-single-block 1 t)

I do not like (unless (when ...)) composition. If I remember correctly, 
`when' should be used for side effects, so `and' may be more suitable 
here. Otherwise it looks like what Greg suggested and should work faster 
than first variant of this patch.

My fault caused significant delay, so feel free to ignore comments below.

I still had a hope that `org-babel-load-file' might be improved a bit by 
using `byte-recompile-file' with 0 passed for ARG (previously I 
incorrectly wrote FORCE). The goal is to avoid recompiling the tangled 
.el file if it is not changed.

I am still curious if it is reliable to compare file size from 
`file-attributes' with (+ 1 (bufferpos-to-filepos (buffer-size))) for 
tangle result prior to loading existing file. I am unsure due to 
variations in encodings and newline formats, however it might further 
improve performance then tangle result changes.

I have noticed that `org-babel-tangle-file' may create empty org file if 
it does not exist. From my point of view it is questionable behavior.

Finally some comments on performance numbers. Ihor, your test simulates 
iterative debugging. Tangle results were likely in disk caches. Another 
case may give different numbers. Consider single pass after small 
modification of the source .org file. For comparison existing files are 
mostly should be loaded from disk. I did not mean disabling disk caches 
completely. After tangling it may take some time till files are actually 
written to disk. I am unsure if during repetitive benchmarking some 
files may be replaced in caches without writing to disk at all, likely 
timeout for dirty cache pages is small enough.

Outline of more fair performance test (however I do not think that such 
accuracy is really required):
- purge disk caches, so earlier tangled files have to be loaded from disk
- tangle
- flush caches (sync) to complete cycle.

And of course, tangling to single large file is not the same as multiple 
small ones.

Leaving aside further changes and details of benchmarking, I hope these 
2 patches will improve experience for make users and will not break 
anything in Org.

  reply	other threads:[~2022-05-17 15:40 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-28  4:04 tangle option to not write a file with same contents? Greg Minshall
2021-10-29 16:21 ` Max Nikulin
2021-10-29 17:58   ` Greg Minshall
2021-10-30 15:13     ` Max Nikulin
2021-10-30 16:13       ` Greg Minshall
2022-05-07  8:05 ` Max Nikulin
2022-05-08  4:42   ` [PATCH] " Ihor Radchenko
2022-05-17 15:39     ` Max Nikulin [this message]
2022-05-30  3:14       ` Ihor Radchenko
2022-05-31 16:07         ` Max Nikulin
2022-06-03  7:04           ` Ihor Radchenko
2022-06-07  3:47             ` Tom Gillespie
2022-06-01 13:18     ` Greg Minshall
2022-09-12 17:36       ` Org version mismatch -- hooray! Greg Minshall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='t60fjj$s6v$1@ciao.gmane.io' \
    --to=manikulin@gmail.com \
    --cc=emacs-orgmode@gnu.org \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).