emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* Protect Org export from auto-formatting hooks
@ 2022-02-25 16:29 David Lukeš
  2022-02-26  9:32 ` Nicolas Goaziou
  0 siblings, 1 reply; 7+ messages in thread
From: David Lukeš @ 2022-02-25 16:29 UTC (permalink / raw)
  To: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 4498 bytes --]

Hi all,

I recently started using non-Pandoc Org exports to HTML and ODT and
noticed some problems on my macOS 12.2.1 box. Non-ASCII characters were
coming out garbled, the whitespace around footnotes was getting
rewritten in weird ways. On my Linux boxes, the same export commands
went fine, so I thought this must be related to some encoding
assumptions Org makes that are slightly different on the two platforms.

Armed with my very limited knowledge of Emacs, Org Mode and Elisp, I
decided to dig into the Org codebase and try to figure out where the
corruption was happening. However, this turned out to be a wild goose
chase – up until the last moment where Org hands off to `write-file’ to
write the export to disk, the contents was fine.

Wanting to confirm this, I tried exporting to a buffer, where indeed,
the contents looked perfectly alright. *But as soon as I saved the
buffer, it got mangled.* This finally made me realize that the culprit
was somewhere else: in my auto-formatting setup. Before that, I hadn’t
realized that Org exports are done via a buffer, so I had no idea that
on-save hooks are run. But knowing this, everything started falling into
place.

It turned out to be a perfect storm of trickiness: I’m using Doom Emacs,
so I’d just enabled autoformatting via `(format +onsave)’. I don’t care
about HTML or XML formatting and never manually installed an
auto-formatter for those, so I was surprised to see the export
reformatted on save. The reason: macOS ships `tidy’ in its base
distribution, so it gets picked up as an auto-formatting provider.
Unfortunately, it’s a really old version (2006) which wreaks havoc on
non-ASCII UTF-8 bytes (details in [this Doom Emacs issue], if you’re
interested).

However, even a recent version of `tidy’ can have undesirable effects on
the export. For instance, I’ve confirmed that those aforementioned
footnote-related whitespace issues in ODT exports persist, even after
installing a new version of `tidy’ which handles UTF-8 correctly. This
is presumably due to `tidy’ re-arranging the XML in ways that affect
whitespace.

So I think Org should try to protect the export buffer from these
shenanigans as much as possible. The best way I can think of to achieve
that is to keep the export buffer in fundamental mode. This should
prevent all the mode-related code from running, potentially even making
the export speedier.

After some experimenting, the way I finally got this to work was by
locally overriding the `set-auto-mode’ function:

┌────
│ diff --git a/lisp/ox.el b/lisp/ox.el
│ index 2a3edaa50..d5a77c26e 100644
│ --- a/lisp/ox.el
│ +++ b/lisp/ox.el
│ @@ -6462,14 +6462,16 @@ or FILE."
│       ',ext-plist)))
│         (with-temp-buffer
│   (insert output)
│ - (let ((coding-system-for-write ',encoding))
│ + (cl-letf ((coding-system-for-write ',encoding)
│ +   ((symbol-function 'set-auto-mode) #'ignore))
│     (write-file ,file)))
│         (or (ignore-errors (funcall ',post-process ,file)) ,file)))
│          (let ((output (org-export-as
│                         backend subtreep visible-only body-only
ext-plist)))
│            (with-temp-buffer
│              (insert output)
│ -            (let ((coding-system-for-write encoding))
│ +            (cl-letf ((coding-system-for-write encoding)
│ +                      ((symbol-function 'set-auto-mode) #'ignore))
│        (write-file file)))
│            (when (and (org-export--copy-to-kill-ring-p) (org-string-nw-p
output))
│              (org-kill-new output))
└────

What are your thoughts? Is this desirable? In my mind, yes: even if
someone has configured auto-formatting manually, they might still not
realize it’s getting run in a hidden buffer they never get to see. After
all, this is an implementation detail – if Emacs had a built-in function
to write a string to a file, I presume that would get used instead?

(In a way, I was lucky that I encountered quite noticeable issues with
mangled characters. With the recent version of tidy, which only messes
up whitespace, I might not have noticed at all.)

And if such protection is desirable, is this the best way to do it? Or
can you come up with a better approach?

Anyway, thanks for reading this far :)

Best,

David


[this Doom Emacs issue]
<https://github.com/hlissner/doom-emacs/issues/6149>

[-- Attachment #2: Type: text/html, Size: 4922 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Protect Org export from auto-formatting hooks
  2022-02-25 16:29 Protect Org export from auto-formatting hooks David Lukeš
@ 2022-02-26  9:32 ` Nicolas Goaziou
  2022-02-28 13:14   ` David Lukeš
  0 siblings, 1 reply; 7+ messages in thread
From: Nicolas Goaziou @ 2022-02-26  9:32 UTC (permalink / raw)
  To: David Lukeš; +Cc: emacs-orgmode

Hello,

David Lukeš <dafydd.lukes@gmail.com> writes:

> So I think Org should try to protect the export buffer from these
> shenanigans as much as possible. The best way I can think of to achieve
> that is to keep the export buffer in fundamental mode. This should
> prevent all the mode-related code from running, potentially even making
> the export speedier.
>
> After some experimenting, the way I finally got this to work was by
> locally overriding the `set-auto-mode’ function:
>
> ┌────
> │ diff --git a/lisp/ox.el b/lisp/ox.el
> │ index 2a3edaa50..d5a77c26e 100644
> │ --- a/lisp/ox.el
> │ +++ b/lisp/ox.el
> │ @@ -6462,14 +6462,16 @@ or FILE."
> │       ',ext-plist)))
> │         (with-temp-buffer
> │   (insert output)
> │ - (let ((coding-system-for-write ',encoding))
> │ + (cl-letf ((coding-system-for-write ',encoding)
> │ +   ((symbol-function 'set-auto-mode) #'ignore))
> │     (write-file ,file)))
> │         (or (ignore-errors (funcall ',post-process ,file)) ,file)))
> │          (let ((output (org-export-as
> │                         backend subtreep visible-only body-only
> ext-plist)))
> │            (with-temp-buffer
> │              (insert output)
> │ -            (let ((coding-system-for-write encoding))
> │ +            (cl-letf ((coding-system-for-write encoding)
> │ +                      ((symbol-function 'set-auto-mode) #'ignore))
> │        (write-file file)))
> │            (when (and (org-export--copy-to-kill-ring-p) (org-string-nw-p
> output))
> │              (org-kill-new output))
> └────
>
> What are your thoughts? Is this desirable? In my mind, yes: even if
> someone has configured auto-formatting manually, they might still not
> realize it’s getting run in a hidden buffer they never get to see. After
> all, this is an implementation detail – if Emacs had a built-in function
> to write a string to a file, I presume that would get used instead?
>
> (In a way, I was lucky that I encountered quite noticeable issues with
> mangled characters. With the recent version of tidy, which only messes
> up whitespace, I might not have noticed at all.)
>
> And if such protection is desirable, is this the best way to do it? Or
> can you come up with a better approach?

What about using `write-region' instead of `write-file' and not touching
`set-auto-mode' function? Could you test it? Note the arguments are
different, the equivalent to (write-file file) would be (write-region
nil nil file).

Regards,
-- 
Nicolas Goaziou


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Protect Org export from auto-formatting hooks
  2022-02-26  9:32 ` Nicolas Goaziou
@ 2022-02-28 13:14   ` David Lukeš
  2022-02-28 13:23     ` Nicolas Goaziou
  0 siblings, 1 reply; 7+ messages in thread
From: David Lukeš @ 2022-02-28 13:14 UTC (permalink / raw)
  To: emacs-orgmode

> What about using `write-region' instead of `write-file' and not touching
> `set-auto-mode' function?

Thanks, that's indeed a much better way of doing it :) One can even
avoid the temp buffer altogether and write the `output' string
directly with `(write-region output nil file)`.

Shall I prepare a patch? Or would you rather do it yourself, since it
was your idea?

Best,

David


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Protect Org export from auto-formatting hooks
  2022-02-28 13:14   ` David Lukeš
@ 2022-02-28 13:23     ` Nicolas Goaziou
  2022-02-28 13:42       ` David Lukeš
  0 siblings, 1 reply; 7+ messages in thread
From: Nicolas Goaziou @ 2022-02-28 13:23 UTC (permalink / raw)
  To: David Lukeš; +Cc: emacs-orgmode

Hello,

David Lukeš <dafydd.lukes@gmail.com> writes:

> Thanks, that's indeed a much better way of doing it :) One can even
> avoid the temp buffer altogether and write the `output' string
> directly with `(write-region output nil file)`.

Yup, even better.

> Shall I prepare a patch? Or would you rather do it yourself, since it
> was your idea?

Please be my guest. :)

Regards,
-- 
Nicolas Goaziou


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Protect Org export from auto-formatting hooks
  2022-02-28 13:23     ` Nicolas Goaziou
@ 2022-02-28 13:42       ` David Lukeš
  2022-06-21 12:36         ` [PATCH] ox.el: Protect " David Lukes
  0 siblings, 1 reply; 7+ messages in thread
From: David Lukeš @ 2022-02-28 13:42 UTC (permalink / raw)
  To: emacs-orgmode

Great, will do :) I realized at the last minute that some
`save-buffer' calls in `ox-odt.el' also need to be amended separately
(I was mostly testing these tweaks with HTML export, as it's faster /
simpler). If it's problematic, we can have that discussion over the
patch once I've submitted it.

Best,

David


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] ox.el: Protect export from auto-formatting hooks
  2022-02-28 13:42       ` David Lukeš
@ 2022-06-21 12:36         ` David Lukes
  2022-06-21 12:46           ` David Lukeš
  0 siblings, 1 reply; 7+ messages in thread
From: David Lukes @ 2022-06-21 12:36 UTC (permalink / raw)
  To: dafydd.lukes; +Cc: emacs-orgmode

* ox.el, ox-odt.el: Use `write-region' instead of `write-file' or
`save-buffer' to protect generated export buffers from auto-formatting
hooks.

In particular, `tidy' is often configured as the auto-formatter for HTML
and XML, and it can corrupt the exports.

TINYCHANGE
---
 lisp/ox-odt.el | 4 ++--
 lisp/ox.el     | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/lisp/ox-odt.el b/lisp/ox-odt.el
index 9d47067..ec1c360 100644
--- a/lisp/ox-odt.el
+++ b/lisp/ox-odt.el
@@ -1411,7 +1411,7 @@ original parsed data.  INFO is a plist holding export options."
 			(level (string-to-number (match-string 2))))
 		    (if (wholenump sec-num) (<= level sec-num) sec-num))
 	    (replace-match replacement t nil))))
-      (save-buffer 0)))
+      (write-region nil nil buffer-file-name)))
   ;; Update content.xml.
 
   (let* ( ;; `org-display-custom-times' should be accessed right
@@ -4004,7 +4004,7 @@ contextual information."
 		   ;; Prettify output if needed.
 		   (when org-odt-prettify-xml
 		     (indent-region (point-min) (point-max)))
-		   (save-buffer 0)))))
+		   (write-region nil nil buffer-file-name)))))
 	   ;; Run zip.
 	   (let* ((target --out-file)
 		  (target-name (file-name-nondirectory target))
diff --git a/lisp/ox.el b/lisp/ox.el
index c75357b..66d18c4 100644
--- a/lisp/ox.el
+++ b/lisp/ox.el
@@ -6553,14 +6553,14 @@ or FILE."
 	       (with-temp-buffer
 		 (insert output)
 		 (let ((coding-system-for-write ',encoding))
-		   (write-file ,file)))
+		   (write-region nil nil ,file)))
 	       (or (ignore-errors (funcall ',post-process ,file)) ,file)))
         (let ((output (org-export-as
                        backend subtreep visible-only body-only ext-plist)))
           (with-temp-buffer
             (insert output)
             (let ((coding-system-for-write encoding))
-	      (write-file file)))
+	      (write-region nil nil file)))
           (when (and (org-export--copy-to-kill-ring-p) (org-string-nw-p output))
             (org-kill-new output))
           ;; Get proper return value.
-- 
2.36.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] ox.el: Protect export from auto-formatting hooks
  2022-06-21 12:36         ` [PATCH] ox.el: Protect " David Lukes
@ 2022-06-21 12:46           ` David Lukeš
  0 siblings, 0 replies; 7+ messages in thread
From: David Lukeš @ 2022-06-21 12:46 UTC (permalink / raw)
  To: David Lukeš; +Cc: emacs-orgmode

Please ignore this patch!

Apologies, as I was sending another patch today, I thought I hadn't
followed up on my earlier promise to send a patch for this issue. It
turns out I have, it just hasn't been reviewed/merged yet:
https://list.orgmode.org/20220228140750.75761-1-dafydd.lukes@gmail.com/

(I didn't find it at first because I'd searched the mailing list
archive for my surname with diacritics, but the patch is signed with
my surname *without* diacritics.)

The earlier patch is better, please ignore this one. Sorry for wasting
everyone's time.

David


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-06-21 12:47 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-25 16:29 Protect Org export from auto-formatting hooks David Lukeš
2022-02-26  9:32 ` Nicolas Goaziou
2022-02-28 13:14   ` David Lukeš
2022-02-28 13:23     ` Nicolas Goaziou
2022-02-28 13:42       ` David Lukeš
2022-06-21 12:36         ` [PATCH] ox.el: Protect " David Lukes
2022-06-21 12:46           ` David Lukeš

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).