Hi all,
I recently started using non-Pandoc Org exports to HTML and ODT and
noticed some problems on my macOS 12.2.1 box. Non-ASCII characters were
coming out garbled, the whitespace around footnotes was getting
rewritten in weird ways. On my Linux boxes, the same export commands
went fine, so I thought this must be related to some encoding
assumptions Org makes that are slightly different on the two platforms.
Armed with my very limited knowledge of Emacs, Org Mode and Elisp, I
decided to dig into the Org codebase and try to figure out where the
corruption was happening. However, this turned out to be a wild goose
chase – up until the last moment where Org hands off to `write-file’ to
write the export to disk, the contents was fine.
Wanting to confirm this, I tried exporting to a buffer, where indeed,
the contents looked perfectly alright. *But as soon as I saved the
buffer, it got mangled.* This finally made me realize that the culprit
was somewhere else: in my auto-formatting setup. Before that, I hadn’t
realized that Org exports are done via a buffer, so I had no idea that
on-save hooks are run. But knowing this, everything started falling into
place.
It turned out to be a perfect storm of trickiness: I’m using Doom Emacs,
so I’d just enabled autoformatting via `(format +onsave)’. I don’t care
about HTML or XML formatting and never manually installed an
auto-formatter for those, so I was surprised to see the export
reformatted on save. The reason: macOS ships `tidy’ in its base
distribution, so it gets picked up as an auto-formatting provider.
Unfortunately, it’s a really old version (2006) which wreaks havoc on
non-ASCII UTF-8 bytes (details in [this Doom Emacs issue], if you’re
interested).
However, even a recent version of `tidy’ can have undesirable effects on
the export. For instance, I’ve confirmed that those aforementioned
footnote-related whitespace issues in ODT exports persist, even after
installing a new version of `tidy’ which handles UTF-8 correctly. This
is presumably due to `tidy’ re-arranging the XML in ways that affect
whitespace.
So I think Org should try to protect the export buffer from these
shenanigans as much as possible. The best way I can think of to achieve
that is to keep the export buffer in fundamental mode. This should
prevent all the mode-related code from running, potentially even making
the export speedier.
After some experimenting, the way I finally got this to work was by
locally overriding the `set-auto-mode’ function:
┌────
│ diff --git a/lisp/ox.el b/lisp/ox.el
│ index 2a3edaa50..d5a77c26e 100644
│ --- a/lisp/ox.el
│ +++ b/lisp/ox.el
│ @@ -6462,14 +6462,16 @@ or FILE."
│ ',ext-plist)))
│ (with-temp-buffer
│ (insert output)
│ - (let ((coding-system-for-write ',encoding))
│ + (cl-letf ((coding-system-for-write ',encoding)
│ + ((symbol-function 'set-auto-mode) #'ignore))
│ (write-file ,file)))
│ (or (ignore-errors (funcall ',post-process ,file)) ,file)))
│ (let ((output (org-export-as
│ backend subtreep visible-only body-only ext-plist)))
│ (with-temp-buffer
│ (insert output)
│ - (let ((coding-system-for-write encoding))
│ + (cl-letf ((coding-system-for-write encoding)
│ + ((symbol-function 'set-auto-mode) #'ignore))
│ (write-file file)))
│ (when (and (org-export--copy-to-kill-ring-p) (org-string-nw-p output))
│ (org-kill-new output))
└────
What are your thoughts? Is this desirable? In my mind, yes: even if
someone has configured auto-formatting manually, they might still not
realize it’s getting run in a hidden buffer they never get to see. After
all, this is an implementation detail – if Emacs had a built-in function
to write a string to a file, I presume that would get used instead?
(In a way, I was lucky that I encountered quite noticeable issues with
mangled characters. With the recent version of tidy, which only messes
up whitespace, I might not have noticed at all.)
And if such protection is desirable, is this the best way to do it? Or
can you come up with a better approach?
Anyway, thanks for reading this far :)
Best,
David
[this Doom Emacs issue]
<https://github.com/hlissner/doom-emacs/issues/6149>