emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: "David Lukeš" <dafydd.lukes@gmail.com>
To: emacs-orgmode@gnu.org
Subject: Protect Org export from auto-formatting hooks
Date: Fri, 25 Feb 2022 17:29:06 +0100	[thread overview]
Message-ID: <CAEPTPEwOZjJLX1jyFmVR6jarsocyuD=Zf7HJ1i9G4SfWk4mUpQ@mail.gmail.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 4498 bytes --]

Hi all,

I recently started using non-Pandoc Org exports to HTML and ODT and
noticed some problems on my macOS 12.2.1 box. Non-ASCII characters were
coming out garbled, the whitespace around footnotes was getting
rewritten in weird ways. On my Linux boxes, the same export commands
went fine, so I thought this must be related to some encoding
assumptions Org makes that are slightly different on the two platforms.

Armed with my very limited knowledge of Emacs, Org Mode and Elisp, I
decided to dig into the Org codebase and try to figure out where the
corruption was happening. However, this turned out to be a wild goose
chase – up until the last moment where Org hands off to `write-file’ to
write the export to disk, the contents was fine.

Wanting to confirm this, I tried exporting to a buffer, where indeed,
the contents looked perfectly alright. *But as soon as I saved the
buffer, it got mangled.* This finally made me realize that the culprit
was somewhere else: in my auto-formatting setup. Before that, I hadn’t
realized that Org exports are done via a buffer, so I had no idea that
on-save hooks are run. But knowing this, everything started falling into
place.

It turned out to be a perfect storm of trickiness: I’m using Doom Emacs,
so I’d just enabled autoformatting via `(format +onsave)’. I don’t care
about HTML or XML formatting and never manually installed an
auto-formatter for those, so I was surprised to see the export
reformatted on save. The reason: macOS ships `tidy’ in its base
distribution, so it gets picked up as an auto-formatting provider.
Unfortunately, it’s a really old version (2006) which wreaks havoc on
non-ASCII UTF-8 bytes (details in [this Doom Emacs issue], if you’re
interested).

However, even a recent version of `tidy’ can have undesirable effects on
the export. For instance, I’ve confirmed that those aforementioned
footnote-related whitespace issues in ODT exports persist, even after
installing a new version of `tidy’ which handles UTF-8 correctly. This
is presumably due to `tidy’ re-arranging the XML in ways that affect
whitespace.

So I think Org should try to protect the export buffer from these
shenanigans as much as possible. The best way I can think of to achieve
that is to keep the export buffer in fundamental mode. This should
prevent all the mode-related code from running, potentially even making
the export speedier.

After some experimenting, the way I finally got this to work was by
locally overriding the `set-auto-mode’ function:

┌────
│ diff --git a/lisp/ox.el b/lisp/ox.el
│ index 2a3edaa50..d5a77c26e 100644
│ --- a/lisp/ox.el
│ +++ b/lisp/ox.el
│ @@ -6462,14 +6462,16 @@ or FILE."
│       ',ext-plist)))
│         (with-temp-buffer
│   (insert output)
│ - (let ((coding-system-for-write ',encoding))
│ + (cl-letf ((coding-system-for-write ',encoding)
│ +   ((symbol-function 'set-auto-mode) #'ignore))
│     (write-file ,file)))
│         (or (ignore-errors (funcall ',post-process ,file)) ,file)))
│          (let ((output (org-export-as
│                         backend subtreep visible-only body-only
ext-plist)))
│            (with-temp-buffer
│              (insert output)
│ -            (let ((coding-system-for-write encoding))
│ +            (cl-letf ((coding-system-for-write encoding)
│ +                      ((symbol-function 'set-auto-mode) #'ignore))
│        (write-file file)))
│            (when (and (org-export--copy-to-kill-ring-p) (org-string-nw-p
output))
│              (org-kill-new output))
└────

What are your thoughts? Is this desirable? In my mind, yes: even if
someone has configured auto-formatting manually, they might still not
realize it’s getting run in a hidden buffer they never get to see. After
all, this is an implementation detail – if Emacs had a built-in function
to write a string to a file, I presume that would get used instead?

(In a way, I was lucky that I encountered quite noticeable issues with
mangled characters. With the recent version of tidy, which only messes
up whitespace, I might not have noticed at all.)

And if such protection is desirable, is this the best way to do it? Or
can you come up with a better approach?

Anyway, thanks for reading this far :)

Best,

David


[this Doom Emacs issue]
<https://github.com/hlissner/doom-emacs/issues/6149>

[-- Attachment #2: Type: text/html, Size: 4922 bytes --]

             reply	other threads:[~2022-02-25 16:33 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-25 16:29 David Lukeš [this message]
2022-02-26  9:32 ` Protect Org export from auto-formatting hooks Nicolas Goaziou
2022-02-28 13:14   ` David Lukeš
2022-02-28 13:23     ` Nicolas Goaziou
2022-02-28 13:42       ` David Lukeš
2022-06-21 12:36         ` [PATCH] ox.el: Protect " David Lukes
2022-06-21 12:46           ` David Lukeš

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAEPTPEwOZjJLX1jyFmVR6jarsocyuD=Zf7HJ1i9G4SfWk4mUpQ@mail.gmail.com' \
    --to=dafydd.lukes@gmail.com \
    --cc=emacs-orgmode@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).