From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp10.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id eKtmAswEGWJ7HAAAgWs5BA (envelope-from ) for ; Fri, 25 Feb 2022 17:33:16 +0100 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp10.migadu.com with LMTPS id eD7MNssEGWKIcgEAG6o9tA (envelope-from ) for ; Fri, 25 Feb 2022 17:33:15 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 7609827078 for ; Fri, 25 Feb 2022 17:33:15 +0100 (CET) Received: from localhost ([::1]:55330 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nNdWs-0000ET-AS for larch@yhetil.org; Fri, 25 Feb 2022 11:33:14 -0500 Received: from eggs.gnu.org ([209.51.188.92]:54360) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nNdTH-0007TA-Ek for emacs-orgmode@gnu.org; Fri, 25 Feb 2022 11:29:35 -0500 Received: from [2607:f8b0:4864:20::d2e] (port=41714 helo=mail-io1-xd2e.google.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1nNdTF-0002i1-ET for emacs-orgmode@gnu.org; Fri, 25 Feb 2022 11:29:31 -0500 Received: by mail-io1-xd2e.google.com with SMTP id d19so7072295ioc.8 for ; Fri, 25 Feb 2022 08:29:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:from:date:message-id:subject:to; bh=KixCU7qwtfG2AiCF1J3HClikaabh/q8AIF7Iw0vDHKE=; b=IAhGnxS2rU3E4gZwtxJQbVp3ac8R1MSyAeWRHSMXLUxSvQj1bH5RgVacg4Q6R+Qqmg 4ZZjPx0c2CawEsHwjZxHSAA6CezncVGdNncd/CQCu71VCLKoo+BY5YUO93kgjAiVVLP2 qUmwZ7dCP6KCa7qg2/y1LdnpLIUtodUHAtwJlYQGMw1VOJ8+F4lQKBgC/4Jj0qlS1yRd KnsUFknoXPDXVBfs4sX25qPkZoCnRGs10vCJE+mTruuGMDOE1ER/Hx357suxZG3Rerhl rYn5u8lC4ZI9bIVmKlJ3nJ8EXgcDCd8NTvSK5Dy/VOhp1LcvxdphrT37SaYIiKcLegNz 94Xg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=KixCU7qwtfG2AiCF1J3HClikaabh/q8AIF7Iw0vDHKE=; b=NzTiIsls7Xf0wWjnmzhPaTk1HWYJMfE1xDf7gr6wNOfby+yw1eDJV5NYcdMNwfA7Y9 eofeB92W0S5o6nrrtwEbMIvvgqIOWMR50Kl9ncLgFqg3q/f5AjtaEwIj5aPDgRhw4reE xs/HoJLLSGwHvzzAPnWkrJsw1WYUz/Smz7Q+mL6qjQ52tkAf16Lj60Z8TUgabH47C8BS VYFffpYmHd+cC7HJEaVorZ9XWiJkLBhxbVbCkMc6+QykK4v77q0AhBN3sTqnRwWT0+GP rHsvcGhksyeea2JJsfnWPC2KYKf9ntvh0BrbKKdn8txeozX/RHQU90fsqgQHdJLgZDsC PmIw== X-Gm-Message-State: AOAM532ZuwOlSqD6A4pqUxOo3QPehQxjPkPwX6fBS4pNJHo8bblnEYku neTqVNaQi0NoC9IFHp/RIGMPn2QWooWu97q0VKjtSRAnMIs= X-Google-Smtp-Source: ABdhPJw5wsno9tTrQbTNxLxNcihgoWo4nA+eme4N3kM4oyT7kKkNTwIi/m6KGbNaGSAI6EWLhf2ZWuoLrI5FHvcBovs= X-Received: by 2002:a05:6602:27c9:b0:5ed:1c27:2982 with SMTP id l9-20020a05660227c900b005ed1c272982mr5838123ios.163.1645806557013; Fri, 25 Feb 2022 08:29:17 -0800 (PST) MIME-Version: 1.0 From: =?UTF-8?B?RGF2aWQgTHVrZcWh?= Date: Fri, 25 Feb 2022 17:29:06 +0100 Message-ID: Subject: Protect Org export from auto-formatting hooks To: emacs-orgmode@gnu.org Content-Type: multipart/alternative; boundary="00000000000033294805d8da33bb" X-Host-Lookup-Failed: Reverse DNS lookup failed for 2607:f8b0:4864:20::d2e (failed) Received-SPF: pass client-ip=2607:f8b0:4864:20::d2e; envelope-from=dafydd.lukes@gmail.com; helo=mail-io1-xd2e.google.com X-Spam_score_int: -6 X-Spam_score: -0.7 X-Spam_bar: / X-Spam_report: (-0.7 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, PDS_HP_HELO_NORDNS=0.659, RCVD_IN_DNSWL_NONE=-0.0001, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Migadu-Flow: FLOW_IN X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1645806795; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=KixCU7qwtfG2AiCF1J3HClikaabh/q8AIF7Iw0vDHKE=; b=Izx5WIPo4bYHLrQ74IbtqliTdeMud8Br6G2TBlUdfkyLXiCa4o2IYQA7tdF4kOdbqYLxn7 vFn5yNWTp9DKb5FzgJzRqWsc1wy1p1N4XpbPX/bwSUH1+Y/NpMsyVI6QweVem0AHtX36hX 9qShmeSJWtzBMVBCRCGfMfiozjHbJ45WUtNkTeTn53YshZRdP1Tp+DSpTQKGahBZnpJeo+ kVvMWNC+0sykR8BpqnyEwio64JdfnEwwbeY/bMSHzEkASQ/LLVI7od83gUCa72OBboN7F/ zm6t40BqdZo1EzK/PXWQfbFKqmPFI49RoLwVSp7EzJEfzrbmrCjfqVu/lOTRBw== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1645806795; a=rsa-sha256; cv=none; b=JnWxGPmyGqHvT+gNpVNsPQ7De+SUVEXmnPyf7lgFrC4dP5n/exoaVhF5VGagceyIOOa8DK 5TTG7L7uvN3t+e2XpJn+P5nrrZzDU+sFJmPVeQ16qnDG9Wssqw7ryvubj2HHihaqqtqHiR K6KfefLnAHPqTBBtRu9aYW8zy/9qklXK90bgWOJJY4/r7bLzUr9QTp2jxNFC6Cc8KYb9C2 ArJfsldtCveO4eLECrRNEXhEZIXTzU3tfuW+ymtmHf0RTIMnENkPF/nE1ozBY3d2QhM8r5 NVEImy34fgU28KAYWi+jBKOgKBqVJT9GLOcfhBUEcNnylDn3Ipp5yQY7AL9kpg== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=IAhGnxS2; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -9.13 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=IAhGnxS2; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: 7609827078 X-Spam-Score: -9.13 X-Migadu-Scanner: scn0.migadu.com X-TUID: izUSmgjd399j --00000000000033294805d8da33bb Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi all, I recently started using non-Pandoc Org exports to HTML and ODT and noticed some problems on my macOS 12.2.1 box. Non-ASCII characters were coming out garbled, the whitespace around footnotes was getting rewritten in weird ways. On my Linux boxes, the same export commands went fine, so I thought this must be related to some encoding assumptions Org makes that are slightly different on the two platforms. Armed with my very limited knowledge of Emacs, Org Mode and Elisp, I decided to dig into the Org codebase and try to figure out where the corruption was happening. However, this turned out to be a wild goose chase =E2=80=93 up until the last moment where Org hands off to `write-file= =E2=80=99 to write the export to disk, the contents was fine. Wanting to confirm this, I tried exporting to a buffer, where indeed, the contents looked perfectly alright. *But as soon as I saved the buffer, it got mangled.* This finally made me realize that the culprit was somewhere else: in my auto-formatting setup. Before that, I hadn=E2=80= =99t realized that Org exports are done via a buffer, so I had no idea that on-save hooks are run. But knowing this, everything started falling into place. It turned out to be a perfect storm of trickiness: I=E2=80=99m using Doom E= macs, so I=E2=80=99d just enabled autoformatting via `(format +onsave)=E2=80=99. = I don=E2=80=99t care about HTML or XML formatting and never manually installed an auto-formatter for those, so I was surprised to see the export reformatted on save. The reason: macOS ships `tidy=E2=80=99 in its base distribution, so it gets picked up as an auto-formatting provider. Unfortunately, it=E2=80=99s a really old version (2006) which wreaks havoc = on non-ASCII UTF-8 bytes (details in [this Doom Emacs issue], if you=E2=80=99r= e interested). However, even a recent version of `tidy=E2=80=99 can have undesirable effec= ts on the export. For instance, I=E2=80=99ve confirmed that those aforementioned footnote-related whitespace issues in ODT exports persist, even after installing a new version of `tidy=E2=80=99 which handles UTF-8 correctly. T= his is presumably due to `tidy=E2=80=99 re-arranging the XML in ways that affec= t whitespace. So I think Org should try to protect the export buffer from these shenanigans as much as possible. The best way I can think of to achieve that is to keep the export buffer in fundamental mode. This should prevent all the mode-related code from running, potentially even making the export speedier. After some experimenting, the way I finally got this to work was by locally overriding the `set-auto-mode=E2=80=99 function: =E2=94=8C=E2=94=80=E2=94=80=E2=94=80=E2=94=80 =E2=94=82 diff --git a/lisp/ox.el b/lisp/ox.el =E2=94=82 index 2a3edaa50..d5a77c26e 100644 =E2=94=82 --- a/lisp/ox.el =E2=94=82 +++ b/lisp/ox.el =E2=94=82 @@ -6462,14 +6462,16 @@ or FILE." =E2=94=82 ',ext-plist))) =E2=94=82 (with-temp-buffer =E2=94=82 (insert output) =E2=94=82 - (let ((coding-system-for-write ',encoding)) =E2=94=82 + (cl-letf ((coding-system-for-write ',encoding) =E2=94=82 + ((symbol-function 'set-auto-mode) #'ignore)) =E2=94=82 (write-file ,file))) =E2=94=82 (or (ignore-errors (funcall ',post-process ,file)) ,file)= )) =E2=94=82 (let ((output (org-export-as =E2=94=82 backend subtreep visible-only body-only ext-plist))) =E2=94=82 (with-temp-buffer =E2=94=82 (insert output) =E2=94=82 - (let ((coding-system-for-write encoding)) =E2=94=82 + (cl-letf ((coding-system-for-write encoding) =E2=94=82 + ((symbol-function 'set-auto-mode) #'ignore= )) =E2=94=82 (write-file file))) =E2=94=82 (when (and (org-export--copy-to-kill-ring-p) (org-stri= ng-nw-p output)) =E2=94=82 (org-kill-new output)) =E2=94=94=E2=94=80=E2=94=80=E2=94=80=E2=94=80 What are your thoughts? Is this desirable? In my mind, yes: even if someone has configured auto-formatting manually, they might still not realize it=E2=80=99s getting run in a hidden buffer they never get to see. = After all, this is an implementation detail =E2=80=93 if Emacs had a built-in fun= ction to write a string to a file, I presume that would get used instead? (In a way, I was lucky that I encountered quite noticeable issues with mangled characters. With the recent version of tidy, which only messes up whitespace, I might not have noticed at all.) And if such protection is desirable, is this the best way to do it? Or can you come up with a better approach? Anyway, thanks for reading this far :) Best, David [this Doom Emacs issue] --00000000000033294805d8da33bb Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi all,

I recently started using non-Pandoc Org exp= orts to HTML and ODT and
noticed some problems on my macOS 12.2.1 box. N= on-ASCII characters were
coming out garbled, the whitespace around footn= otes was getting
rewritten in weird ways. On my Linux boxes, the same ex= port commands
went fine, so I thought this must be related to some encod= ing
assumptions Org makes that are slightly different on the two platfor= ms.

Armed with my very limited knowledge of Emacs, Org Mode and Elis= p, I
decided to dig into the Org codebase and try to figure out where th= e
corruption was happening. However, this turned out to be a wild goose<= br>chase =E2=80=93 up until the last moment where Org hands off to `write-f= ile=E2=80=99 to
write the export to disk, the contents was fine.

= Wanting to confirm this, I tried exporting to a buffer, where indeed,
th= e contents looked perfectly alright. *But as soon as I saved the
buffer,= it got mangled.* This finally made me realize that the culprit
was some= where else: in my auto-formatting setup. Before that, I hadn=E2=80=99t
r= ealized that Org exports are done via a buffer, so I had no idea that
on= -save hooks are run. But knowing this, everything started falling into
p= lace.

It turned out to be a perfect storm of trickiness: I=E2=80=99m= using Doom Emacs,
so I=E2=80=99d just enabled autoformatting via `(form= at +onsave)=E2=80=99. I don=E2=80=99t care
about HTML or XML formatting = and never manually installed an
auto-formatter for those, so I was surpr= ised to see the export
reformatted on save. The reason: macOS ships `tid= y=E2=80=99 in its base
distribution, so it gets picked up as an auto-for= matting provider.
Unfortunately, it=E2=80=99s a really old version (2006= ) which wreaks havoc on
non-ASCII UTF-8 bytes (details in [this Doom Ema= cs issue], if you=E2=80=99re
interested).

However, even a recent = version of `tidy=E2=80=99 can have undesirable effects on
the export. Fo= r instance, I=E2=80=99ve confirmed that those aforementioned
footnote-re= lated whitespace issues in ODT exports persist, even after
installing a = new version of `tidy=E2=80=99 which handles UTF-8 correctly. This
is pre= sumably due to `tidy=E2=80=99 re-arranging the XML in ways that affect
w= hitespace.

So I think Org should try to protect the export buffer fr= om these
shenanigans as much as possible. The best way I can think of to= achieve
that is to keep the export buffer in fundamental mode. This sho= uld
prevent all the mode-related code from running, potentially even mak= ing
the export speedier.

After some experimenting, the way I fina= lly got this to work was by
locally overriding the `set-auto-mode=E2=80= =99 function:

=E2=94=8C=E2=94=80=E2=94=80=E2=94=80=E2=94=80
=E2= =94=82 diff --git a/lisp/ox.el b/lisp/ox.el
=E2=94=82 index 2a3edaa50..d= 5a77c26e 100644
=E2=94=82 --- a/lisp/ox.el
=E2=94=82 +++ b/lisp/ox.el=
=E2=94=82 @@ -6462,14 +6462,16 @@ or FILE."
=E2=94=82 =C2=A0 = =C2=A0 =C2=A0 ',ext-plist)))
=E2=94=82 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = (with-temp-buffer
=E2=94=82 =C2=A0 (insert output)
=E2=94=82 - (l= et ((coding-system-for-write ',encoding))
=E2=94=82 + (cl-letf ((c= oding-system-for-write ',encoding)
=E2=94=82 + =C2=A0 ((symbol-fu= nction 'set-auto-mode) #'ignore))
=E2=94=82 =C2=A0 =C2=A0 (wri= te-file ,file)))
=E2=94=82 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (or (ignore-erro= rs (funcall ',post-process ,file)) ,file)))
=E2=94=82 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0(let ((output (org-export-as
=E2=94=82 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 back= end subtreep visible-only body-only ext-plist)))
=E2=94=82 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0(with-temp-buffer
=E2=94=82 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(insert output)
=E2=94=82 - =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(let ((coding-system-for-write encoding))=E2=94=82 + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(cl-letf ((coding-sys= tem-for-write encoding)
=E2=94=82 + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0((symbol-function 'set-auto-mo= de) #'ignore))
=E2=94=82 =C2=A0 =C2=A0 =C2=A0 =C2=A0(write-file fil= e)))
=E2=94=82 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(when (and (org-= export--copy-to-kill-ring-p) (org-string-nw-p output))
=E2=94=82 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(org-kill-new output))
=E2=94= =94=E2=94=80=E2=94=80=E2=94=80=E2=94=80

What are your thoughts? Is t= his desirable? In my mind, yes: even if
someone has configured auto-form= atting manually, they might still not
realize it=E2=80=99s getting run i= n a hidden buffer they never get to see. After
all, this is an implement= ation detail =E2=80=93 if Emacs had a built-in function
to write a strin= g to a file, I presume that would get used instead?

(In a way, I was= lucky that I encountered quite noticeable issues with
mangled character= s. With the recent version of tidy, which only messes
up whitespace, I m= ight not have noticed at all.)

And if such protection is desirable, = is this the best way to do it? Or
can you come up with a better approach= ?

Anyway, thanks for reading this far :)

Best,

David

[this Doom Emacs issue]
<https://github.com/hlissner/doom-emacs/issues/= 6149>
--00000000000033294805d8da33bb--