From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp11.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id MPmbIWLm32KmWQEAbAwnHQ (envelope-from ) for ; Tue, 26 Jul 2022 15:04:34 +0200 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp11.migadu.com with LMTPS id IEOeIWLm32LhkQAA9RJhRA (envelope-from ) for ; Tue, 26 Jul 2022 15:04:34 +0200 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 26177D649 for ; Tue, 26 Jul 2022 15:04:34 +0200 (CEST) Received: from localhost ([::1]:40608 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oGKEi-0001J4-Ks for larch@yhetil.org; Tue, 26 Jul 2022 09:04:32 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:42068) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oGK8g-0007Ud-Bn for emacs-orgmode@gnu.org; Tue, 26 Jul 2022 08:58:29 -0400 Received: from mail-pj1-x102f.google.com ([2607:f8b0:4864:20::102f]:39517) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1oGK8e-0006mM-Cd for emacs-orgmode@gnu.org; Tue, 26 Jul 2022 08:58:18 -0400 Received: by mail-pj1-x102f.google.com with SMTP id x24-20020a17090ab01800b001f21556cf48so17205155pjq.4 for ; Tue, 26 Jul 2022 05:58:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:in-reply-to:references:date:message-id :mime-version; bh=nwx5HgAAL8TiRU1QxJ5gjHlGvfMynVcbsAxIKtQwPtA=; b=Um6PpKortL4pf47q0mj7X7mvcBzUu/FOqWiFqqO4hIaA0gqa9uNhNLhYPXiF0EPrPm g0c/b6b5EaiRtJZ4xY/l8my8QHrlLUEP1ZqsOBHTcsUIg0aH1bBFQ2NAbtagbLQZ0C15 s8/g/wFBzNIRqskXYM7ovmxSkMSPIXiPg1/PEUvItt/mvQQsnVVD8qq0vt5QJwcPwAwB AaW1XFS7TLsddoROFZNMnpUtVhDTpunlYfyDE2GPvUIkhmGpsssvxUo0shbd5etFCSw+ WldDInBeixbCqkZaHScMG8jj+7NbBgdoIwjxsGWyZjAczfZ6NGd1S5lqw75gpFWTB5Tg Km9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version; bh=nwx5HgAAL8TiRU1QxJ5gjHlGvfMynVcbsAxIKtQwPtA=; b=UtSsjjdUtUOmpEKfgAHSW3siBQbFb6iQ65SAUvEj5ZOO90MAymO9PxvGi/ZtV1M5XO 1tGprLY9Mm5DYa3Fc96/uLj/lg1Q4hFKVjbCmsaMf/R30tUgbmcuLFSg/nQuwQYVFyq9 rfiMvoAduvbdAqrAdUUria8kD2UWKBLmaHMBkIZWV1bO+P+cWJJl3qME1n3pyHPsKx7i y5UZ2pF5TR4RabtuedAYDFN/pnBbjBeguwEgX7NgGxoqqgHUgLxF3TUO6TnSvCT3rF6U koKK9j4KcqaAHz3RiQgo6XVQEwRzMkIw5Uea8FETqBOFiiJHqvtbbIZwxAALaGytecFn Idpg== X-Gm-Message-State: AJIora8npkVeSJstpYa+6envBIqbj6Kq0ryvPMdQfht1Bq7/2fd0VxUi CKLir/1KdipFZWw9utxbXhc= X-Google-Smtp-Source: AGRyM1vichsawlIi7bqxYQ8R8oNMwODUvcyJzeJmG9bj6dtCOjhxeu7gAkpD8fzUuqNGGrnr5AjTkQ== X-Received: by 2002:a17:902:ef4f:b0:16c:d1a7:19f3 with SMTP id e15-20020a170902ef4f00b0016cd1a719f3mr17185787plx.65.1658840294715; Tue, 26 Jul 2022 05:58:14 -0700 (PDT) Received: from localhost ([2409:8a70:217:4f80:8ec6:81ff:fe70:339d]) by smtp.gmail.com with ESMTPSA id j7-20020a170903024700b0016c20d40ee7sm11717223plh.174.2022.07.26.05.58.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Jul 2022 05:58:13 -0700 (PDT) From: Ihor Radchenko To: K K Cc: Max Nikulin , "emacs-orgmode@gnu.org" Subject: [PATCH] org-export: Remove zero-width space escapes during export In-Reply-To: <80f0990042a564556cc6b047a94f7e9dddf5a280.camel@outlook.com> References: <87r128d5pp.fsf@localhost> <80f0990042a564556cc6b047a94f7e9dddf5a280.camel@outlook.com> Date: Tue, 26 Jul 2022 20:59:18 +0800 Message-ID: <87v8rkav2x.fsf@localhost> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Received-SPF: pass client-ip=2607:f8b0:4864:20::102f; envelope-from=yantar92@gmail.com; helo=mail-pj1-x102f.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Migadu-Flow: FLOW_IN X-Migadu-To: larch@yhetil.org X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1658840674; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=nwx5HgAAL8TiRU1QxJ5gjHlGvfMynVcbsAxIKtQwPtA=; b=DxTEKx2Mjnw5+KWxQ2de9gfzM3oY0ihk2aucO6R4iRgkSt3+rfnysCFGzem945CDJllYeE HOzSrz7pPIfG2hILwAVEYh+J+qUuaTXjNFVYsvvt91WvfCH0lvKVH/n2DjsqikJWqM479A ZInKG0Lv8uEET/NMKQdSxGaUyd7X8FI4m5OTFBo072/l8ZshlnmCm27pgvuA8WM5pLdwk0 4B7C3WFf72CRot60V4YHLrVrXq9Eh7Uo9sWhLGYEIDL62reuG5WZtQAldu7PoxV23Oy364 s+frePSv6URdu7mE3sYIQbn9PffsJxS6ZzAk3WwkCBR0DZ9UO9t5hWC/Wapwxw== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1658840674; a=rsa-sha256; cv=none; b=X5sWE0WS+xUA480+CA/AXcehWDFQYetKt0MKP8voWa3V3pbmTDBnLBZqL6v4UQFPH593oK KmyUgfHohWlZ/OAzPYC22Y1wcDebC2M9r06TkbpiFTMtU3P2rTtH4ycGslb6+rWb1llX7c IhdCVoLt0BXZGIDu5CsPHs1KbpmipdmA7GXwljxgcWkRhHEj7CulqmYHnkcktF3iFwRzC4 j++1xrDrATgZkg1bNpGqAO8YJzFwY69c2pqNrCoEaSMAM/sSfLAucLOiVimzlnB+tzon6V NrfbSMKnJtewb9MA6zWQBpzevruNJNAUO8FlSaDVw3ZxnAt3ZlHAWkuHYu5B8Q== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=Um6PpKor; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -3.43 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=Um6PpKor; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: 26177D649 X-Spam-Score: -3.43 X-Migadu-Scanner: scn1.migadu.com X-TUID: cAs5lXyCK6P3 --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable K K writes: > My use case is to emphasize chinese characters without spaces being inser= ted, even those zero-width spaces. For example "=E4=B8=AD=E6=96=87*=E6=B5= =8B*=E8=AF=95" should be enough to emphasize "=E6=B5=8B". > > I am using zero-width spaces right now, and it works fine in org-mode buf= fers, but if exported to latex-pdf files, the U+200B ZERO WIDTH SPACE chara= cter will not be zero-width for certain fonts. So I hope not to use that ch= aracter. This is a bug. While escape symbols do not affect export in most common scenarios, your report is adding yet another case when zero-width space is actually altering the export result. I am attaching a tentative patch that will make Org export remove zero-width spaces when those spaces actually separate the object boundaries. Any objections? > On Tue, 26 Jul 2022 09:26:42 +0800, Ihor Radchenko wrote: >> Another idea we have discussed is using something similar to Markdown >> format: **bold**, //italics//, __underline__, etc. It is less verbose >> compared to the special blocks, which should be valuable for >> Japanese/Chinese/other languages with no spaces between words. > > By the way, it seems that my use case has already been implemented by mar= kdown-mode. In a markdown-mode buffer "=E4=B8=AD=E6=96=87**=E6=B5=8B**=E8= =AF=95" will certainly make "=E6=B5=8B" bold. The idea was indeed inspired by Markdown. However, Markdown is different - **bold** is the official syntax to indicate bold markup. Though things are more complex in reality: https://www.markdownguide.org/basic-syntax/ Markdown has its own edge cases. Best, Ihor --=-=-= Content-Type: text/x-patch; charset=utf-8 Content-Disposition: inline; filename=0001-org-export-Remove-zero-width-space-escapes-during-ex.patch Content-Transfer-Encoding: quoted-printable >From 5764b41b858bff3d56dcb24741cf550a7e245d36 Mon Sep 17 00:00:00 2001 Message-Id: <5764b41b858bff3d56dcb24741cf550a7e245d36.1658840330.git.yantar= 92@gmail.com> From: Ihor Radchenko Date: Tue, 26 Jul 2022 20:50:47 +0800 Subject: [PATCH] org-export: Remove zero-width space escapes during export * lisp/ox.el (org-export--remove-escaped): New function removing zero-width spaces when they separate object boundaries. (org-export-as): Call `org-export--remove-escaped'. * testing/lisp/test-ox.el (test-org-export/remove-escaped): New test. --- lisp/ox.el | 22 ++++++++++++++++++++++ testing/lisp/test-ox.el | 13 +++++++++++++ 2 files changed, 35 insertions(+) diff --git a/lisp/ox.el b/lisp/ox.el index 40ad7ae4e..de034fd22 100644 --- a/lisp/ox.el +++ b/lisp/ox.el @@ -2916,6 +2916,25 @@ (defun org-export--remove-uninterpreted-data (data i= nfo) ;; Return modified parse tree. data) =20 +(defun org-export--remove-escaped (data info) + "Remove escape symbols from plain-text in DATA. +DATA is a parse tree or a secondary string. INFO is a plist +containing export options. It is modified by side effect and +returned by the function." + (org-element-map data '(plain-text) + (lambda (string) + (let (processed-string) + (setq processed-string + (replace-regexp-in-string "\\`=E2=80=8B" "" string)) + (setq processed-string + (replace-regexp-in-string "=E2=80=8B\\'" "" processed-string= )) + (unless (equal string processed-string) + (org-element-insert-before processed-string string) + (org-element-extract-element string)))) + info nil nil t) + ;; Return modified parse tree. + data) + ;;;###autoload (defun org-export-as (backend &optional subtreep visible-only body-only ext-plist) @@ -3046,6 +3065,9 @@ (defun org-export-as ;; communication channel. (org-export--prune-tree tree info) (org-export--remove-uninterpreted-data tree info) + ;; Remove zero-width spaces that escape Org syntax + ;; elements. + (org-export--remove-escaped tree info) ;; Call parse tree filters. (setq tree (org-export-filter-apply-functions diff --git a/testing/lisp/test-ox.el b/testing/lisp/test-ox.el index 7c71b6e24..ea4fce363 100644 --- a/testing/lisp/test-ox.el +++ b/testing/lisp/test-ox.el @@ -982,6 +982,19 @@ (ert-deftest test-org-export/uninterpreted () (section . (lambda (s c i) c)))) nil nil nil '(:with-sub-superscript {})))))) =20 +(ert-deftest test-org-export/remove-escaped () + "Test removing escape symbols." + ;; Remove zero-width space around markup. + (should + (equal "This*is*test.\n" + (org-test-with-temp-text "This=E2=80=8B*is*=E2=80=8Btest.\n" + (org-export-as (org-test-default-backend))))) + ;; Do not remove zero-width space in other places. + (should + (equal "This=E2=80=8Bis=E2=80=8Btest.\n" + (org-test-with-temp-text "This=E2=80=8Bis=E2=80=8Btest.\n" + (org-export-as (org-test-default-backend)))))) + (ert-deftest test-org-export/export-scope () "Test all export scopes." ;; Subtree. --=20 2.35.1 --=-=-=--