emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Ihor Radchenko <yantar92@gmail.com>
To: K K <k_foreign@outlook.com>
Cc: Max Nikulin <manikulin@gmail.com>,
	 "emacs-orgmode@gnu.org" <emacs-orgmode@gnu.org>
Subject: [PATCH] org-export: Remove zero-width space escapes during export
Date: Tue, 26 Jul 2022 20:59:18 +0800	[thread overview]
Message-ID: <87v8rkav2x.fsf@localhost> (raw)
In-Reply-To: <80f0990042a564556cc6b047a94f7e9dddf5a280.camel@outlook.com>

[-- Attachment #1: Type: text/plain, Size: 1597 bytes --]

K K <k_foreign@outlook.com> writes:

> My use case is to emphasize chinese characters without spaces being inserted, even those zero-width spaces. For example "中文*测*试" should be enough to emphasize "测".
>
> I am using zero-width spaces right now, and it works fine in org-mode buffers, but if exported to latex-pdf files, the U+200B ZERO WIDTH SPACE character will not be zero-width for certain fonts. So I hope not to use that character.

This is a bug. While escape symbols do not affect export in most common
scenarios, your report is adding yet another case when zero-width space
is actually altering the export result.

I am attaching a tentative patch that will make Org export remove
zero-width spaces when those spaces actually separate the object
boundaries.

Any objections?

> On Tue, 26 Jul 2022 09:26:42 +0800, Ihor Radchenko wrote:
>> Another idea we have discussed is using something similar to Markdown
>> format: **bold**, //italics//, __underline__, etc. It is less verbose
>> compared to the special blocks, which should be valuable for
>> Japanese/Chinese/other languages with no spaces between words.
>
> By the way, it seems that my use case has already been implemented by markdown-mode. In a markdown-mode buffer "中文**测**试" will certainly make "测" bold.

The idea was indeed inspired by Markdown.
However, Markdown is different - **bold** is the official syntax to
indicate bold markup. Though things are more complex in reality:
https://www.markdownguide.org/basic-syntax/ Markdown has its own edge
cases.

Best,
Ihor


[-- Attachment #2: 0001-org-export-Remove-zero-width-space-escapes-during-ex.patch --]
[-- Type: text/x-patch, Size: 3213 bytes --]

From 5764b41b858bff3d56dcb24741cf550a7e245d36 Mon Sep 17 00:00:00 2001
Message-Id: <5764b41b858bff3d56dcb24741cf550a7e245d36.1658840330.git.yantar92@gmail.com>
From: Ihor Radchenko <yantar92@gmail.com>
Date: Tue, 26 Jul 2022 20:50:47 +0800
Subject: [PATCH] org-export: Remove zero-width space escapes during export

* lisp/ox.el (org-export--remove-escaped): New function removing
zero-width spaces when they separate object boundaries.
(org-export-as): Call `org-export--remove-escaped'.
* testing/lisp/test-ox.el (test-org-export/remove-escaped): New test.
---
 lisp/ox.el              | 22 ++++++++++++++++++++++
 testing/lisp/test-ox.el | 13 +++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/lisp/ox.el b/lisp/ox.el
index 40ad7ae4e..de034fd22 100644
--- a/lisp/ox.el
+++ b/lisp/ox.el
@@ -2916,6 +2916,25 @@ (defun org-export--remove-uninterpreted-data (data info)
   ;; Return modified parse tree.
   data)
 
+(defun org-export--remove-escaped (data info)
+  "Remove escape symbols from plain-text in DATA.
+DATA is a parse tree or a secondary string.  INFO is a plist
+containing export options.  It is modified by side effect and
+returned by the function."
+  (org-element-map data '(plain-text)
+    (lambda (string)
+      (let (processed-string)
+        (setq processed-string
+              (replace-regexp-in-string "\\`​" "" string))
+        (setq processed-string
+              (replace-regexp-in-string "​\\'" "" processed-string))
+        (unless (equal string processed-string)
+          (org-element-insert-before processed-string string)
+          (org-element-extract-element string))))
+    info nil nil t)
+  ;; Return modified parse tree.
+  data)
+
 ;;;###autoload
 (defun org-export-as
     (backend &optional subtreep visible-only body-only ext-plist)
@@ -3046,6 +3065,9 @@ (defun org-export-as
 	   ;; communication channel.
 	   (org-export--prune-tree tree info)
 	   (org-export--remove-uninterpreted-data tree info)
+           ;; Remove zero-width spaces that escape Org syntax
+           ;; elements.
+           (org-export--remove-escaped tree info)
 	   ;; Call parse tree filters.
 	   (setq tree
 	         (org-export-filter-apply-functions
diff --git a/testing/lisp/test-ox.el b/testing/lisp/test-ox.el
index 7c71b6e24..ea4fce363 100644
--- a/testing/lisp/test-ox.el
+++ b/testing/lisp/test-ox.el
@@ -982,6 +982,19 @@ (ert-deftest test-org-export/uninterpreted ()
 			     (section . (lambda (s c i) c))))
 	     nil nil nil '(:with-sub-superscript {}))))))
 
+(ert-deftest test-org-export/remove-escaped ()
+  "Test removing escape symbols."
+  ;; Remove zero-width space around markup.
+  (should
+   (equal "This*is*test.\n"
+          (org-test-with-temp-text "This​*is*​test.\n"
+            (org-export-as (org-test-default-backend)))))
+  ;; Do not remove zero-width space in other places.
+  (should
+   (equal "This​is​test.\n"
+          (org-test-with-temp-text "This​is​test.\n"
+            (org-export-as (org-test-default-backend))))))
+
 (ert-deftest test-org-export/export-scope ()
   "Test all export scopes."
   ;; Subtree.
-- 
2.35.1


  parent reply	other threads:[~2022-07-26 13:04 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-19  5:32 How to force markup without spaces cinsky
2012-11-19  7:11 ` Vladimir Lomov
2012-11-19 10:06   ` Seong-Kook Shin
2012-11-19 14:40     ` Suvayu Ali
2012-12-13 21:26       ` Bastien
2022-07-25 17:50         ` K
2022-07-25 18:27         ` K
2022-07-25 19:02           ` K
2022-07-26  1:26             ` Ihor Radchenko
2022-07-26  2:23               ` Max Nikulin
2022-07-26  4:26                 ` K K
2022-07-26  6:30                   ` Max Nikulin
2022-07-26 12:59                   ` Ihor Radchenko [this message]
2022-07-26 14:25                     ` [PATCH] org-export: Remove zero-width space escapes during export Timothy
2022-07-26 15:27                       ` András Simonyi
2022-07-26 16:38                     ` Max Nikulin
2022-07-27  3:30                     ` Max Nikulin
2022-07-28 13:17                     ` [PATCH] Add new entity \-- serving as markup separator/escape symbol Ihor Radchenko
2022-07-28 15:34                       ` Max Nikulin
2022-07-29  1:43                         ` Ihor Radchenko
2022-07-29  2:50                           ` Max Nikulin
2022-07-29  9:06                             ` [PATCH v2] " Ihor Radchenko
2022-07-30  0:22                               ` Samuel Wales
2022-07-30  4:12                                 ` Samuel Wales
2022-07-30  6:49                                 ` Ihor Radchenko
2022-07-30 15:44                                   ` Max Nikulin
2022-07-28 22:20                       ` [PATCH] " Tim Cross
2022-07-29  0:32                       ` Juan Manuel Macías
2022-07-29  5:49                       ` tomas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87v8rkav2x.fsf@localhost \
    --to=yantar92@gmail.com \
    --cc=emacs-orgmode@gnu.org \
    --cc=k_foreign@outlook.com \
    --cc=manikulin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).