From: Ihor Radchenko <yantar92@gmail.com>
To: K K <k_foreign@outlook.com>
Cc: Max Nikulin <manikulin@gmail.com>,
"emacs-orgmode@gnu.org" <emacs-orgmode@gnu.org>
Subject: [PATCH] org-export: Remove zero-width space escapes during export
Date: Tue, 26 Jul 2022 20:59:18 +0800 [thread overview]
Message-ID: <87v8rkav2x.fsf@localhost> (raw)
In-Reply-To: <80f0990042a564556cc6b047a94f7e9dddf5a280.camel@outlook.com>
[-- Attachment #1: Type: text/plain, Size: 1597 bytes --]
K K <k_foreign@outlook.com> writes:
> My use case is to emphasize chinese characters without spaces being inserted, even those zero-width spaces. For example "中文*测*试" should be enough to emphasize "测".
>
> I am using zero-width spaces right now, and it works fine in org-mode buffers, but if exported to latex-pdf files, the U+200B ZERO WIDTH SPACE character will not be zero-width for certain fonts. So I hope not to use that character.
This is a bug. While escape symbols do not affect export in most common
scenarios, your report is adding yet another case when zero-width space
is actually altering the export result.
I am attaching a tentative patch that will make Org export remove
zero-width spaces when those spaces actually separate the object
boundaries.
Any objections?
> On Tue, 26 Jul 2022 09:26:42 +0800, Ihor Radchenko wrote:
>> Another idea we have discussed is using something similar to Markdown
>> format: **bold**, //italics//, __underline__, etc. It is less verbose
>> compared to the special blocks, which should be valuable for
>> Japanese/Chinese/other languages with no spaces between words.
>
> By the way, it seems that my use case has already been implemented by markdown-mode. In a markdown-mode buffer "中文**测**试" will certainly make "测" bold.
The idea was indeed inspired by Markdown.
However, Markdown is different - **bold** is the official syntax to
indicate bold markup. Though things are more complex in reality:
https://www.markdownguide.org/basic-syntax/ Markdown has its own edge
cases.
Best,
Ihor
[-- Attachment #2: 0001-org-export-Remove-zero-width-space-escapes-during-ex.patch --]
[-- Type: text/x-patch, Size: 3213 bytes --]
From 5764b41b858bff3d56dcb24741cf550a7e245d36 Mon Sep 17 00:00:00 2001
Message-Id: <5764b41b858bff3d56dcb24741cf550a7e245d36.1658840330.git.yantar92@gmail.com>
From: Ihor Radchenko <yantar92@gmail.com>
Date: Tue, 26 Jul 2022 20:50:47 +0800
Subject: [PATCH] org-export: Remove zero-width space escapes during export
* lisp/ox.el (org-export--remove-escaped): New function removing
zero-width spaces when they separate object boundaries.
(org-export-as): Call `org-export--remove-escaped'.
* testing/lisp/test-ox.el (test-org-export/remove-escaped): New test.
---
lisp/ox.el | 22 ++++++++++++++++++++++
testing/lisp/test-ox.el | 13 +++++++++++++
2 files changed, 35 insertions(+)
diff --git a/lisp/ox.el b/lisp/ox.el
index 40ad7ae4e..de034fd22 100644
--- a/lisp/ox.el
+++ b/lisp/ox.el
@@ -2916,6 +2916,25 @@ (defun org-export--remove-uninterpreted-data (data info)
;; Return modified parse tree.
data)
+(defun org-export--remove-escaped (data info)
+ "Remove escape symbols from plain-text in DATA.
+DATA is a parse tree or a secondary string. INFO is a plist
+containing export options. It is modified by side effect and
+returned by the function."
+ (org-element-map data '(plain-text)
+ (lambda (string)
+ (let (processed-string)
+ (setq processed-string
+ (replace-regexp-in-string "\\`" "" string))
+ (setq processed-string
+ (replace-regexp-in-string "\\'" "" processed-string))
+ (unless (equal string processed-string)
+ (org-element-insert-before processed-string string)
+ (org-element-extract-element string))))
+ info nil nil t)
+ ;; Return modified parse tree.
+ data)
+
;;;###autoload
(defun org-export-as
(backend &optional subtreep visible-only body-only ext-plist)
@@ -3046,6 +3065,9 @@ (defun org-export-as
;; communication channel.
(org-export--prune-tree tree info)
(org-export--remove-uninterpreted-data tree info)
+ ;; Remove zero-width spaces that escape Org syntax
+ ;; elements.
+ (org-export--remove-escaped tree info)
;; Call parse tree filters.
(setq tree
(org-export-filter-apply-functions
diff --git a/testing/lisp/test-ox.el b/testing/lisp/test-ox.el
index 7c71b6e24..ea4fce363 100644
--- a/testing/lisp/test-ox.el
+++ b/testing/lisp/test-ox.el
@@ -982,6 +982,19 @@ (ert-deftest test-org-export/uninterpreted ()
(section . (lambda (s c i) c))))
nil nil nil '(:with-sub-superscript {}))))))
+(ert-deftest test-org-export/remove-escaped ()
+ "Test removing escape symbols."
+ ;; Remove zero-width space around markup.
+ (should
+ (equal "This*is*test.\n"
+ (org-test-with-temp-text "This*is*test.\n"
+ (org-export-as (org-test-default-backend)))))
+ ;; Do not remove zero-width space in other places.
+ (should
+ (equal "Thisistest.\n"
+ (org-test-with-temp-text "Thisistest.\n"
+ (org-export-as (org-test-default-backend))))))
+
(ert-deftest test-org-export/export-scope ()
"Test all export scopes."
;; Subtree.
--
2.35.1
next prev parent reply other threads:[~2022-07-26 13:04 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-19 5:32 How to force markup without spaces cinsky
2012-11-19 7:11 ` Vladimir Lomov
2012-11-19 10:06 ` Seong-Kook Shin
2012-11-19 14:40 ` Suvayu Ali
2012-12-13 21:26 ` Bastien
2022-07-25 17:50 ` K
2022-07-25 18:27 ` K
2022-07-25 19:02 ` K
2022-07-26 1:26 ` Ihor Radchenko
2022-07-26 2:23 ` Max Nikulin
2022-07-26 4:26 ` K K
2022-07-26 6:30 ` Max Nikulin
2022-07-26 12:59 ` Ihor Radchenko [this message]
2022-07-26 14:25 ` [PATCH] org-export: Remove zero-width space escapes during export Timothy
2022-07-26 15:27 ` András Simonyi
2022-07-26 16:38 ` Max Nikulin
2022-07-27 3:30 ` Max Nikulin
2022-07-28 13:17 ` [PATCH] Add new entity \-- serving as markup separator/escape symbol Ihor Radchenko
2022-07-28 15:34 ` Max Nikulin
2022-07-29 1:43 ` Ihor Radchenko
2022-07-29 2:50 ` Max Nikulin
2022-07-29 9:06 ` [PATCH v2] " Ihor Radchenko
2022-07-30 0:22 ` Samuel Wales
2022-07-30 4:12 ` Samuel Wales
2022-07-30 6:49 ` Ihor Radchenko
2022-07-30 15:44 ` Max Nikulin
2022-07-28 22:20 ` [PATCH] " Tim Cross
2022-07-29 0:32 ` Juan Manuel Macías
2022-07-29 5:49 ` tomas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.orgmode.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87v8rkav2x.fsf@localhost \
--to=yantar92@gmail.com \
--cc=emacs-orgmode@gnu.org \
--cc=k_foreign@outlook.com \
--cc=manikulin@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).