From: Joseph Turner <joseph@breatheoutbreathe.in>
To: Ihor Radchenko <yantar92@posteo.net>
Cc: Christian Moe <mail@christianmoe.com>,
Org Mode Mailing List <emacs-orgmode@gnu.org>,
Bohong Huang <bohonghuang@qq.com>
Subject: Re: Form feed characters break odt export
Date: Fri, 27 Dec 2024 12:42:48 -0800 [thread overview]
Message-ID: <87jzbk503b.fsf@breatheoutbreathe.in> (raw)
In-Reply-To: <87plld5svl.fsf@localhost> (Ihor Radchenko's message of "Fri, 27 Dec 2024 10:21:02 +0000")
[-- Attachment #1: Type: text/plain, Size: 5065 bytes --]
Ihor Radchenko <yantar92@posteo.net> writes:
[...]
> +(defconst org-odt-forbidden-char-re
> + (rx (not (in ?\N{U+9} ?\N{U+A} ?\N{U+D}
> + (?\N{U+20} . ?\N{U+D7FF})
> + (?\N{U+E000} . ?\N{U+FFFD})
> + (?\N{U+10000} . ?\N{U+10FFFF}))))
Indentation mismatch ^
> + "Regexp matching forbidden XML1.0 characters.
> +https://www.w3.org/TR/REC-xml/#charsets")
> +
> (defconst org-odt-schema-dir-list
> (list (expand-file-name "./schema/" org-odt-data-dir))
> "List of directories to search for OpenDocument schema files.
> @@ -364,6 +374,19 @@ (defgroup org-export-odt nil
> :tag "Org Export ODT"
> :group 'org-export)
>
> +(defcustom org-odt-with-forbidden-chars ""
> + "String to replace forbidden XML characters.
> +When set to t, forbidden characters are retained.
> +When set to nil, an error is thrown.
> +See `org-odt-forbidden-char-re' for the list of forbidden characters
> +that cannot occur inside ODT documents.
> +
> +You may also consider export filters to perform more fine-grained
> +replacements. See info node `(org)Advanced Export Configuration'."
> + :package-version '(Org . "9.8")
> + :type '(choice (const :tag "Strip forbidden characters" t)
According to the docstring, the above tag should say "Leave forbidden
characters as-is". See patch which slightly rewords the docstring too.
> + (const :tag "Err when forbidden characters encountered" nil)
> + (string :tag "Replacement string")))
>
> ;;;; Debugging
>
> @@ -2892,6 +2915,24 @@ (defun org-odt--encode-tabs-and-spaces (line)
> (format " <text:s text:c=\"%d\"/>" (1- (length s)))))
> line))
>
> +(defun org-odt--remove-forbidden (text _backend info)
> + "Remove forbidden and discouraged characters from TEXT.
> +INFO is the communication plist"
> + (pcase (plist-get info :odt-with-forbidden-chars)
Should we use pcase-exhaustive?
> + ((and (pred stringp) rep)
> + (prog1 (replace-regexp-in-string org-odt-forbidden-char-re rep text)
> + (when (match-string 0 text)
The replacement appears to work well on my machine, but there are
unnecessary warnings. Run org-odt-export-to-odt on a buffer containing:
--8<---------------cut here---------------start------------->8---
* foo
bar
--8<---------------cut here---------------end--------------->8---
the (match-string 0 text) form inside org-odt--remove-forbidden evals to
"<?xml version=\"1.0\" "
which causes the incorrect warning message "Warning (ox-odt): Replacing forbidden character '' with ''"
Confusingly, `text' and the replacement text are string-equal, so it
appears that no replacement has been made.
I suspect that match-string and replace-regexp-in-string perhaps do not
play well together. Try this out:
(let* ((text "bar")
(new (replace-regexp-in-string "r" "z" text)))
new ; "baz", as expected
(match-string 0 new) ; signals error
(match-string 0 text)) ; signals error
I get the following stack trace (for the first error):
Debugger entered--Lisp error: (args-out-of-range "baz" 402 403)
substring("baz" 402 403)
(if string (substring string (match-beginning num) (match-end num)) (buffer-substring (match-beginning num) (match-end num)))
(if (match-beginning num) (if string (substring string (match-beginning num) (match-end num)) (buffer-substring (match-beginning num) (match-end num))))
match-string(0 "baz")
(let* ((text "bar") (new (replace-regexp-in-string "r" "z" text))) new (match-string 0 new) (match-string 0 text))
(progn (let* ((text "bar") (new (replace-regexp-in-string "r" "z" text))) new (match-string 0 new) (match-string 0 text)))
(let ((print-level nil) (print-length nil)) (progn (let* ((text "bar") (new (replace-regexp-in-string "r" "z" text))) new (match-string 0 new) (match-string 0 text))))
(setq elisp--eval-defun-result (let ((print-level nil) (print-length nil)) (progn (let* ((text "bar") (new (replace-regexp-in-string "r" "z" text))) new (match-string 0 new) (match-string 0 text)))))
elisp--eval-defun()
#<subr eval-defun>(nil)
edebug--eval-defun(#<subr eval-defun> nil)
apply(edebug--eval-defun #<subr eval-defun> nil)
eval-defun(nil)
funcall-interactively(eval-defun nil)
command-execute(eval-defun)
Also with the replace-regexp-in-string design, there will only be one
warning even with multiple forbidden characters. See patch below.
> + (display-warning
> + '(ox-odt ox-odt-with-forbidden-chars)
> + (format "Replacing forbidden character '%s' with '%s'"
> + (match-string 0 text) rep)))))
> + (`nil
> + (if (string-match org-odt-forbidden-char-re text)
> + (error "Forbidden character '%s' found. See `org-odt-with-forbidden-chars'"
> + (match-string 0 text))
> + text))
> + (_ text)))
> +
> (defun org-odt--encode-plain-text (text &optional no-whitespace-filling)
> (dolist (pair '(("&" . "&") ("<" . "<") (">" . ">")))
> (setq text (replace-regexp-in-string (car pair) (cdr pair) text t t)))
> --
> 2.47.1
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-ox-odt-Avoid-putting-forbidden-characters-into-ODT-x.patch --]
[-- Type: text/x-diff, Size: 4623 bytes --]
From ce506caa0bffbd243a2aba384f75f7aaac7fdc4b Mon Sep 17 00:00:00 2001
From: Ihor Radchenko <yantar92@posteo.net>
Date: Fri, 27 Dec 2024 10:21:02 +0000
Subject: [PATCH] ox-odt: Avoid putting forbidden characters into ODT xml
* lisp/ox-odt.el (org-odt-with-forbidden-chars): New export option to
control how to handle forbidden XML characters.
(org-odt--remove-forbidden): New filter removing/replacing forbidden
characters.
Co-authored-by: Joseph Turner <joseph@breatheoutbreathe.in>
Link: https://orgmode.org/list/87o711l4u4.fsf@christianmoe.com
---
lisp/ox-odt.el | 51 +++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 50 insertions(+), 1 deletion(-)
diff --git a/lisp/ox-odt.el b/lisp/ox-odt.el
index ec81637ef..960bab286 100644
--- a/lisp/ox-odt.el
+++ b/lisp/ox-odt.el
@@ -94,7 +94,8 @@ (org-export-define-backend 'odt
. (org-odt--translate-latex-fragments
org-odt--translate-description-lists
org-odt--translate-list-tables
- org-odt--translate-image-links)))
+ org-odt--translate-image-links))
+ (:filter-final-output . org-odt--remove-forbidden))
:menu-entry
'(?o "Export to ODT"
((?o "As ODT file" org-odt-export-to-odt)
@@ -108,6 +109,7 @@ (org-export-define-backend 'odt
(:keywords "KEYWORDS" nil nil space)
(:subtitle "SUBTITLE" nil nil parse)
;; Other variables.
+ (:odt-with-forbidden-chars nil nil org-odt-with-forbidden-chars)
(:odt-content-template-file nil nil org-odt-content-template-file)
(:odt-display-outline-level nil nil org-odt-display-outline-level)
(:odt-fontify-srcblocks nil nil org-odt-fontify-srcblocks)
@@ -170,6 +172,14 @@ (defconst org-odt-special-string-regexps
("\\.\\.\\." . "…")) ; hellip
"Regular expressions for special string conversion.")
+(defconst org-odt-forbidden-char-re
+ (rx (not (in ?\N{U+9} ?\N{U+A} ?\N{U+D}
+ (?\N{U+20} . ?\N{U+D7FF})
+ (?\N{U+E000} . ?\N{U+FFFD})
+ (?\N{U+10000} . ?\N{U+10FFFF}))))
+ "Regexp matching forbidden XML1.0 characters.
+https://www.w3.org/TR/REC-xml/#charsets")
+
(defconst org-odt-schema-dir-list
(list (expand-file-name "./schema/" org-odt-data-dir))
"List of directories to search for OpenDocument schema files.
@@ -364,6 +374,19 @@ (defgroup org-export-odt nil
:tag "Org Export ODT"
:group 'org-export)
+(defcustom org-odt-with-forbidden-chars ""
+ "String to replace forbidden XML characters.
+When set to t, forbidden characters are left as-is.
+When set to nil, an error is thrown.
+See `org-odt-forbidden-char-re' for the list of forbidden characters
+that cannot occur inside ODT documents.
+
+You may also consider export filters to perform more fine-grained
+replacements. See info node `(org)Advanced Export Configuration'."
+ :package-version '(Org . "9.8")
+ :type '(choice (const :tag "Leave forbidden characters as-is" t)
+ (const :tag "Err when forbidden characters encountered" nil)
+ (string :tag "Replacement string")))
;;;; Debugging
@@ -2892,6 +2915,32 @@ (defun org-odt--encode-tabs-and-spaces (line)
(format " <text:s text:c=\"%d\"/>" (1- (length s)))))
line))
+(defun org-odt--remove-forbidden (text _backend info)
+ "Remove forbidden and discouraged characters from TEXT.
+INFO is the communication plist"
+ (pcase-exhaustive (plist-get info :odt-with-forbidden-chars)
+ ((and (pred stringp) rep)
+ (let ((replacements (make-hash-table :test 'equal)))
+ (with-temp-buffer
+ (insert text)
+ (goto-char (point-min))
+ (while (re-search-forward org-odt-forbidden-char-re nil t)
+ (cl-incf (gethash (match-string 0) replacements 0))
+ (replace-match rep))
+ (cl-loop for forbidden being the hash-keys of replacements
+ using (hash-values count)
+ do (display-warning
+ '(ox-odt ox-odt-with-forbidden-chars)
+ (format "Replaced forbidden character '%s' with '%s' %d times"
+ forbidden rep count)))
+ (buffer-string))))
+ (`nil
+ (if (string-match org-odt-forbidden-char-re text)
+ (error "Forbidden character '%s' found. See `org-odt-with-forbidden-chars'"
+ (match-string 0 text))
+ text))
+ ('t text)))
+
(defun org-odt--encode-plain-text (text &optional no-whitespace-filling)
(dolist (pair '(("&" . "&") ("<" . "<") (">" . ">")))
(setq text (replace-regexp-in-string (car pair) (cdr pair) text t t)))
--
2.46.0
[-- Attachment #3: Type: text/plain, Size: 21 bytes --]
Thank you!!
Joseph
next prev parent reply other threads:[~2024-12-27 20:44 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-21 1:48 Form feed characters break odt export Joseph Turner via General discussions about Org-mode.
2024-12-21 3:56 ` Max Nikulin
2024-12-21 6:52 ` Joseph Turner
2024-12-21 7:23 ` Max Nikulin
2024-12-21 19:06 ` Joseph Turner
2024-12-24 16:23 ` Max Nikulin
2024-12-25 10:16 ` Joseph Turner
2024-12-23 17:32 ` Ihor Radchenko
2024-12-24 11:04 ` Christian Moe
2024-12-24 14:14 ` Ihor Radchenko
2024-12-25 10:10 ` Joseph Turner
2024-12-27 10:21 ` Ihor Radchenko
2024-12-27 20:42 ` Joseph Turner [this message]
2024-12-28 8:32 ` Ihor Radchenko
2024-12-28 9:50 ` Joseph Turner
2024-12-28 15:50 ` Ihor Radchenko
2024-12-24 14:25 ` Max Nikulin
2024-12-24 14:30 ` Ihor Radchenko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.orgmode.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87jzbk503b.fsf@breatheoutbreathe.in \
--to=joseph@breatheoutbreathe.in \
--cc=bohonghuang@qq.com \
--cc=emacs-orgmode@gnu.org \
--cc=mail@christianmoe.com \
--cc=yantar92@posteo.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).