emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Ihor Radchenko <yantar92@posteo.net>
To: Max Nikulin <manikulin@gmail.com>
Cc: emacs-orgmode@gnu.org
Subject: Re: Trailing whitespace after export snippets without a transcoder
Date: Mon, 22 Apr 2024 19:01:07 +0000	[thread overview]
Message-ID: <87wmoprzm4.fsf@localhost> (raw)
In-Reply-To: <v05f4q$a6m$1@ciao.gmane.io>

[-- Attachment #1: Type: text/plain, Size: 2366 bytes --]

Max Nikulin <manikulin@gmail.com> writes:

>> I do not think that we need to handle this Org mode-wide (it will be
>> difficult and will likely cause breaking changes).
>
> I have not figured out why it may become a breaking changes and what 
> backends need blank lines inside paragraph. I would make stripping empty 
> lines default behavior with some option to disable this feature.

For example, consider an HTML exporter that aligns tags nicely and keeps
blank lines between markup blocks for readability.  If we remove such
blank lines unconditionally, it will be problematic.

>> See the attached tentative fix.
>
> Since zero width spaces are part of Org syntax, they need special treatment.

They are not a part of Org syntax, and we currently do not handle them
specially. They still work as escape-character simply because Org syntax
defines markup boundaries using a closed set of whitespace characters -
(rx (any " \t")). So, any non-tab non-space whitespace will be an
equivalent of zero-width space for all practical purposes.

> ---- 8< ----
> #+macro: empty (eval "")
>
> Some *bold*​@@comment: *@@ text.
> @@comment: line@@
> More /italic/​{{{empty}}} text.
> {{{empty}}}
> Last line.
> ---- >8 ----
>
> LaTeX export:
> ---- 8< ----
> Some \textbf{bold}​text.
> More \emph{italic}​ text.
>
> Last line.
> ---- >8 ----
>
> Notice visible space character disappeared after "bold".

I guess that I can change the condition to not include trailing space
from (rx whitespace eol) to (rx (any " \t|) eol).

See the attached updated version of the patch set.

> ... I am leaving up 
> to you to decide if empty line appeared due to a macro is a bug or a 
> feature. If I remember it correctly, your opinion is that a macro 
> expanding to multiple paragraphs is a valid one.

Yes. I do believe that we should keep macros as dumb as possible, so
that people can use them in the most flexible ways, including breaking
paragraphs, if so desired.

A more annoying one is

First line
@@comment:foo@@
last line.

vs.

First line
@@comment:foo
@@last line.

where we encounter the peculiarity of Org syntax with trailing tabs and
spaces included as part of the object, but not newlines.

But I do not see any good way to address this problem without rewriting
half of Org mode.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: v2-0001-org-export-data-Handle-trailing-spaces-when-trans.patch --]
[-- Type: text/x-patch, Size: 5708 bytes --]

From 229a563dc38e1fdfd63be2dfebb1a9e9023e44b2 Mon Sep 17 00:00:00 2001
Message-ID: <229a563dc38e1fdfd63be2dfebb1a9e9023e44b2.1713812419.git.yantar92@posteo.net>
From: Ihor Radchenko <yantar92@posteo.net>
Date: Sun, 21 Apr 2024 15:37:18 +0300
Subject: [PATCH v2 1/2] org-export-data: Handle trailing spaces when
 transcoder returns nil

* lisp/ox.el (org-export--keep-spaces): New helper function containing
logic about keeping spaces in place of removed object from
`org-export--prune-tree'.  The logic is modified to keep spaces in the
case when previous plain-string object ends with a whitespace, but not
" " or "\t".  This can happen, for example, when there is a trailing
zero-width space.  We do want to keep spaces in such scenario.
(org-export-data): When transcoder returns nil, handle
trailing spaces after an object the same way `org-export--prune-tree'
does.  Remove special handling of export snippets that unconditionally
keep their trailing spaces.
(org-export--prune-tree): Use the helper function.

Link: https://orgmode.org/list/87h6fwmgkm.fsf@localhost
---
 lisp/ox.el | 67 ++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 42 insertions(+), 25 deletions(-)

diff --git a/lisp/ox.el b/lisp/ox.el
index fc746950d..6f6689188 100644
--- a/lisp/ox.el
+++ b/lisp/ox.el
@@ -1880,6 +1880,38 @@ (defun org-export-transcoder (blob info)
       (let ((transcoder (cdr (assq type (plist-get info :translate-alist)))))
 	(and (functionp transcoder) transcoder)))))
 
+(defun org-export--keep-spaces (data info)
+  "Non-nil, when post-blank spaces after removing DATA should be preserved.
+INFO is the info channel.
+
+This function returns nil, when previous exported element already has
+trailing spaces or when DATA does not have non-zero non-nil
+`:post-blank' property.
+
+When the return value is non-nil, it is a string containing the trailing
+spaces."
+  ;; When DATA is an object, interpret this as if DATA should be
+  ;; ignored (see `org-export--prune-tree').  Keep spaces in place of
+  ;; removed element, if necessary.  Example: "Foo.[10%] Bar" would
+  ;; become "Foo.Bar" if we do not keep spaces.  Another example: "A
+  ;; space@@ascii:*@@ character."  should become "A space character"
+  ;; in non-ASCII export.
+  (let ((post-blank (org-element-post-blank data)))
+    (unless (or (not post-blank)
+                (zerop post-blank)
+                (eq 'element (org-element-class data)))
+      (let ((previous (org-export-get-previous-element data info)))
+	(unless (or (not previous)
+		    (pcase (org-element-type previous)
+		      (`plain-text
+		       (string-match-p
+			(rx (any " \t") eos) previous))
+		      (_ (org-element-post-blank previous))))
+          ;; When previous element does not have
+          ;; trailing spaces, keep the trailing
+          ;; spaces from DATA.
+	  (make-string post-blank ?\s))))))
+
 ;;;###autoload
 (defun org-export-data (data info)
   "Convert DATA into current backend format.
@@ -1930,15 +1962,11 @@ (defun org-export-data (data info)
 			   (eq (plist-get info :with-archived-trees) 'headline)
 			   (org-element-property :archivedp data)))
 		  (let ((transcoder (org-export-transcoder data info)))
-		    (or (and (functionp transcoder)
-                             (if (eq type 'link)
-			         (broken-link-handler
-			          (funcall transcoder data nil info))
-                               (funcall transcoder data nil info)))
-			;; Export snippets never return a nil value so
-			;; that white spaces following them are never
-			;; ignored.
-			(and (eq type 'export-snippet) ""))))
+		    (and (functionp transcoder)
+                         (if (eq type 'link)
+			     (broken-link-handler
+			      (funcall transcoder data nil info))
+                           (funcall transcoder data nil info)))))
 		 ;; Element/Object with contents.
 		 (t
 		  (let ((transcoder (org-export-transcoder data info)))
@@ -1979,8 +2007,8 @@ (defun org-export-data (data info)
 	  (puthash
 	   data
 	   (cond
-	    ((not results) "")
-	    ((memq type '(nil org-data plain-text raw)) results)
+	    ((not results) (or (org-export--keep-spaces data info) ""))
+            ((memq type '(nil org-data plain-text raw)) results)
 	    ;; Append the same white space between elements or objects
 	    ;; as in the original buffer, and call appropriate filters.
 	    (t
@@ -2641,24 +2669,13 @@ (defun org-export--prune-tree (data info)
 		(let ((type (org-element-type data)))
 		  (if (org-export--skip-p data info selected excluded)
 		      (if (memq type '(table-cell table-row)) (push data ignore)
-			(let ((post-blank (org-element-post-blank data)))
-			  (if (or (not post-blank) (zerop post-blank)
-				  (eq 'element (org-element-class data)))
-			      (org-element-extract data)
+                        (if-let ((keep-spaces (org-export--keep-spaces data info)))
 			    ;; Keep spaces in place of removed
 			    ;; element, if necessary.
 			    ;; Example: "Foo.[10%] Bar" would become
 			    ;; "Foo.Bar" if we do not keep spaces.
-			    (let ((previous (org-export-get-previous-element data info)))
-			      (if (or (not previous)
-				      (pcase (org-element-type previous)
-					(`plain-text
-					 (string-match-p
-					  (rx  whitespace eos) previous))
-					(_ (org-element-post-blank previous))))
-				  ;; Previous object ends with whitespace already.
-				  (org-element-extract data)
-				(org-element-set data (make-string post-blank ?\s)))))))
+                            (org-element-set data keep-spaces)
+			  (org-element-extract data)))
 		    (if (and (eq type 'headline)
 			     (eq (plist-get info :with-archived-trees)
 				 'headline)
-- 
2.44.0


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: v2-0002-ox-md-ox-ascii-ox-texinfo-Strip-blank-lines-from-.patch --]
[-- Type: text/x-patch, Size: 4662 bytes --]

From 3fa3ed068fcfc58470430a5c4bae3a5ffd1ca3ed Mon Sep 17 00:00:00 2001
Message-ID: <3fa3ed068fcfc58470430a5c4bae3a5ffd1ca3ed.1713812419.git.yantar92@posteo.net>
In-Reply-To: <229a563dc38e1fdfd63be2dfebb1a9e9023e44b2.1713812419.git.yantar92@posteo.net>
References: <229a563dc38e1fdfd63be2dfebb1a9e9023e44b2.1713812419.git.yantar92@posteo.net>
From: Ihor Radchenko <yantar92@posteo.net>
Date: Sun, 21 Apr 2024 15:54:48 +0300
Subject: [PATCH v2 2/2] ox-md, ox-ascii, ox-texinfo: Strip blank lines from
 paragraphs

* lisp/org-macs.el (org-remove-blank-lines): New helper function to
strip blank lines from string.
* lisp/ox-ascii.el (org-ascii-paragraph):
* lisp/ox-latex.el (org-latex-paragraph):
* lisp/ox-md.el (org-md-paragraph):
* lisp/ox-texinfo.el (org-texinfo-paragraph): Strip blank lines from
paragraphs - these exporters are using blank lines as paragraph
separators.

Reported-by: Max Nikulin <manikulin@gmail.com>
Link: https://orgmode.org/list/v00le7$frp$1@ciao.gmane.io
---
 lisp/org-macs.el   | 4 ++++
 lisp/ox-ascii.el   | 6 ++++++
 lisp/ox-latex.el   | 4 +---
 lisp/ox-md.el      | 6 ++++++
 lisp/ox-texinfo.el | 7 ++++++-
 5 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/lisp/org-macs.el b/lisp/org-macs.el
index 1254ddb54..93803bfe9 100644
--- a/lisp/org-macs.el
+++ b/lisp/org-macs.el
@@ -1244,6 +1244,10 @@ (defun org-remove-tabs (s &optional width)
 	     t t s)))
   s)
 
+(defun org-remove-blank-lines (s)
+  "Remove blank lines in S."
+  (replace-regexp-in-string (rx "\n" (1+ (0+ space) "\n")) "\n" s))
+
 (defun org-wrap (string &optional width lines)
   "Wrap string to either a number of lines, or a width in characters.
 If WIDTH is non-nil, the string is wrapped to that width, however many lines
diff --git a/lisp/ox-ascii.el b/lisp/ox-ascii.el
index db4356ec6..e767f66cf 100644
--- a/lisp/ox-ascii.el
+++ b/lisp/ox-ascii.el
@@ -1651,6 +1651,12 @@ (defun org-ascii-paragraph (paragraph contents info)
   "Transcode a PARAGRAPH element from Org to ASCII.
 CONTENTS is the contents of the paragraph, as a string.  INFO is
 the plist used as a communication channel."
+  ;; Ensure that we do not create multiple paragraphs, when a single
+  ;; paragraph is expected.
+  ;; Multiple newlines may appear in CONTENTS, for example, when
+  ;; certain objects are stripped from export, leaving single newlines
+  ;; before and after.
+  (setq contents (org-remove-blank-lines contents))
   (org-ascii--justify-element
    (let ((indented-line-width (plist-get info :ascii-indented-line-width)))
      (if (not (wholenump indented-line-width)) contents
diff --git a/lisp/ox-latex.el b/lisp/ox-latex.el
index 5c19e1fe7..2267a604e 100644
--- a/lisp/ox-latex.el
+++ b/lisp/ox-latex.el
@@ -3039,9 +3039,7 @@ (defun org-latex-paragraph (_paragraph contents _info)
   ;; Multiple newlines may appear in CONTENTS, for example, when
   ;; certain objects are stripped from export, leaving single newlines
   ;; before and after.
-  (replace-regexp-in-string
-   (rx "\n" (1+ (0+ space) "\n")) "\n"
-   contents))
+  (org-remove-blank-lines contents))
 
 
 ;;;; Plain List
diff --git a/lisp/ox-md.el b/lisp/ox-md.el
index fa2beeb95..28f0a4cf6 100644
--- a/lisp/ox-md.el
+++ b/lisp/ox-md.el
@@ -628,6 +628,12 @@ (defun org-md-paragraph (paragraph contents _info)
   "Transcode PARAGRAPH element into Markdown format.
 CONTENTS is the paragraph contents.  INFO is a plist used as
 a communication channel."
+  ;; Ensure that we do not create multiple paragraphs, when a single
+  ;; paragraph is expected.
+  ;; Multiple newlines may appear in CONTENTS, for example, when
+  ;; certain objects are stripped from export, leaving single newlines
+  ;; before and after.
+  (setq contents (org-remove-blank-lines contents))
   (let ((first-object (car (org-element-contents paragraph))))
     ;; If paragraph starts with a #, protect it.
     (if (and (stringp first-object) (string-prefix-p "#" first-object))
diff --git a/lisp/ox-texinfo.el b/lisp/ox-texinfo.el
index 4aef9c41c..fc9ec9209 100644
--- a/lisp/ox-texinfo.el
+++ b/lisp/ox-texinfo.el
@@ -1517,7 +1517,12 @@ (defun org-texinfo-paragraph (_paragraph contents _info)
   "Transcode a PARAGRAPH element from Org to Texinfo.
 CONTENTS is the contents of the paragraph, as a string.  INFO is
 the plist used as a communication channel."
-  contents)
+  ;; Ensure that we do not create multiple paragraphs, when a single
+  ;; paragraph is expected.
+  ;; Multiple newlines may appear in CONTENTS, for example, when
+  ;; certain objects are stripped from export, leaving single newlines
+  ;; before and after.
+  (org-remove-blank-lines contents))
 
 ;;;; Plain List
 
-- 
2.44.0


[-- Attachment #4: Type: text/plain, Size: 224 bytes --]


-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>

  reply	other threads:[~2024-04-22 19:01 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-22  0:04 Inline comments ypuntot
2023-06-22  1:50 ` Max Nikulin
2023-06-28 15:51   ` [PATCH] org-faq.org: " Max Nikulin
2023-06-29 10:47     ` Ihor Radchenko
2023-06-30 10:34       ` Max Nikulin
2023-07-01 10:48         ` Ihor Radchenko
2023-08-06 18:34           ` Bastien Guerry
2023-07-01 15:24         ` Ihor Radchenko
2023-07-02  5:46           ` Org FAQ design (Re: [PATCH] org-faq.org: Inline comments) Max Nikulin
2023-07-08  5:48       ` [PATCH v2] org-faq.org: Inline comments Max Nikulin
2023-07-08  9:31         ` Ihor Radchenko
2023-07-10 16:09           ` Max Nikulin
2023-09-01 11:28             ` Ihor Radchenko
2024-04-15 12:17               ` Ihor Radchenko
2024-04-17 14:44                 ` Max Nikulin
2024-04-20 11:14                   ` Trailing whitespace after export snippets without a transcoder (was: [PATCH v2] org-faq.org: Inline comments) Ihor Radchenko
2024-04-20 15:02                     ` Trailing whitespace after export snippets without a transcoder Max Nikulin
2024-04-21 13:00                       ` Ihor Radchenko
2024-04-22 10:45                         ` Max Nikulin
2024-04-22 19:01                           ` Ihor Radchenko [this message]
2024-04-25 10:54                             ` Max Nikulin
2024-04-28 11:19                               ` Ihor Radchenko
2024-05-19 10:04                                 ` Ihor Radchenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wmoprzm4.fsf@localhost \
    --to=yantar92@posteo.net \
    --cc=emacs-orgmode@gnu.org \
    --cc=manikulin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).