emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* [bug] Smart quotes: confusion of apostrophe with second level quotes
@ 2024-03-22  1:04 Juan Manuel Macías
  2024-03-23 11:38 ` Ihor Radchenko
  0 siblings, 1 reply; 6+ messages in thread
From: Juan Manuel Macías @ 2024-03-22  1:04 UTC (permalink / raw)
  To: orgmode

Hi,

I don't know if this is a known issue, but I haven't been able to find
any mention of it. I think this is partly because in English it can go
perfectly unnoticed, since for English the values of secondary-closing
and apostrophe are identical:

  (secondary-closing :utf-8 "’" :html "’" :latex "'" :texinfo "'")
  (apostrophe :utf-8 "’" :html "’")

However, consider the following example:

  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  #+OPTIONS: ':t
  #+language:es

  "my friends' party and the students' papers"
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

the above produces in LaTeX:

  \guillemotleft{}my friends'' party and the students'' papers\guillemotright{}

In Spanish, as in other similar cases, the issue is easier to reproduce
because:

  (secondary-closing :utf-8 "”" :html "”" :latex "''" :texinfo "''")
  (apostrophe :utf-8 "’" :html "’")

I don't know whether to consider this a bug or a limitation in the
current implementation, originating from how Org interprets an
apostrophe. Although I suspect it has a difficult solution: how to
differentiate an apostrophe from a second-level quote in certain
scenarios, when the approach seems to be essentially heuristic? Let us
also consider cases in which the apostrophe can be placed at the
beginning of a word, as in Greek:

  "να 'ρθώ το βράδυ"

(Org would confuse the apostrophe in the word 'ρθώ with second-level
opening quotes)

Perhaps a possible solution would be to allow the use of a specific,
customizable character, other than an apostrophe, for second-level
quotes. Or at least add some brief warning in the manual: in certain
contexts it is safer to use a explicit Unicode character for the
apostrophe.

Best regards,

Juan Manuel

--
Juan Manuel Macías -- Composición tipográfica, tratamiento de datos, diseño editorial y ortotipografía


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [bug] Smart quotes: confusion of apostrophe with second level quotes
  2024-03-22  1:04 [bug] Smart quotes: confusion of apostrophe with second level quotes Juan Manuel Macías
@ 2024-03-23 11:38 ` Ihor Radchenko
  2024-03-23 13:41   ` Juan Manuel Macías
  0 siblings, 1 reply; 6+ messages in thread
From: Ihor Radchenko @ 2024-03-23 11:38 UTC (permalink / raw)
  To: Juan Manuel Macías; +Cc: orgmode

[-- Attachment #1: Type: text/plain, Size: 1088 bytes --]

Juan Manuel Macías <maciaschain@posteo.net> writes:

>   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
>   #+OPTIONS: ':t
>   #+language:es
>
>   "my friends' party and the students' papers"
>   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
>
> the above produces in LaTeX:
>
>   \guillemotleft{}my friends'' party and the students'' papers\guillemotright{}
> ...
> Perhaps a possible solution would be to allow the use of a specific,
> customizable character, other than an apostrophe, for second-level
> quotes. Or at least add some brief warning in the manual: in certain
> contexts it is safer to use a explicit Unicode character for the
> apostrophe.

I think that we can address examples like this simply by not replacing
unbalanced quotes. There is already some effort in the code towards such
treatment, but it is not complete.

Can you try the attached patch?


[-- Attachment #2: 0001-org-export-Do-not-treat-unpaired-and-as-smart-quotes.patch --]
[-- Type: text/x-patch, Size: 5726 bytes --]

From 4a034fbb0029ca7e635f629810a6179df4ca24d9 Mon Sep 17 00:00:00 2001
Message-ID: <4a034fbb0029ca7e635f629810a6179df4ca24d9.1711193777.git.yantar92@posteo.net>
From: Ihor Radchenko <yantar92@posteo.net>
Date: Sat, 23 Mar 2024 14:34:06 +0300
Subject: [PATCH] org-export: Do not treat unpaired ' and " as smart quotes
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* lisp/ox.el (org-export--smart-quote-status): When quotes are not
balanced, treat " literally and ' as apostrophes.
* testing/lisp/test-ox.el (test-org-export/activate-smart-quotes): Fix
test with unbalanced " and add new tests for unbalanced quotes.

Reported-by: Juan Manuel Macías <maciaschain@posteo.net>
Link: https://list.orgmode.org/orgmode/875xxfqdpt.fsf@posteo.net/
---
 lisp/ox.el              | 45 +++++++++++++++++++++++++++++++++++++++++
 testing/lisp/test-ox.el | 29 ++++++++++++++++++++++++--
 2 files changed, 72 insertions(+), 2 deletions(-)

diff --git a/lisp/ox.el b/lisp/ox.el
index 929b306dc..539d31d9d 100644
--- a/lisp/ox.el
+++ b/lisp/ox.el
@@ -5942,6 +5942,51 @@ (defun org-export--smart-quote-status (s info)
 	      (when current-status
 		(push (cons text (nreverse current-status)) full-status))))
 	  info nil org-element-recursive-objects)
+        ;; When quotes are not balanced, threat them as apostrophes.
+        (setq full-status (nreverse full-status))
+        (let (primary-openings secondary-openings)
+          (dolist (substatus full-status)
+            (let ((status (cdr substatus)))
+              (while status
+                (pcase (car status)
+                  (`apostrophe nil)
+                  (`primary-opening
+                   (push status primary-openings))
+                  (`secondary-opening
+                   (push status secondary-openings))
+                  (`secondary-closing
+                   (if secondary-openings
+                       ;; Remove matched opening.
+                       (pop secondary-openings)
+                     ;; No matching openings for a given closing.  Replace
+                     ;; it with apostrophe.
+                     (setcar status 'apostrophe)))
+                  (`primary-closing
+                   (when secondary-openings
+                     ;; Some secondary opening quotes are not closed
+                     ;; within "...".  Replace them all with apostrophes.
+                     (dolist (opening secondary-openings)
+                       (setcar opening 'apostrophe))
+                     (setq secondary-openings nil))
+                   (if primary-openings
+                       ;; Remove matched opening.
+                       (pop primary-openings)
+                     ;; No matching openings for a given closing.
+                     (error "This should no happen"))))
+                (setq status (cdr status)))))
+          (when primary-openings
+            ;; Trailing unclosed "
+            (unless (= 1 (length primary-openings))
+              (error "This should not happen"))
+            ;; Mark for not replacing.
+            (setcar (car primary-openings) nil)
+            ;; Mark all the secondary openings and closings after
+            ;; trailing unclosed " as apostrophes.
+            (let ((tail (car primary-openings)))
+              (while tail
+                (when (memq (car tail) '(secondary-opening secondary-closing))
+                  (setcar tail 'apostrophe))
+                (setq tail (cdr tail))))))
 	(puthash (cons parent (org-element-secondary-p s)) full-status cache)
 	(cdr (assq s full-status))))))
 
diff --git a/testing/lisp/test-ox.el b/testing/lisp/test-ox.el
index 01e082c9b..16e81c64b 100644
--- a/testing/lisp/test-ox.el
+++ b/testing/lisp/test-ox.el
@@ -4134,9 +4134,9 @@ (ert-deftest test-org-export/activate-smart-quotes ()
   ;; Opening quotes: at the beginning of a paragraph.
   (should
    (equal
-    '("&ldquo;begin")
+    '("&ldquo;begin&rdquo;")
     (let ((org-export-default-language "en"))
-      (org-test-with-parsed-data "\"begin"
+      (org-test-with-parsed-data "\"begin\""
 	(org-element-map tree 'plain-text
 	  (lambda (s) (org-export-activate-smart-quotes s :html info))
 	  info)))))
@@ -4267,6 +4267,31 @@ (ert-deftest test-org-export/activate-smart-quotes ()
 	    (org-test-with-parsed-data "*\"foo\"*"
 	      (org-element-map tree 'plain-text
 		(lambda (s) (org-export-activate-smart-quotes s :html info))
+		info nil nil t)))))
+  ;; Unmatched quotes.
+  (should
+   (equal '("\\guillemotleft{}my friends' party and the students' papers\\guillemotright{} \\guillemotleft{}``mothers''\\guillemotright{}")
+	  (let ((org-export-default-language "es"))
+	    (org-test-with-parsed-data
+                "\"my friends' party and the students' papers\" \"'mothers'\""
+	      (org-element-map tree 'plain-text
+		(lambda (s) (org-export-activate-smart-quotes s :latex info))
+		info nil nil t)))))
+  (should
+   (equal '("\"'mothers'")
+	  (let ((org-export-default-language "es"))
+	    (org-test-with-parsed-data
+                "\"'mothers'"
+	      (org-element-map tree 'plain-text
+		(lambda (s) (org-export-activate-smart-quotes s :latex info))
+		info nil nil t)))))
+  (should
+   (equal '("\\guillemotleft{}να 'ρθώ το βράδυ\\guillemotright{}")
+	  (let ((org-export-default-language "el"))
+	    (org-test-with-parsed-data
+                "\"να 'ρθώ το βράδυ\""
+	      (org-element-map tree 'plain-text
+		(lambda (s) (org-export-activate-smart-quotes s :latex info))
 		info nil nil t))))))
 
 
-- 
2.44.0


[-- Attachment #3: Type: text/plain, Size: 224 bytes --]


-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [bug] Smart quotes: confusion of apostrophe with second level quotes
  2024-03-23 11:38 ` Ihor Radchenko
@ 2024-03-23 13:41   ` Juan Manuel Macías
  2024-03-23 13:49     ` Ihor Radchenko
  0 siblings, 1 reply; 6+ messages in thread
From: Juan Manuel Macías @ 2024-03-23 13:41 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: orgmode

Ihor Radchenko writes:

> Juan Manuel Macías <maciaschain@posteo.net> writes:
>
>>   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
>>   #+OPTIONS: ':t
>>   #+language:es
>>
>>   "my friends' party and the students' papers"
>>   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
>>
>> the above produces in LaTeX:
>>
>>   \guillemotleft{}my friends'' party and the students'' papers\guillemotright{}
>> ...
>> Perhaps a possible solution would be to allow the use of a specific,
>> customizable character, other than an apostrophe, for second-level
>> quotes. Or at least add some brief warning in the manual: in certain
>> contexts it is safer to use a explicit Unicode character for the
>> apostrophe.
>
> I think that we can address examples like this simply by not replacing
> unbalanced quotes. There is already some effort in the code towards such
> treatment, but it is not complete.
>
> Can you try the attached patch?

Hi, Ihor,

The patch works fine, and I think it can prevent a lot of cases. But
false positives can still appear. Consider (second level quotes open
after the colon):

"two articles: 'my friends' party' and 'the students' papers'"

"A Greek folk song says: 'να 'ρθώ το βράδυ'"

==>

\guillemotleft{}two articles: ``my friends'' party' and ``the students'' papers'\guillemotright{}

\guillemotleft{}A Greek folk song says: 'να ``ρθώ το βράδυ''\guillemotright{}

I think the only solution here would be to introduce a Unicode
apostrophe (’). Or allow an optional, alternative character for
second-level quotes:

"... `my friends' party` ..."

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [bug] Smart quotes: confusion of apostrophe with second level quotes
  2024-03-23 13:41   ` Juan Manuel Macías
@ 2024-03-23 13:49     ` Ihor Radchenko
  2024-03-23 15:42       ` Juan Manuel Macías
  0 siblings, 1 reply; 6+ messages in thread
From: Ihor Radchenko @ 2024-03-23 13:49 UTC (permalink / raw)
  To: Juan Manuel Macías; +Cc: orgmode

Juan Manuel Macías <maciaschain@posteo.net> writes:

> The patch works fine, and I think it can prevent a lot of cases. But
> false positives can still appear. Consider (second level quotes open
> after the colon):
>
> "two articles: 'my friends' party' and 'the students' papers'"
>
> "A Greek folk song says: 'να 'ρθώ το βράδυ'"
>
> ==>
>
> \guillemotleft{}two articles: ``my friends'' party' and ``the students'' papers'\guillemotright{}
>
> \guillemotleft{}A Greek folk song says: 'να ``ρθώ το βράδυ''\guillemotright{}

These are not false-positives, but ambiguity. There is no deterministic
way in this scenario to distinguish between apostrophe and smart quotes.

> I think the only solution here would be to introduce a Unicode
> apostrophe (’). Or allow an optional, alternative character for
> second-level quotes:
>
> "... `my friends' party` ..."

We may introduce \apostrophe entity.

"two articles: 'my friends\apostrophe party' and 'the students\apostrophe papers'"

"A Greek folk song says: \apostrophe{}να \apostrophe{}ρθώ το βράδυ'"

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [bug] Smart quotes: confusion of apostrophe with second level quotes
  2024-03-23 13:49     ` Ihor Radchenko
@ 2024-03-23 15:42       ` Juan Manuel Macías
  2024-03-24  9:55         ` Ihor Radchenko
  0 siblings, 1 reply; 6+ messages in thread
From: Juan Manuel Macías @ 2024-03-23 15:42 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: orgmode

Ihor Radchenko writes:

> We may introduce \apostrophe entity.
>
> "two articles: 'my friends\apostrophe party' and 'the students\apostrophe papers'"
>
> "A Greek folk song says: \apostrophe{}να \apostrophe{}ρθώ το βράδυ'"

It's not a bad idea to use entities. I just discovered that an \rsquo
entity already exists. Right single quotation mark is the Unicode
recommended character for the apostrophe, and it is also the character
used in org-export-smart-quotes-alist[1].

Anyway, I think a) your patch could be a major improvement; b) perhaps a
brief note in the manual (I can send a tiny patch) should be added to
warn of possible ambiguities, and possible solutions.

[1] Although there are arguments against this Unicode recommendation,
see: https://en.wikipedia.org/wiki/Right_single_quotation_mark


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [bug] Smart quotes: confusion of apostrophe with second level quotes
  2024-03-23 15:42       ` Juan Manuel Macías
@ 2024-03-24  9:55         ` Ihor Radchenko
  0 siblings, 0 replies; 6+ messages in thread
From: Ihor Radchenko @ 2024-03-24  9:55 UTC (permalink / raw)
  To: Juan Manuel Macías; +Cc: orgmode

Juan Manuel Macías <maciaschain@posteo.net> writes:

> Anyway, I think a) your patch could be a major improvement;

Applied, onto main, after fixing another edge case with quotes spanning
across multiple markup objects.
https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=33503445e

> ... b) perhaps a
> brief note in the manual (I can send a tiny patch) should be added to
> warn of possible ambiguities, and possible solutions.

Yes, a patch clarifying what to do to force apostrophe would be welcome.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-03-24  9:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-22  1:04 [bug] Smart quotes: confusion of apostrophe with second level quotes Juan Manuel Macías
2024-03-23 11:38 ` Ihor Radchenko
2024-03-23 13:41   ` Juan Manuel Macías
2024-03-23 13:49     ` Ihor Radchenko
2024-03-23 15:42       ` Juan Manuel Macías
2024-03-24  9:55         ` Ihor Radchenko

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).