emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* Bug: unconverted dashes in HTML export
@ 2014-02-01 22:29 Thomas Morgan
  2014-02-08  9:29 ` Nicolas Goaziou
  0 siblings, 1 reply; 3+ messages in thread
From: Thomas Morgan @ 2014-02-01 22:29 UTC (permalink / raw)
  To: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 905 bytes --]

Dear Org mode hackers,

I started Emacs with `emacs -Q -l setup.el test-case.org', then typed
`C-c C-e h o' to export to HTML and open the result.  The setup file
(`setup.el'), test case (`test-case.org'), HTML output (`lose.html'),
and a PDF printed by the web browser (`lose.pdf'), are attached.

The test case contains a one-cell table with three hyphens (`---').
I expected this to be converted to an em-dash in the HTML output,
but it remained three hyphens.

A patch fixing the problem is attached, along with the HTML and PDF
produced after the patch was applied (`win.html', `win.pdf').

I started preparing this report last May (sorry for the delay)
but just confirmed the bug again with Org-mode version 8.2.5g
(`release_8.2.5g-663-g24a213' @ `/src/org-mode/lisp/') and GNU Emacs
24.3.1 (`x86_64-unknown-linux-gnu', X toolkit, Xaw3d scroll bars)
of 2013-09-24.

Thanks and best regards,
Thomas


[-- Attachment #2: setup.el --]
[-- Type: application/emacs-lisp, Size: 47 bytes --]

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: test-case.org --]
[-- Type: text/x-org, Size: 61 bytes --]

#+TITLE: Bug with unconverted dashes in HTML export

| --- |

[-- Attachment #4: lose.html --]
[-- Type: text/html, Size: 5341 bytes --]

[-- Attachment #5: lose.pdf --]
[-- Type: application/pdf, Size: 15268 bytes --]

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #6: 0001-Convert-dashes-in-HTML-export-even-when-at-end-of-st.patch --]
[-- Type: text/x-diff, Size: 926 bytes --]

From bd14cdce80a610a5eadbf563ac12472fbed542a5 Mon Sep 17 00:00:00 2001
From: Thomas Morgan <tlm@ziiuu.com>
Date: Mon, 13 May 2013 11:06:52 +0200
Subject: [PATCH] Convert dashes in HTML export even when at end of string.

* lisp/ox-html.el (org-html-special-string-regexps): Convert dashes
  even when at end of string.
---
 lisp/ox-html.el |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lisp/ox-html.el b/lisp/ox-html.el
index 54c6a45..f47e760 100644
--- a/lisp/ox-html.el
+++ b/lisp/ox-html.el
@@ -145,8 +145,8 @@
 
 (defconst org-html-special-string-regexps
   '(("\\\\-" . "&#x00ad;")		; shy
-    ("---\\([^-]\\)" . "&#x2014;\\1")	; mdash
-    ("--\\([^-]\\)" . "&#x2013;\\1")	; ndash
+    ("---\\([^-]?\\)" . "&#x2014;\\1")	; mdash
+    ("--\\([^-]?\\)" . "&#x2013;\\1")	; ndash
     ("\\.\\.\\." . "&#x2026;"))		; hellip
   "Regular expressions for special string conversion.")
 
-- 
1.7.9.5


[-- Attachment #7: win.html --]
[-- Type: text/html, Size: 5346 bytes --]

[-- Attachment #8: win.pdf --]
[-- Type: application/pdf, Size: 15316 bytes --]

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: Bug: unconverted dashes in HTML export
  2014-02-01 22:29 Bug: unconverted dashes in HTML export Thomas Morgan
@ 2014-02-08  9:29 ` Nicolas Goaziou
  2014-02-10  9:33   ` Thomas Morgan
  0 siblings, 1 reply; 3+ messages in thread
From: Nicolas Goaziou @ 2014-02-08  9:29 UTC (permalink / raw)
  To: Thomas Morgan; +Cc: emacs-orgmode

Hello,

Thomas Morgan <tlm@ziiuu.com> writes:

> I started Emacs with `emacs -Q -l setup.el test-case.org', then typed
> `C-c C-e h o' to export to HTML and open the result.  The setup file
> (`setup.el'), test case (`test-case.org'), HTML output (`lose.html'),
> and a PDF printed by the web browser (`lose.pdf'), are attached.
>
> The test case contains a one-cell table with three hyphens (`---').
> I expected this to be converted to an em-dash in the HTML output,
> but it remained three hyphens.

Indeed.

> A patch fixing the problem is attached, along with the HTML and PDF
> produced after the patch was applied (`win.html', `win.pdf').
>
> I started preparing this report last May (sorry for the delay)
> but just confirmed the bug again with Org-mode version 8.2.5g
> (`release_8.2.5g-663-g24a213' @ `/src/org-mode/lisp/') and GNU Emacs
> 24.3.1 (`x86_64-unknown-linux-gnu', X toolkit, Xaw3d scroll bars)
> of 2013-09-24.

Thank you for the patch. A few remarks below.

>From bd14cdce80a610a5eadbf563ac12472fbed542a5 Mon Sep 17 00:00:00 2001
> From: Thomas Morgan <tlm@ziiuu.com>
> Date: Mon, 13 May 2013 11:06:52 +0200
> Subject: [PATCH] Convert dashes in HTML export even when at end of string.
>
> * lisp/ox-html.el (org-html-special-string-regexps): Convert dashes
>   even when at end of string.

You need to add TINYCHANGE at the end of the commit message.

> -    ("---\\([^-]\\)" . "&#x2014;\\1")	; mdash
> -    ("--\\([^-]\\)" . "&#x2013;\\1")	; ndash
> +    ("---\\([^-]?\\)" . "&#x2014;\\1")	; mdash
> +    ("--\\([^-]?\\)" . "&#x2013;\\1")	; ndash

The new regexps still don't look right, as they can match an additional
dash:

  (string-match "---\\([^-]?\\)" "----") => 0

I'm not sure about the intent of this regexp, that is whether
consecutive mdashes or ndashes are allowed or not.

A correct version could be either:

  ("---" . "&#x2014;")

or

  ("\\([^-]\\|^\\)---\\([^-]\\|$\\)" . "\\1&#x2014;\\2")

I think the former is on par with LaTeX behaviour.


What do you think?


Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Bug: unconverted dashes in HTML export
  2014-02-08  9:29 ` Nicolas Goaziou
@ 2014-02-10  9:33   ` Thomas Morgan
  0 siblings, 0 replies; 3+ messages in thread
From: Thomas Morgan @ 2014-02-10  9:33 UTC (permalink / raw)
  To: Nicolas Goaziou; +Cc: emacs-orgmode

Hi, Nicolas,

> The new regexps still don't look right, as they can match an additional
> dash:
>
>   (string-match "---\\([^-]?\\)" "----") => 0
>
> I'm not sure about the intent of this regexp, that is whether
> consecutive mdashes or ndashes are allowed or not.
>
> A correct version could be either:
>
>   ("---" . "&#x2014;")
>
> or
>
>   ("\\([^-]\\|^\\)---\\([^-]\\|$\\)" . "\\1&#x2014;\\2")
>
> I think the former is on par with LaTeX behaviour.

Good point, and I don't see any problem with your first version,
though I don't understand the intent behind the original regexp,
either.  I had assumed the point was to avoid matching horizontal
separator lines in tables, but that doesn't seem to be an issue.

Thanks,
Thomas

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-02-10 13:28 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-01 22:29 Bug: unconverted dashes in HTML export Thomas Morgan
2014-02-08  9:29 ` Nicolas Goaziou
2014-02-10  9:33   ` Thomas Morgan

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).