emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* org-html-use-unicode-chars breaks source code blocks
@ 2015-08-04 13:40 Vladimir Alexiev
  2015-08-04 17:35 ` Rasmus
  0 siblings, 1 reply; 14+ messages in thread
From: Vladimir Alexiev @ 2015-08-04 13:40 UTC (permalink / raw)
  To: emacs-orgmode

Hi! 
I've set org-html-use-unicode-chars since I want ox-html to leave IRIs as IRIs.
But this has another undesired effect: it breaks <URL> references in code,
since it doesn't escape the brackets.

Eg this:

#+BEGIN_SRC Turtle
@prefix aat:   <http://vocab.getty.edu/aat/>.
#+END_SRC

results in the URL being invisible in the exported HTML.
The fault is here:

(defun org-html-final-function (contents backend info)
  ...
    (when org-html-use-unicode-chars
      (require 'mm-url)
      (mm-url-decode-entities))

previous code carefully escaped the entities in org-html-protect-char-alist, 
only for mm-url-decode-entities to unescape them.

http://article.gmane.org/gmane.emacs.orgmode/94742 is somewhat related.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: org-html-use-unicode-chars breaks source code blocks
  2015-08-04 13:40 org-html-use-unicode-chars breaks source code blocks Vladimir Alexiev
@ 2015-08-04 17:35 ` Rasmus
  2015-08-04 18:37   ` Nicolas Goaziou
  0 siblings, 1 reply; 14+ messages in thread
From: Rasmus @ 2015-08-04 17:35 UTC (permalink / raw)
  To: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 477 bytes --]

Hi,

Vladimir Alexiev <vladimir.alexiev@ontotext.com> writes:

> I've set org-html-use-unicode-chars since I want ox-html to leave IRIs as IRIs.
> But this has another undesired effect: it breaks <URL> references in code,
> since it doesn't escape the brackets.

I think this should only apply to entities.  Any reason to do it on the
whole output?  Nicolas?

This patch makes that change.

Rasmus

-- 
This message is brought to you by the department of redundant departments

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-ox-html-Only-translate-entities-to-UTF-8.patch --]
[-- Type: text/x-diff, Size: 1940 bytes --]

From 535366ec1e1819c73bb038712a19f5e1be0a51b7 Mon Sep 17 00:00:00 2001
From: Rasmus <rasmus@gmx.us>
Date: Tue, 4 Aug 2015 19:12:00 +0200
Subject: [PATCH 1/4] ox-html: Only translate entities to UTF-8

* ox-html.el (org-html-final-function): Do not check
 :html-use-unicode-chars.
 (org-html-entity): Check :html-use-unicode-chars
 (org-html-use-unicode-chars): Update docstring.

Reported-by: Vladimir Alexiev <vladimir.alexiev@ontotext.com>
<http://permalink.gmane.org/gmane.emacs.orgmode/99451>
---
 lisp/ox-html.el | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/lisp/ox-html.el b/lisp/ox-html.el
index 2c13bf6..c329b72 100644
--- a/lisp/ox-html.el
+++ b/lisp/ox-html.el
@@ -609,10 +609,10 @@ Warning: non-nil may break indentation of source code blocks."
   :type 'boolean)
 
 (defcustom org-html-use-unicode-chars nil
-  "Non-nil means to use unicode characters instead of HTML entities."
+  "Non-nil means to use unicode characters for org-entities instead of HTML codes."
   :group 'org-export-html
-  :version "24.4"
-  :package-version '(Org . "8.0")
+  :version "25.1"
+  :package-version '(Org . "8.3")
   :type 'boolean)
 
 ;;;; Drawers
@@ -2359,7 +2359,9 @@ holding contextual information.  See `org-export-data'."
   "Transcode an ENTITY object from Org to HTML.
 CONTENTS are the definition itself.  INFO is a plist holding
 contextual information."
-  (org-element-property :html entity))
+  (if (plist-get info :html-use-unicode-chars)
+      (org-element-property :utf-8 entity)
+    (org-element-property :html entity)))
 
 ;;;; Example Block
 
@@ -3500,9 +3502,6 @@ contextual information."
     (set-auto-mode t)
     (if (plist-get info :html-indent)
 	(indent-region (point-min) (point-max)))
-    (when (plist-get info :html-use-unicode-chars)
-      (require 'mm-url)
-      (mm-url-decode-entities))
     (buffer-substring-no-properties (point-min) (point-max))))
 
 \f
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: org-html-use-unicode-chars breaks source code blocks
  2015-08-04 17:35 ` Rasmus
@ 2015-08-04 18:37   ` Nicolas Goaziou
  2015-08-07  9:56     ` Rasmus
  0 siblings, 1 reply; 14+ messages in thread
From: Nicolas Goaziou @ 2015-08-04 18:37 UTC (permalink / raw)
  To: Rasmus; +Cc: emacs-orgmode

Hello,

Rasmus <rasmus@gmx.us> writes:

> Vladimir Alexiev <vladimir.alexiev@ontotext.com> writes:
>
>> I've set org-html-use-unicode-chars since I want ox-html to leave IRIs as IRIs.
>> But this has another undesired effect: it breaks <URL> references in code,
>> since it doesn't escape the brackets.
>
> I think this should only apply to entities.  Any reason to do it on the
> whole output?  Nicolas?

It was introduced in e8742b78e0a982a7fca0bf25b4f3551be58660ef. I'm not
sure about the intent of this variable but I tend to think it is about
beautification of the output. As a consequence, it isn't meant to apply
to Org entities specifically.

However, as you noticed, it is not subtle enough to apply
`mm-url-decode-entities' on the full output. It needs to be applied
piece-wise wherever that makes sense. `org-html-entity' is one case.
Maybe `org-html-plain-text' for another one.


Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: org-html-use-unicode-chars breaks source code blocks
  2015-08-04 18:37   ` Nicolas Goaziou
@ 2015-08-07  9:56     ` Rasmus
  2015-08-07 10:37       ` Nicolas Goaziou
  0 siblings, 1 reply; 14+ messages in thread
From: Rasmus @ 2015-08-07  9:56 UTC (permalink / raw)
  To: emacs-orgmode

Nicolas Goaziou <mail@nicolasgoaziou.fr> writes:

> Hello,
>
> Rasmus <rasmus@gmx.us> writes:
>
>> Vladimir Alexiev <vladimir.alexiev@ontotext.com> writes:
>>
>>> I've set org-html-use-unicode-chars since I want ox-html to leave IRIs as IRIs.
>>> But this has another undesired effect: it breaks <URL> references in code,
>>> since it doesn't escape the brackets.
>>
>> I think this should only apply to entities.  Any reason to do it on the
>> whole output?  Nicolas?
>
> It was introduced in e8742b78e0a982a7fca0bf25b4f3551be58660ef. I'm not
> sure about the intent of this variable but I tend to think it is about
> beautification of the output. As a consequence, it isn't meant to apply
> to Org entities specifically.
>
> However, as you noticed, it is not subtle enough to apply
> `mm-url-decode-entities' on the full output. It needs to be applied
> piece-wise wherever that makes sense. `org-html-entity' is one case.
> Maybe `org-html-plain-text' for another one.

OK.  I added it to plain-text as well.  What is an example of a plain-text
that would need beautification?

Should we apply it to snippets as well?  In the spirit of "beautification"
it would make sense, but it could also seem like a bad choice.

Rasmus

-- 
Slowly unravels in a ball of yarn and the devil collects it

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: org-html-use-unicode-chars breaks source code blocks
  2015-08-07  9:56     ` Rasmus
@ 2015-08-07 10:37       ` Nicolas Goaziou
  2015-08-07 10:57         ` Rasmus
  0 siblings, 1 reply; 14+ messages in thread
From: Nicolas Goaziou @ 2015-08-07 10:37 UTC (permalink / raw)
  To: Rasmus; +Cc: emacs-orgmode

> OK.  I added it to plain-text as well.  What is an example of a plain-text
> that would need beautification?

To tell the truth, I don't know why we need beautification in the first
place. Bastien introduced it, so he may be able to answer.

> Should we apply it to snippets as well?  In the spirit of "beautification"
> it would make sense, but it could also seem like a bad choice.

I think it makes sense to apply it to snippets, indeed, but see above.

Regards,

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: org-html-use-unicode-chars breaks source code blocks
  2015-08-07 10:37       ` Nicolas Goaziou
@ 2015-08-07 10:57         ` Rasmus
  2015-08-08 21:09           ` Andreas Leha
  2015-08-16 14:03           ` Bastien Guerry
  0 siblings, 2 replies; 14+ messages in thread
From: Rasmus @ 2015-08-07 10:57 UTC (permalink / raw)
  To: emacs-orgmode

Nicolas Goaziou <mail@nicolasgoaziou.fr> writes:

>> OK.  I added it to plain-text as well.  What is an example of a plain-text
>> that would need beautification?
>
> To tell the truth, I don't know why we need beautification in the first
> place. Bastien introduced it, so he may be able to answer.

To this extend; me neither.  But Vladimir uses it to "leave IRIs as IRIs"
(I don't know what this means).

>> Should we apply it to snippets as well?  In the spirit of "beautification"
>> it would make sense, but it could also seem like a bad choice.
>
> I think it makes sense to apply it to snippets, indeed, but see above.

My initial reaction was to kill it as well.  But I might feel like this a
bit to often (I feel the same way about headline keywords like COMMENT).

Rasmus

-- 
9000!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: org-html-use-unicode-chars breaks source code blocks
  2015-08-07 10:57         ` Rasmus
@ 2015-08-08 21:09           ` Andreas Leha
  2015-08-09 19:32             ` Sebastien Vauban
  2015-08-16 13:48             ` Bastien Guerry
  2015-08-16 14:03           ` Bastien Guerry
  1 sibling, 2 replies; 14+ messages in thread
From: Andreas Leha @ 2015-08-08 21:09 UTC (permalink / raw)
  To: emacs-orgmode

Hi,

[ deleted: discussion on beatification ]

>
> My initial reaction was to kill it as well.  But I might feel like this a
> bit to often (I feel the same way about headline keywords like COMMENT).

There has been repeated 'bashing' of the COMMENT keyword lately on this
list.  Let me just raise a voice in defence.  I do not mind the syntax
too much, but the functionality of commenting a whole subtree without
loosing the outline functionality is really handy.  Especially also in
distinction to the equally handy :noexport: tag.

So, even if there is probably not a high risk for the COMMENT keyword to
be dropped I just wanted to express my support for it.

Regards,
Andreas

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: org-html-use-unicode-chars breaks source code blocks
  2015-08-08 21:09           ` Andreas Leha
@ 2015-08-09 19:32             ` Sebastien Vauban
  2015-08-16 13:48             ` Bastien Guerry
  1 sibling, 0 replies; 14+ messages in thread
From: Sebastien Vauban @ 2015-08-09 19:32 UTC (permalink / raw)
  To: emacs-orgmode-mXXj517/zsQ

Andreas Leha <andreas.leha-A1rZ2h3LdSKGMSlLMZIubhS11BummzK+@public.gmane.org> writes:
> [ deleted: discussion on beatification ]
>
>> My initial reaction was to kill it as well.  But I might feel like this a
>> bit to often (I feel the same way about headline keywords like COMMENT).
>
> There has been repeated 'bashing' of the COMMENT keyword lately on this
> list.  Let me just raise a voice in defence.  I do not mind the syntax
> too much, but the functionality of commenting a whole subtree without
> loosing the outline functionality is really handy.  Especially also in
> distinction to the equally handy :noexport: tag.
>
> So, even if there is probably not a high risk for the COMMENT keyword to
> be dropped I just wanted to express my support for it.

+1

Both COMMENT and :noexport: are necessary, for achieving different tasks.

Best regards,
  Seb

-- 
Sebastien Vauban

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: org-html-use-unicode-chars breaks source code blocks
  2015-08-08 21:09           ` Andreas Leha
  2015-08-09 19:32             ` Sebastien Vauban
@ 2015-08-16 13:48             ` Bastien Guerry
  2015-08-16 18:47               ` Brady Trainor
  1 sibling, 1 reply; 14+ messages in thread
From: Bastien Guerry @ 2015-08-16 13:48 UTC (permalink / raw)
  To: Andreas Leha; +Cc: emacs-orgmode

Andreas Leha <andreas.leha@med.uni-goettingen.de> writes:

> So, even if there is probably not a high risk for the COMMENT keyword to
> be dropped I just wanted to express my support for it.

COMMENT will stay, for sure.

-- 
 Bastien

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: org-html-use-unicode-chars breaks source code blocks
  2015-08-07 10:57         ` Rasmus
  2015-08-08 21:09           ` Andreas Leha
@ 2015-08-16 14:03           ` Bastien Guerry
  1 sibling, 0 replies; 14+ messages in thread
From: Bastien Guerry @ 2015-08-16 14:03 UTC (permalink / raw)
  To: Rasmus; +Cc: emacs-orgmode

Hi,

I removed `org-html-use-unicode-chars'.

Thanks,

-- 
 Bastien

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: org-html-use-unicode-chars breaks source code blocks
  2015-08-16 13:48             ` Bastien Guerry
@ 2015-08-16 18:47               ` Brady Trainor
  2015-08-17  8:01                 ` Nicolas Goaziou
  0 siblings, 1 reply; 14+ messages in thread
From: Brady Trainor @ 2015-08-16 18:47 UTC (permalink / raw)
  To: emacs-orgmode

Speaking of COMMENT, I had noticed some strange behavior if I have a state like COMMENTED_OUT. 

If I have a header like =#+TODO: TODO COMMENTED_OUT | DONE=, and cycle through state with S-<right arrow>, it gets pretty wonky. For now, I simply use COMMENT when I have a section I want to consider as commented out, but ideally this can behavior can be different? 

Always true, but I haven't been on the mailing list for a spell, so, big thanks to all who contribute to this software. 

Bastien Guerry <bzg@gnu.org> writes:

> Andreas Leha <andreas.leha@med.uni-goettingen.de> writes:
>
>> So, even if there is probably not a high risk for the COMMENT keyword to
>> be dropped I just wanted to express my support for it.
>
> COMMENT will stay, for sure.

-- 
Brady

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: org-html-use-unicode-chars breaks source code blocks
  2015-08-16 18:47               ` Brady Trainor
@ 2015-08-17  8:01                 ` Nicolas Goaziou
  2015-08-17  8:41                   ` Brady Trainor
  2015-08-17 16:44                   ` Rasmus
  0 siblings, 2 replies; 14+ messages in thread
From: Nicolas Goaziou @ 2015-08-17  8:01 UTC (permalink / raw)
  To: Brady Trainor; +Cc: emacs-orgmode

Hello,

Brady Trainor <algebrat@uw.edu> writes:

> Speaking of COMMENT, I had noticed some strange behavior if I have a state like COMMENTED_OUT. 
>
> If I have a header like =#+TODO: TODO COMMENTED_OUT | DONE=, and cycle
> through state with S-<right arrow>, it gets pretty wonky.

Could you elaborate a bit? I cannot reproduce anything suspicious except
a minor fontification glitch.


Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: org-html-use-unicode-chars breaks source code blocks
  2015-08-17  8:01                 ` Nicolas Goaziou
@ 2015-08-17  8:41                   ` Brady Trainor
  2015-08-17 16:44                   ` Rasmus
  1 sibling, 0 replies; 14+ messages in thread
From: Brady Trainor @ 2015-08-17  8:41 UTC (permalink / raw)
  To: emacs-orgmode

Hello,

Nicolas Goaziou <mail@nicolasgoaziou.fr> writes:

> Hello,
>
> Brady Trainor <algebrat@uw.edu> writes:
>
>> Speaking of COMMENT, I had noticed some strange behavior if I have a
>> state like COMMENTED_OUT.
>>
>> If I have a header like =#+TODO: TODO COMMENTED_OUT | DONE=, and cycle
>> through state with S-<right arrow>, it gets pretty wonky.
>
> Could you elaborate a bit? I cannot reproduce anything suspicious except
> a minor fontification glitch.
>
>
> Regards,

Ah, I should have checked 8.3... This is on 8.2.10 that it was doing this. In 8.2.10, as states were passed through, I could see several different all-caps states in one headline at once, though only one would be highlighted.

I guess on 8.3, it is just that COMMENT of COMMENTED_OUT would be highlighted, leaving ED_OUT unhighlighted. I am guessing that is what you are seeing. 

--
Brady

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: org-html-use-unicode-chars breaks source code blocks
  2015-08-17  8:01                 ` Nicolas Goaziou
  2015-08-17  8:41                   ` Brady Trainor
@ 2015-08-17 16:44                   ` Rasmus
  1 sibling, 0 replies; 14+ messages in thread
From: Rasmus @ 2015-08-17 16:44 UTC (permalink / raw)
  To: emacs-orgmode

Nicolas Goaziou <mail@nicolasgoaziou.fr> writes:

> Hello,
>
> Brady Trainor <algebrat@uw.edu> writes:
>
>> Speaking of COMMENT, I had noticed some strange behavior if I have a state like COMMENTED_OUT. 
>>
>> If I have a header like =#+TODO: TODO COMMENTED_OUT | DONE=, and cycle
>> through state with S-<right arrow>, it gets pretty wonky.
>
> Could you elaborate a bit? I cannot reproduce anything suspicious except
> a minor fontification glitch.

I only saw the fontification error as well, which can be fixed in
org-set-font-lock-defaults.  AFAICT, you did not fix this, right?

Rasmus

-- 
The right to be left alone is a human right

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-08-17 16:45 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-04 13:40 org-html-use-unicode-chars breaks source code blocks Vladimir Alexiev
2015-08-04 17:35 ` Rasmus
2015-08-04 18:37   ` Nicolas Goaziou
2015-08-07  9:56     ` Rasmus
2015-08-07 10:37       ` Nicolas Goaziou
2015-08-07 10:57         ` Rasmus
2015-08-08 21:09           ` Andreas Leha
2015-08-09 19:32             ` Sebastien Vauban
2015-08-16 13:48             ` Bastien Guerry
2015-08-16 18:47               ` Brady Trainor
2015-08-17  8:01                 ` Nicolas Goaziou
2015-08-17  8:41                   ` Brady Trainor
2015-08-17 16:44                   ` Rasmus
2015-08-16 14:03           ` Bastien Guerry

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).