emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* [patch][ox-html] Stylistic changes
@ 2014-03-16  0:33 Rasmus
  2014-03-16  9:59 ` Nicolas Goaziou
  0 siblings, 1 reply; 10+ messages in thread
From: Rasmus @ 2014-03-16  0:33 UTC (permalink / raw)
  To: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 563 bytes --]

Hi,

Here's a couple of minor changes for ox-html.

First, I dropped the references to HTML5 hgroup since apparently W3
did the same¹.

Second, for some reason ox-html replaces a couple of entities by
itself—rather than letting org-entities do it—but uses hex references
(or whatever), rather than a "nice" HTML character entity.  The second
patch fixes this.  (I naively assume that there is not a reason for
not using the pretty references).

—Rasmus

Footnotes: 
¹   e.g. http://html5doctor.com/the-hgroup-element/

-- 
May the Force be with you

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Remove-reference-to-hgroup-in-ox-html.patch --]
[-- Type: text/x-diff, Size: 963 bytes --]

From 8325901e959e16d34546ca7bf74d7efbc8e16825 Mon Sep 17 00:00:00 2001
From: Rasmus <w530@pank.eu>
Date: Sun, 16 Mar 2014 00:36:21 +0100
Subject: [PATCH 1/2] Remove reference to hgroup in ox-html

* ox-html.el (org-html-html5-elements): Drop reference to hgroup.
---
 lisp/ox-html.el | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/lisp/ox-html.el b/lisp/ox-html.el
index a8c924f..cb95161 100644
--- a/lisp/ox-html.el
+++ b/lisp/ox-html.el
@@ -169,10 +169,8 @@
     "progress" "section" "video")
   "New elements in html5.
 
-<hgroup> is not included because it's currently impossible to
-wrap special blocks around multiple headlines. For other blocks
-that should contain headlines, use the HTML_CONTAINER property on
-the headline itself.")
+For blocks that should contain headlines, use the HTML_CONTAINER
+property on the headline itself.")
 
 (defconst org-html-special-string-regexps
   '(("\\\\-" . "&#x00ad;")		; shy
-- 
1.9.0


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: 0002-Proper-HTML-entities-for-dashes-dots-in-ox-html.patch --]
[-- Type: text/x-diff, Size: 1083 bytes --]

From bd096d2040d4ffaa517466ac85c4e0da08863bec Mon Sep 17 00:00:00 2001
From: Rasmus <w530@pank.eu>
Date: Sun, 16 Mar 2014 00:54:11 +0100
Subject: [PATCH 2/2] Proper HTML entities for dashes, dots in ox-html

* ox-html.el (org-html-special-string-regexps): Use HTML entities.
---
 lisp/ox-html.el | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lisp/ox-html.el b/lisp/ox-html.el
index cb95161..8e22df6 100644
--- a/lisp/ox-html.el
+++ b/lisp/ox-html.el
@@ -173,10 +173,10 @@ For blocks that should contain headlines, use the HTML_CONTAINER
 property on the headline itself.")
 
 (defconst org-html-special-string-regexps
-  '(("\\\\-" . "&#x00ad;")		; shy
-    ("---\\([^-]\\)" . "&#x2014;\\1")	; mdash
-    ("--\\([^-]\\)" . "&#x2013;\\1")	; ndash
-    ("\\.\\.\\." . "&#x2026;"))		; hellip
+  '(("\\\\-" . "&shy;")		; shy
+    ("---\\([^-]\\)" . "&mdash;\\1")	; mdash
+    ("--\\([^-]\\)" . "&ndash;\\1")	; ndash
+    ("\\.\\.\\." . "&hellip;"))		; hellip
   "Regular expressions for special string conversion.")
 
 (defconst org-html-scripts
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [patch][ox-html] Stylistic changes
  2014-03-16  0:33 [patch][ox-html] Stylistic changes Rasmus
@ 2014-03-16  9:59 ` Nicolas Goaziou
  2014-03-16 13:06   ` Rasmus
  2014-03-17  2:17   ` Bastien
  0 siblings, 2 replies; 10+ messages in thread
From: Nicolas Goaziou @ 2014-03-16  9:59 UTC (permalink / raw)
  To: Rasmus; +Cc: emacs-orgmode

Hello,

Rasmus <rasmus@gmx.us> writes:

> Here's a couple of minor changes for ox-html.

Thank you.

> First, I dropped the references to HTML5 hgroup since apparently W3
> did the same¹.

Applied.

> Second, for some reason ox-html replaces a couple of entities by
> itself—rather than letting org-entities do it—but uses hex references
> (or whatever), rather than a "nice" HTML character entity.  The second
> patch fixes this.  (I naively assume that there is not a reason for
> not using the pretty references).

According to the log of the commit introducing the changes, there is
a reason:


  commit f2b2c8318fa8c2ce82208d717c649377c856802c
  Author: Jambunathan K <kjambunathan@gmail.com>
  Date:   Sat Mar 2 11:00:46 2013 +0530

      Add Freemind Mindmap Back-End for Org Export Engine
      
      * contrib/lisp/ox-freemind.el: New file.
      
      * lisp/ox-html.el (org-html--tags, org-html-format-headline)
      (org-html--format-toc-headline, org-html-checkbox)
      (org-html-table-cell, org-html-timestamp)
      (org-html-verse-block, org-html-special-string-regexps):
      Replace named HTML entities with their numeric counterparts.
      This keeps Freemind backend happy.

So if the change is only stylistic, I see no reason to break
compatibility with "ox-freemind.el".


Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch][ox-html] Stylistic changes
  2014-03-16  9:59 ` Nicolas Goaziou
@ 2014-03-16 13:06   ` Rasmus
  2014-03-17  2:17   ` Bastien
  1 sibling, 0 replies; 10+ messages in thread
From: Rasmus @ 2014-03-16 13:06 UTC (permalink / raw)
  To: n.goaziou; +Cc: emacs-orgmode

Nicolas Goaziou <n.goaziou@gmail.com> writes:

>> Second, for some reason ox-html replaces a couple of entities by
>> itself—rather than letting org-entities do it—but uses hex references
>> (or whatever), rather than a "nice" HTML character entity.  The second
>> patch fixes this.  (I naively assume that there is not a reason for
>> not using the pretty references).
>
> According to the log of the commit introducing the changes, there is
> a reason:
>
>
>   commit f2b2c8318fa8c2ce82208d717c649377c856802c
>   Author: Jambunathan K <kjambunathan@gmail.com>
>   Date:   Sat Mar 2 11:00:46 2013 +0530
>
>       Add Freemind Mindmap Back-End for Org Export Engine
>       
>       * contrib/lisp/ox-freemind.el: New file.
>       
>       * lisp/ox-html.el (org-html--tags, org-html-format-headline)
>       (org-html--format-toc-headline, org-html-checkbox)
>       (org-html-table-cell, org-html-timestamp)
>       (org-html-verse-block, org-html-special-string-regexps):
>       Replace named HTML entities with their numeric counterparts.
>       This keeps Freemind backend happy.
>
> So if the change is only stylistic, I see no reason to break
> compatibility with "ox-freemind.el".

Obviously not.  I should have checked with git-blame first, but I
honestly didn't remember that this tool existed.  Thanks!

—Rasmus

-- 
May the Force be with you

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch][ox-html] Stylistic changes
  2014-03-16  9:59 ` Nicolas Goaziou
  2014-03-16 13:06   ` Rasmus
@ 2014-03-17  2:17   ` Bastien
  2014-03-17 17:01     ` Rick Frankel
  1 sibling, 1 reply; 10+ messages in thread
From: Bastien @ 2014-03-17  2:17 UTC (permalink / raw)
  To: Nicolas Goaziou; +Cc: emacs-orgmode, Rasmus

Hello,

Nicolas Goaziou <n.goaziou@gmail.com> writes:

> So if the change is only stylistic, I see no reason to break
> compatibility with "ox-freemind.el".

I would favor a solution where the HTML backend uses what's
readable (&mdash; and friends) and where the Freemind backend
deals with this.

Maybe `org-html-special-string-regexps' could be a variable
and Freemind could temporarily set it up to what it needs?

-- 
 Bastien

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch][ox-html] Stylistic changes
  2014-03-17  2:17   ` Bastien
@ 2014-03-17 17:01     ` Rick Frankel
  2014-03-17 22:19       ` Rasmus
  0 siblings, 1 reply; 10+ messages in thread
From: Rick Frankel @ 2014-03-17 17:01 UTC (permalink / raw)
  To: Bastien; +Cc: emacs-orgmode, Nicolas Goaziou, Rasmus

On Mon, Mar 17, 2014 at 03:17:10AM +0100, Bastien wrote:
> Hello,
>
> Nicolas Goaziou <n.goaziou@gmail.com> writes:
>
> > So if the change is only stylistic, I see no reason to break
> > compatibility with "ox-freemind.el".

> I would favor a solution where the HTML backend uses what's
> readable (&mdash; and friends) and where the Freemind backend
> deals with this.
>
> Maybe `org-html-special-string-regexps' could be a variable
> and Freemind could temporarily set it up to what it needs?

The use of numeric vs. named entities is not just stylistic.
XHMTL[45] only support the 5 basic named entities interally:

          - &amp; - the ampersand &
          - &quot; - the double quote "
          - &apos; single quote '
          - &lt; - less-than <
          - &gt; - greater-than >

So including any others will generate non-conforming output.
Since the change is cosmetic, I don't see the purpose in adding a lot
of conditional code to the exporter to output different entities for
html[45] vs xhtml[45].

rick

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch][ox-html] Stylistic changes
  2014-03-17 17:01     ` Rick Frankel
@ 2014-03-17 22:19       ` Rasmus
  2014-03-18  0:35         ` Rick Frankel
  0 siblings, 1 reply; 10+ messages in thread
From: Rasmus @ 2014-03-17 22:19 UTC (permalink / raw)
  To: bzg; +Cc: emacs-orgmode, n.goaziou

Hi Rick,

Rick Frankel <rick@rickster.com> writes:

> On Mon, Mar 17, 2014 at 03:17:10AM +0100, Bastien wrote:
>> Hello,
>>
>> Nicolas Goaziou <n.goaziou@gmail.com> writes:
>>
>> > So if the change is only stylistic, I see no reason to break
>> > compatibility with "ox-freemind.el".
>
>> I would favor a solution where the HTML backend uses what's
>> readable (&mdash; and friends) and where the Freemind backend
>> deals with this.
>>
>> Maybe `org-html-special-string-regexps' could be a variable
>> and Freemind could temporarily set it up to what it needs?
>
> The use of numeric vs. named entities is not just stylistic.
> XHMTL[45] only support the 5 basic named entities interally:
>
>           - &amp; - the ampersand &
>           - &quot; - the double quote "
>           - &apos; single quote '
>           - &lt; - less-than <
>           - &gt; - greater-than >
>
> So including any others will generate non-conforming output.
> Since the change is cosmetic, I don't see the purpose in adding a lot
> of conditional code to the exporter to output different entities for
> html[45] vs xhtml[45].

AFAIK, we have a lot more entities in org-entities with &PRETTY-NAME;.
When I've entities I've used a pretty name over a numeric value when I
found one.  What's you'r opinion on that?  Should we go for readable
or aim towards replacing them with these numeric values?

—Rasmus

-- 
With monopolies the cake is a lie!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch][ox-html] Stylistic changes
  2014-03-17 22:19       ` Rasmus
@ 2014-03-18  0:35         ` Rick Frankel
       [not found]           ` <874n2w2n62.fsf@gmx.us>
  0 siblings, 1 reply; 10+ messages in thread
From: Rick Frankel @ 2014-03-18  0:35 UTC (permalink / raw)
  To: Rasmus; +Cc: bzg, n.goaziou, emacs-orgmode

On Mon, Mar 17, 2014 at 11:19:27PM +0100, Rasmus wrote:
> Hi Rick,
> 
> Rick Frankel <rick@rickster.com> writes:
> 
> > On Mon, Mar 17, 2014 at 03:17:10AM +0100, Bastien wrote:
> >> Hello,
> >>
> >> Nicolas Goaziou <n.goaziou@gmail.com> writes:
> >>
> >> > So if the change is only stylistic, I see no reason to break
> >> > compatibility with "ox-freemind.el".
> >
> >> I would favor a solution where the HTML backend uses what's
> >> readable (&mdash; and friends) and where the Freemind backend
> >> deals with this.
> >>
> >> Maybe `org-html-special-string-regexps' could be a variable
> >> and Freemind could temporarily set it up to what it needs?
> >
> > The use of numeric vs. named entities is not just stylistic.
> > XHMTL[45] only support the 5 basic named entities interally:
> >
> >           - &amp; - the ampersand &
> >           - &quot; - the double quote "
> >           - &apos; single quote '
> >           - &lt; - less-than <
> >           - &gt; - greater-than >
> >
> > So including any others will generate non-conforming output.
> > Since the change is cosmetic, I don't see the purpose in adding a lot
> > of conditional code to the exporter to output different entities for
> > html[45] vs xhtml[45].
> 
> AFAIK, we have a lot more entities in org-entities with &PRETTY-NAME;.
> When I've entities I've used a pretty name over a numeric value when I
> found one.  What's you'r opinion on that?  Should we go for readable
> or aim towards replacing them with these numeric values?

We should use only those named entities (above) which are valid in
xhtml(5). So, yes, we should change to using numeric entites for any
other than the above.

rick

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch][ox-html] Stylistic changes
       [not found]           ` <874n2w2n62.fsf@gmx.us>
@ 2014-03-18 13:49             ` Rick Frankel
  2014-03-18 19:46               ` Rasmus
  0 siblings, 1 reply; 10+ messages in thread
From: Rick Frankel @ 2014-03-18 13:49 UTC (permalink / raw)
  To: Rasmus; +Cc: emacs-orgmode

On 2014-03-17 23:36, Rasmus wrote:
> Rick Frankel <rick@rickster.com> writes:
> 
> On Mon, Mar 17, 2014 at 11:19:27PM +0100, Rasmus wrote:
> Hi Rick,
> 
> 
> Rick Frankel <rick@rickster.com> writes:
> 
> > On Mon, Mar 17, 2014 at 03:17:10AM +0100, Bastien wrote:
> >> Hello,
> >>
> >> Nicolas Goaziou <n.goaziou@gmail.com> writes:
> >>
> >> > So if the change is only stylistic, I see no reason to break
> >> > compatibility with "ox-freemind.el".
> >
> >> I would favor a solution where the HTML backend uses what's
> >> readable (&mdash; and friends) and where the Freemind backend
> >> deals with this.
> >>
> >> Maybe `org-html-special-string-regexps' could be a variable
> >> and Freemind could temporarily set it up to what it needs?
> >
> > The use of numeric vs. named entities is not just stylistic.
> > XHMTL[45] only support the 5 basic named entities interally:
> >
> >           - &amp; - the ampersand &
> >           - &quot; - the double quote "
> >           - &apos; single quote '
> >           - &lt; - less-than <
> >           - &gt; - greater-than >
> >
> > So including any others will generate non-conforming output.
> > Since the change is cosmetic, I don't see the purpose in adding a lot
> > of conditional code to the exporter to output different entities for
> > html[45] vs xhtml[45].
> 
> AFAIK, we have a lot more entities in org-entities with &PRETTY-NAME;.
> When I've entities I've used a pretty name over a numeric value when I
> found one.  What's you'r opinion on that?  Should we go for readable
> or aim towards replacing them with these numeric values?
> 
> We should use only those named entities (above) which are valid in
> xhtml(5). So, yes, we should change to using numeric entites for any
> other than the above.
> 
> Since Emacs knows both the codepoints and the hex for utf8 entities it
> may be fairly simple to change the HTML representations, though I
> don't like it. . .

When you refer above to "utf-8 entities", do you mean the named html
entities (e.g., &lt;) or the actual utf-8 encoded characters?

I believe the named entities are encoding independent, while including
encoded characters in html output is fine -- although making sure the
page is served with the correct character encoding is another issue
entirely.

As to using a more extensive set of named entities, as i said above,
the problem is that the xhtml flavors don't support them, and I don't
see any advantage in making the exporter handle character encoding
differently based on ouput doctype.

As Nicolas would point out, you can always use a filter to map all the
entities in the output.

rick

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch][ox-html] Stylistic changes
  2014-03-18 13:49             ` Rick Frankel
@ 2014-03-18 19:46               ` Rasmus
  2014-03-19 14:00                 ` Rick Frankel
  0 siblings, 1 reply; 10+ messages in thread
From: Rasmus @ 2014-03-18 19:46 UTC (permalink / raw)
  To: emacs-orgmode

Rick Frankel <rick@rickster.com> writes:

> On 2014-03-17 23:36, Rasmus wrote:
> When you refer above to "utf-8 entities", do you mean the named html
> entities (e.g., &lt;) or the actual utf-8 encoded characters?

The latter.  Do M-x describe-char on such an character.  Emacs will
tell you the code points.  My conjecture is therefore that one could
write a script that would translate html values to these weird hex
string or codepoints.  It would create more ugly source output, but
perhaps better for XHTML.  Personally, I don't care about XHTML as I
have little intuition as to when to use. . .

> I believe the named entities are encoding independent, while including
> encoded characters in html output is fine -- although making sure the
> page is served with the correct character encoding is another issue
> entirely.

Not what I meant.  I'm only addressing your concern about
&HUMAN-READABLE-NAME; vs %HEX-VALUE;.

> As to using a more extensive set of named entities, as i said above,
> the problem is that the xhtml flavors don't support them, and I don't
> see any advantage in making the exporter handle character encoding
> differently based on ouput doctype.

Definitely not.  Why I ask if there's a point in changing nice
entities to ugly entities for the sake of not getting them in
XHTML-encoded documents.

> As Nicolas would point out, you can always use a filter to map all the
> entities in the output.

With ox-latex.el we for instance don't include entities that are not
supported by the default package alist.  A similar concern could be at
play here.

–Rasmus

-- 
El Rey ha muerto. ¡Larga vida al Rey!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch][ox-html] Stylistic changes
  2014-03-18 19:46               ` Rasmus
@ 2014-03-19 14:00                 ` Rick Frankel
  0 siblings, 0 replies; 10+ messages in thread
From: Rick Frankel @ 2014-03-19 14:00 UTC (permalink / raw)
  To: Rasmus; +Cc: emacs-orgmode

On 2014-03-18 15:46, Rasmus wrote:
> Rick Frankel <rick@rickster.com> writes:
> 
> On 2014-03-17 23:36, Rasmus wrote:
> When you refer above to "utf-8 entities", do you mean the named html
> entities (e.g., &lt;) or the actual utf-8 encoded characters?
> 
> The latter.  Do M-x describe-char on such an character.  Emacs will
> tell you the code points.  My conjecture is therefore that one could
> write a script that would translate html values to these weird hex
> string or codepoints.  It would create more ugly source output, but
> perhaps better for XHTML.  Personally, I don't care about XHTML as I
> have little intuition as to when to use. . .

Do you close the empty tags in your html (e.g., <br />, <hr />)? Then
you're using xhtml.

> I believe the named entities are encoding independent, while including
> encoded characters in html output is fine -- although making sure the
> page is served with the correct character encoding is another issue
> entirely.
> 
> Not what I meant.  I'm only addressing your concern about
> &HUMAN-READABLE-NAME; vs %HEX-VALUE;.
> 
> As to using a more extensive set of named entities, as i said above,
> the problem is that the xhtml flavors don't support them, and I don't
> see any advantage in making the exporter handle character encoding
> differently based on ouput doctype.
> 
> Definitely not.  Why I ask if there's a point in changing nice
> entities to ugly entities for the sake of not getting them in
> XHTML-encoded documents.

Yes we should. You can't properly post-process the html if it's
invalid xml. And the definition of "pretty" and "ugly" are subjective.

The question is, do we want to generate valid (x)html or not? My vote
is yes. In our case, html is an output format and not a source format.
In fact, we should probably compress out unnecessary whitespace, etc.
the way other web generators do to make the smallest/most efficent
output for webserving.

> As Nicolas would point out, you can always use a filter to map all the
> entities in the output.
> 
> With ox-latex.el we for instance don't include entities that are not
> supported by the default package alist.  A similar concern could be at
> play here.

Agreed.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-03-19 14:00 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-16  0:33 [patch][ox-html] Stylistic changes Rasmus
2014-03-16  9:59 ` Nicolas Goaziou
2014-03-16 13:06   ` Rasmus
2014-03-17  2:17   ` Bastien
2014-03-17 17:01     ` Rick Frankel
2014-03-17 22:19       ` Rasmus
2014-03-18  0:35         ` Rick Frankel
     [not found]           ` <874n2w2n62.fsf@gmx.us>
2014-03-18 13:49             ` Rick Frankel
2014-03-18 19:46               ` Rasmus
2014-03-19 14:00                 ` Rick Frankel

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).