emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Bastien <bzg@altern.org>
To: emacs-orgmode@gnu.org
Subject: Re: XHTML export - &nbsp; etc.
Date: Sat, 03 Nov 2007 13:51:44 +0000	[thread overview]
Message-ID: <87ve8j8mj3.fsf@bzg.ath.cx> (raw)
In-Reply-To: <41c818190711022310x147f5376r1f6c8184fe7f4b36@mail.gmail.com> (William Henney's message of "Sat, 3 Nov 2007 00:10:57 -0600")

[-- Attachment #1: Type: text/plain, Size: 619 bytes --]

"William Henney" <whenney@gmail.com> writes:

> Personally, I tend to just use the unicode characters directly in my
> org file. This works quite well om the whole (see attached example).

This is *very* nice.

I attach the corresponding .tex source and a .pdf output.

I used org-export-latex.el for this.  I had to handle backslashes issues
(hopefully this will be fixed in org-export-latex.el) and some unicode
characters could not be printed:

Unicode characters: ✧ ♥ ⊼ ⋓ ∡ □ ϑ

ERROR: Package ucs Error: Unknown Unicode character 9829 = U+2665,

I don't have any clue on how this could be handled...


[-- Attachment #2: unicode.tex --]
[-- Type: text/x-tex, Size: 7742 bytes --]

% Created 2007-11-03 sam 11:58
\documentclass[11pt,a4paper]{article}
\usepackage[mathletters]{ucs}
%\usepackage{ucs}             % Unicode support
\usepackage[utf8x]{inputenc}   % UCS' UTF-8 driver is better than the LaTeX kernel's
\usepackage[T1]{fontenc}      % The default font encoding only contains Latin characters
\usepackage{ae,aecompl}       % Almost European fonts/hyphenation do a better job than Computer Modern
\usepackage{graphicx}
\usepackage{hyperref}
\hypersetup{
  colorlinks=true,
  urlcolor=blue,
  linkcolor=blue,
  }

\title{Unicode and org mode}
\author{William Henney}
\date{03 novembre 2007}

\begin{document}

\maketitle

\section*{Notes on using unicode characters in org mode}

\subsection*{How to enter the unicode characters}

Use either the SGML or TeX input method.

\subsubsection*{Using the TeX input method}

\begin{itemize}
\item Type \texttt{C-u C-\textbackslash{} tex} to activate
\item Type things like \texttt{$\alpha$} or \texttt{$x^2$} and they will be translated
      into the unicode glyph. Use tab for completion help.
\item Pro: ``Intuitive'' to use.
\item Con: Gets in the way of typing a ``real'' backslash
\end{itemize}

\subsubsection*{Using the SGML input method}

\begin{itemize}
\item Type \texttt{C-u C-\textbackslash{} sgml} to activate
\item Type things like \texttt{\&alpha;} or \texttt{\&deg;} to get α and °.
\item Pro: Access to more glyphs than with TeX it seems
\item Con: No access to sub/superscripts
\end{itemize}

\subsection*{Punctuation}

We can use the em and en dashes—this clause is bounded by em
dashes—directly in the org file. However, they aren't very easily
distinguishable in some fonts, especially fixed width ones at small
sizes. Here is a range of numbers separated by an en dash: 223–999.  In
this sentence – following British typographic convention – the en dash
is used like the em dash is used in American typography. Here are some
minus signs:— binary (223 − 999) and unary (−0.2). Finally, here is a
hyphen for comparison: a-b. They look good in proportional fonts, such
as Times, Futura and Optima. Baskerville is the font where they look
most like their Computer Modern versions. In fact, Baskerville looks
quite a lot like CMR in other ways too… Oh, and that was an ellipsis.
 
\begin{verbatim}
   Test in fixed-width font:— range 666–999 
\end{verbatim}


\begin{tabular}{ll}
 symbol   &  examples  \\
\hline
 hyphen   &  1-2  a-b  \\
 en dash  &  1–2  a–b  \\
 em dash  &  1—2  a—b  \\
 minus    &  1−2  a−b  \\
\hline
\end{tabular}


It seems that the glyphs for the non-ascii characters are always taken
from those of the font family of the \texttt{default} face, even where
the font-lock face is specifically set to another font family.

\subsection*{Dealing with pre-formatted text}

\begin{verbatim}
This uses the org-code face, so we can easily make 
it fixed-width
\end{verbatim}

Even if we are using a proportional font family for the \texttt{default}
face, by customizing the \texttt{org-code} face, we can use a
fixed-width font (such as Monaco) for pre-formatted material (lines
starting with ``:'' and words delimited with ``=''). We can do the same
with the \texttt{org-table} face, so that the alignment of table lines
still works. In the case of the pairing of Monaco and Times, it is also
necessary to set the height of the fixed-width faces to 0.85, so that
the character sizes match up.

\subsubsection*{Bugs}

\begin{enumerate}
\item Table alignment still won't be quite right if there are unicode
       characters in the table cells, since the glyphs for these have
       variable widths, even in a \emph{supposedly} fixed-width font like
       Monaco.
\item It doesn't work for sections with the QUOTE keyword, since
       these do not use any special face.
\end{enumerate}

\subsection*{Other typographical symbols (e.g., §)}

% FIXME Cannot be printed:
% It would be nice if we could use ∗, • and ⋆ as list markers. Maybe even
% ♥ and ♠, although they look a bit heavy.

Diamond character: ♢ 

% FIXME Cannot be printed:
%✧ ♥ ⊼ ⋓ ∡ □ ϑ

\subsection*{Greek letters and math symbols: \emph{α = x² − y²}}

% FIXME Cannot be printed:
%Examples: ½∫ Ξ₀ dz = ℏc/λ ⇒ ϑ ⊂ \{⊼, ⋓, ∡\} □
Examples: ½∫ Ξ₀ dz = ℏc/λ ⇒ 

\subsubsection*{Variations between fonts (Mac OS X 10.4/Aquamacs 1.2)}

As far as I can see, only a few fonts have their own set of glyphs for
the Greek letters. Times has a nice set of glyphs, although it does have
the problem that italic nu and italic v look \emph{very} similar. Spot
the difference: \emph{νv} ! Most font families use a common set of
glyphs that have a Sans Serif feel to them, as though they were designed
to go with Helvetica (although Helvetica actually uses a slightly
different set). These glyphs have the problem that the ``gamma'' looks
too much like a ``y'' and the ``tau'' looks like a ``t''. When used with
Monaco, they look too small.
     
\subsubsection*{Super- and sub-scripts}

These don't exist for all letters. 

\subsubsection*{Example alphabets}

αβγδεζηθικλμνξοπρστυφχψω\\
/αβγδεζηθικλμνξοπρστυφχψω/\\
abcdefghijklmnopqrstuvwxyz\\
\emph{abcdefghijklmnopqrstuvwxyz} \\
ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ\\
\emph{ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ} \\
ABCDEFGHIJKLMNOPQRSTUVWXYZ\\
\emph{ABCDEFGHIJKLMNOPQRSTUVWXYZ} \\
    
\begin{verbatim}
αβγδεζηθικλμνξοπρστυφχψω
/αβγδεζηθικλμνξοπρστυφχψω/
abcdefghijklmnopqrstuvwxyz
/abcdefghijklmnopqrstuvwxyz/
ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ
/ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ/
ABCDEFGHIJKLMNOPQRSTUVWXYZ
/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
\end{verbatim}

\subsection*{Export to HTML}

This should work since the charset is declared as utf-8. However,
support in browsers is variable.

\begin{itemize}
\item Safari and Opera work the best—everything looks pretty nice in
  both.
\item Firefox does OK, but the minus signs come out as hyphen. The bold
  math looks funny too with greek letters being \textbf{very} bold.
\end{itemize}

\subsection*{Export to \LaTeX{}}

   Presumably, this won't work out of the box. I haven't tried it
   yet. However, see this \href{http://iamleeg.blogspot.com/2007/10/nice-looking-latex-unicode.html}{blog post by Graham Lee} for a possible
   solution:

\begin{verbatim}
\usepackage{ucs} % Unicode support
\usepackage[utf8x]{inputenc} % UCS' UTF-8 driver is better than the \LaTeX{} kernel's
\usepackage[T1]{fontenc} % The default font encoding only contains Latin characters
\usepackage{ae,aecompl} % Almost European fonts/hyphenation do a better job than Computer Modern
\end{verbatim}

\subsubsection*{Update [2007-11-02 Fri]}

% FIXME (some chars were removed):
Best to use the option \texttt{[mathletters]}, since otherwise it tries
to use commands like \texttt{textalpha} and I have no idea where these
are defined (and Google wasn't much help). With \texttt{mathletters} it
uses the standard math symbol greek alphabet, whether you are in math
mode or not. I guess a better solution would be to use \texttt{ifmmode}
to test if we are in math mode and use \texttt{upalpha} if we are not.

\begin{description}

\item[Problems encountered with \texttt{org-export-latex}]

\begin{itemize}
\item Backslashes in quoted text are not properly escaped.
\end{itemize}

\end{description}

\subsection*{Integration with calc}

Calc does not understand unicode as afar as I can see (e.g., it
doesn't recognise 2.3 ± 0.4 as an error form). Presumably, this
could be fixed rather easily since calc already has the concept of
display styles. 

\end{document}

[-- Attachment #3: unicode.pdf --]
[-- Type: application/pdf, Size: 117008 bytes --]

[-- Attachment #4: Type: text/plain, Size: 13 bytes --]


-- 
Bastien

[-- Attachment #5: Type: text/plain, Size: 204 bytes --]

_______________________________________________
Emacs-orgmode mailing list
Remember: use `Reply All' to send replies to the list.
Emacs-orgmode@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-orgmode

  reply	other threads:[~2007-11-03 12:51 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-11-02 12:13 XHTML export - customizing via local variables Adam Spiers
2007-11-02 12:20 ` XHTML export - &nbsp; etc Adam Spiers
2007-11-02 15:26   ` Bastien
2007-11-02 15:19     ` Adam Spiers
2007-11-02 15:33       ` Daniel Clemente
2007-11-02 15:34         ` Daniel Clemente
2007-11-02 19:21         ` Bastien
2007-11-02 18:50           ` William Henney
2007-11-02 20:20             ` Daniel Clemente
2007-11-03  0:06               ` Bastien
2007-11-04  0:00                 ` Daniel Clemente
2007-11-04 12:45                   ` Bastien
2007-11-07 16:09                     ` Daniel Clemente
2007-11-07 17:39                       ` Bastien
2007-11-07 18:04                         ` Daniel Clemente
2007-11-08  0:48                           ` Bastien
2007-11-09 19:51                             ` Daniel Clemente
2007-11-09 19:59                               ` Daniel Clemente
2007-11-09 22:37                                 ` Eddward DeVilla
2007-11-09 23:40                                   ` Xiao-Yong Jin
2007-11-12  5:07                                     ` Bastien
2007-11-12  5:22                               ` Bastien
2007-11-03  6:10               ` William Henney
2007-11-03 13:51                 ` Bastien [this message]
2007-11-03 14:23                   ` William Henney
2007-11-03 15:51                     ` Bastien
2007-11-03 16:46                       ` William Henney
2007-11-03  0:04             ` Bastien
2007-11-02 18:46       ` Bastien
2007-11-02 15:45 ` XHTML export - customizing via local variables Bastien

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ve8j8mj3.fsf@bzg.ath.cx \
    --to=bzg@altern.org \
    --cc=emacs-orgmode@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).