From: Jambunathan K <kjambunathan@gmail.com>
To: Renzo Been <swangdoodles@gmail.com>
Cc: emacs-orgmode@gnu.org
Subject: Re: ODT Charset/Encoding issues (was question about ODT export behavior)
Date: Sun, 17 Jul 2011 01:43:28 +0530 [thread overview]
Message-ID: <81r55qx9uf.fsf_-_@gmail.com> (raw)
In-Reply-To: <loom.20110715T221004-235@post.gmane.org> (Renzo Been's message of "Fri, 15 Jul 2011 20:34:57 +0000 (UTC)")
Renzo
> I just want to add one point that I did not find in the org-manual. I tested
> some of my org-files and exported them to the OpenOffice format. When I tried to
> open these documents in OpenOffice, they were corrupt and could not be opened.
>
> I soon found out why. If you want to export an org-mode file to .odt, you need
> to explicitly set the file encoding to UTF-8 (I usually use iso-8859-1 encoding
> for my files), like:
> #-*- mode: org; coding: utf-8; -*-
> After that OpenOffice could open the files without any problems.
I use English for communication and I have to admit that I have zero
understanding of things like character sets, encodings etc.
Thanks for the above note. I surely see is a bug but my poor
understanding prevents me from quantifying it further.
Could you please send me a minimal iso-8859-1 test.org file and the
associated corrupted test.odt file? I will look in to this issue.
1. Do you have any specific requirement on how the component xml files
be encoded? A cursory look at the odt exporter suggests that it could
actually be emitting xml files in iso-8859-1 format while wrongly
claiming UTF-8 encoding as below
--8<---------------cut here---------------start------------->8---
<?xml version="1.0" encoding="UTF-8"?>
--8<---------------cut here---------------end--------------->8---
2. Should the xml file be always ejected in UTF-8 irrespective of how
the original Org file is encoded.
[Notes to Self]
[Notes from odbook]
Para 3 of http://books.evc-cit.info/odbook/apa.html#appc-11-fm2xml
says
--8<---------------cut here---------------start------------->8---
OpenDocument files are always encoded in UTF-8.
--8<---------------cut here---------------end--------------->8---
Para 2 of
http://books.evc-cit.info/odbook/apa.html#xml-other-char-encodings-section
says
--8<---------------cut here---------------start------------->8---
XML 1.0 allows a document to be encoded in any character set registered
with the Internet Assigned Numbers Authority (IANA). European documents
are commonly encoded in one of the ISO Latin character sets, such as
ISO-8859-1. Japanese documents commonly use Shift-JIS, and Chinese
documents use GB2312 and Big 5.
--8<---------------cut here---------------end--------------->8---
Para 4 of
http://books.evc-cit.info/odbook/apa.html#xml-other-char-encodings-section
says
--8<---------------cut here---------------start------------->8---
XML processors are not required by the XML 1.0 specification to support
any more than UTF-8 and UTF-16, but most commonly support other
encodings, such as US-ASCII and ISO-8859-1.
--8<---------------cut here---------------end--------------->8---
[Notes from XMLmind XSL-FO Converter]
XFC supports outputting of content.xml and styles.xml in UTF-8 as well
as ISO-8859-1.
http://xml.web.cern.ch/XML/www.xmlmind.com/xfc_perso_java-4_4_0/doc/user/command_line_java.html
says
,---- [see outputEncoding section]
| For OpenDocument output (.odt), this option specifies the encoding of
| XML content (files styles.xml and content.xml) in the output
| document. All encodings available in the current JVM are supported. The
| option value may be either the encoding name (e.g. ISO8859_1) or the
| charset name (e.g. ISO-8859-1). The default value is UTF8.
`----
--
next prev parent reply other threads:[~2011-07-16 20:13 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-13 13:20 question about ODT export behavior Rainer Stengele
2011-07-13 14:23 ` Bastien
2011-07-13 15:04 ` Rainer Stengele
2011-07-13 16:14 ` Bastien
2011-07-13 20:18 ` Jambunathan K
2011-07-13 16:55 ` Jambunathan K
2011-07-13 20:15 ` Jambunathan K
2011-07-14 6:50 ` Rainer Stengele
2011-07-14 15:44 ` Bastien
2011-07-15 5:54 ` Jambunathan K
2011-07-15 20:34 ` Renzo Been
2011-07-16 20:13 ` Jambunathan K [this message]
2011-07-17 14:12 ` ODT Charset/Encoding issues (was question about ODT export behavior) Renzo Been
2011-07-17 19:13 ` Jambunathan K
2011-07-18 8:59 ` Bastien
2011-07-22 14:38 ` [PATCH 1/2] org-odt: Improve customization of org-export-odt-styles-file Jambunathan K
2011-07-22 15:49 ` Bastien
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.orgmode.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=81r55qx9uf.fsf_-_@gmail.com \
--to=kjambunathan@gmail.com \
--cc=emacs-orgmode@gnu.org \
--cc=swangdoodles@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).