emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Jambunathan K <kjambunathan@gmail.com>
To: Renzo Been <swangdoodles@gmail.com>
Cc: emacs-orgmode@gnu.org
Subject: Re: ODT Charset/Encoding issues (was question about ODT export behavior)
Date: Sun, 17 Jul 2011 01:43:28 +0530	[thread overview]
Message-ID: <81r55qx9uf.fsf_-_@gmail.com> (raw)
In-Reply-To: <loom.20110715T221004-235@post.gmane.org> (Renzo Been's message of "Fri, 15 Jul 2011 20:34:57 +0000 (UTC)")


Renzo

> I just want to add one point that I did not find in the org-manual.  I tested
> some of my org-files and exported them to the OpenOffice format. When I tried to
> open these documents in OpenOffice, they were corrupt and could not be opened.
>
> I soon found out why. If you want to export an org-mode file to .odt, you need
> to explicitly set the file encoding to UTF-8 (I usually use iso-8859-1 encoding
> for my files), like:
> #-*- mode: org; coding: utf-8; -*-
> After that OpenOffice could open the files without any problems.

I use English for communication and I have to admit that I have zero
understanding of things like character sets, encodings etc. 

Thanks for the above note. I surely see is a bug but my poor
understanding prevents me from quantifying it further.

Could you please send me a minimal iso-8859-1 test.org file and the
associated corrupted test.odt file? I will look in to this issue.

1. Do you have any specific requirement on how the component xml files
   be encoded? A cursory look at the odt exporter suggests that it could
   actually be emitting xml files in iso-8859-1 format while wrongly
   claiming UTF-8 encoding as below

--8<---------------cut here---------------start------------->8---
<?xml version="1.0" encoding="UTF-8"?>
--8<---------------cut here---------------end--------------->8---

2. Should the xml file be always ejected in UTF-8 irrespective of how
   the original Org file is encoded.


[Notes to Self]
[Notes from odbook]

Para 3 of http://books.evc-cit.info/odbook/apa.html#appc-11-fm2xml
says

--8<---------------cut here---------------start------------->8---
OpenDocument files are always encoded in UTF-8. 
--8<---------------cut here---------------end--------------->8---

Para 2 of
http://books.evc-cit.info/odbook/apa.html#xml-other-char-encodings-section
says

--8<---------------cut here---------------start------------->8---
XML 1.0 allows a document to be encoded in any character set registered
with the Internet Assigned Numbers Authority (IANA). European documents
are commonly encoded in one of the ISO Latin character sets, such as
ISO-8859-1. Japanese documents commonly use Shift-JIS, and Chinese
documents use GB2312 and Big 5.
--8<---------------cut here---------------end--------------->8---

Para 4 of
http://books.evc-cit.info/odbook/apa.html#xml-other-char-encodings-section
says

--8<---------------cut here---------------start------------->8---
XML processors are not required by the XML 1.0 specification to support
any more than UTF-8 and UTF-16, but most commonly support other
encodings, such as US-ASCII and ISO-8859-1.
--8<---------------cut here---------------end--------------->8---


[Notes from XMLmind XSL-FO Converter]


XFC supports outputting of content.xml and styles.xml in UTF-8 as well
as ISO-8859-1.

http://xml.web.cern.ch/XML/www.xmlmind.com/xfc_perso_java-4_4_0/doc/user/command_line_java.html

says

,---- [see outputEncoding section]
| For OpenDocument output (.odt), this option specifies the encoding of
| XML content (files styles.xml and content.xml) in the output
| document. All encodings available in the current JVM are supported. The
| option value may be either the encoding name (e.g. ISO8859_1) or the
| charset name (e.g. ISO-8859-1). The default value is UTF8.
`----

-- 

  reply	other threads:[~2011-07-16 20:13 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-13 13:20 question about ODT export behavior Rainer Stengele
2011-07-13 14:23 ` Bastien
2011-07-13 15:04   ` Rainer Stengele
2011-07-13 16:14     ` Bastien
2011-07-13 20:18       ` Jambunathan K
2011-07-13 16:55 ` Jambunathan K
2011-07-13 20:15   ` Jambunathan K
2011-07-14  6:50     ` Rainer Stengele
2011-07-14 15:44       ` Bastien
2011-07-15  5:54         ` Jambunathan K
2011-07-15 20:34           ` Renzo Been
2011-07-16 20:13             ` Jambunathan K [this message]
2011-07-17 14:12               ` ODT Charset/Encoding issues (was question about ODT export behavior) Renzo Been
2011-07-17 19:13                 ` Jambunathan K
2011-07-18  8:59                   ` Bastien
2011-07-22 14:38           ` [PATCH 1/2] org-odt: Improve customization of org-export-odt-styles-file Jambunathan K
2011-07-22 15:49             ` Bastien

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=81r55qx9uf.fsf_-_@gmail.com \
    --to=kjambunathan@gmail.com \
    --cc=emacs-orgmode@gnu.org \
    --cc=swangdoodles@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).