From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jambunathan K Subject: Re: ODT Charset/Encoding issues (was question about ODT export behavior) Date: Sun, 17 Jul 2011 01:43:28 +0530 Message-ID: <81r55qx9uf.fsf_-_@gmail.com> References: <817h7mce7q.fsf@gmail.com> <81oc0yaqes.fsf@gmail.com> <4E1E91D3.70800@diplan.de> <87wrfkx3wo.fsf@gnu.org> <81oc0w9jie.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain Return-path: Received: from eggs.gnu.org ([140.186.70.92]:43519) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QiBFE-00061m-Bd for emacs-orgmode@gnu.org; Sat, 16 Jul 2011 16:13:52 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QiBF9-00058d-Rq for emacs-orgmode@gnu.org; Sat, 16 Jul 2011 16:13:48 -0400 Received: from mail-pz0-f43.google.com ([209.85.210.43]:35955) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QiBF9-00058Z-Mf for emacs-orgmode@gnu.org; Sat, 16 Jul 2011 16:13:43 -0400 Received: by pzk1 with SMTP id 1so3115445pzk.30 for ; Sat, 16 Jul 2011 13:13:42 -0700 (PDT) In-Reply-To: (Renzo Been's message of "Fri, 15 Jul 2011 20:34:57 +0000 (UTC)") List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Renzo Been Cc: emacs-orgmode@gnu.org Renzo > I just want to add one point that I did not find in the org-manual. I tested > some of my org-files and exported them to the OpenOffice format. When I tried to > open these documents in OpenOffice, they were corrupt and could not be opened. > > I soon found out why. If you want to export an org-mode file to .odt, you need > to explicitly set the file encoding to UTF-8 (I usually use iso-8859-1 encoding > for my files), like: > #-*- mode: org; coding: utf-8; -*- > After that OpenOffice could open the files without any problems. I use English for communication and I have to admit that I have zero understanding of things like character sets, encodings etc. Thanks for the above note. I surely see is a bug but my poor understanding prevents me from quantifying it further. Could you please send me a minimal iso-8859-1 test.org file and the associated corrupted test.odt file? I will look in to this issue. 1. Do you have any specific requirement on how the component xml files be encoded? A cursory look at the odt exporter suggests that it could actually be emitting xml files in iso-8859-1 format while wrongly claiming UTF-8 encoding as below --8<---------------cut here---------------start------------->8--- --8<---------------cut here---------------end--------------->8--- 2. Should the xml file be always ejected in UTF-8 irrespective of how the original Org file is encoded. [Notes to Self] [Notes from odbook] Para 3 of http://books.evc-cit.info/odbook/apa.html#appc-11-fm2xml says --8<---------------cut here---------------start------------->8--- OpenDocument files are always encoded in UTF-8. --8<---------------cut here---------------end--------------->8--- Para 2 of http://books.evc-cit.info/odbook/apa.html#xml-other-char-encodings-section says --8<---------------cut here---------------start------------->8--- XML 1.0 allows a document to be encoded in any character set registered with the Internet Assigned Numbers Authority (IANA). European documents are commonly encoded in one of the ISO Latin character sets, such as ISO-8859-1. Japanese documents commonly use Shift-JIS, and Chinese documents use GB2312 and Big 5. --8<---------------cut here---------------end--------------->8--- Para 4 of http://books.evc-cit.info/odbook/apa.html#xml-other-char-encodings-section says --8<---------------cut here---------------start------------->8--- XML processors are not required by the XML 1.0 specification to support any more than UTF-8 and UTF-16, but most commonly support other encodings, such as US-ASCII and ISO-8859-1. --8<---------------cut here---------------end--------------->8--- [Notes from XMLmind XSL-FO Converter] XFC supports outputting of content.xml and styles.xml in UTF-8 as well as ISO-8859-1. http://xml.web.cern.ch/XML/www.xmlmind.com/xfc_perso_java-4_4_0/doc/user/command_line_java.html says ,---- [see outputEncoding section] | For OpenDocument output (.odt), this option specifies the encoding of | XML content (files styles.xml and content.xml) in the output | document. All encodings available in the current JVM are supported. The | option value may be either the encoding name (e.g. ISO8859_1) or the | charset name (e.g. ISO-8859-1). The default value is UTF8. `---- --