From mboxrd@z Thu Jan 1 00:00:00 1970 From: Renzo Been Subject: Re: ODT Charset/Encoding issues (was question about ODT export behavior) Date: Sun, 17 Jul 2011 16:12:09 +0200 Message-ID: References: <817h7mce7q.fsf@gmail.com> <81oc0yaqes.fsf@gmail.com> <4E1E91D3.70800@diplan.de> <87wrfkx3wo.fsf@gnu.org> <81oc0w9jie.fsf@gmail.com> <81r55qx9uf.fsf_-_@gmail.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary=000e0cd2e6dee2ecfa04a84477cd Return-path: Received: from eggs.gnu.org ([140.186.70.92]:45286) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QiS4r-00046Z-To for emacs-orgmode@gnu.org; Sun, 17 Jul 2011 10:12:15 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QiS4p-0002FJ-Sv for emacs-orgmode@gnu.org; Sun, 17 Jul 2011 10:12:13 -0400 Received: from mail-gy0-f169.google.com ([209.85.160.169]:49442) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QiS4p-0002FF-IH for emacs-orgmode@gnu.org; Sun, 17 Jul 2011 10:12:11 -0400 Received: by gyg13 with SMTP id 13so1153906gyg.0 for ; Sun, 17 Jul 2011 07:12:10 -0700 (PDT) In-Reply-To: <81r55qx9uf.fsf_-_@gmail.com> List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Jambunathan K Cc: emacs-orgmode@gnu.org --000e0cd2e6dee2ecfa04a84477cd Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi Jambunathan, See comments below. Ciao, Renzo P.S. I'm on a camping-site right now, so I do not have good Internet access= ... On 16 July 2011 22:13, Jambunathan K wrote: > > Renzo > >> I just want to add one point that I did not find in the org-manual. =A0I= tested >> some of my org-files and exported them to the OpenOffice format. When I = tried to >> open these documents in OpenOffice, they were corrupt and could not be o= pened. >> >> I soon found out why. If you want to export an org-mode file to .odt, yo= u need >> to explicitly set the file encoding to UTF-8 (I usually use iso-8859-1 e= ncoding >> for my files), like: >> #-*- mode: org; coding: utf-8; -*- >> After that OpenOffice could open the files without any problems. > > I use English for communication and I have to admit that I have zero > understanding of things like character sets, encodings etc. As for communicating; I'm from the border regions of The Netherlands, Belgi= um and Germany... And therefore I'm multilingual, and often need to type words with accents. > Thanks for the above note. I surely see is a bug but my poor > understanding prevents me from quantifying it further. Well... I would not really see it as a bug... As long as it is mentioned in= the documentation, that org-file encoding's other then utf-8 could result in co= rrupt output-files. > Could you please send me a minimal iso-8859-1 test.org file and the > associated corrupted test.odt file? I will look in to this issue. See attachment. I can only send you the org file, because I do not have acc= ess to a working Emacs at the moment... > 1. Do you have any specific requirement on how the component xml files > =A0 be encoded? A cursory look at the odt exporter suggests that it could > =A0 actually be emitting xml files in iso-8859-1 format while wrongly > =A0 claiming UTF-8 encoding as below > > --8<---------------cut here---------------start------------->8--- > > --8<---------------cut here---------------end--------------->8--- > > 2. Should the xml file be always ejected in UTF-8 irrespective of how > =A0 the original Org file is encoded. Yes that would seem a good solution to me... If the odt-exporter checks the files encoding, and then changes the encoding to utf-8 (maybe using a tempo= rary buffer?) before the actual exporting, then there would be no further problems... As for the idea that the OpenOffice xml can actually be in another encoding than utf-8; I do not know how much work that would be for you, to implement= in the odt-exporter. It might be to much effort... Also I don't know if such an OpenOffice document will open with no problems= in all OpenOffice applications. > [Notes to Self] > [Notes from odbook] > > Para 3 of http://books.evc-cit.info/odbook/apa.html#appc-11-fm2xml > says > > --8<---------------cut here---------------start------------->8--- > OpenDocument files are always encoded in UTF-8. > --8<---------------cut here---------------end--------------->8--- > > Para 2 of > http://books.evc-cit.info/odbook/apa.html#xml-other-char-encodings-sectio= n > says > > --8<---------------cut here---------------start------------->8--- > XML 1.0 allows a document to be encoded in any character set registered > with the Internet Assigned Numbers Authority (IANA). European documents > are commonly encoded in one of the ISO Latin character sets, such as > ISO-8859-1. Japanese documents commonly use Shift-JIS, and Chinese > documents use GB2312 and Big 5. > --8<---------------cut here---------------end--------------->8--- > > Para 4 of > http://books.evc-cit.info/odbook/apa.html#xml-other-char-encodings-sectio= n > says > > --8<---------------cut here---------------start------------->8--- > XML processors are not required by the XML 1.0 specification to support > any more than UTF-8 and UTF-16, but most commonly support other > encodings, such as US-ASCII and ISO-8859-1. > --8<---------------cut here---------------end--------------->8--- > > > [Notes from XMLmind XSL-FO Converter] > > > XFC supports outputting of content.xml and styles.xml in UTF-8 as well > as ISO-8859-1. > > http://xml.web.cern.ch/XML/www.xmlmind.com/xfc_perso_java-4_4_0/doc/user/= command_line_java.html > > says > > ,---- [see outputEncoding section] > | For OpenDocument output (.odt), this option specifies the encoding of > | XML content (files styles.xml and content.xml) in the output > | document. All encodings available in the current JVM are supported. The > | option value may be either the encoding name (e.g. ISO8859_1) or the > | charset name (e.g. ISO-8859-1). The default value is UTF8. > `---- > > -- --000e0cd2e6dee2ecfa04a84477cd Content-Type: application/zip; name="test-encoding.zip" Content-Disposition: attachment; filename="test-encoding.zip" Content-Transfer-Encoding: base64 X-Attachment-Id: f_gq82qv7r0 UEsDBBQAAAAIAPN58T6MjMXlkwAAALgAAAARAAAAdGVzdC1lbmNvZGluZy50eHRNzkEKwjAUBNB9 TjHgrhLBhVDbrV0IQje5QE2+bTDml9+E9srewkA3boc3wxx0pfFhRw1YxhaWnY9jA7+wruvLVZ91 jn5rUZw6HM3dPLoG/c2g22aWBBJhwerT9NcBxX1IVTATgSNhZXG7GyIGaykmPCnwWsIQYIe8EFLB s7DLlhxO7BJePpSYCy3fRPKcVP/+qh9QSwECFAAUAAAACADzefE+jIzF5ZMAAAC4AAAAEQAAAAAA AAAAACAAAAAAAAAAdGVzdC1lbmNvZGluZy50eHRQSwUGAAAAAAEAAQA/AAAAwgAAAAAA --000e0cd2e6dee2ecfa04a84477cd--