Hi Jambunathan, See comments below. Ciao, Renzo P.S. I'm on a camping-site right now, so I do not have good Internet access... On 16 July 2011 22:13, Jambunathan K wrote: > > Renzo > >> I just want to add one point that I did not find in the org-manual.  I tested >> some of my org-files and exported them to the OpenOffice format. When I tried to >> open these documents in OpenOffice, they were corrupt and could not be opened. >> >> I soon found out why. If you want to export an org-mode file to .odt, you need >> to explicitly set the file encoding to UTF-8 (I usually use iso-8859-1 encoding >> for my files), like: >> #-*- mode: org; coding: utf-8; -*- >> After that OpenOffice could open the files without any problems. > > I use English for communication and I have to admit that I have zero > understanding of things like character sets, encodings etc. As for communicating; I'm from the border regions of The Netherlands, Belgium and Germany... And therefore I'm multilingual, and often need to type words with accents. > Thanks for the above note. I surely see is a bug but my poor > understanding prevents me from quantifying it further. Well... I would not really see it as a bug... As long as it is mentioned in the documentation, that org-file encoding's other then utf-8 could result in corrupt output-files. > Could you please send me a minimal iso-8859-1 test.org file and the > associated corrupted test.odt file? I will look in to this issue. See attachment. I can only send you the org file, because I do not have access to a working Emacs at the moment... > 1. Do you have any specific requirement on how the component xml files >   be encoded? A cursory look at the odt exporter suggests that it could >   actually be emitting xml files in iso-8859-1 format while wrongly >   claiming UTF-8 encoding as below > > --8<---------------cut here---------------start------------->8--- > > --8<---------------cut here---------------end--------------->8--- > > 2. Should the xml file be always ejected in UTF-8 irrespective of how >   the original Org file is encoded. Yes that would seem a good solution to me... If the odt-exporter checks the files encoding, and then changes the encoding to utf-8 (maybe using a temporary buffer?) before the actual exporting, then there would be no further problems... As for the idea that the OpenOffice xml can actually be in another encoding than utf-8; I do not know how much work that would be for you, to implement in the odt-exporter. It might be to much effort... Also I don't know if such an OpenOffice document will open with no problems in all OpenOffice applications. > [Notes to Self] > [Notes from odbook] > > Para 3 of http://books.evc-cit.info/odbook/apa.html#appc-11-fm2xml > says > > --8<---------------cut here---------------start------------->8--- > OpenDocument files are always encoded in UTF-8. > --8<---------------cut here---------------end--------------->8--- > > Para 2 of > http://books.evc-cit.info/odbook/apa.html#xml-other-char-encodings-section > says > > --8<---------------cut here---------------start------------->8--- > XML 1.0 allows a document to be encoded in any character set registered > with the Internet Assigned Numbers Authority (IANA). European documents > are commonly encoded in one of the ISO Latin character sets, such as > ISO-8859-1. Japanese documents commonly use Shift-JIS, and Chinese > documents use GB2312 and Big 5. > --8<---------------cut here---------------end--------------->8--- > > Para 4 of > http://books.evc-cit.info/odbook/apa.html#xml-other-char-encodings-section > says > > --8<---------------cut here---------------start------------->8--- > XML processors are not required by the XML 1.0 specification to support > any more than UTF-8 and UTF-16, but most commonly support other > encodings, such as US-ASCII and ISO-8859-1. > --8<---------------cut here---------------end--------------->8--- > > > [Notes from XMLmind XSL-FO Converter] > > > XFC supports outputting of content.xml and styles.xml in UTF-8 as well > as ISO-8859-1. > > http://xml.web.cern.ch/XML/www.xmlmind.com/xfc_perso_java-4_4_0/doc/user/command_line_java.html > > says > > ,---- [see outputEncoding section] > | For OpenDocument output (.odt), this option specifies the encoding of > | XML content (files styles.xml and content.xml) in the output > | document. All encodings available in the current JVM are supported. The > | option value may be either the encoding name (e.g. ISO8859_1) or the > | charset name (e.g. ISO-8859-1). The default value is UTF8. > `---- > > --