From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Brand Subject: Re: org-feed XML entities and character encoding Date: Fri, 13 Aug 2010 21:03:52 +0200 Message-ID: <4C659718.9060906@alumni.ethz.ch> References: <4C61AF9E.7040903@alumni.ethz.ch> <8762ze7bc8.wl%dmaus@ictsoc.de> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from [140.186.70.92] (port=41049 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OjzbT-0005bP-6h for emacs-orgmode@gnu.org; Fri, 13 Aug 2010 15:07:45 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OjzXp-0003mt-1O for emacs-orgmode@gnu.org; Fri, 13 Aug 2010 15:03:58 -0400 Received: from mail01.solnet.ch ([212.101.4.135]:59473) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OjzXo-0003mR-Sv for emacs-orgmode@gnu.org; Fri, 13 Aug 2010 15:03:57 -0400 In-Reply-To: <8762ze7bc8.wl%dmaus@ictsoc.de> List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: David Maus Cc: julien@danjou.info, Org Mode , zwz Hi David On 10-08-13 17:59 , David Maus wrote: >> 2. request for help about an issue with multibyte character encoding >> ==================================================================== >> >> There is an issue with multibyte characters that appear in the input >> as unescaped, multibyte encoded characters (not as XML entities, as XML >> entities multibyte characters are simply substituted correctly). I >> looked for an example with a character encoding specified in the first >> line of the XML feed like >> >> and found one here: >> http://www.openscreencast.de/blog/rss.xml >> [...] > > The problem with this feed is, that it contains raw unicode characters > that must be converted to utf-8 before they can be properly inserted > in the target buffer. > > Attached patch does this by explicitely decoding new entries according > to their detected character encoding. > > Btw.: Helpful introduction to the topic gives > > The Absolute Minimum Every Software Developer Absolutely, Positively > Must Know About Unicode and Character Sets (No Excuses!) > > by Joel Spolsky > > http://www.joelonsoftware.com/articles/Unicode.html Thank you very much for your patch, it resolves this issue with org-feed.el like expected. I tested your patch with the two feeds http://www.openscreencast.de/blog/rss.xml (declared utf-8) and http://pod.drs.ch/world_music_special_mpx.xml (not declared utf-8) described more by me earlier and a dozen other feeds, all with character encoding utf-8. Michael