From mboxrd@z Thu Jan 1 00:00:00 1970 From: torys.anderson@gmail.com (Tory S. Anderson) Subject: =?UTF-8?B?UmU6IE9yZ21vZGUg4oaSIE9EVDogQ2VydGFpbiBjaGFycyBicmVh?= =?UTF-8?B?ayBleHBvcnQ=?= Date: Fri, 13 Feb 2015 11:41:35 -0500 Message-ID: <8761b569xc.fsf@gmail.com> References: <87r3tu5bus.fsf@gmail.com> <87386avzrl.fsf@gmx.us> <87egpt6drz.fsf@gmail.com> <87mw4hn6bx.fsf@gmx.us> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:50445) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YMJJF-00058p-Jd for emacs-orgmode@gnu.org; Fri, 13 Feb 2015 11:41:42 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YMJJB-0006NK-1U for emacs-orgmode@gnu.org; Fri, 13 Feb 2015 11:41:41 -0500 Received: from mail-qc0-x22c.google.com ([2607:f8b0:400d:c01::22c]:38386) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YMJJA-0006Ms-UG for emacs-orgmode@gnu.org; Fri, 13 Feb 2015 11:41:36 -0500 Received: by mail-qc0-f172.google.com with SMTP id i8so1131975qcq.3 for ; Fri, 13 Feb 2015 08:41:36 -0800 (PST) In-Reply-To: <87mw4hn6bx.fsf@gmx.us> (rasmus@gmx.us's message of "Fri, 13 Feb 2015 17:07:14 +0100") List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Rasmus Cc: emacs-orgmode@gnu.org There is a helpful wiki page now that you found XML; it even mentions my specific character.[1] The main source seems to be at the w3.org spec.[2] Rasmus writes: > torys.anderson@gmail.com (Tory S. Anderson) writes: > >> From a user perspective just stripping the characters seems best to >> me, but finding out what the characters seems obnoxious. > > But maybe there is a valid way to represent such characters in XML? At > the very least entities must be replaced before stripping these... > >> Neither a quick search nor skimming the ODT doc specification[1][2] seem >> to give any insight into a set of illegal characters. Does elisp have >> anything similar to Java's "isWhitespace"[3] that could be used to check >> character features? > > It's an XML thing. When I tried to open the contents.xml with Firefox it > also says broken XML. But I also don't know which are the characters that > are not supported by XML. > > —Rasmus Footnotes: [1] https://en.wikipedia.org/wiki/Valid_characters_in_XML#XML_1.1 [2] http://www.w3.org/TR/xml11/#charsets