From mboxrd@z Thu Jan  1 00:00:00 1970
From: torys.anderson@gmail.com (Tory S. Anderson)
Subject: =?UTF-8?B?UmU6IE9yZ21vZGUg4oaSIE9EVDogQ2VydGFpbiBjaGFycyBicmVh?=
 =?UTF-8?B?ayBleHBvcnQ=?=
Date: Fri, 13 Feb 2015 11:41:35 -0500
Message-ID: <8761b569xc.fsf@gmail.com>
References: <87r3tu5bus.fsf@gmail.com> <87386avzrl.fsf@gmx.us>
	<87egpt6drz.fsf@gmail.com> <87mw4hn6bx.fsf@gmx.us>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Return-path: <emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org>
Received: from eggs.gnu.org ([2001:4830:134:3::10]:50445)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <torys.anderson@gmail.com>) id 1YMJJF-00058p-Jd
	for emacs-orgmode@gnu.org; Fri, 13 Feb 2015 11:41:42 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <torys.anderson@gmail.com>) id 1YMJJB-0006NK-1U
	for emacs-orgmode@gnu.org; Fri, 13 Feb 2015 11:41:41 -0500
Received: from mail-qc0-x22c.google.com ([2607:f8b0:400d:c01::22c]:38386)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <torys.anderson@gmail.com>) id 1YMJJA-0006Ms-UG
	for emacs-orgmode@gnu.org; Fri, 13 Feb 2015 11:41:36 -0500
Received: by mail-qc0-f172.google.com with SMTP id i8so1131975qcq.3
	for <emacs-orgmode@gnu.org>; Fri, 13 Feb 2015 08:41:36 -0800 (PST)
In-Reply-To: <87mw4hn6bx.fsf@gmx.us> (rasmus@gmx.us's message of "Fri, 13 Feb
	2015 17:07:14 +0100")
List-Id: "General discussions about Org-mode." <emacs-orgmode.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-orgmode>,
	<mailto:emacs-orgmode-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-orgmode>
List-Post: <mailto:emacs-orgmode@gnu.org>
List-Help: <mailto:emacs-orgmode-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-orgmode>,
	<mailto:emacs-orgmode-request@gnu.org?subject=subscribe>
Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org
Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org
To: Rasmus <rasmus@gmx.us>
Cc: emacs-orgmode@gnu.org

There is a helpful wiki page now that you found XML; it even mentions my specific character.[1] The main source seems to be at the w3.org spec.[2]

Rasmus <rasmus@gmx.us> writes:

> torys.anderson@gmail.com (Tory S. Anderson) writes:
>
>> From a user perspective just stripping the characters seems best to
>> me, but finding out what the characters seems obnoxious. 
>
> But maybe there is a valid way to represent such characters in XML?  At
> the very least entities must be replaced before stripping these...
>
>> Neither a quick search nor skimming the ODT doc specification[1][2] seem
>> to give any insight into a set of illegal characters. Does elisp have
>> anything similar to Java's "isWhitespace"[3] that could be used to check
>> character features?
>
> It's an XML thing.  When I tried to open the contents.xml with Firefox it
> also says broken XML.  But I also don't know which are the characters that
> are not supported by XML.
>
> —Rasmus

Footnotes: 
[1]  https://en.wikipedia.org/wiki/Valid_characters_in_XML#XML_1.1

[2]  http://www.w3.org/TR/xml11/#charsets