From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tobias Getzner Subject: Re: [BUG] Mark-up handling chokes on Unicode white-space Date: Wed, 24 Sep 2014 09:34:25 +0200 Message-ID: <1411544065.2146.1.camel@gmx.de> References: <87a95qp8vp.fsf@gmail.com> <87ppemnqxy.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:53058) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XWh5y-0006kC-8x for emacs-orgmode@gnu.org; Wed, 24 Sep 2014 03:34:44 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XWh5r-0008Eo-UY for emacs-orgmode@gnu.org; Wed, 24 Sep 2014 03:34:38 -0400 Received: from mout.gmx.net ([212.227.15.18]:59233) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XWh5r-0008ED-LZ for emacs-orgmode@gnu.org; Wed, 24 Sep 2014 03:34:31 -0400 In-Reply-To: <87ppemnqxy.fsf@gmail.com> List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Aaron Ecay Cc: emacs-orgmode@gnu.org Hi Aaron, On Di, 2014-09-23 at 14:15 -0400, Aaron Ecay wrote: > org-emphasis-regexp-components is known to be a wart. You can search > for posts on the mailing list. Some people are trying to figure out how > to get rid of it. (You can search in particular for Nicolas Goaziou=E2= =80=99s > posts...) Here=E2=80=99s one thread where you can see the lay of the lan= d: > . Thank you for the background info! > All that to say, the longer-term solution is to figure out some radically > different approach. In the meantime though, if you can provide a list of > characters (by unicode name and/or code point) that you think should be > added to that variable, someone might be able to add them.=20 I guess the straightforward way of defining white-space would be just using the set of characters with the Unicode property WSpace=3DY, and this would be what =C2=AB[:space:]=C2=BB, =C2=AB\s=C2=AB, etc., should be e= xpected to match on Unicode-based locales. I=E2=80=99m supplying a list of code-points below= , for convenience. I agree though that defining what counts as =C2=ABwhite space=C2=BB within = the confines of org-mode is putting the cart before the horse. I=E2=80=99ll try= to ascertain whether the Emacs implementation of =C2=AB[:space:]=C2=BB really = only does 8-bit spaces, and if so I=E2=80=99ll see whether I can poke someone on= the Emacs bug tracker about this. Best regards, T. =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80 List of Unicode white-space Below is the list of characters with the property White_Space set, taken from the Unicode 7.0.0 character database. This includes line-breaking white-space such as =C2=ABline feed=C2=BB. If these are not relevant, one can use the subset of space separators (Zs; these do not include control characters such as Tab) and control chars (Cc). 0009..000D ; White_Space # Cc [5] .. 0020 ; White_Space # Zs SPACE 0085 ; White_Space # Cc 00A0 ; White_Space # Zs NO-BREAK SPACE 1680 ; White_Space # Zs OGHAM SPACE MARK 2000..200A ; White_Space # Zs [11] EN QUAD..HAIR SPACE 2028 ; White_Space # Zl LINE SEPARATOR 2029 ; White_Space # Zp PARAGRAPH SEPARATOR 202F ; White_Space # Zs NARROW NO-BREAK SPACE 205F ; White_Space # Zs MEDIUM MATHEMATICAL SPACE 3000 ; White_Space # Zs IDEOGRAPHIC SPACE =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80