emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Tobias Getzner <tobias.getzner@gmx.de>
To: Aaron Ecay <aaronecay@gmail.com>
Cc: emacs-orgmode@gnu.org
Subject: Re: [BUG] Mark-up handling chokes on Unicode white-space
Date: Wed, 24 Sep 2014 09:34:25 +0200	[thread overview]
Message-ID: <1411544065.2146.1.camel@gmx.de> (raw)
In-Reply-To: <87ppemnqxy.fsf@gmail.com>

Hi Aaron,

On Di, 2014-09-23 at 14:15 -0400, Aaron Ecay wrote:
> org-emphasis-regexp-components is known to be a wart.  You can search
> for posts on the mailing list.  Some people are trying to figure out how
> to get rid of it.  (You can search in particular for Nicolas Goaziou’s
> posts...)  Here’s one thread where you can see the lay of the land:
> <http://mid.gmane.org/87zjl6ktu2.fsf@gmail.com>.

Thank you for the background info!

> All that to say, the longer-term solution is to figure out some radically
> different approach.  In the meantime though, if you can provide a list of
> characters (by unicode name and/or code point) that you think should be
> added to that variable, someone might be able to add them. 

I guess the straightforward way of defining white-space would be just
using the set of characters with the Unicode property WSpace=Y, and
this would be what «[:space:]», «\s«, etc., should be expected to match
on Unicode-based locales. I’m supplying a list of code-points below,
for convenience.

I agree though that defining what counts as «white space» within the
confines of org-mode is putting the cart before the horse. I’ll try to
ascertain whether the Emacs implementation of «[:space:]» really only
does 8-bit spaces, and if so I’ll see whether I can poke someone on the
Emacs bug tracker about this.

Best regards,
T.


──────────────────────────────────────────────────────────────────────
List of Unicode white-space

Below is the list of characters with the property White_Space set,
taken from the Unicode 7.0.0 character database. This includes
line-breaking white-space such as «line feed». If these are not
relevant, one can use the subset of space separators (Zs; these do not
include control characters such as Tab) and control chars (Cc).

0009..000D    ; White_Space # Cc   [5] <control-0009>..<control-000D>
0020          ; White_Space # Zs       SPACE
0085          ; White_Space # Cc       <control-0085>
00A0          ; White_Space # Zs       NO-BREAK SPACE
1680          ; White_Space # Zs       OGHAM SPACE MARK
2000..200A    ; White_Space # Zs  [11] EN QUAD..HAIR SPACE
2028          ; White_Space # Zl       LINE SEPARATOR
2029          ; White_Space # Zp       PARAGRAPH SEPARATOR
202F          ; White_Space # Zs       NARROW NO-BREAK SPACE
205F          ; White_Space # Zs       MEDIUM MATHEMATICAL SPACE
3000          ; White_Space # Zs       IDEOGRAPHIC SPACE
──────────────────────────────────────────────────────────────────────

      reply	other threads:[~2014-09-24  7:34 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-23 12:44 [BUG] Mark-up handling chokes on unicode whitespace Tobias Getzner
2014-09-23 17:03 ` Aaron Ecay
2014-09-23 17:44   ` Tobias Getzner
2014-09-23 18:15     ` Aaron Ecay
2014-09-24  7:34       ` Tobias Getzner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1411544065.2146.1.camel@gmx.de \
    --to=tobias.getzner@gmx.de \
    --cc=aaronecay@gmail.com \
    --cc=emacs-orgmode@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).