From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nicolas Goaziou Subject: [RFC] Alternative to sub/superscript regexp Date: Mon, 25 Nov 2013 18:14:27 +0100 Message-ID: <87wqjw8kuk.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:48621) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Vkzjn-0001CZ-Oi for emacs-orgmode@gnu.org; Mon, 25 Nov 2013 12:14:28 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Vkzjf-00089y-3e for emacs-orgmode@gnu.org; Mon, 25 Nov 2013 12:14:19 -0500 Received: from mail-ea0-x22d.google.com ([2a00:1450:4013:c01::22d]:61476) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Vkzje-00089o-TW for emacs-orgmode@gnu.org; Mon, 25 Nov 2013 12:14:11 -0500 Received: by mail-ea0-f173.google.com with SMTP id g15so2645025eak.32 for ; Mon, 25 Nov 2013 09:14:10 -0800 (PST) Received: from selenimh ([91.224.148.150]) by mx.google.com with ESMTPSA id 1sm772016eeg.4.2013.11.25.09.14.06 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 25 Nov 2013 09:14:07 -0800 (PST) List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Org Mode List Hello, For the record `org-match-substring-regexp' is a variation on: "\\(\\S-\\)\\([_^]\\)\\(\\(?:\\*\\|[-+]?[^-+*!@#$%^_ \t\r\n,:\"?<>~;./{}=()]+\\)\\)\\)" I think it is a bit convoluted and therefore difficult to predict. For example, as recent bug report showed, you may tend to interpret a_b[fn:1] as a_{b}[fn:1] but, in fact, it is equivalent to a_{b[fn}:1] Of course, we can prevent this by forbidding "[" and "]" in the last part of the regexp. But I wonder if there's something better to do. The idea behind this regexp is that we should be able to write simple sub/superscript, including numbers and entities, without requiring curly braces (see `org-use-sub-superscripts' docstring for details). Maybe something like the following could be an interesting alternative: "\\(\\S-\\)\\([_^]\\)\\(\\*\\|[+-]?\\(?:\\w\\|[0-9.,\\]\\)*\\(\\w\\|[0-9]\\)\\)" That is, without braces, either an asterisk or any combination of word, number, dot, comma and backslash characters, which may start with either a plus or a minus sign but cannot end with either a dot or a comma. I find it arguably more predictable (no inverted class). Also, we "gain" the following: a^3.14. <=> a^{3.14}. At the moment, a^3.14. <=> a^{3}.14. What do you think? Regards, -- Nicolas Goaziou