emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* [RFC] Alternative to sub/superscript regexp
@ 2013-11-25 17:14 Nicolas Goaziou
  2013-11-25 17:45 ` Nick Dokos
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Nicolas Goaziou @ 2013-11-25 17:14 UTC (permalink / raw)
  To: Org Mode List

Hello,

For the record `org-match-substring-regexp' is a variation on:

"\\(\\S-\\)\\([_^]\\)\\(\\(?:\\*\\|[-+]?[^-+*!@#$%^_ \t\r\n,:\"?<>~;./{}=()]+\\)\\)\\)"

I think it is a bit convoluted and therefore difficult to predict. For
example, as recent bug report showed, you may tend to interpret
a_b[fn:1] as

   a_{b}[fn:1]

but, in fact, it is equivalent to

   a_{b[fn}:1]

Of course, we can prevent this by forbidding "[" and "]" in the last
part of the regexp. But I wonder if there's something better to do.

The idea behind this regexp is that we should be able to write simple
sub/superscript, including numbers and entities, without requiring curly
braces (see `org-use-sub-superscripts' docstring for details). Maybe
something like the following could be an interesting alternative:

  "\\(\\S-\\)\\([_^]\\)\\(\\*\\|[+-]?\\(?:\\w\\|[0-9.,\\]\\)*\\(\\w\\|[0-9]\\)\\)"

That is, without braces, either an asterisk or any combination of word,
number, dot, comma and backslash characters, which may start with either
a plus or a minus sign but cannot end with either a dot or a comma.

I find it arguably more predictable (no inverted class). Also, we "gain"
the following:

  a^3.14. <=> a^{3.14}.

At the moment, a^3.14. <=> a^{3}.14.

What do you think?


Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] Alternative to sub/superscript regexp
  2013-11-25 17:14 [RFC] Alternative to sub/superscript regexp Nicolas Goaziou
@ 2013-11-25 17:45 ` Nick Dokos
  2013-11-25 21:53 ` Rasmus
  2013-11-26  9:20 ` Carsten Dominik
  2 siblings, 0 replies; 7+ messages in thread
From: Nick Dokos @ 2013-11-25 17:45 UTC (permalink / raw)
  To: emacs-orgmode

Nicolas Goaziou <n.goaziou@gmail.com> writes:

> For the record `org-match-substring-regexp' is a variation on:
>
> "\\(\\S-\\)\\([_^]\\)\\(\\(?:\\*\\|[-+]?[^-+*!@#$%^_ \t\r\n,:\"?<>~;./{}=()]+\\)\\)\\)"
>
> I think it is a bit convoluted and therefore difficult to predict.

Once more, Nicolas proves to be a master of understatement :-)

Nick

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] Alternative to sub/superscript regexp
  2013-11-25 17:14 [RFC] Alternative to sub/superscript regexp Nicolas Goaziou
  2013-11-25 17:45 ` Nick Dokos
@ 2013-11-25 21:53 ` Rasmus
  2013-11-26  9:20 ` Carsten Dominik
  2 siblings, 0 replies; 7+ messages in thread
From: Rasmus @ 2013-11-25 21:53 UTC (permalink / raw)
  To: emacs-orgmode

Nicolas Goaziou <n.goaziou@gmail.com> writes:

> Hello,
>
> For the record `org-match-substring-regexp' is a variation on:
>
> "\\(\\S-\\)\\([_^]\\)\\(\\(?:\\*\\|[-+]?[^-+*!@#$%^_ \t\r\n,:\"?<>~;./{}=()]+\\)\\)\\)"
>
> I think it is a bit convoluted and therefore difficult to predict.

Luckily this can be made less convoluted:)

> "\\(\\S-\\)\\([_^]\\)\\(\\*\\|[+-]?\\(?:\\w\\|[0-9.,\\]\\)*\\(\\w\\|[0-9]\\)\\)"
>
> That is, without braces, either an asterisk or any combination of word,
> number, dot, comma and backslash characters, which may start with either
> a plus or a minus sign but cannot end with either a dot or a comma.

Should parentheses be recognized? e.g. A₍₁₎ (A_{(1)}) is sometimes
used to denote e.g. a submatrix of A corresponding to the removal of
column 1 in A.  Personally, I would mostly always wrap such stuff into
math anyway (entities being the exception), since I want it in
math-mode anyway.

> What do you think?

This proposal seems better.  Thanks!

Also, it reminded me of this comic:
   http://xkcd.com/1171/

–Rasmus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] Alternative to sub/superscript regexp
  2013-11-25 17:14 [RFC] Alternative to sub/superscript regexp Nicolas Goaziou
  2013-11-25 17:45 ` Nick Dokos
  2013-11-25 21:53 ` Rasmus
@ 2013-11-26  9:20 ` Carsten Dominik
  2013-11-26 16:46   ` Nicolas Goaziou
  2 siblings, 1 reply; 7+ messages in thread
From: Carsten Dominik @ 2013-11-26  9:20 UTC (permalink / raw)
  To: Nicolas Goaziou; +Cc: Org Mode List

Hi Nicolas,

I have tested this a bit, and it does pretty much what I want.
Just to be sure:  We will also support expressions with braces, right?

- Carsten

On Nov 25, 2013, at 6:14 PM, Nicolas Goaziou <n.goaziou@gmail.com> wrote:

> Hello,
> 
> For the record `org-match-substring-regexp' is a variation on:
> 
> "\\(\\S-\\)\\([_^]\\)\\(\\(?:\\*\\|[-+]?[^-+*!@#$%^_ \t\r\n,:\"?<>~;./{}=()]+\\)\\)\\)"
> 
> I think it is a bit convoluted and therefore difficult to predict. For
> example, as recent bug report showed, you may tend to interpret
> a_b[fn:1] as
> 
>   a_{b}[fn:1]
> 
> but, in fact, it is equivalent to
> 
>   a_{b[fn}:1]
> 
> Of course, we can prevent this by forbidding "[" and "]" in the last
> part of the regexp. But I wonder if there's something better to do.
> 
> The idea behind this regexp is that we should be able to write simple
> sub/superscript, including numbers and entities, without requiring curly
> braces (see `org-use-sub-superscripts' docstring for details). Maybe
> something like the following could be an interesting alternative:
> 
>  "\\(\\S-\\)\\([_^]\\)\\(\\*\\|[+-]?\\(?:\\w\\|[0-9.,\\]\\)*\\(\\w\\|[0-9]\\)\\)"
> 
> That is, without braces, either an asterisk or any combination of word,
> number, dot, comma and backslash characters, which may start with either
> a plus or a minus sign but cannot end with either a dot or a comma.
> 
> I find it arguably more predictable (no inverted class). Also, we "gain"
> the following:
> 
>  a^3.14. <=> a^{3.14}.
> 
> At the moment, a^3.14. <=> a^{3}.14.
> 
> What do you think?
> 
> 
> Regards,
> 
> -- 
> Nicolas Goaziou
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] Alternative to sub/superscript regexp
  2013-11-26  9:20 ` Carsten Dominik
@ 2013-11-26 16:46   ` Nicolas Goaziou
  2013-12-18 22:13     ` Nicolas Goaziou
  0 siblings, 1 reply; 7+ messages in thread
From: Nicolas Goaziou @ 2013-11-26 16:46 UTC (permalink / raw)
  To: Carsten Dominik; +Cc: Org Mode List

Hello,

Carsten Dominik <carsten.dominik@gmail.com> writes:

> I have tested this a bit, and it does pretty much what I want.

Great.

> Just to be sure:  We will also support expressions with braces, right?

Of course, we will. Curly braces are not ambiguous so I'm no suggesting
to change this part of the syntax (even though it would be best to parse
them with an appropriate syntax table rather than a regexp).

OTOH, while we're at it, I wonder if we should keep grouping with
parenthesis. I guess that if you want to use parenthesis, you're going
to need math mode anyway, so there's no real need to make it easy to
type, e.g. :

  x^(2-i)

which will become x$^\text{(2-i)}$ anyway.

What do you think?


Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] Alternative to sub/superscript regexp
  2013-11-26 16:46   ` Nicolas Goaziou
@ 2013-12-18 22:13     ` Nicolas Goaziou
  2014-01-11 23:18       ` Nicolas Goaziou
  0 siblings, 1 reply; 7+ messages in thread
From: Nicolas Goaziou @ 2013-12-18 22:13 UTC (permalink / raw)
  To: Carsten Dominik; +Cc: Org Mode List

[-- Attachment #1: Type: text/plain, Size: 617 bytes --]

Completing myself,

> OTOH, while we're at it, I wonder if we should keep grouping with
> parenthesis. I guess that if you want to use parenthesis, you're going
> to need math mode anyway, so there's no real need to make it easy to
> type, e.g. :
>
>   x^(2-i)
>
> which will become x$^\text{(2-i)}$ anyway.

I attach the suggested patch. I removed parenthesis-grouping for the
reasons I gave above. Carsten: if you'd rather keep this feature, let me
know, I'll provide another patch.

I also slightly changed the simplified regexp, as [:word:] includes
underscores in Org syntax. I used [:alnum:] instead.

Regards,

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Change-no-brace-sub-superscript-syntax.patch --]
[-- Type: text/x-diff, Size: 4638 bytes --]

From ffda7d038326aececd9337850d4384e4fd207459 Mon Sep 17 00:00:00 2001
From: Nicolas Goaziou <n.goaziou@gmail.com>
Date: Wed, 18 Dec 2013 22:32:13 +0100
Subject: [PATCH] Change no brace sub/superscript syntax

* lisp/org.el (org-match-substring-regexp): Simplify syntax when not
using braces.  Also remove parenthesis as grouping characters and make
the variable a defconst.
(org-match-substring-with-braces-regexp): Make the variable a defconst
and remove a useless group in regexp.
(org-use-sub-superscripts): Update docstring.
* lisp/org-element.el (org-element-subscript-parser,
  org-element-superscript-parser): Adapt to group changes.
---
 lisp/org-element.el | 17 ++++++++---------
 lisp/org.el         | 19 +++++++------------
 2 files changed, 15 insertions(+), 21 deletions(-)

diff --git a/lisp/org-element.el b/lisp/org-element.el
index 31add7b..17062a9 100644
--- a/lisp/org-element.el
+++ b/lisp/org-element.el
@@ -3372,13 +3372,12 @@ Return a list whose CAR is `subscript' and CDR a plist with
 Assume point is at the underscore."
   (save-excursion
     (unless (bolp) (backward-char))
-    (let ((bracketsp (if (looking-at org-match-substring-with-braces-regexp)
-			 t
-		       (not (looking-at org-match-substring-regexp))))
+    (looking-at org-match-substring-regexp)
+    (let ((bracketsp (match-beginning 4))
 	  (begin (match-beginning 2))
-	  (contents-begin (or (match-beginning 5)
+	  (contents-begin (or (match-beginning 4)
 			      (match-beginning 3)))
-	  (contents-end (or (match-end 5) (match-end 3)))
+	  (contents-end (or (match-end 4) (match-end 3)))
 	  (post-blank (progn (goto-char (match-end 0))
 			     (skip-chars-forward " \t")))
 	  (end (point)))
@@ -3421,12 +3420,12 @@ Return a list whose CAR is `superscript' and CDR a plist with
 Assume point is at the caret."
   (save-excursion
     (unless (bolp) (backward-char))
-    (let ((bracketsp (if (looking-at org-match-substring-with-braces-regexp) t
-		       (not (looking-at org-match-substring-regexp))))
+    (looking-at org-match-substring-regexp)
+    (let ((bracketsp (match-beginning 4))
 	  (begin (match-beginning 2))
-	  (contents-begin (or (match-beginning 5)
+	  (contents-begin (or (match-beginning 4)
 			      (match-beginning 3)))
-	  (contents-end (or (match-end 5) (match-end 3)))
+	  (contents-end (or (match-end 4) (match-end 3)))
 	  (post-blank (progn (goto-char (match-end 0))
 			     (skip-chars-forward " \t")))
 	  (end (point)))
diff --git a/lisp/org.el b/lisp/org.el
index 59f55a8..4722714 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -882,7 +882,7 @@ will be parsed as single sub- or superscripts:
  10^-12  or   10^-tau    a leading sign with digits or a word
  x^2-y^3                 will be read as x^2 - y^3, because items are
 			 terminated by almost any nonword/nondigit char.
- x_{i^2} or   x^(2-i)    braces or parenthesis do grouping.
+ x_{i^2}                 braces do grouping.
 
 Still, ambiguity is possible.  So when in doubt, use {} to enclose
 the sub/superscript.  If you set this variable to the symbol `{}',
@@ -5636,21 +5636,18 @@ stacked delimiters is N.  Escaping delimiters is not possible."
 	    next (concat "\\(?:" nothing left next right "\\)+" nothing)))
     (concat left "\\(" re "\\)" right)))
 
-(defvar org-match-substring-regexp
+(defconst org-match-substring-regexp
   (concat
    "\\(\\S-\\)\\([_^]\\)\\("
-   "\\(" (org-create-multibrace-regexp "{" "}" org-match-sexp-depth) "\\)"
+   "\\(?:" (org-create-multibrace-regexp "{" "}" org-match-sexp-depth) "\\)"
    "\\|"
-   "\\(" (org-create-multibrace-regexp "(" ")" org-match-sexp-depth) "\\)"
-   "\\|"
-   "\\(\\(?:\\*\\|[-+]?[^-+*!@#$%^_ \t\r\n,:\"?<>~;./{}=()]+\\)\\)\\)")
+   "\\(?:\\*\\|[+-]?[[:alnum:].,\\]*[[:alnum:]]\\)\\)")
   "The regular expression matching a sub- or superscript.")
 
-(defvar org-match-substring-with-braces-regexp
+(defconst org-match-substring-with-braces-regexp
   (concat
-   "\\(\\S-\\)\\([_^]\\)\\("
-   "\\(" (org-create-multibrace-regexp "{" "}" org-match-sexp-depth) "\\)"
-   "\\)")
+   "\\(\\S-\\)\\([_^]\\)"
+   "\\(" (org-create-multibrace-regexp "{" "}" org-match-sexp-depth) "\\)")
   "The regular expression matching a sub- or superscript, forcing braces.")
 
 (defun org-make-link-regexps ()
@@ -6087,8 +6084,6 @@ by a #."
 
 (defvar org-latex-and-related-regexp nil
   "Regular expression for highlighting LaTeX, entities and sub/superscript.")
-(defvar org-match-substring-regexp)
-(defvar org-match-substring-with-braces-regexp)
 
 (defun org-compute-latex-and-related-regexp ()
   "Compute regular expression for LaTeX, entities and sub/superscript.
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC] Alternative to sub/superscript regexp
  2013-12-18 22:13     ` Nicolas Goaziou
@ 2014-01-11 23:18       ` Nicolas Goaziou
  0 siblings, 0 replies; 7+ messages in thread
From: Nicolas Goaziou @ 2014-01-11 23:18 UTC (permalink / raw)
  To: Carsten Dominik; +Cc: Org Mode List

Nicolas Goaziou <n.goaziou@gmail.com> writes:

> I attach the suggested patch. I removed parenthesis-grouping for the
> reasons I gave above. Carsten: if you'd rather keep this feature, let me
> know, I'll provide another patch.

Applied, but I kept parenthesis-grouping.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-01-11 23:18 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-25 17:14 [RFC] Alternative to sub/superscript regexp Nicolas Goaziou
2013-11-25 17:45 ` Nick Dokos
2013-11-25 21:53 ` Rasmus
2013-11-26  9:20 ` Carsten Dominik
2013-11-26 16:46   ` Nicolas Goaziou
2013-12-18 22:13     ` Nicolas Goaziou
2014-01-11 23:18       ` Nicolas Goaziou

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).