* [RFC] Alternative to sub/superscript regexp @ 2013-11-25 17:14 Nicolas Goaziou 2013-11-25 17:45 ` Nick Dokos ` (2 more replies) 0 siblings, 3 replies; 7+ messages in thread From: Nicolas Goaziou @ 2013-11-25 17:14 UTC (permalink / raw) To: Org Mode List Hello, For the record `org-match-substring-regexp' is a variation on: "\\(\\S-\\)\\([_^]\\)\\(\\(?:\\*\\|[-+]?[^-+*!@#$%^_ \t\r\n,:\"?<>~;./{}=()]+\\)\\)\\)" I think it is a bit convoluted and therefore difficult to predict. For example, as recent bug report showed, you may tend to interpret a_b[fn:1] as a_{b}[fn:1] but, in fact, it is equivalent to a_{b[fn}:1] Of course, we can prevent this by forbidding "[" and "]" in the last part of the regexp. But I wonder if there's something better to do. The idea behind this regexp is that we should be able to write simple sub/superscript, including numbers and entities, without requiring curly braces (see `org-use-sub-superscripts' docstring for details). Maybe something like the following could be an interesting alternative: "\\(\\S-\\)\\([_^]\\)\\(\\*\\|[+-]?\\(?:\\w\\|[0-9.,\\]\\)*\\(\\w\\|[0-9]\\)\\)" That is, without braces, either an asterisk or any combination of word, number, dot, comma and backslash characters, which may start with either a plus or a minus sign but cannot end with either a dot or a comma. I find it arguably more predictable (no inverted class). Also, we "gain" the following: a^3.14. <=> a^{3.14}. At the moment, a^3.14. <=> a^{3}.14. What do you think? Regards, -- Nicolas Goaziou ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC] Alternative to sub/superscript regexp 2013-11-25 17:14 [RFC] Alternative to sub/superscript regexp Nicolas Goaziou @ 2013-11-25 17:45 ` Nick Dokos 2013-11-25 21:53 ` Rasmus 2013-11-26 9:20 ` Carsten Dominik 2 siblings, 0 replies; 7+ messages in thread From: Nick Dokos @ 2013-11-25 17:45 UTC (permalink / raw) To: emacs-orgmode Nicolas Goaziou <n.goaziou@gmail.com> writes: > For the record `org-match-substring-regexp' is a variation on: > > "\\(\\S-\\)\\([_^]\\)\\(\\(?:\\*\\|[-+]?[^-+*!@#$%^_ \t\r\n,:\"?<>~;./{}=()]+\\)\\)\\)" > > I think it is a bit convoluted and therefore difficult to predict. Once more, Nicolas proves to be a master of understatement :-) Nick ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC] Alternative to sub/superscript regexp 2013-11-25 17:14 [RFC] Alternative to sub/superscript regexp Nicolas Goaziou 2013-11-25 17:45 ` Nick Dokos @ 2013-11-25 21:53 ` Rasmus 2013-11-26 9:20 ` Carsten Dominik 2 siblings, 0 replies; 7+ messages in thread From: Rasmus @ 2013-11-25 21:53 UTC (permalink / raw) To: emacs-orgmode Nicolas Goaziou <n.goaziou@gmail.com> writes: > Hello, > > For the record `org-match-substring-regexp' is a variation on: > > "\\(\\S-\\)\\([_^]\\)\\(\\(?:\\*\\|[-+]?[^-+*!@#$%^_ \t\r\n,:\"?<>~;./{}=()]+\\)\\)\\)" > > I think it is a bit convoluted and therefore difficult to predict. Luckily this can be made less convoluted:) > "\\(\\S-\\)\\([_^]\\)\\(\\*\\|[+-]?\\(?:\\w\\|[0-9.,\\]\\)*\\(\\w\\|[0-9]\\)\\)" > > That is, without braces, either an asterisk or any combination of word, > number, dot, comma and backslash characters, which may start with either > a plus or a minus sign but cannot end with either a dot or a comma. Should parentheses be recognized? e.g. A₍₁₎ (A_{(1)}) is sometimes used to denote e.g. a submatrix of A corresponding to the removal of column 1 in A. Personally, I would mostly always wrap such stuff into math anyway (entities being the exception), since I want it in math-mode anyway. > What do you think? This proposal seems better. Thanks! Also, it reminded me of this comic: http://xkcd.com/1171/ –Rasmus ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC] Alternative to sub/superscript regexp 2013-11-25 17:14 [RFC] Alternative to sub/superscript regexp Nicolas Goaziou 2013-11-25 17:45 ` Nick Dokos 2013-11-25 21:53 ` Rasmus @ 2013-11-26 9:20 ` Carsten Dominik 2013-11-26 16:46 ` Nicolas Goaziou 2 siblings, 1 reply; 7+ messages in thread From: Carsten Dominik @ 2013-11-26 9:20 UTC (permalink / raw) To: Nicolas Goaziou; +Cc: Org Mode List Hi Nicolas, I have tested this a bit, and it does pretty much what I want. Just to be sure: We will also support expressions with braces, right? - Carsten On Nov 25, 2013, at 6:14 PM, Nicolas Goaziou <n.goaziou@gmail.com> wrote: > Hello, > > For the record `org-match-substring-regexp' is a variation on: > > "\\(\\S-\\)\\([_^]\\)\\(\\(?:\\*\\|[-+]?[^-+*!@#$%^_ \t\r\n,:\"?<>~;./{}=()]+\\)\\)\\)" > > I think it is a bit convoluted and therefore difficult to predict. For > example, as recent bug report showed, you may tend to interpret > a_b[fn:1] as > > a_{b}[fn:1] > > but, in fact, it is equivalent to > > a_{b[fn}:1] > > Of course, we can prevent this by forbidding "[" and "]" in the last > part of the regexp. But I wonder if there's something better to do. > > The idea behind this regexp is that we should be able to write simple > sub/superscript, including numbers and entities, without requiring curly > braces (see `org-use-sub-superscripts' docstring for details). Maybe > something like the following could be an interesting alternative: > > "\\(\\S-\\)\\([_^]\\)\\(\\*\\|[+-]?\\(?:\\w\\|[0-9.,\\]\\)*\\(\\w\\|[0-9]\\)\\)" > > That is, without braces, either an asterisk or any combination of word, > number, dot, comma and backslash characters, which may start with either > a plus or a minus sign but cannot end with either a dot or a comma. > > I find it arguably more predictable (no inverted class). Also, we "gain" > the following: > > a^3.14. <=> a^{3.14}. > > At the moment, a^3.14. <=> a^{3}.14. > > What do you think? > > > Regards, > > -- > Nicolas Goaziou > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC] Alternative to sub/superscript regexp 2013-11-26 9:20 ` Carsten Dominik @ 2013-11-26 16:46 ` Nicolas Goaziou 2013-12-18 22:13 ` Nicolas Goaziou 0 siblings, 1 reply; 7+ messages in thread From: Nicolas Goaziou @ 2013-11-26 16:46 UTC (permalink / raw) To: Carsten Dominik; +Cc: Org Mode List Hello, Carsten Dominik <carsten.dominik@gmail.com> writes: > I have tested this a bit, and it does pretty much what I want. Great. > Just to be sure: We will also support expressions with braces, right? Of course, we will. Curly braces are not ambiguous so I'm no suggesting to change this part of the syntax (even though it would be best to parse them with an appropriate syntax table rather than a regexp). OTOH, while we're at it, I wonder if we should keep grouping with parenthesis. I guess that if you want to use parenthesis, you're going to need math mode anyway, so there's no real need to make it easy to type, e.g. : x^(2-i) which will become x$^\text{(2-i)}$ anyway. What do you think? Regards, -- Nicolas Goaziou ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC] Alternative to sub/superscript regexp 2013-11-26 16:46 ` Nicolas Goaziou @ 2013-12-18 22:13 ` Nicolas Goaziou 2014-01-11 23:18 ` Nicolas Goaziou 0 siblings, 1 reply; 7+ messages in thread From: Nicolas Goaziou @ 2013-12-18 22:13 UTC (permalink / raw) To: Carsten Dominik; +Cc: Org Mode List [-- Attachment #1: Type: text/plain, Size: 617 bytes --] Completing myself, > OTOH, while we're at it, I wonder if we should keep grouping with > parenthesis. I guess that if you want to use parenthesis, you're going > to need math mode anyway, so there's no real need to make it easy to > type, e.g. : > > x^(2-i) > > which will become x$^\text{(2-i)}$ anyway. I attach the suggested patch. I removed parenthesis-grouping for the reasons I gave above. Carsten: if you'd rather keep this feature, let me know, I'll provide another patch. I also slightly changed the simplified regexp, as [:word:] includes underscores in Org syntax. I used [:alnum:] instead. Regards, [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-Change-no-brace-sub-superscript-syntax.patch --] [-- Type: text/x-diff, Size: 4638 bytes --] From ffda7d038326aececd9337850d4384e4fd207459 Mon Sep 17 00:00:00 2001 From: Nicolas Goaziou <n.goaziou@gmail.com> Date: Wed, 18 Dec 2013 22:32:13 +0100 Subject: [PATCH] Change no brace sub/superscript syntax * lisp/org.el (org-match-substring-regexp): Simplify syntax when not using braces. Also remove parenthesis as grouping characters and make the variable a defconst. (org-match-substring-with-braces-regexp): Make the variable a defconst and remove a useless group in regexp. (org-use-sub-superscripts): Update docstring. * lisp/org-element.el (org-element-subscript-parser, org-element-superscript-parser): Adapt to group changes. --- lisp/org-element.el | 17 ++++++++--------- lisp/org.el | 19 +++++++------------ 2 files changed, 15 insertions(+), 21 deletions(-) diff --git a/lisp/org-element.el b/lisp/org-element.el index 31add7b..17062a9 100644 --- a/lisp/org-element.el +++ b/lisp/org-element.el @@ -3372,13 +3372,12 @@ Return a list whose CAR is `subscript' and CDR a plist with Assume point is at the underscore." (save-excursion (unless (bolp) (backward-char)) - (let ((bracketsp (if (looking-at org-match-substring-with-braces-regexp) - t - (not (looking-at org-match-substring-regexp)))) + (looking-at org-match-substring-regexp) + (let ((bracketsp (match-beginning 4)) (begin (match-beginning 2)) - (contents-begin (or (match-beginning 5) + (contents-begin (or (match-beginning 4) (match-beginning 3))) - (contents-end (or (match-end 5) (match-end 3))) + (contents-end (or (match-end 4) (match-end 3))) (post-blank (progn (goto-char (match-end 0)) (skip-chars-forward " \t"))) (end (point))) @@ -3421,12 +3420,12 @@ Return a list whose CAR is `superscript' and CDR a plist with Assume point is at the caret." (save-excursion (unless (bolp) (backward-char)) - (let ((bracketsp (if (looking-at org-match-substring-with-braces-regexp) t - (not (looking-at org-match-substring-regexp)))) + (looking-at org-match-substring-regexp) + (let ((bracketsp (match-beginning 4)) (begin (match-beginning 2)) - (contents-begin (or (match-beginning 5) + (contents-begin (or (match-beginning 4) (match-beginning 3))) - (contents-end (or (match-end 5) (match-end 3))) + (contents-end (or (match-end 4) (match-end 3))) (post-blank (progn (goto-char (match-end 0)) (skip-chars-forward " \t"))) (end (point))) diff --git a/lisp/org.el b/lisp/org.el index 59f55a8..4722714 100644 --- a/lisp/org.el +++ b/lisp/org.el @@ -882,7 +882,7 @@ will be parsed as single sub- or superscripts: 10^-12 or 10^-tau a leading sign with digits or a word x^2-y^3 will be read as x^2 - y^3, because items are terminated by almost any nonword/nondigit char. - x_{i^2} or x^(2-i) braces or parenthesis do grouping. + x_{i^2} braces do grouping. Still, ambiguity is possible. So when in doubt, use {} to enclose the sub/superscript. If you set this variable to the symbol `{}', @@ -5636,21 +5636,18 @@ stacked delimiters is N. Escaping delimiters is not possible." next (concat "\\(?:" nothing left next right "\\)+" nothing))) (concat left "\\(" re "\\)" right))) -(defvar org-match-substring-regexp +(defconst org-match-substring-regexp (concat "\\(\\S-\\)\\([_^]\\)\\(" - "\\(" (org-create-multibrace-regexp "{" "}" org-match-sexp-depth) "\\)" + "\\(?:" (org-create-multibrace-regexp "{" "}" org-match-sexp-depth) "\\)" "\\|" - "\\(" (org-create-multibrace-regexp "(" ")" org-match-sexp-depth) "\\)" - "\\|" - "\\(\\(?:\\*\\|[-+]?[^-+*!@#$%^_ \t\r\n,:\"?<>~;./{}=()]+\\)\\)\\)") + "\\(?:\\*\\|[+-]?[[:alnum:].,\\]*[[:alnum:]]\\)\\)") "The regular expression matching a sub- or superscript.") -(defvar org-match-substring-with-braces-regexp +(defconst org-match-substring-with-braces-regexp (concat - "\\(\\S-\\)\\([_^]\\)\\(" - "\\(" (org-create-multibrace-regexp "{" "}" org-match-sexp-depth) "\\)" - "\\)") + "\\(\\S-\\)\\([_^]\\)" + "\\(" (org-create-multibrace-regexp "{" "}" org-match-sexp-depth) "\\)") "The regular expression matching a sub- or superscript, forcing braces.") (defun org-make-link-regexps () @@ -6087,8 +6084,6 @@ by a #." (defvar org-latex-and-related-regexp nil "Regular expression for highlighting LaTeX, entities and sub/superscript.") -(defvar org-match-substring-regexp) -(defvar org-match-substring-with-braces-regexp) (defun org-compute-latex-and-related-regexp () "Compute regular expression for LaTeX, entities and sub/superscript. -- 1.8.5.2 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [RFC] Alternative to sub/superscript regexp 2013-12-18 22:13 ` Nicolas Goaziou @ 2014-01-11 23:18 ` Nicolas Goaziou 0 siblings, 0 replies; 7+ messages in thread From: Nicolas Goaziou @ 2014-01-11 23:18 UTC (permalink / raw) To: Carsten Dominik; +Cc: Org Mode List Nicolas Goaziou <n.goaziou@gmail.com> writes: > I attach the suggested patch. I removed parenthesis-grouping for the > reasons I gave above. Carsten: if you'd rather keep this feature, let me > know, I'll provide another patch. Applied, but I kept parenthesis-grouping. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-01-11 23:18 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-11-25 17:14 [RFC] Alternative to sub/superscript regexp Nicolas Goaziou 2013-11-25 17:45 ` Nick Dokos 2013-11-25 21:53 ` Rasmus 2013-11-26 9:20 ` Carsten Dominik 2013-11-26 16:46 ` Nicolas Goaziou 2013-12-18 22:13 ` Nicolas Goaziou 2014-01-11 23:18 ` Nicolas Goaziou
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).