emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Aaron Ecay <aaronecay@gmail.com>
To: Nicolas Goaziou <n.goaziou@gmail.com>,
	"emacs-orgmode@gnu.org" <emacs-orgmode@gnu.org>
Subject: Re: [parser] subscripts and underlines interacting badly
Date: Wed, 18 Dec 2013 01:57:21 -0500	[thread overview]
Message-ID: <87k3f2eje6.fsf@gmail.com> (raw)
In-Reply-To: <871u1b9zzq.fsf@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1427 bytes --]

2013ko abenudak 17an, Nicolas Goaziou-ek idatzi zuen:
> 
> Hello,
> 
> Aaron Ecay <aaronecay@gmail.com> writes:
> 
>> Since the present syntax is inadequate for representating these
>> sequences, the new syntax will have to break backwards compatibility
>> somehow in order to fix the problem.  So there’s no long-term harm in
>> having a short-term kludge that will eventually disappear.
> 
> OK. Thanks for the patch.
> 
> Though, I think you are patching the wrong location. Modifying
> `org-element--get-next-object-candidates' is expensive. It would be
> better to patch `org-element-sub/superscript-successor' and make it
> ignore underline matches with brackets followed by an underscore
> character and resume searching.

We (perhaps) have to worry about cases like: '_foo bar_ .  Here it’s not
enough to look at the character immediately following the (possible)
subscript, but rather to take into account the full logic of
org-emph-re.

But now that I think about it, this is the only correct way, since what
org-element--get-next-object-candidates sees is limited by the
restriction.

The attached patch implements this.  It also updates the fontification
to match (by calling out to the parser, so there are potential
performance issues although with the cache it will hopefully not be an
issue in practice), and notes the new heuristic in the manual.  The test
suite passes.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-org-element-use-brackets-to-disambiguate-subscript-u.patch --]
[-- Type: text/x-diff, Size: 5608 bytes --]

From e2044312b95f8b427ddc662cd1abf10bf4d87b2d Mon Sep 17 00:00:00 2001
From: Aaron Ecay <aaronecay@gmail.com>
Date: Sun, 15 Dec 2013 21:30:27 -0500
Subject: [PATCH] org-element: use brackets to disambiguate subscript/underline

* lisp/org-element.el (org-element-sub/superscript-successor): use
brackets to disambiguate subscript/underline
* lisp/org.el (org-do-emphasis-faces): incorporate the above
disambiguation
* doc/org.texi: reflect these changes in the manual

In an org-syntax string like 1 or 2 below, both subscript and
underline are possible interpretations.  This patch uses the presence
of brackets to disambiguate these cases, that is, 1 is interpreted as
an underlined "foo" whereas 2 is subscript "foo" followed by
plain-text "_"

1: '_foo_
2: '_{foo}_

This the in-buffer highlighting is updated to match.
---
 doc/org.texi        | 14 ++++++++++++++
 lisp/org-element.el | 22 +++++++++++++++++++---
 lisp/org.el         | 36 ++++++++++++++++++++++--------------
 3 files changed, 55 insertions(+), 17 deletions(-)

diff --git a/doc/org.texi b/doc/org.texi
index b4c4078..3eefe9a 100644
--- a/doc/org.texi
+++ b/doc/org.texi
@@ -9739,6 +9739,17 @@ can tweak @code{org-emphasis-regexp-components}.  Beware that changing one of
 the above variables will no take effect until you reload Org, for which you
 may need to restart Emacs.
 
+When it follows an alphanumeric character, the underscore is always
+interpreted as a subscript (@pxref{Subscripts and superscripts}), and when it
+follows whitespace it is always the start of an underline (assuming a
+matching underscore is found in a proper position further along).  However,
+after a punctuation character (for example the apostrophe), the underscore
+character can be ambiguous between these two interpretations.  Org uses a
+simple heuristic for these cases: if the character following the underscore
+is an opening brace @samp{@{} or if no matching underscore is seen in the
+following text, the underscore is considered to be the start of a subscript.
+Otherwise, it is the start of underlining.
+
 @node Horizontal rules
 @subheading  Horizontal rules
 @cindex horizontal rules, markup rules
@@ -10123,6 +10134,9 @@ In addition to showing entities as UTF-8 characters, this command will also
 format sub- and superscripts in a WYSIWYM way.
 @end table
 
+For discussion of the resolution of ambiguities between the underscore as the
+introducer of a subscript vs.@ underline, see @ref{Emphasis and monospace}.
+
 @node @LaTeX{} fragments
 @subsection @LaTeX{} fragments
 @cindex @LaTeX{} fragments
diff --git a/lisp/org-element.el b/lisp/org-element.el
index 089ecfb..faa1e44 100644
--- a/lisp/org-element.el
+++ b/lisp/org-element.el
@@ -3408,9 +3408,25 @@ Return value is a cons cell whose CAR is either `subscript' or
 `superscript' and CDR is beginning position."
   (save-excursion
     (unless (bolp) (backward-char))
-    (when (re-search-forward org-match-substring-regexp nil t)
-      (cons (if (string= (match-string 2) "_") 'subscript 'superscript)
-	    (match-beginning 2)))))
+    (let (res)
+      (while (and (not res)
+		  (re-search-forward org-match-substring-regexp nil t))
+	(goto-char (match-beginning 0))
+	(when (or
+	       ;; this subscript uses brackets -> handle as subscript
+	       ;; unconditionally
+	       (eq (aref (match-string 3) 0) ?{)
+	       ;; it is not ambiguous with an underline -> handle as
+	       ;; subscript
+	       (not (looking-at-p org-emph-re)))
+	  (setq res (cons (if (string= (match-string 2) "_")
+			      'subscript
+			    'superscript)
+			  (match-beginning 2))))
+	;; otherwise -> keep going, and let the underline
+	;; parser have it
+	(goto-char (match-end 0)))
+      res)))
 
 
 ;;;; Superscript
diff --git a/lisp/org.el b/lisp/org.el
index eeb0c33..dbddc0b 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -5721,21 +5721,29 @@ This should be called after the variable `org-link-types' has changed."
       (if (not (= (char-after (match-beginning 3))
 		  (char-after (match-beginning 4))))
 	  (progn
-	    (setq rtn t)
 	    (setq a (assoc (match-string 3) org-emphasis-alist))
-	    (font-lock-prepend-text-property (match-beginning 2) (match-end 2)
-					     'face
-					     (nth 1 a))
-	    (and (nth 2 a)
-		 (org-remove-flyspell-overlays-in
-		  (match-beginning 0) (match-end 0)))
-	    (add-text-properties (match-beginning 2) (match-end 2)
-				 '(font-lock-multiline t org-emphasis t))
-	    (when org-hide-emphasis-markers
-	      (add-text-properties (match-end 4) (match-beginning 5)
-				   '(invisible org-link))
-	      (add-text-properties (match-beginning 3) (match-end 3)
-				   '(invisible org-link)))))
+	    ;; Don't do any highlighting if this is an underscore
+	    ;; which is part of a subscript.
+	    (unless (and (string= (car a) "_")
+			 (save-excursion
+			   (goto-char (match-beginning 2))
+			   (save-match-data
+			     (eq (org-element-type (org-element-context))
+				 'subscript))))
+	      (setq rtn t)
+	      (font-lock-prepend-text-property (match-beginning 2) (match-end 2)
+					       'face
+					       (nth 1 a))
+	      (and (nth 2 a)
+		   (org-remove-flyspell-overlays-in
+		    (match-beginning 0) (match-end 0)))
+	      (add-text-properties (match-beginning 2) (match-end 2)
+				   '(font-lock-multiline t org-emphasis t))
+	      (when org-hide-emphasis-markers
+		(add-text-properties (match-end 4) (match-beginning 5)
+				     '(invisible org-link))
+		(add-text-properties (match-beginning 3) (match-end 3)
+				     '(invisible org-link))))))
       (backward-char 1))
     rtn))
 
-- 
1.8.5.1


[-- Attachment #3: Type: text/plain, Size: 25 bytes --]


Thanks,

-- 
Aaron Ecay

  reply	other threads:[~2013-12-18  6:57 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-11  2:30 [parser] subscripts and underlines interacting badly Aaron Ecay
2013-12-11  8:22 ` Nicolas Goaziou
2013-12-11 18:36   ` Aaron Ecay
2013-12-11 20:55     ` Nicolas Goaziou
2013-12-12  7:56       ` Aaron Ecay
2013-12-12 17:33         ` Nicolas Goaziou
2013-12-12 19:42           ` Aaron Ecay
2013-12-12 20:47             ` Nicolas Goaziou
2013-12-16  3:15               ` Aaron Ecay
2013-12-16  3:24                 ` [PATCH] quick patch to org-habit todo state keywords Ted Wiles
2013-12-16  4:27                   ` Aaron Ecay
2013-12-17 16:57                 ` [parser] subscripts and underlines interacting badly Nicolas Goaziou
2013-12-18  6:57                   ` Aaron Ecay [this message]
2013-12-18 15:01                     ` Nicolas Goaziou

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k3f2eje6.fsf@gmail.com \
    --to=aaronecay@gmail.com \
    --cc=emacs-orgmode@gnu.org \
    --cc=n.goaziou@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).