emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Aaron Ecay <aaronecay@gmail.com>
To: Nicolas Goaziou <n.goaziou@gmail.com>,
	"emacs-orgmode@gnu.org" <emacs-orgmode@gnu.org>
Subject: Re: [parser] subscripts and underlines interacting badly
Date: Sun, 15 Dec 2013 22:15:33 -0500	[thread overview]
Message-ID: <87fvptfpuy.fsf@gmail.com> (raw)
In-Reply-To: <87lhzpx0d7.fsf@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5135 bytes --]

2013ko abenudak 12an, Nicolas Goaziou-ek idatzi zuen:
> No, it just means that I didn't put much thought into it. It also means
> that I would prefer something more natural (and simpler) than such an
> ad-hoc rule.
> 
> If you work on it and really think it is an improvement over existing
> situation, then I don't see why I wouldn't accept it. But I'd rather not
> consider it as a definitive answer to the problem (and include it as
> a part of a standard Org syntax implementation).

I have worked up such a patch.  You are right that it is complicated.  I
think it is important to give people the possibility to represent some
of the currently-impossible sequences while a better syntax is cooked
up, though.

Since the present syntax is inadequate for representating these
sequences, the new syntax will have to break backwards compatibility
somehow in order to fix the problem.  So there’s no long-term harm in
having a short-term kludge that will eventually disappear.

The patch is attached to this email.

> 
>> 1. You have a plan to get rid of org-use-sub-superscripts.  You might also
>> want to get rid of `org-export-with-sub-superscripts' (depending on how
>> one interprets your remark that the variable "do[es]n't make much sense
>> anyway").  Also, other parts of org (e.g. the parser) cannot change to
>> harmonize with these variables.  This means that these variables are de
>> facto deprecated, and org is headed to a future where sub/superscripts
>> are non-optional and non-configurable.
> 
> and non-intrusive, too, which isn't the case at the moment.
> 
> You cannot get rid of subscript in LaTeX (well, you probably can, but
> I guess most users don't). Why could you in Org?

As a point of fact, you can get rid of it in Latex.  There is even a
Latex package on CTAN precisely to turn off subscripts (in text mode,
not math):
<http://www.ctan.org/tex-archive/macros/latex2e/contrib/underscore>

As for why more users don’t turn it off, it probably has something to do
with the fact that subscripts and underlines in Latex don’t interfere
with each other.

But in any case “you can do X in Latex” is strange as an argument for
“you should *not* be able to do X in org”

> 
>> 2. The current (non-optional, non-configurable) implementation of
>> X-scripts by the parser has specifically identifiable defects,
>> such as the one I mentioned whereby '_foo_, perhaps naturally
>> interpreted as underlining (among other reasons because of how it
>> is highlighted by org) is "really" a subscript.
> 
> The current implementation, with its defects, is still configurable.
> `org-export-with-sub-superscripts' works as advertised, AFAIK.

It has several bugs; see separate thread.

> I'm not really able to change Org syntax without Carsten's consent.
> 
> Anyway, I'd like any syntax change to be really discussed. Org has
> a long history of great ideas implemented without any consistent syntax
> in mind. Examples include @<tag>, Babel's #+header line with ":prop
> value" (even though every other part of Org used "key=value"),
> configurable emphasis markers and list item bullets, "comment" and
> "quote" keywords (even though Archive is a tag)...
> 
> Also, changing Org syntax isn't limited to a mere patch over
> org-element.el. Remember that most of Org doesn't use this library
> (hint).

But eventually it will (assuming the cache implementation proves robust
enough), right?  So, changes in org-element.el will eventually percolate
to the rest of org, whereas changes elsewhere will wither and dry up.

> 
> Back to the topic. As you know, I'm not really open to per-user Org
> syntax. But I will consider any syntactical change that would solve the
> problem at hand.
> 
>> I hope you realize why this situation might be frustrating to a user and
>> attempted contributor.
> 
> I don't want to be frustrating.
> 
> I try to make as clear as possible what I see as important and where
> I would like to head to. I even suggested topics to work on (e.g.
> escaped characters).

I don’t think escaped characters help with the problem that it is
presently impossible to represent the following (pseudo)-element
sequence in org syntax:

(plain-text "'") (underline "foo")

–> which expands to:

'_foo_

–> which parses to:

(plain-text "'") (subscript "foo") (plain-text "_")

Anyway, what do escaped characters do that entities cannot?  You can
write \under{} to escape an underscore today.  It’s marginally less
convenient to type (and remember) than \_, but this seems like a
more-or-less solved problem, especially given the effort it would take
to teach e.g. org-emph-re about backslash escapes.

> 
> There's also optimization to do on cache, if you're motivated.

I think the cache mechanism is an incredible piece of engineering,
indeed.  I hope to someday be able to understand the code well enough to
contribute in that area.  But it seems hard to get into without many
hours of devoted study, which I (regretfully) don’t have time for ATM.

Thanks,

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-org-element-use-brackets-to-disambiguate-subscript-u.patch --]
[-- Type: text/x-diff, Size: 3568 bytes --]

From 0d616d65a9b6175c9e7ef225f4baff3d185b5d5b Mon Sep 17 00:00:00 2001
From: Aaron Ecay <aaronecay@gmail.com>
Date: Sun, 15 Dec 2013 21:30:27 -0500
Subject: [PATCH] org-element: use brackets to disambiguate subscript/underline

* lisp/org-element.el (org-element--get-next-object-candidates):
use brackets to disambiguate subscript/underline

In an org-syntax string like 1 or 2 below, both subscript and
underline are possible interpretations.  This patch uses the presence
of brackets to disambiguate these cases, that is, 1 is interpreted as
an underline whereas 2 is a subscript (followed by plain-text _)

1: '_foo_
2: '_{foo}_

This change does not touch the in-buffer highlighting, which will show
underline or subscript in these cases depending on the
org-use-sub-superscripts variable, which is (close to)
deprecated (<http://mid.gmane.org/87lhzpx0d7.fsf@gmail.com>).
---
 lisp/org-element.el | 64 ++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 44 insertions(+), 20 deletions(-)

diff --git a/lisp/org-element.el b/lisp/org-element.el
index 089ecfb..7082562 100644
--- a/lisp/org-element.el
+++ b/lisp/org-element.el
@@ -4424,26 +4424,50 @@ RESTRICTION should be looked after.
 
 Return value is an alist whose CAR is the object type and CDR its
 beginning position."
-  (delq
-   nil
-   (if (eq objects 'initial)
-       ;; When searching for the first time, look for every successor
-       ;; allowed in RESTRICTION.
-       (mapcar
-	(lambda (res)
-	  (funcall (intern (format "org-element-%s-successor" res))))
-	restriction)
-     ;; Focus on objects returned during last search.  Keep those
-     ;; still after point.  Search again objects before it.
-     (mapcar
-      (lambda (obj)
-	(if (>= (cdr obj) (point)) obj
-	  (let* ((type (car obj))
-		 (succ (or (cdr (assq type org-element-object-successor-alist))
-			   type)))
-	    (and succ
-		 (funcall (intern (format "org-element-%s-successor" succ)))))))
-      objects))))
+  (let ((res
+	 (delq
+	  nil
+	  (if (eq objects 'initial)
+	      ;; When searching for the first time, look for every successor
+	      ;; allowed in RESTRICTION.
+	      (mapcar
+	       (lambda (res)
+		 (funcall (intern (format "org-element-%s-successor" res))))
+	       restriction)
+	    ;; Focus on objects returned during last search.  Keep those
+	    ;; still after point.  Search again objects before it.
+	    (mapcar
+	     (lambda (obj)
+	       (if (>= (cdr obj) (point)) obj
+		 (let* ((type (car obj))
+			(succ (or (cdr (assq type org-element-object-successor-alist))
+				  type)))
+		   (and succ
+			(funcall (intern (format "org-element-%s-successor" succ)))))))
+	     objects)))))
+    ;; To disambiguate between underlines and subscripts, which may
+    ;; both match at the same position, the heuristic is used that if
+    ;; there are brackets, the expression is a subscript, whereas
+    ;; otherwise it is an underline.  Thus:
+    ;; '_foo_ parses as (plain-text "'") (underline "foo")
+    ;; whereas '_{foo}_ parses as
+    ;; (plain-text "'") (subscript "foo") (plain-text "_")
+    (setq res
+	  (sort res
+		(lambda (a b)
+		  (if (and (= (cdr a) (cdr b))
+			   (memq 'underline (list (car a) (car b)))
+			   (memq 'subscript (list (car a) (car b))))
+		      (save-excursion
+			(goto-char (cdr a))
+			(let ((bracketsp
+			       (org-element-property
+				:use-brackets-p (org-element-subscript-parser))))
+			  (if (eq (car a) 'underline)
+			      (not bracketsp)
+			    bracketsp)))
+		    (< (cdr a) (cdr b))))))
+    res))
 
 
 \f
-- 
1.8.5.1


[-- Attachment #3: Type: text/plain, Size: 15 bytes --]

-- 
Aaron Ecay

  reply	other threads:[~2013-12-16  3:15 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-11  2:30 [parser] subscripts and underlines interacting badly Aaron Ecay
2013-12-11  8:22 ` Nicolas Goaziou
2013-12-11 18:36   ` Aaron Ecay
2013-12-11 20:55     ` Nicolas Goaziou
2013-12-12  7:56       ` Aaron Ecay
2013-12-12 17:33         ` Nicolas Goaziou
2013-12-12 19:42           ` Aaron Ecay
2013-12-12 20:47             ` Nicolas Goaziou
2013-12-16  3:15               ` Aaron Ecay [this message]
2013-12-16  3:24                 ` [PATCH] quick patch to org-habit todo state keywords Ted Wiles
2013-12-16  4:27                   ` Aaron Ecay
2013-12-17 16:57                 ` [parser] subscripts and underlines interacting badly Nicolas Goaziou
2013-12-18  6:57                   ` Aaron Ecay
2013-12-18 15:01                     ` Nicolas Goaziou

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87fvptfpuy.fsf@gmail.com \
    --to=aaronecay@gmail.com \
    --cc=emacs-orgmode@gnu.org \
    --cc=n.goaziou@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).