From mboxrd@z Thu Jan 1 00:00:00 1970 From: Aaron Ecay Subject: Re: [parser] subscripts and underlines interacting badly Date: Sun, 15 Dec 2013 22:15:33 -0500 Message-ID: <87fvptfpuy.fsf@gmail.com> References: <87ppp415n4.fsf@gmail.com> <87bo0nu79v.fsf@gmail.com> <87haaf1bgi.fsf@gmail.com> <8761qvxg2o.fsf@gmail.com> <87r49ik0qw.fsf@gmail.com> <87txeevus8.fsf@gmail.com> <87ob4lkhmo.fsf@gmail.com> <87lhzpx0d7.fsf@gmail.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:60176) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VsOek-0000x5-7V for emacs-orgmode@gnu.org; Sun, 15 Dec 2013 22:15:47 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VsOef-00042S-8M for emacs-orgmode@gnu.org; Sun, 15 Dec 2013 22:15:42 -0500 Received: from mail-qa0-x232.google.com ([2607:f8b0:400d:c00::232]:62894) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VsOee-00042K-Rr for emacs-orgmode@gnu.org; Sun, 15 Dec 2013 22:15:37 -0500 Received: by mail-qa0-f50.google.com with SMTP id i13so1167131qae.9 for ; Sun, 15 Dec 2013 19:15:36 -0800 (PST) In-Reply-To: <87lhzpx0d7.fsf@gmail.com> List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Nicolas Goaziou , "emacs-orgmode@gnu.org" --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable 2013ko abenudak 12an, Nicolas Goaziou-ek idatzi zuen: > No, it just means that I didn't put much thought into it. It also means > that I would prefer something more natural (and simpler) than such an > ad-hoc rule. >=20 > If you work on it and really think it is an improvement over existing > situation, then I don't see why I wouldn't accept it. But I'd rather not > consider it as a definitive answer to the problem (and include it as > a part of a standard Org syntax implementation). I have worked up such a patch. You are right that it is complicated. I think it is important to give people the possibility to represent some of the currently-impossible sequences while a better syntax is cooked up, though. Since the present syntax is inadequate for representating these sequences, the new syntax will have to break backwards compatibility somehow in order to fix the problem. So there=E2=80=99s no long-term harm = in having a short-term kludge that will eventually disappear. The patch is attached to this email. >=20 >> 1. You have a plan to get rid of org-use-sub-superscripts. You might al= so >> want to get rid of `org-export-with-sub-superscripts' (depending on how >> one interprets your remark that the variable "do[es]n't make much sense >> anyway"). Also, other parts of org (e.g. the parser) cannot change to >> harmonize with these variables. This means that these variables are de >> facto deprecated, and org is headed to a future where sub/superscripts >> are non-optional and non-configurable. >=20 > and non-intrusive, too, which isn't the case at the moment. >=20 > You cannot get rid of subscript in LaTeX (well, you probably can, but > I guess most users don't). Why could you in Org? As a point of fact, you can get rid of it in Latex. There is even a Latex package on CTAN precisely to turn off subscripts (in text mode, not math): As for why more users don=E2=80=99t turn it off, it probably has something = to do with the fact that subscripts and underlines in Latex don=E2=80=99t interfe= re with each other. But in any case =E2=80=9Cyou can do X in Latex=E2=80=9D is strange as an ar= gument for =E2=80=9Cyou should *not* be able to do X in org=E2=80=9D >=20 >> 2. The current (non-optional, non-configurable) implementation of >> X-scripts by the parser has specifically identifiable defects, >> such as the one I mentioned whereby '_foo_, perhaps naturally >> interpreted as underlining (among other reasons because of how it >> is highlighted by org) is "really" a subscript. >=20 > The current implementation, with its defects, is still configurable. > `org-export-with-sub-superscripts' works as advertised, AFAIK. It has several bugs; see separate thread. > I'm not really able to change Org syntax without Carsten's consent. >=20 > Anyway, I'd like any syntax change to be really discussed. Org has > a long history of great ideas implemented without any consistent syntax > in mind. Examples include @, Babel's #+header line with ":prop > value" (even though every other part of Org used "key=3Dvalue"), > configurable emphasis markers and list item bullets, "comment" and > "quote" keywords (even though Archive is a tag)... >=20 > Also, changing Org syntax isn't limited to a mere patch over > org-element.el. Remember that most of Org doesn't use this library > (hint). But eventually it will (assuming the cache implementation proves robust enough), right? So, changes in org-element.el will eventually percolate to the rest of org, whereas changes elsewhere will wither and dry up. >=20 > Back to the topic. As you know, I'm not really open to per-user Org > syntax. But I will consider any syntactical change that would solve the > problem at hand. >=20 >> I hope you realize why this situation might be frustrating to a user and >> attempted contributor. >=20 > I don't want to be frustrating. >=20 > I try to make as clear as possible what I see as important and where > I would like to head to. I even suggested topics to work on (e.g. > escaped characters). I don=E2=80=99t think escaped characters help with the problem that it is presently impossible to represent the following (pseudo)-element sequence in org syntax: (plain-text "'") (underline "foo") =E2=80=93> which expands to: '_foo_ =E2=80=93> which parses to: (plain-text "'") (subscript "foo") (plain-text "_") Anyway, what do escaped characters do that entities cannot? You can write \under{} to escape an underscore today. It=E2=80=99s marginally less convenient to type (and remember) than \_, but this seems like a more-or-less solved problem, especially given the effort it would take to teach e.g. org-emph-re about backslash escapes. >=20 > There's also optimization to do on cache, if you're motivated. I think the cache mechanism is an incredible piece of engineering, indeed. I hope to someday be able to understand the code well enough to contribute in that area. But it seems hard to get into without many hours of devoted study, which I (regretfully) don=E2=80=99t have time for A= TM. Thanks, --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename=0001-org-element-use-brackets-to-disambiguate-subscript-u.patch >From 0d616d65a9b6175c9e7ef225f4baff3d185b5d5b Mon Sep 17 00:00:00 2001 From: Aaron Ecay Date: Sun, 15 Dec 2013 21:30:27 -0500 Subject: [PATCH] org-element: use brackets to disambiguate subscript/underline * lisp/org-element.el (org-element--get-next-object-candidates): use brackets to disambiguate subscript/underline In an org-syntax string like 1 or 2 below, both subscript and underline are possible interpretations. This patch uses the presence of brackets to disambiguate these cases, that is, 1 is interpreted as an underline whereas 2 is a subscript (followed by plain-text _) 1: '_foo_ 2: '_{foo}_ This change does not touch the in-buffer highlighting, which will show underline or subscript in these cases depending on the org-use-sub-superscripts variable, which is (close to) deprecated (). --- lisp/org-element.el | 64 ++++++++++++++++++++++++++++++++++++----------------- 1 file changed, 44 insertions(+), 20 deletions(-) diff --git a/lisp/org-element.el b/lisp/org-element.el index 089ecfb..7082562 100644 --- a/lisp/org-element.el +++ b/lisp/org-element.el @@ -4424,26 +4424,50 @@ RESTRICTION should be looked after. Return value is an alist whose CAR is the object type and CDR its beginning position." - (delq - nil - (if (eq objects 'initial) - ;; When searching for the first time, look for every successor - ;; allowed in RESTRICTION. - (mapcar - (lambda (res) - (funcall (intern (format "org-element-%s-successor" res)))) - restriction) - ;; Focus on objects returned during last search. Keep those - ;; still after point. Search again objects before it. - (mapcar - (lambda (obj) - (if (>= (cdr obj) (point)) obj - (let* ((type (car obj)) - (succ (or (cdr (assq type org-element-object-successor-alist)) - type))) - (and succ - (funcall (intern (format "org-element-%s-successor" succ))))))) - objects)))) + (let ((res + (delq + nil + (if (eq objects 'initial) + ;; When searching for the first time, look for every successor + ;; allowed in RESTRICTION. + (mapcar + (lambda (res) + (funcall (intern (format "org-element-%s-successor" res)))) + restriction) + ;; Focus on objects returned during last search. Keep those + ;; still after point. Search again objects before it. + (mapcar + (lambda (obj) + (if (>= (cdr obj) (point)) obj + (let* ((type (car obj)) + (succ (or (cdr (assq type org-element-object-successor-alist)) + type))) + (and succ + (funcall (intern (format "org-element-%s-successor" succ))))))) + objects))))) + ;; To disambiguate between underlines and subscripts, which may + ;; both match at the same position, the heuristic is used that if + ;; there are brackets, the expression is a subscript, whereas + ;; otherwise it is an underline. Thus: + ;; '_foo_ parses as (plain-text "'") (underline "foo") + ;; whereas '_{foo}_ parses as + ;; (plain-text "'") (subscript "foo") (plain-text "_") + (setq res + (sort res + (lambda (a b) + (if (and (= (cdr a) (cdr b)) + (memq 'underline (list (car a) (car b))) + (memq 'subscript (list (car a) (car b)))) + (save-excursion + (goto-char (cdr a)) + (let ((bracketsp + (org-element-property + :use-brackets-p (org-element-subscript-parser)))) + (if (eq (car a) 'underline) + (not bracketsp) + bracketsp))) + (< (cdr a) (cdr b)))))) + res)) -- 1.8.5.1 --=-=-= Content-Type: text/plain -- Aaron Ecay --=-=-=--