From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nicolas Goaziou Subject: Re: [parser] subscripts and underlines interacting badly Date: Wed, 18 Dec 2013 16:01:35 +0100 Message-ID: <87a9fy8apc.fsf@gmail.com> References: <87ppp415n4.fsf@gmail.com> <87bo0nu79v.fsf@gmail.com> <87haaf1bgi.fsf@gmail.com> <8761qvxg2o.fsf@gmail.com> <87r49ik0qw.fsf@gmail.com> <87txeevus8.fsf@gmail.com> <87ob4lkhmo.fsf@gmail.com> <87lhzpx0d7.fsf@gmail.com> <87fvptfpuy.fsf@gmail.com> <871u1b9zzq.fsf@gmail.com> <87k3f2eje6.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:53549) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VtIcm-0003zb-VI for emacs-orgmode@gnu.org; Wed, 18 Dec 2013 10:01:33 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VtIce-0004ip-FN for emacs-orgmode@gnu.org; Wed, 18 Dec 2013 10:01:24 -0500 Received: from mail-wi0-x236.google.com ([2a00:1450:400c:c05::236]:60971) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VtIce-0004ij-87 for emacs-orgmode@gnu.org; Wed, 18 Dec 2013 10:01:16 -0500 Received: by mail-wi0-f182.google.com with SMTP id en1so750643wid.3 for ; Wed, 18 Dec 2013 07:01:15 -0800 (PST) Received: from selenimh ([91.224.148.150]) by mx.google.com with ESMTPSA id o9sm4503367wib.10.2013.12.18.07.01.13 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 18 Dec 2013 07:01:14 -0800 (PST) In-Reply-To: <87k3f2eje6.fsf@gmail.com> (Aaron Ecay's message of "Wed, 18 Dec 2013 01:57:21 -0500") List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: "emacs-orgmode@gnu.org" Hello, Aaron Ecay writes: > The attached patch implements this. It also updates the fontification > to match (by calling out to the parser, so there are potential > performance issues although with the cache it will hopefully not be an > issue in practice), and notes the new heuristic in the manual. The test > suite passes. Thank you. Here are some comments and the usual nitpicks. > From e2044312b95f8b427ddc662cd1abf10bf4d87b2d Mon Sep 17 00:00:00 2001 > From: Aaron Ecay > Date: Sun, 15 Dec 2013 21:30:27 -0500 > Subject: [PATCH] org-element: use brackets to disambiguate subscript/underline You need a capital after colon. > * lisp/org-element.el (org-element-sub/superscript-successor): use > brackets to disambiguate subscript/underline Ditto, and a period at the end of the sentence. > * lisp/org.el (org-do-emphasis-faces): incorporate the above > disambiguation I'd rather not use `org-element-context' in fontification ATM. My plan is, indeed, to use the parser for fontification, but in a planned-out way. Doing it too early may be counter-productive. For now, we can accept some discrepancies between fontification and syntax (there are many other such occurrences anyway). > * doc/org.texi: reflect these changes in the manual See above. > +When it follows an alphanumeric character, the underscore is always > +interpreted as a subscript (@pxref{Subscripts and superscripts}), and when it > +follows whitespace it is always the start of an underline (assuming a > +matching underscore is found in a proper position further along). However, > +after a punctuation character (for example the apostrophe), the underscore > +character can be ambiguous between these two interpretations. Org uses a > +simple heuristic for these cases: if the character following the underscore > +is an opening brace @samp{@{} or if no matching underscore is seen in the > +following text, the underscore is considered to be the start of a subscript. > +Otherwise, it is the start of underlining. There is no harm in documenting it, but remember that it's not a feature of the syntax. Maybe it could be shortened and put into a footnote instead? > + (let (res) > + (while (and (not res) > + (re-search-forward org-match-substring-regexp nil t)) > + (goto-char (match-beginning 0)) > + (when (or > + ;; this subscript uses brackets -> handle as subscript > + ;; unconditionally Comments need to start with a capital and end with a period. > + (eq (aref (match-string 3) 0) ?{) > + ;; it is not ambiguous with an underline -> handle as > + ;; subscript > + (not (looking-at-p org-emph-re))) It should be `org-looking-at-p' for compatibility with other Emacsen. > + (setq res (cons (if (string= (match-string 2) "_") > + 'subscript > + 'superscript) > + (match-beginning 2)))) > + ;; otherwise -> keep going, and let the underline > + ;; parser have it > + (goto-char (match-end 0))) I think (save-excursion (goto-char (match-beginning 0)) ...) is better than (goto-char (match-beginning 0)) ... (goto-char (match-end 0)). > + res))) I suggest to use (catch 'found ... (throw 'found (cons ...))) instead of RES variable: the less `setq', the better. Regards, -- Nicolas Goaziou