emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* [parser] subscripts and underlines interacting badly
@ 2013-12-11  2:30 Aaron Ecay
  2013-12-11  8:22 ` Nicolas Goaziou
  0 siblings, 1 reply; 14+ messages in thread
From: Aaron Ecay @ 2013-12-11  2:30 UTC (permalink / raw)
  To: emacs-orgmode@gnu.org; +Cc: Nicolas Goaziou

[-- Attachment #1: Type: text/plain, Size: 773 bytes --]

Hello,

I have encountered two related misbehaviors in the parser/exporter.

The first manifests if you type the following line into an org-mode
buffer and execute M-: (org-element-context) with point on the ‘f’; the
result is a subscript object, whereas I would have expected an
underline:

'_foo_

I think both possibilities are returned by
org-element--get-next-object-candidates, and subscript “wins” because it
precedes the other in the list.  I’m not sure how this should be
addressed, but maybe Nicolas knows.

I encountered the second issue when trying to hack around the first by
setting org-use-sub-superscripts to '{}.  It seems this variable is not
considered by the parser.  I think the attached patch fixes this issue.

Thanks,

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-org-element-respect-the-org-use-sub-superscripts-var.patch --]
[-- Type: text/x-diff, Size: 1370 bytes --]

From f5f6d10e5d55f68d57abd2f7a947f72cb90081d0 Mon Sep 17 00:00:00 2001
From: Aaron Ecay <aaronecay@gmail.com>
Date: Tue, 10 Dec 2013 21:19:23 -0500
Subject: [PATCH] org-element: respect the `org-use-sub-superscripts' variable

* lisp/org-element.el (org-element-sub/superscript-successor): respect
the `org-use-sub-superscripts' variable
---
 lisp/org-element.el | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/lisp/org-element.el b/lisp/org-element.el
index 9183a67..d0e6bf9 100644
--- a/lisp/org-element.el
+++ b/lisp/org-element.el
@@ -3394,11 +3394,15 @@ CONTENTS is the contents of the object."
 
 Return value is a cons cell whose CAR is either `subscript' or
 `superscript' and CDR is beginning position."
-  (save-excursion
-    (unless (bolp) (backward-char))
-    (when (re-search-forward org-match-substring-regexp nil t)
-      (cons (if (string= (match-string 2) "_") 'subscript 'superscript)
-	    (match-beginning 2)))))
+  (when org-use-sub-superscripts
+    (save-excursion
+      (unless (bolp) (backward-char))
+      (when (re-search-forward (if (eq org-use-sub-superscripts '{})
+				   org-match-substring-with-braces-regexp
+				 org-match-substring-regexp)
+			       nil t)
+	(cons (if (string= (match-string 2) "_") 'subscript 'superscript)
+	      (match-beginning 2))))))
 
 
 ;;;; Superscript
-- 
1.8.5.1


[-- Attachment #3: Type: text/plain, Size: 15 bytes --]

-- 
Aaron Ecay

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [parser] subscripts and underlines interacting badly
  2013-12-11  2:30 [parser] subscripts and underlines interacting badly Aaron Ecay
@ 2013-12-11  8:22 ` Nicolas Goaziou
  2013-12-11 18:36   ` Aaron Ecay
  0 siblings, 1 reply; 14+ messages in thread
From: Nicolas Goaziou @ 2013-12-11  8:22 UTC (permalink / raw)
  To: emacs-orgmode@gnu.org

Hello,

Aaron Ecay <aaronecay@gmail.com> writes:

> I have encountered two related misbehaviors in the parser/exporter.
>
> The first manifests if you type the following line into an org-mode
> buffer and execute M-: (org-element-context) with point on the ‘f’; the
> result is a subscript object, whereas I would have expected an
> underline:
>
> '_foo_
>
> I think both possibilities are returned by
> org-element--get-next-object-candidates, and subscript “wins” because it
> precedes the other in the list.  I’m not sure how this should be
> addressed, but maybe Nicolas knows.

Actually, this is not really a parser problem but a syntax one.
underline and subscript are ambiguous, and therefore ill-defined,
because, in some situations, both can match at the same location.

This is usually not noticeable because, I think, most uses of underline
begin with a space (e.g. some _word_) whereas subscript cannot. This is
not true in your example.

This has been discussed some months ago, but, AFAIR, no answer was
found. Note that I suggested a change to superscript/supscript a couple
of weeks ago, but it won't solve the problem at hand. Perhaps it could
be extended to remove ambiguity for subscript.

> I encountered the second issue when trying to hack around the first by
> setting org-use-sub-superscripts to '{}.  It seems this variable is not
> considered by the parser.  I think the attached patch fixes this
> issue.

Thanks for the patch.

Though, the parser ignores `org-use-sub-superscripts' on purpose. At the
moment `org-use-sub-superscripts' is a display variable only.

This change happened in 8.0. This also explains why
`org-export-with-sub-superscripts' is now a separate value from
`org-use-sub-superscripts'.

The main reason for this change is that I think that customizable
syntax, unlike to customizable behaviour, is not a good idea for Org
(e.g. portability and simplicity issues).


Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [parser] subscripts and underlines interacting badly
  2013-12-11  8:22 ` Nicolas Goaziou
@ 2013-12-11 18:36   ` Aaron Ecay
  2013-12-11 20:55     ` Nicolas Goaziou
  0 siblings, 1 reply; 14+ messages in thread
From: Aaron Ecay @ 2013-12-11 18:36 UTC (permalink / raw)
  To: Nicolas Goaziou, emacs-orgmode@gnu.org

Hi Nicolas,

Thanks for your comments.

2013ko abenudak 11an, Nicolas Goaziou-ek idatzi zuen:
> Actually, this is not really a parser problem but a syntax one.
> underline and subscript are ambiguous, and therefore ill-defined,
> because, in some situations, both can match at the same location.

I have found one case where both match, but an underline is intended.
Are there any reverse cases, i.e. where both match but a subscript is
intended?  The closest I could come up with would be something like:

The quantities X_1 and X_2 are ....

But I think, at least with default values of
org-emphasis-regexp-components, this cannot be an underline.

So, if there are indeed no such cases, the fix is just to always choose
the underline, when both underline and subscript match at the same
position.

> 
> Thanks for the patch.
> 
> Though, the parser ignores `org-use-sub-superscripts' on purpose. At the
> moment `org-use-sub-superscripts' is a display variable only.
> 
> This change happened in 8.0. This also explains why
> `org-export-with-sub-superscripts' is now a separate value from
> `org-use-sub-superscripts'.
> 
> The main reason for this change is that I think that customizable
> syntax, unlike to customizable behaviour, is not a good idea for Org
> (e.g. portability and simplicity issues).

I understand your point.  But I think there is a danger in some cases
that the tail of “portability” will wind up wagging the dog of org-mode.
The syntax of org is an abstract mathematical object; the parser is just
one (currently the only, AFAIK) implementation of it.  So, if it proves
necessary, some behavioral aspects can be added to the parser, as long
as it is understood that they are behavioral and not driven by the
abstract syntax (we could add such a comment to my patch, for example).

I think it is advantageous to do so in this case.  In the example I
gave, two core parts of org (display and export) differ in their
interpretation of the same string.  Putting this behavior in the parser
will fix that.  It will also free future elisp code which consumes the
parser’s output* from having to worry about the value of the variables in
question.

Finally, it would allow the re-unification of the export and display
flavors of the use-subscripts variable.  It’s hard to think of a use
case that would want subscripts to be interpreted differently for
display and export.  (Although if someone has such a case, the
unification need not be undertaken: it is purely optional.)

Thanks again,

-- 
Aaron Ecay

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [parser] subscripts and underlines interacting badly
  2013-12-11 18:36   ` Aaron Ecay
@ 2013-12-11 20:55     ` Nicolas Goaziou
  2013-12-12  7:56       ` Aaron Ecay
  0 siblings, 1 reply; 14+ messages in thread
From: Nicolas Goaziou @ 2013-12-11 20:55 UTC (permalink / raw)
  To: emacs-orgmode@gnu.org

Aaron Ecay <aaronecay@gmail.com> writes:

> I have found one case where both match, but an underline is intended.
> Are there any reverse cases, i.e. where both match but a subscript is
> intended?

I don't know. Perhaps something as convoluted as:

  A'_{a_-1}

But that's not the real problem: whenever we change underline syntax
(e.g. if we implement escaped characters), we will start over again, as
more ambiguous cases might spawn.

> So, if there are indeed no such cases, the fix is just to always choose
> the underline, when both underline and subscript match at the same
> position.

As a short term solution, it can be implemented (it's probably just
a matter of reordering successors calls). But in the long run, we really
need to define properly both syntax.

> I understand your point.  But I think there is a danger in some cases
> that the tail of “portability” will wind up wagging the dog of org-mode.
> The syntax of org is an abstract mathematical object; the parser is just
> one (currently the only, AFAIK) implementation of it.  So, if it proves
> necessary, some behavioral aspects can be added to the parser, as long
> as it is understood that they are behavioral and not driven by the
> abstract syntax (we could add such a comment to my patch, for
> example).

I'm strongly against behavioral parts in Org syntax (even though the
ship probably has sailed long ago). Org mode is bound to Emacs, but Org
format should be platform independent.

> I think it is advantageous to do so in this case.  In the example I
> gave, two core parts of org (display and export) differ in their
> interpretation of the same string.  Putting this behavior in the parser
> will fix that.  It will also free future elisp code which consumes the
> parser’s output* from having to worry about the value of the variables in
> question.
>
> Finally, it would allow the re-unification of the export and display
> flavors of the use-subscripts variable.  It’s hard to think of a use
> case that would want subscripts to be interpreted differently for
> display and export.  (Although if someone has such a case, the
> unification need not be undertaken: it is purely optional.)

Note that in my post, I said "at the moment". There are two variables
for historical reasons. AFAIC some `org-export-with-*' variables don't
make much sense anyway (`org-export-with-tables' comes to mind, but also
`org-export-with-sub-superscripts').

The real question is: why would we need to disable superscript/subscript
in an Org document? We probably need it because they can get in the way
sometimes. Then, we'd better provide tools to put them out of that way
instead of completely disabling them. Character escaping is one
solution.

Again, I strongly think we should focus on making Org syntax simpler
(and yet powerful) instead of piling up variables to change it on the
fly for occasional needs.


Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [parser] subscripts and underlines interacting badly
  2013-12-11 20:55     ` Nicolas Goaziou
@ 2013-12-12  7:56       ` Aaron Ecay
  2013-12-12 17:33         ` Nicolas Goaziou
  0 siblings, 1 reply; 14+ messages in thread
From: Aaron Ecay @ 2013-12-12  7:56 UTC (permalink / raw)
  To: Nicolas Goaziou, emacs-orgmode@gnu.org

2013ko abenudak 11an, Nicolas Goaziou-ek idatzi zuen:
>
> Aaron Ecay <aaronecay@gmail.com> writes:
>
>> I have found one case where both match, but an underline is intended.
>> Are there any reverse cases, i.e. where both match but a subscript is
>> intended?
>
> I don't know. Perhaps something as convoluted as:
>
>   A'_{a_-1}

Oh, yes.  Very clever.  Unfortunately, it means that a fix, even a
temporary one, cannot be just to prioritize underline over subscript as
I proposed.  =/


> As a short term solution, it can be implemented (it's probably just
> a matter of reordering successors calls). But in the long run, we really
> need to define properly both syntax.

I agree.  Do you think it is possible to solve the problem while
preserving the fact that underscore is used for both subscript and
underline?  It seems very difficult.

When I think about the question, I think probably what is needed is a
representation where object boundaries are delimited by one well-defined
pair of delimiters, like {} in latex or <> in html (well, in html they
delimit tags, but the principle is the same: only one pair).  Then we
don’t have to worry about escape syntax for many characters, or
characters with multiple possible interpretations (or how many lines
org-emph-re is allowed to match across, or ...).

But that is just one idea I have had.  You must have thought about it
more, so maybe you have others.

>
> I'm strongly against behavioral parts in Org syntax (even though the
> ship probably has sailed long ago). Org mode is bound to Emacs, but Org
> format should be platform independent.

Org syntax can be un-configurable even if org-element.el implements a
(configurable) superset of it.  Given that the use-subscript variable
exists (and without taking into account more systemic solutions as
discussed above), I’m arguing that it is cleaner to implement it in
org-element, rather than in two separate places (in the regex-based
old-style parsing code in org.el and in ox.el; there’s also one
reference to the variable in org-table.el(!))

Phrased in other terms, it makes no sense (in the context of Org-Mode,
not platonic Org Syntax) for org-element to insist that "a_b" is a
subscript, if org-use-sub-superscripts =
org-export-with-sub-superscripts = nil.

Thanks,

--
Aaron Ecay

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [parser] subscripts and underlines interacting badly
  2013-12-12  7:56       ` Aaron Ecay
@ 2013-12-12 17:33         ` Nicolas Goaziou
  2013-12-12 19:42           ` Aaron Ecay
  0 siblings, 1 reply; 14+ messages in thread
From: Nicolas Goaziou @ 2013-12-12 17:33 UTC (permalink / raw)
  To: emacs-orgmode@gnu.org

Hello,

Aaron Ecay <aaronecay@gmail.com> writes:

> I agree.  Do you think it is possible to solve the problem while
> preserving the fact that underscore is used for both subscript and
> underline?  It seems very difficult.

We could give priority to underline when there are no curly brackets,
priority to subscript otherwise. It sounds overly complicated though.

> When I think about the question, I think probably what is needed is a
> representation where object boundaries are delimited by one well-defined
> pair of delimiters, like {} in latex or <> in html (well, in html they
> delimit tags, but the principle is the same: only one pair).  Then we
> don’t have to worry about escape syntax for many characters, or
> characters with multiple possible interpretations (or how many lines
> org-emph-re is allowed to match across, or ...).
>
> But that is just one idea I have had.  You must have thought about it
> more, so maybe you have others.
>
>>
>> I'm strongly against behavioral parts in Org syntax (even though the
>> ship probably has sailed long ago). Org mode is bound to Emacs, but Org
>> format should be platform independent.
>
> Org syntax can be un-configurable even if org-element.el implements a
> (configurable) superset of it.

org-element.el is meant to implement _exactly_ Org syntax. Not
a superset of it.

> Given that the use-subscript variable exists (and without taking into
> account more systemic solutions as discussed above), I’m arguing that
> it is cleaner to implement it in org-element, rather than in two
> separate places (in the regex-based old-style parsing code in org.el
> and in ox.el; there’s also one reference to the variable in
> org-table.el(!))
>
> Phrased in other terms, it makes no sense (in the context of Org-Mode,
> not platonic Org Syntax) for org-element to insist that "a_b" is a
> subscript, if org-use-sub-superscripts =
> org-export-with-sub-superscripts = nil.

Again, `org-use-sub-superscripts' is, at the moment, a visual-only
variable. My plan is to move it out, not in.


Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [parser] subscripts and underlines interacting badly
  2013-12-12 17:33         ` Nicolas Goaziou
@ 2013-12-12 19:42           ` Aaron Ecay
  2013-12-12 20:47             ` Nicolas Goaziou
  0 siblings, 1 reply; 14+ messages in thread
From: Aaron Ecay @ 2013-12-12 19:42 UTC (permalink / raw)
  To: Nicolas Goaziou, emacs-orgmode@gnu.org

Hi Nicolas,

2013ko abenudak 12an, Nicolas Goaziou-ek idatzi zuen:

>
> We could give priority to underline when there are no curly brackets,
> priority to subscript otherwise. It sounds overly complicated though.

Your last sentence sounds very close to "don’t do it; I won’t accept
such a patch."  Is that so?


[...]


>
> Again, `org-use-sub-superscripts' is, at the moment, a visual-only
> variable. My plan is to move it out, not in.

Just to be sure I understand:

1. You have a plan to get rid of org-use-sub-superscripts.  You might also
   want to get rid of `org-export-with-sub-superscripts' (depending on how
   one interprets your remark that the variable "do[es]n't make much sense
   anyway").  Also, other parts of org (e.g. the parser) cannot change to
   harmonize with these variables.  This means that these variables are de
   facto deprecated, and org is headed to a future where sub/superscripts
   are non-optional and non-configurable.
2. The current (non-optional, non-configurable) implementation of
   X-scripts by the parser has specifically identifiable defects,
   such as the one I mentioned whereby '_foo_, perhaps naturally
   interpreted as underlining (among other reasons because of how it
   is highlighted by org) is "really" a subscript.
3. These inconsistencies cannot (or ought not) be addressed except by
   some notional change to org syntax, which only you can (ought) make,
   and which you’re not willing to discuss except in negative terms
   ("don’t do it that way").

I hope you realize why this situation might be frustrating to a user and
attempted contributor.

Thanks,
Aaron

PS I guess you might be frustrated too.  You mentioned your previous
proposal about changing the regex which recognized X-scripts.  I read
the thread at the time, and didn’t say anything because I didn’t have a
strong opinion one way or the other; it simply looked like a reasonable,
incremental change and you were getting positive feedback.  I’ve re-read
the thread, and FWIW I think you should install the change, if you have
not done so.  I again don’t have an opinion on the question about
grouping with parentheses which was left hanging at the end of the
thread.  Coming from a latex background, it would never occur to me to
use parentheses to bracket an X-script.  So it would not bother me if
you removed parenthesis-grouping as it seems you want to do.

PPS Also FWIW and again coming from a latex background, I think that
"bare" X-scripts such as a_b are always somewhat suspect.  I would be
happy if org required brackets for X-scripts, always.  I think this
would simplify the parsing problem a lot.  But I don’t know if this
could have support enough to be implemented.

--
Aaron Ecay

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [parser] subscripts and underlines interacting badly
  2013-12-12 19:42           ` Aaron Ecay
@ 2013-12-12 20:47             ` Nicolas Goaziou
  2013-12-16  3:15               ` Aaron Ecay
  0 siblings, 1 reply; 14+ messages in thread
From: Nicolas Goaziou @ 2013-12-12 20:47 UTC (permalink / raw)
  To: emacs-orgmode@gnu.org

Aaron Ecay <aaronecay@gmail.com> writes:

> 2013ko abenudak 12an, Nicolas Goaziou-ek idatzi zuen:
>>
>> We could give priority to underline when there are no curly brackets,
>> priority to subscript otherwise. It sounds overly complicated though.
>
> Your last sentence sounds very close to "don’t do it; I won’t accept
> such a patch."  Is that so?

No, it just means that I didn't put much thought into it. It also means
that I would prefer something more natural (and simpler) than such an
ad-hoc rule.

If you work on it and really think it is an improvement over existing
situation, then I don't see why I wouldn't accept it. But I'd rather not
consider it as a definitive answer to the problem (and include it as
a part of a standard Org syntax implementation).

> 1. You have a plan to get rid of org-use-sub-superscripts.  You might also
>    want to get rid of `org-export-with-sub-superscripts' (depending on how
>    one interprets your remark that the variable "do[es]n't make much sense
>    anyway").  Also, other parts of org (e.g. the parser) cannot change to
>    harmonize with these variables.  This means that these variables are de
>    facto deprecated, and org is headed to a future where sub/superscripts
>    are non-optional and non-configurable.

and non-intrusive, too, which isn't the case at the moment.

You cannot get rid of subscript in LaTeX (well, you probably can, but
I guess most users don't). Why could you in Org?

> 2. The current (non-optional, non-configurable) implementation of
>    X-scripts by the parser has specifically identifiable defects,
>    such as the one I mentioned whereby '_foo_, perhaps naturally
>    interpreted as underlining (among other reasons because of how it
>    is highlighted by org) is "really" a subscript.

The current implementation, with its defects, is still configurable.
`org-export-with-sub-superscripts' works as advertised, AFAIK.

> 3. These inconsistencies cannot (or ought not) be addressed except by
>    some notional change to org syntax, which only you can (ought) make,
>    and which you’re not willing to discuss except in negative terms
>    ("don’t do it that way").

I'm not really able to change Org syntax without Carsten's consent.

Anyway, I'd like any syntax change to be really discussed. Org has
a long history of great ideas implemented without any consistent syntax
in mind. Examples include @<tag>, Babel's #+header line with ":prop
value" (even though every other part of Org used "key=value"),
configurable emphasis markers and list item bullets, "comment" and
"quote" keywords (even though Archive is a tag)...

Also, changing Org syntax isn't limited to a mere patch over
org-element.el. Remember that most of Org doesn't use this library
(hint).

Back to the topic. As you know, I'm not really open to per-user Org
syntax. But I will consider any syntactical change that would solve the
problem at hand.

> I hope you realize why this situation might be frustrating to a user and
> attempted contributor.

I don't want to be frustrating.

I try to make as clear as possible what I see as important and where
I would like to head to. I even suggested topics to work on (e.g.
escaped characters).

There's also optimization to do on cache, if you're motivated.

> PS I guess you might be frustrated too.  You mentioned your previous
> proposal about changing the regex which recognized X-scripts.  I read
> the thread at the time, and didn’t say anything because I didn’t have a
> strong opinion one way or the other; it simply looked like a reasonable,
> incremental change and you were getting positive feedback.  I’ve re-read
> the thread, and FWIW I think you should install the change, if you have
> not done so.  I again don’t have an opinion on the question about
> grouping with parentheses which was left hanging at the end of the
> thread.  Coming from a latex background, it would never occur to me to
> use parentheses to bracket an X-script.  So it would not bother me if
> you removed parenthesis-grouping as it seems you want to do.

It is not applied. I am waiting for Carsten's green light about
parenthesis-grouping removal.

> PPS Also FWIW and again coming from a latex background, I think that
> "bare" X-scripts such as a_b are always somewhat suspect.  I would be
> happy if org required brackets for X-scripts, always.  I think this
> would simplify the parsing problem a lot.  But I don’t know if this
> could have support enough to be implemented.

You are right, it would simplify parsing. But it is very handy for note
taking. I wouldn't suggest to remove it.


Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [parser] subscripts and underlines interacting badly
  2013-12-12 20:47             ` Nicolas Goaziou
@ 2013-12-16  3:15               ` Aaron Ecay
  2013-12-16  3:24                 ` [PATCH] quick patch to org-habit todo state keywords Ted Wiles
  2013-12-17 16:57                 ` [parser] subscripts and underlines interacting badly Nicolas Goaziou
  0 siblings, 2 replies; 14+ messages in thread
From: Aaron Ecay @ 2013-12-16  3:15 UTC (permalink / raw)
  To: Nicolas Goaziou, emacs-orgmode@gnu.org

[-- Attachment #1: Type: text/plain, Size: 5135 bytes --]

2013ko abenudak 12an, Nicolas Goaziou-ek idatzi zuen:
> No, it just means that I didn't put much thought into it. It also means
> that I would prefer something more natural (and simpler) than such an
> ad-hoc rule.
> 
> If you work on it and really think it is an improvement over existing
> situation, then I don't see why I wouldn't accept it. But I'd rather not
> consider it as a definitive answer to the problem (and include it as
> a part of a standard Org syntax implementation).

I have worked up such a patch.  You are right that it is complicated.  I
think it is important to give people the possibility to represent some
of the currently-impossible sequences while a better syntax is cooked
up, though.

Since the present syntax is inadequate for representating these
sequences, the new syntax will have to break backwards compatibility
somehow in order to fix the problem.  So there’s no long-term harm in
having a short-term kludge that will eventually disappear.

The patch is attached to this email.

> 
>> 1. You have a plan to get rid of org-use-sub-superscripts.  You might also
>> want to get rid of `org-export-with-sub-superscripts' (depending on how
>> one interprets your remark that the variable "do[es]n't make much sense
>> anyway").  Also, other parts of org (e.g. the parser) cannot change to
>> harmonize with these variables.  This means that these variables are de
>> facto deprecated, and org is headed to a future where sub/superscripts
>> are non-optional and non-configurable.
> 
> and non-intrusive, too, which isn't the case at the moment.
> 
> You cannot get rid of subscript in LaTeX (well, you probably can, but
> I guess most users don't). Why could you in Org?

As a point of fact, you can get rid of it in Latex.  There is even a
Latex package on CTAN precisely to turn off subscripts (in text mode,
not math):
<http://www.ctan.org/tex-archive/macros/latex2e/contrib/underscore>

As for why more users don’t turn it off, it probably has something to do
with the fact that subscripts and underlines in Latex don’t interfere
with each other.

But in any case “you can do X in Latex” is strange as an argument for
“you should *not* be able to do X in org”

> 
>> 2. The current (non-optional, non-configurable) implementation of
>> X-scripts by the parser has specifically identifiable defects,
>> such as the one I mentioned whereby '_foo_, perhaps naturally
>> interpreted as underlining (among other reasons because of how it
>> is highlighted by org) is "really" a subscript.
> 
> The current implementation, with its defects, is still configurable.
> `org-export-with-sub-superscripts' works as advertised, AFAIK.

It has several bugs; see separate thread.

> I'm not really able to change Org syntax without Carsten's consent.
> 
> Anyway, I'd like any syntax change to be really discussed. Org has
> a long history of great ideas implemented without any consistent syntax
> in mind. Examples include @<tag>, Babel's #+header line with ":prop
> value" (even though every other part of Org used "key=value"),
> configurable emphasis markers and list item bullets, "comment" and
> "quote" keywords (even though Archive is a tag)...
> 
> Also, changing Org syntax isn't limited to a mere patch over
> org-element.el. Remember that most of Org doesn't use this library
> (hint).

But eventually it will (assuming the cache implementation proves robust
enough), right?  So, changes in org-element.el will eventually percolate
to the rest of org, whereas changes elsewhere will wither and dry up.

> 
> Back to the topic. As you know, I'm not really open to per-user Org
> syntax. But I will consider any syntactical change that would solve the
> problem at hand.
> 
>> I hope you realize why this situation might be frustrating to a user and
>> attempted contributor.
> 
> I don't want to be frustrating.
> 
> I try to make as clear as possible what I see as important and where
> I would like to head to. I even suggested topics to work on (e.g.
> escaped characters).

I don’t think escaped characters help with the problem that it is
presently impossible to represent the following (pseudo)-element
sequence in org syntax:

(plain-text "'") (underline "foo")

–> which expands to:

'_foo_

–> which parses to:

(plain-text "'") (subscript "foo") (plain-text "_")

Anyway, what do escaped characters do that entities cannot?  You can
write \under{} to escape an underscore today.  It’s marginally less
convenient to type (and remember) than \_, but this seems like a
more-or-less solved problem, especially given the effort it would take
to teach e.g. org-emph-re about backslash escapes.

> 
> There's also optimization to do on cache, if you're motivated.

I think the cache mechanism is an incredible piece of engineering,
indeed.  I hope to someday be able to understand the code well enough to
contribute in that area.  But it seems hard to get into without many
hours of devoted study, which I (regretfully) don’t have time for ATM.

Thanks,

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-org-element-use-brackets-to-disambiguate-subscript-u.patch --]
[-- Type: text/x-diff, Size: 3568 bytes --]

From 0d616d65a9b6175c9e7ef225f4baff3d185b5d5b Mon Sep 17 00:00:00 2001
From: Aaron Ecay <aaronecay@gmail.com>
Date: Sun, 15 Dec 2013 21:30:27 -0500
Subject: [PATCH] org-element: use brackets to disambiguate subscript/underline

* lisp/org-element.el (org-element--get-next-object-candidates):
use brackets to disambiguate subscript/underline

In an org-syntax string like 1 or 2 below, both subscript and
underline are possible interpretations.  This patch uses the presence
of brackets to disambiguate these cases, that is, 1 is interpreted as
an underline whereas 2 is a subscript (followed by plain-text _)

1: '_foo_
2: '_{foo}_

This change does not touch the in-buffer highlighting, which will show
underline or subscript in these cases depending on the
org-use-sub-superscripts variable, which is (close to)
deprecated (<http://mid.gmane.org/87lhzpx0d7.fsf@gmail.com>).
---
 lisp/org-element.el | 64 ++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 44 insertions(+), 20 deletions(-)

diff --git a/lisp/org-element.el b/lisp/org-element.el
index 089ecfb..7082562 100644
--- a/lisp/org-element.el
+++ b/lisp/org-element.el
@@ -4424,26 +4424,50 @@ RESTRICTION should be looked after.
 
 Return value is an alist whose CAR is the object type and CDR its
 beginning position."
-  (delq
-   nil
-   (if (eq objects 'initial)
-       ;; When searching for the first time, look for every successor
-       ;; allowed in RESTRICTION.
-       (mapcar
-	(lambda (res)
-	  (funcall (intern (format "org-element-%s-successor" res))))
-	restriction)
-     ;; Focus on objects returned during last search.  Keep those
-     ;; still after point.  Search again objects before it.
-     (mapcar
-      (lambda (obj)
-	(if (>= (cdr obj) (point)) obj
-	  (let* ((type (car obj))
-		 (succ (or (cdr (assq type org-element-object-successor-alist))
-			   type)))
-	    (and succ
-		 (funcall (intern (format "org-element-%s-successor" succ)))))))
-      objects))))
+  (let ((res
+	 (delq
+	  nil
+	  (if (eq objects 'initial)
+	      ;; When searching for the first time, look for every successor
+	      ;; allowed in RESTRICTION.
+	      (mapcar
+	       (lambda (res)
+		 (funcall (intern (format "org-element-%s-successor" res))))
+	       restriction)
+	    ;; Focus on objects returned during last search.  Keep those
+	    ;; still after point.  Search again objects before it.
+	    (mapcar
+	     (lambda (obj)
+	       (if (>= (cdr obj) (point)) obj
+		 (let* ((type (car obj))
+			(succ (or (cdr (assq type org-element-object-successor-alist))
+				  type)))
+		   (and succ
+			(funcall (intern (format "org-element-%s-successor" succ)))))))
+	     objects)))))
+    ;; To disambiguate between underlines and subscripts, which may
+    ;; both match at the same position, the heuristic is used that if
+    ;; there are brackets, the expression is a subscript, whereas
+    ;; otherwise it is an underline.  Thus:
+    ;; '_foo_ parses as (plain-text "'") (underline "foo")
+    ;; whereas '_{foo}_ parses as
+    ;; (plain-text "'") (subscript "foo") (plain-text "_")
+    (setq res
+	  (sort res
+		(lambda (a b)
+		  (if (and (= (cdr a) (cdr b))
+			   (memq 'underline (list (car a) (car b)))
+			   (memq 'subscript (list (car a) (car b))))
+		      (save-excursion
+			(goto-char (cdr a))
+			(let ((bracketsp
+			       (org-element-property
+				:use-brackets-p (org-element-subscript-parser))))
+			  (if (eq (car a) 'underline)
+			      (not bracketsp)
+			    bracketsp)))
+		    (< (cdr a) (cdr b))))))
+    res))
 
 
 \f
-- 
1.8.5.1


[-- Attachment #3: Type: text/plain, Size: 15 bytes --]

-- 
Aaron Ecay

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH] quick patch to org-habit todo state keywords
  2013-12-16  3:15               ` Aaron Ecay
@ 2013-12-16  3:24                 ` Ted Wiles
  2013-12-16  4:27                   ` Aaron Ecay
  2013-12-17 16:57                 ` [parser] subscripts and underlines interacting badly Nicolas Goaziou
  1 sibling, 1 reply; 14+ messages in thread
From: Ted Wiles @ 2013-12-16  3:24 UTC (permalink / raw)
  Cc: emacs-orgmode@gnu.org

[-- Attachment #1: Type: text/plain, Size: 868 bytes --]


Took out the hardcoded todo-state

* lisp/org-habit.el 

  The "DONE" keyword was hardcoded in. This is my first patch (ever, to anything).

  TINYCHANGE

1 file changed, 2 insertions(+), 1 deletion(-)
 lisp/org-habit.el |    3 ++-

	Modified   lisp/org-habit.el
diff --git a/lisp/org-habit.el b/lisp/org-habit.el
index eba9037..69d0fa6 100644
--- a/lisp/org-habit.el
+++ b/lisp/org-habit.el
@@ -200,7 +200,8 @@ This list represents a \"habit\" for the rest of this module."
 	     (count 0))
 	(unless reversed (goto-char end))
 	(while (and (< count maxdays)
-		    (funcall search "- State \"DONE\".*\\[\\([^]]+\\)\\]" limit t))
+		    (funcall search (format "- State \"%s\".*\\[\\([^]]+\\)\\]" 
+					    (car (reverse (car org-todo-keywords)))) limit t))
 	  (push (time-to-days
 		 (org-time-string-to-time (match-string-no-properties 1)))
 		closed-dates)




[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Took-out-the-hardcoded-todo-state.patch --]
[-- Type: text/x-diff; name="0001-Took-out-the-hardcoded-todo-state.patch", Size: 907 bytes --]

From 1f486cdb5fffd83cd0fc065e86be4032fc368ee7 Mon Sep 17 00:00:00 2001
From: Theodore Wiles <theodore.wiles@gmail.com>
Date: Sun, 15 Dec 2013 21:56:25 -0500
Subject: [PATCH] Took out the hardcoded todo-state.

---
 lisp/org-habit.el |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lisp/org-habit.el b/lisp/org-habit.el
index eba9037..69d0fa6 100644
--- a/lisp/org-habit.el
+++ b/lisp/org-habit.el
@@ -200,7 +200,8 @@ This list represents a \"habit\" for the rest of this module."
 	     (count 0))
 	(unless reversed (goto-char end))
 	(while (and (< count maxdays)
-		    (funcall search "- State \"DONE\".*\\[\\([^]]+\\)\\]" limit t))
+		    (funcall search (format "- State \"%s\".*\\[\\([^]]+\\)\\]" 
+					    (car (reverse (car org-todo-keywords)))) limit t))
 	  (push (time-to-days
 		 (org-time-string-to-time (match-string-no-properties 1)))
 		closed-dates)
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] quick patch to org-habit todo state keywords
  2013-12-16  3:24                 ` [PATCH] quick patch to org-habit todo state keywords Ted Wiles
@ 2013-12-16  4:27                   ` Aaron Ecay
  0 siblings, 0 replies; 14+ messages in thread
From: Aaron Ecay @ 2013-12-16  4:27 UTC (permalink / raw)
  To: Ted Wiles; +Cc: emacs-orgmode@gnu.org

Hi Ted,

Welcome to org-mode, and thanks for the patch!

I have one small comment:

2013ko abenudak 15an, Ted Wiles-ek idatzi zuen:
> 
> Took out the hardcoded todo-state
> 
> * lisp/org-habit.el 
> 
>   The "DONE" keyword was hardcoded in. This is my first patch (ever, to anything).
> 
>   TINYCHANGE
> 
> 1 file changed, 2 insertions(+), 1 deletion(-)
>  lisp/org-habit.el |    3 ++-
> 
> 	Modified   lisp/org-habit.el
> diff --git a/lisp/org-habit.el b/lisp/org-habit.el
> index eba9037..69d0fa6 100644
> --- a/lisp/org-habit.el
> +++ b/lisp/org-habit.el
> @@ -200,7 +200,8 @@ This list represents a \"habit\" for the rest of this module."
>  	     (count 0))
>  	(unless reversed (goto-char end))
>  	(while (and (< count maxdays)
> -		    (funcall search "- State \"DONE\".*\\[\\([^]]+\\)\\]" limit t))
> +		    (funcall search (format "- State \"%s\".*\\[\\([^]]+\\)\\]" 
> +					    (car (reverse (car org-todo-keywords)))) limit t))

I think this will not be the correct value when the org-todo-keywords
variable contains characters for fast todo selection (cf. the fifth
paragraph of the docstring at C-h v org-todo-keywords).  For example, my
org-todo-keywords is set to:
((sequence "TODO(t)" "NEXT(n)" "|" "DONE(d)")
 (sequence "WAIT(w@/!)" "|" "CANCELLED(c@/!)"))

I think you can get the value you want with:
(concat "\\(" (mapconcat 'regexp-quote org-done-keywords "\\|") "\\)")

The org-done-keywords variable is not documented, but it is set (locally
in each buffer) to the list of done keywords in that buffer by the code
that initializes org mode.  (I found and slightly adapted the above bit
of code from line 6283 of org.el, where some fontification regexps are
calculated.)

Aaron

PS You should make a new message to the list when you send a new patch,
rather than replying to another one.  The replies interfere with the
threading algorithms in some MUAs which parse the References or
In-Reply-To header of the email in order to determine its parent.

-- 
Aaron Ecay

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [parser] subscripts and underlines interacting badly
  2013-12-16  3:15               ` Aaron Ecay
  2013-12-16  3:24                 ` [PATCH] quick patch to org-habit todo state keywords Ted Wiles
@ 2013-12-17 16:57                 ` Nicolas Goaziou
  2013-12-18  6:57                   ` Aaron Ecay
  1 sibling, 1 reply; 14+ messages in thread
From: Nicolas Goaziou @ 2013-12-17 16:57 UTC (permalink / raw)
  To: emacs-orgmode@gnu.org

Hello,

Aaron Ecay <aaronecay@gmail.com> writes:

> Since the present syntax is inadequate for representating these
> sequences, the new syntax will have to break backwards compatibility
> somehow in order to fix the problem.  So there’s no long-term harm in
> having a short-term kludge that will eventually disappear.

OK. Thanks for the patch.

Though, I think you are patching the wrong location. Modifying
`org-element--get-next-object-candidates' is expensive. It would be
better to patch `org-element-sub/superscript-successor' and make it
ignore underline matches with brackets followed by an underscore
character and resume searching.

> But eventually it will (assuming the cache implementation proves robust
> enough), right?  So, changes in org-element.el will eventually percolate
> to the rest of org, whereas changes elsewhere will wither and dry up.

But it will be a slow process, and, meanwhile both org-element and the
rest of Org must be handled.

> I don’t think escaped characters help with the problem that it is
> presently impossible to represent the following (pseudo)-element
> sequence in org syntax:

[...]

You are right, escaped characters cannot help us here.

> Anyway, what do escaped characters do that entities cannot?  

Not much. But they could be used in verbatim context. Also, they are
somehow inconvenient to use, as you noticed. This can be troublesome in
an environment also meant for note-taking.


Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [parser] subscripts and underlines interacting badly
  2013-12-17 16:57                 ` [parser] subscripts and underlines interacting badly Nicolas Goaziou
@ 2013-12-18  6:57                   ` Aaron Ecay
  2013-12-18 15:01                     ` Nicolas Goaziou
  0 siblings, 1 reply; 14+ messages in thread
From: Aaron Ecay @ 2013-12-18  6:57 UTC (permalink / raw)
  To: Nicolas Goaziou, emacs-orgmode@gnu.org

[-- Attachment #1: Type: text/plain, Size: 1427 bytes --]

2013ko abenudak 17an, Nicolas Goaziou-ek idatzi zuen:
> 
> Hello,
> 
> Aaron Ecay <aaronecay@gmail.com> writes:
> 
>> Since the present syntax is inadequate for representating these
>> sequences, the new syntax will have to break backwards compatibility
>> somehow in order to fix the problem.  So there’s no long-term harm in
>> having a short-term kludge that will eventually disappear.
> 
> OK. Thanks for the patch.
> 
> Though, I think you are patching the wrong location. Modifying
> `org-element--get-next-object-candidates' is expensive. It would be
> better to patch `org-element-sub/superscript-successor' and make it
> ignore underline matches with brackets followed by an underscore
> character and resume searching.

We (perhaps) have to worry about cases like: '_foo bar_ .  Here it’s not
enough to look at the character immediately following the (possible)
subscript, but rather to take into account the full logic of
org-emph-re.

But now that I think about it, this is the only correct way, since what
org-element--get-next-object-candidates sees is limited by the
restriction.

The attached patch implements this.  It also updates the fontification
to match (by calling out to the parser, so there are potential
performance issues although with the cache it will hopefully not be an
issue in practice), and notes the new heuristic in the manual.  The test
suite passes.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-org-element-use-brackets-to-disambiguate-subscript-u.patch --]
[-- Type: text/x-diff, Size: 5608 bytes --]

From e2044312b95f8b427ddc662cd1abf10bf4d87b2d Mon Sep 17 00:00:00 2001
From: Aaron Ecay <aaronecay@gmail.com>
Date: Sun, 15 Dec 2013 21:30:27 -0500
Subject: [PATCH] org-element: use brackets to disambiguate subscript/underline

* lisp/org-element.el (org-element-sub/superscript-successor): use
brackets to disambiguate subscript/underline
* lisp/org.el (org-do-emphasis-faces): incorporate the above
disambiguation
* doc/org.texi: reflect these changes in the manual

In an org-syntax string like 1 or 2 below, both subscript and
underline are possible interpretations.  This patch uses the presence
of brackets to disambiguate these cases, that is, 1 is interpreted as
an underlined "foo" whereas 2 is subscript "foo" followed by
plain-text "_"

1: '_foo_
2: '_{foo}_

This the in-buffer highlighting is updated to match.
---
 doc/org.texi        | 14 ++++++++++++++
 lisp/org-element.el | 22 +++++++++++++++++++---
 lisp/org.el         | 36 ++++++++++++++++++++++--------------
 3 files changed, 55 insertions(+), 17 deletions(-)

diff --git a/doc/org.texi b/doc/org.texi
index b4c4078..3eefe9a 100644
--- a/doc/org.texi
+++ b/doc/org.texi
@@ -9739,6 +9739,17 @@ can tweak @code{org-emphasis-regexp-components}.  Beware that changing one of
 the above variables will no take effect until you reload Org, for which you
 may need to restart Emacs.
 
+When it follows an alphanumeric character, the underscore is always
+interpreted as a subscript (@pxref{Subscripts and superscripts}), and when it
+follows whitespace it is always the start of an underline (assuming a
+matching underscore is found in a proper position further along).  However,
+after a punctuation character (for example the apostrophe), the underscore
+character can be ambiguous between these two interpretations.  Org uses a
+simple heuristic for these cases: if the character following the underscore
+is an opening brace @samp{@{} or if no matching underscore is seen in the
+following text, the underscore is considered to be the start of a subscript.
+Otherwise, it is the start of underlining.
+
 @node Horizontal rules
 @subheading  Horizontal rules
 @cindex horizontal rules, markup rules
@@ -10123,6 +10134,9 @@ In addition to showing entities as UTF-8 characters, this command will also
 format sub- and superscripts in a WYSIWYM way.
 @end table
 
+For discussion of the resolution of ambiguities between the underscore as the
+introducer of a subscript vs.@ underline, see @ref{Emphasis and monospace}.
+
 @node @LaTeX{} fragments
 @subsection @LaTeX{} fragments
 @cindex @LaTeX{} fragments
diff --git a/lisp/org-element.el b/lisp/org-element.el
index 089ecfb..faa1e44 100644
--- a/lisp/org-element.el
+++ b/lisp/org-element.el
@@ -3408,9 +3408,25 @@ Return value is a cons cell whose CAR is either `subscript' or
 `superscript' and CDR is beginning position."
   (save-excursion
     (unless (bolp) (backward-char))
-    (when (re-search-forward org-match-substring-regexp nil t)
-      (cons (if (string= (match-string 2) "_") 'subscript 'superscript)
-	    (match-beginning 2)))))
+    (let (res)
+      (while (and (not res)
+		  (re-search-forward org-match-substring-regexp nil t))
+	(goto-char (match-beginning 0))
+	(when (or
+	       ;; this subscript uses brackets -> handle as subscript
+	       ;; unconditionally
+	       (eq (aref (match-string 3) 0) ?{)
+	       ;; it is not ambiguous with an underline -> handle as
+	       ;; subscript
+	       (not (looking-at-p org-emph-re)))
+	  (setq res (cons (if (string= (match-string 2) "_")
+			      'subscript
+			    'superscript)
+			  (match-beginning 2))))
+	;; otherwise -> keep going, and let the underline
+	;; parser have it
+	(goto-char (match-end 0)))
+      res)))
 
 
 ;;;; Superscript
diff --git a/lisp/org.el b/lisp/org.el
index eeb0c33..dbddc0b 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -5721,21 +5721,29 @@ This should be called after the variable `org-link-types' has changed."
       (if (not (= (char-after (match-beginning 3))
 		  (char-after (match-beginning 4))))
 	  (progn
-	    (setq rtn t)
 	    (setq a (assoc (match-string 3) org-emphasis-alist))
-	    (font-lock-prepend-text-property (match-beginning 2) (match-end 2)
-					     'face
-					     (nth 1 a))
-	    (and (nth 2 a)
-		 (org-remove-flyspell-overlays-in
-		  (match-beginning 0) (match-end 0)))
-	    (add-text-properties (match-beginning 2) (match-end 2)
-				 '(font-lock-multiline t org-emphasis t))
-	    (when org-hide-emphasis-markers
-	      (add-text-properties (match-end 4) (match-beginning 5)
-				   '(invisible org-link))
-	      (add-text-properties (match-beginning 3) (match-end 3)
-				   '(invisible org-link)))))
+	    ;; Don't do any highlighting if this is an underscore
+	    ;; which is part of a subscript.
+	    (unless (and (string= (car a) "_")
+			 (save-excursion
+			   (goto-char (match-beginning 2))
+			   (save-match-data
+			     (eq (org-element-type (org-element-context))
+				 'subscript))))
+	      (setq rtn t)
+	      (font-lock-prepend-text-property (match-beginning 2) (match-end 2)
+					       'face
+					       (nth 1 a))
+	      (and (nth 2 a)
+		   (org-remove-flyspell-overlays-in
+		    (match-beginning 0) (match-end 0)))
+	      (add-text-properties (match-beginning 2) (match-end 2)
+				   '(font-lock-multiline t org-emphasis t))
+	      (when org-hide-emphasis-markers
+		(add-text-properties (match-end 4) (match-beginning 5)
+				     '(invisible org-link))
+		(add-text-properties (match-beginning 3) (match-end 3)
+				     '(invisible org-link))))))
       (backward-char 1))
     rtn))
 
-- 
1.8.5.1


[-- Attachment #3: Type: text/plain, Size: 25 bytes --]


Thanks,

-- 
Aaron Ecay

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [parser] subscripts and underlines interacting badly
  2013-12-18  6:57                   ` Aaron Ecay
@ 2013-12-18 15:01                     ` Nicolas Goaziou
  0 siblings, 0 replies; 14+ messages in thread
From: Nicolas Goaziou @ 2013-12-18 15:01 UTC (permalink / raw)
  To: emacs-orgmode@gnu.org

Hello,

Aaron Ecay <aaronecay@gmail.com> writes:

> The attached patch implements this.  It also updates the fontification
> to match (by calling out to the parser, so there are potential
> performance issues although with the cache it will hopefully not be an
> issue in practice), and notes the new heuristic in the manual.  The test
> suite passes.

Thank you. Here are some comments and the usual nitpicks.

> From e2044312b95f8b427ddc662cd1abf10bf4d87b2d Mon Sep 17 00:00:00 2001
> From: Aaron Ecay <aaronecay@gmail.com>
> Date: Sun, 15 Dec 2013 21:30:27 -0500
> Subject: [PATCH] org-element: use brackets to disambiguate subscript/underline

You need a capital after colon.

> * lisp/org-element.el (org-element-sub/superscript-successor): use
> brackets to disambiguate subscript/underline

Ditto, and a period at the end of the sentence.

> * lisp/org.el (org-do-emphasis-faces): incorporate the above
> disambiguation

I'd rather not use `org-element-context' in fontification ATM. My plan
is, indeed, to use the parser for fontification, but in a planned-out
way. Doing it too early may be counter-productive.

For now, we can accept some discrepancies between fontification and
syntax (there are many other such occurrences anyway).

> * doc/org.texi: reflect these changes in the manual

See above.

> +When it follows an alphanumeric character, the underscore is always
> +interpreted as a subscript (@pxref{Subscripts and superscripts}), and when it
> +follows whitespace it is always the start of an underline (assuming a
> +matching underscore is found in a proper position further along).  However,
> +after a punctuation character (for example the apostrophe), the underscore
> +character can be ambiguous between these two interpretations.  Org uses a
> +simple heuristic for these cases: if the character following the underscore
> +is an opening brace @samp{@{} or if no matching underscore is seen in the
> +following text, the underscore is considered to be the start of a subscript.
> +Otherwise, it is the start of underlining.

There is no harm in documenting it, but remember that it's not a feature
of the syntax.  Maybe it could be shortened and put into a footnote
instead?

> +    (let (res)
> +      (while (and (not res)
> +		  (re-search-forward org-match-substring-regexp nil t))
> +	(goto-char (match-beginning 0))
> +	(when (or
> +	       ;; this subscript uses brackets -> handle as subscript
> +	       ;; unconditionally

Comments need to start with a capital and end with a period.

> +	       (eq (aref (match-string 3) 0) ?{)
> +	       ;; it is not ambiguous with an underline -> handle as
> +	       ;; subscript
> +	       (not (looking-at-p org-emph-re)))

It should be `org-looking-at-p' for compatibility with other Emacsen.

> +	  (setq res (cons (if (string= (match-string 2) "_")
> +			      'subscript
> +			    'superscript)
> +			  (match-beginning 2))))
> +	;; otherwise -> keep going, and let the underline
> +	;; parser have it
> +	(goto-char (match-end 0)))

I think

  (save-excursion (goto-char (match-beginning 0)) ...) 

is better than 

  (goto-char (match-beginning 0)) ... (goto-char (match-end 0)).

> +      res)))

I suggest to use (catch 'found ... (throw 'found (cons ...))) instead
of RES variable: the less `setq', the better.


Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2013-12-18 15:01 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-11  2:30 [parser] subscripts and underlines interacting badly Aaron Ecay
2013-12-11  8:22 ` Nicolas Goaziou
2013-12-11 18:36   ` Aaron Ecay
2013-12-11 20:55     ` Nicolas Goaziou
2013-12-12  7:56       ` Aaron Ecay
2013-12-12 17:33         ` Nicolas Goaziou
2013-12-12 19:42           ` Aaron Ecay
2013-12-12 20:47             ` Nicolas Goaziou
2013-12-16  3:15               ` Aaron Ecay
2013-12-16  3:24                 ` [PATCH] quick patch to org-habit todo state keywords Ted Wiles
2013-12-16  4:27                   ` Aaron Ecay
2013-12-17 16:57                 ` [parser] subscripts and underlines interacting badly Nicolas Goaziou
2013-12-18  6:57                   ` Aaron Ecay
2013-12-18 15:01                     ` Nicolas Goaziou

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).