* [parser] subscripts and underlines interacting badly @ 2013-12-11 2:30 Aaron Ecay 2013-12-11 8:22 ` Nicolas Goaziou 0 siblings, 1 reply; 14+ messages in thread From: Aaron Ecay @ 2013-12-11 2:30 UTC (permalink / raw) To: emacs-orgmode@gnu.org; +Cc: Nicolas Goaziou [-- Attachment #1: Type: text/plain, Size: 773 bytes --] Hello, I have encountered two related misbehaviors in the parser/exporter. The first manifests if you type the following line into an org-mode buffer and execute M-: (org-element-context) with point on the ‘f’; the result is a subscript object, whereas I would have expected an underline: '_foo_ I think both possibilities are returned by org-element--get-next-object-candidates, and subscript “wins” because it precedes the other in the list. I’m not sure how this should be addressed, but maybe Nicolas knows. I encountered the second issue when trying to hack around the first by setting org-use-sub-superscripts to '{}. It seems this variable is not considered by the parser. I think the attached patch fixes this issue. Thanks, [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-org-element-respect-the-org-use-sub-superscripts-var.patch --] [-- Type: text/x-diff, Size: 1370 bytes --] From f5f6d10e5d55f68d57abd2f7a947f72cb90081d0 Mon Sep 17 00:00:00 2001 From: Aaron Ecay <aaronecay@gmail.com> Date: Tue, 10 Dec 2013 21:19:23 -0500 Subject: [PATCH] org-element: respect the `org-use-sub-superscripts' variable * lisp/org-element.el (org-element-sub/superscript-successor): respect the `org-use-sub-superscripts' variable --- lisp/org-element.el | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/lisp/org-element.el b/lisp/org-element.el index 9183a67..d0e6bf9 100644 --- a/lisp/org-element.el +++ b/lisp/org-element.el @@ -3394,11 +3394,15 @@ CONTENTS is the contents of the object." Return value is a cons cell whose CAR is either `subscript' or `superscript' and CDR is beginning position." - (save-excursion - (unless (bolp) (backward-char)) - (when (re-search-forward org-match-substring-regexp nil t) - (cons (if (string= (match-string 2) "_") 'subscript 'superscript) - (match-beginning 2))))) + (when org-use-sub-superscripts + (save-excursion + (unless (bolp) (backward-char)) + (when (re-search-forward (if (eq org-use-sub-superscripts '{}) + org-match-substring-with-braces-regexp + org-match-substring-regexp) + nil t) + (cons (if (string= (match-string 2) "_") 'subscript 'superscript) + (match-beginning 2)))))) ;;;; Superscript -- 1.8.5.1 [-- Attachment #3: Type: text/plain, Size: 15 bytes --] -- Aaron Ecay ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [parser] subscripts and underlines interacting badly 2013-12-11 2:30 [parser] subscripts and underlines interacting badly Aaron Ecay @ 2013-12-11 8:22 ` Nicolas Goaziou 2013-12-11 18:36 ` Aaron Ecay 0 siblings, 1 reply; 14+ messages in thread From: Nicolas Goaziou @ 2013-12-11 8:22 UTC (permalink / raw) To: emacs-orgmode@gnu.org Hello, Aaron Ecay <aaronecay@gmail.com> writes: > I have encountered two related misbehaviors in the parser/exporter. > > The first manifests if you type the following line into an org-mode > buffer and execute M-: (org-element-context) with point on the ‘f’; the > result is a subscript object, whereas I would have expected an > underline: > > '_foo_ > > I think both possibilities are returned by > org-element--get-next-object-candidates, and subscript “wins” because it > precedes the other in the list. I’m not sure how this should be > addressed, but maybe Nicolas knows. Actually, this is not really a parser problem but a syntax one. underline and subscript are ambiguous, and therefore ill-defined, because, in some situations, both can match at the same location. This is usually not noticeable because, I think, most uses of underline begin with a space (e.g. some _word_) whereas subscript cannot. This is not true in your example. This has been discussed some months ago, but, AFAIR, no answer was found. Note that I suggested a change to superscript/supscript a couple of weeks ago, but it won't solve the problem at hand. Perhaps it could be extended to remove ambiguity for subscript. > I encountered the second issue when trying to hack around the first by > setting org-use-sub-superscripts to '{}. It seems this variable is not > considered by the parser. I think the attached patch fixes this > issue. Thanks for the patch. Though, the parser ignores `org-use-sub-superscripts' on purpose. At the moment `org-use-sub-superscripts' is a display variable only. This change happened in 8.0. This also explains why `org-export-with-sub-superscripts' is now a separate value from `org-use-sub-superscripts'. The main reason for this change is that I think that customizable syntax, unlike to customizable behaviour, is not a good idea for Org (e.g. portability and simplicity issues). Regards, -- Nicolas Goaziou ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parser] subscripts and underlines interacting badly 2013-12-11 8:22 ` Nicolas Goaziou @ 2013-12-11 18:36 ` Aaron Ecay 2013-12-11 20:55 ` Nicolas Goaziou 0 siblings, 1 reply; 14+ messages in thread From: Aaron Ecay @ 2013-12-11 18:36 UTC (permalink / raw) To: Nicolas Goaziou, emacs-orgmode@gnu.org Hi Nicolas, Thanks for your comments. 2013ko abenudak 11an, Nicolas Goaziou-ek idatzi zuen: > Actually, this is not really a parser problem but a syntax one. > underline and subscript are ambiguous, and therefore ill-defined, > because, in some situations, both can match at the same location. I have found one case where both match, but an underline is intended. Are there any reverse cases, i.e. where both match but a subscript is intended? The closest I could come up with would be something like: The quantities X_1 and X_2 are .... But I think, at least with default values of org-emphasis-regexp-components, this cannot be an underline. So, if there are indeed no such cases, the fix is just to always choose the underline, when both underline and subscript match at the same position. > > Thanks for the patch. > > Though, the parser ignores `org-use-sub-superscripts' on purpose. At the > moment `org-use-sub-superscripts' is a display variable only. > > This change happened in 8.0. This also explains why > `org-export-with-sub-superscripts' is now a separate value from > `org-use-sub-superscripts'. > > The main reason for this change is that I think that customizable > syntax, unlike to customizable behaviour, is not a good idea for Org > (e.g. portability and simplicity issues). I understand your point. But I think there is a danger in some cases that the tail of “portability” will wind up wagging the dog of org-mode. The syntax of org is an abstract mathematical object; the parser is just one (currently the only, AFAIK) implementation of it. So, if it proves necessary, some behavioral aspects can be added to the parser, as long as it is understood that they are behavioral and not driven by the abstract syntax (we could add such a comment to my patch, for example). I think it is advantageous to do so in this case. In the example I gave, two core parts of org (display and export) differ in their interpretation of the same string. Putting this behavior in the parser will fix that. It will also free future elisp code which consumes the parser’s output* from having to worry about the value of the variables in question. Finally, it would allow the re-unification of the export and display flavors of the use-subscripts variable. It’s hard to think of a use case that would want subscripts to be interpreted differently for display and export. (Although if someone has such a case, the unification need not be undertaken: it is purely optional.) Thanks again, -- Aaron Ecay ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parser] subscripts and underlines interacting badly 2013-12-11 18:36 ` Aaron Ecay @ 2013-12-11 20:55 ` Nicolas Goaziou 2013-12-12 7:56 ` Aaron Ecay 0 siblings, 1 reply; 14+ messages in thread From: Nicolas Goaziou @ 2013-12-11 20:55 UTC (permalink / raw) To: emacs-orgmode@gnu.org Aaron Ecay <aaronecay@gmail.com> writes: > I have found one case where both match, but an underline is intended. > Are there any reverse cases, i.e. where both match but a subscript is > intended? I don't know. Perhaps something as convoluted as: A'_{a_-1} But that's not the real problem: whenever we change underline syntax (e.g. if we implement escaped characters), we will start over again, as more ambiguous cases might spawn. > So, if there are indeed no such cases, the fix is just to always choose > the underline, when both underline and subscript match at the same > position. As a short term solution, it can be implemented (it's probably just a matter of reordering successors calls). But in the long run, we really need to define properly both syntax. > I understand your point. But I think there is a danger in some cases > that the tail of “portability” will wind up wagging the dog of org-mode. > The syntax of org is an abstract mathematical object; the parser is just > one (currently the only, AFAIK) implementation of it. So, if it proves > necessary, some behavioral aspects can be added to the parser, as long > as it is understood that they are behavioral and not driven by the > abstract syntax (we could add such a comment to my patch, for > example). I'm strongly against behavioral parts in Org syntax (even though the ship probably has sailed long ago). Org mode is bound to Emacs, but Org format should be platform independent. > I think it is advantageous to do so in this case. In the example I > gave, two core parts of org (display and export) differ in their > interpretation of the same string. Putting this behavior in the parser > will fix that. It will also free future elisp code which consumes the > parser’s output* from having to worry about the value of the variables in > question. > > Finally, it would allow the re-unification of the export and display > flavors of the use-subscripts variable. It’s hard to think of a use > case that would want subscripts to be interpreted differently for > display and export. (Although if someone has such a case, the > unification need not be undertaken: it is purely optional.) Note that in my post, I said "at the moment". There are two variables for historical reasons. AFAIC some `org-export-with-*' variables don't make much sense anyway (`org-export-with-tables' comes to mind, but also `org-export-with-sub-superscripts'). The real question is: why would we need to disable superscript/subscript in an Org document? We probably need it because they can get in the way sometimes. Then, we'd better provide tools to put them out of that way instead of completely disabling them. Character escaping is one solution. Again, I strongly think we should focus on making Org syntax simpler (and yet powerful) instead of piling up variables to change it on the fly for occasional needs. Regards, -- Nicolas Goaziou ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parser] subscripts and underlines interacting badly 2013-12-11 20:55 ` Nicolas Goaziou @ 2013-12-12 7:56 ` Aaron Ecay 2013-12-12 17:33 ` Nicolas Goaziou 0 siblings, 1 reply; 14+ messages in thread From: Aaron Ecay @ 2013-12-12 7:56 UTC (permalink / raw) To: Nicolas Goaziou, emacs-orgmode@gnu.org 2013ko abenudak 11an, Nicolas Goaziou-ek idatzi zuen: > > Aaron Ecay <aaronecay@gmail.com> writes: > >> I have found one case where both match, but an underline is intended. >> Are there any reverse cases, i.e. where both match but a subscript is >> intended? > > I don't know. Perhaps something as convoluted as: > > A'_{a_-1} Oh, yes. Very clever. Unfortunately, it means that a fix, even a temporary one, cannot be just to prioritize underline over subscript as I proposed. =/ > As a short term solution, it can be implemented (it's probably just > a matter of reordering successors calls). But in the long run, we really > need to define properly both syntax. I agree. Do you think it is possible to solve the problem while preserving the fact that underscore is used for both subscript and underline? It seems very difficult. When I think about the question, I think probably what is needed is a representation where object boundaries are delimited by one well-defined pair of delimiters, like {} in latex or <> in html (well, in html they delimit tags, but the principle is the same: only one pair). Then we don’t have to worry about escape syntax for many characters, or characters with multiple possible interpretations (or how many lines org-emph-re is allowed to match across, or ...). But that is just one idea I have had. You must have thought about it more, so maybe you have others. > > I'm strongly against behavioral parts in Org syntax (even though the > ship probably has sailed long ago). Org mode is bound to Emacs, but Org > format should be platform independent. Org syntax can be un-configurable even if org-element.el implements a (configurable) superset of it. Given that the use-subscript variable exists (and without taking into account more systemic solutions as discussed above), I’m arguing that it is cleaner to implement it in org-element, rather than in two separate places (in the regex-based old-style parsing code in org.el and in ox.el; there’s also one reference to the variable in org-table.el(!)) Phrased in other terms, it makes no sense (in the context of Org-Mode, not platonic Org Syntax) for org-element to insist that "a_b" is a subscript, if org-use-sub-superscripts = org-export-with-sub-superscripts = nil. Thanks, -- Aaron Ecay ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parser] subscripts and underlines interacting badly 2013-12-12 7:56 ` Aaron Ecay @ 2013-12-12 17:33 ` Nicolas Goaziou 2013-12-12 19:42 ` Aaron Ecay 0 siblings, 1 reply; 14+ messages in thread From: Nicolas Goaziou @ 2013-12-12 17:33 UTC (permalink / raw) To: emacs-orgmode@gnu.org Hello, Aaron Ecay <aaronecay@gmail.com> writes: > I agree. Do you think it is possible to solve the problem while > preserving the fact that underscore is used for both subscript and > underline? It seems very difficult. We could give priority to underline when there are no curly brackets, priority to subscript otherwise. It sounds overly complicated though. > When I think about the question, I think probably what is needed is a > representation where object boundaries are delimited by one well-defined > pair of delimiters, like {} in latex or <> in html (well, in html they > delimit tags, but the principle is the same: only one pair). Then we > don’t have to worry about escape syntax for many characters, or > characters with multiple possible interpretations (or how many lines > org-emph-re is allowed to match across, or ...). > > But that is just one idea I have had. You must have thought about it > more, so maybe you have others. > >> >> I'm strongly against behavioral parts in Org syntax (even though the >> ship probably has sailed long ago). Org mode is bound to Emacs, but Org >> format should be platform independent. > > Org syntax can be un-configurable even if org-element.el implements a > (configurable) superset of it. org-element.el is meant to implement _exactly_ Org syntax. Not a superset of it. > Given that the use-subscript variable exists (and without taking into > account more systemic solutions as discussed above), I’m arguing that > it is cleaner to implement it in org-element, rather than in two > separate places (in the regex-based old-style parsing code in org.el > and in ox.el; there’s also one reference to the variable in > org-table.el(!)) > > Phrased in other terms, it makes no sense (in the context of Org-Mode, > not platonic Org Syntax) for org-element to insist that "a_b" is a > subscript, if org-use-sub-superscripts = > org-export-with-sub-superscripts = nil. Again, `org-use-sub-superscripts' is, at the moment, a visual-only variable. My plan is to move it out, not in. Regards, -- Nicolas Goaziou ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parser] subscripts and underlines interacting badly 2013-12-12 17:33 ` Nicolas Goaziou @ 2013-12-12 19:42 ` Aaron Ecay 2013-12-12 20:47 ` Nicolas Goaziou 0 siblings, 1 reply; 14+ messages in thread From: Aaron Ecay @ 2013-12-12 19:42 UTC (permalink / raw) To: Nicolas Goaziou, emacs-orgmode@gnu.org Hi Nicolas, 2013ko abenudak 12an, Nicolas Goaziou-ek idatzi zuen: > > We could give priority to underline when there are no curly brackets, > priority to subscript otherwise. It sounds overly complicated though. Your last sentence sounds very close to "don’t do it; I won’t accept such a patch." Is that so? [...] > > Again, `org-use-sub-superscripts' is, at the moment, a visual-only > variable. My plan is to move it out, not in. Just to be sure I understand: 1. You have a plan to get rid of org-use-sub-superscripts. You might also want to get rid of `org-export-with-sub-superscripts' (depending on how one interprets your remark that the variable "do[es]n't make much sense anyway"). Also, other parts of org (e.g. the parser) cannot change to harmonize with these variables. This means that these variables are de facto deprecated, and org is headed to a future where sub/superscripts are non-optional and non-configurable. 2. The current (non-optional, non-configurable) implementation of X-scripts by the parser has specifically identifiable defects, such as the one I mentioned whereby '_foo_, perhaps naturally interpreted as underlining (among other reasons because of how it is highlighted by org) is "really" a subscript. 3. These inconsistencies cannot (or ought not) be addressed except by some notional change to org syntax, which only you can (ought) make, and which you’re not willing to discuss except in negative terms ("don’t do it that way"). I hope you realize why this situation might be frustrating to a user and attempted contributor. Thanks, Aaron PS I guess you might be frustrated too. You mentioned your previous proposal about changing the regex which recognized X-scripts. I read the thread at the time, and didn’t say anything because I didn’t have a strong opinion one way or the other; it simply looked like a reasonable, incremental change and you were getting positive feedback. I’ve re-read the thread, and FWIW I think you should install the change, if you have not done so. I again don’t have an opinion on the question about grouping with parentheses which was left hanging at the end of the thread. Coming from a latex background, it would never occur to me to use parentheses to bracket an X-script. So it would not bother me if you removed parenthesis-grouping as it seems you want to do. PPS Also FWIW and again coming from a latex background, I think that "bare" X-scripts such as a_b are always somewhat suspect. I would be happy if org required brackets for X-scripts, always. I think this would simplify the parsing problem a lot. But I don’t know if this could have support enough to be implemented. -- Aaron Ecay ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parser] subscripts and underlines interacting badly 2013-12-12 19:42 ` Aaron Ecay @ 2013-12-12 20:47 ` Nicolas Goaziou 2013-12-16 3:15 ` Aaron Ecay 0 siblings, 1 reply; 14+ messages in thread From: Nicolas Goaziou @ 2013-12-12 20:47 UTC (permalink / raw) To: emacs-orgmode@gnu.org Aaron Ecay <aaronecay@gmail.com> writes: > 2013ko abenudak 12an, Nicolas Goaziou-ek idatzi zuen: >> >> We could give priority to underline when there are no curly brackets, >> priority to subscript otherwise. It sounds overly complicated though. > > Your last sentence sounds very close to "don’t do it; I won’t accept > such a patch." Is that so? No, it just means that I didn't put much thought into it. It also means that I would prefer something more natural (and simpler) than such an ad-hoc rule. If you work on it and really think it is an improvement over existing situation, then I don't see why I wouldn't accept it. But I'd rather not consider it as a definitive answer to the problem (and include it as a part of a standard Org syntax implementation). > 1. You have a plan to get rid of org-use-sub-superscripts. You might also > want to get rid of `org-export-with-sub-superscripts' (depending on how > one interprets your remark that the variable "do[es]n't make much sense > anyway"). Also, other parts of org (e.g. the parser) cannot change to > harmonize with these variables. This means that these variables are de > facto deprecated, and org is headed to a future where sub/superscripts > are non-optional and non-configurable. and non-intrusive, too, which isn't the case at the moment. You cannot get rid of subscript in LaTeX (well, you probably can, but I guess most users don't). Why could you in Org? > 2. The current (non-optional, non-configurable) implementation of > X-scripts by the parser has specifically identifiable defects, > such as the one I mentioned whereby '_foo_, perhaps naturally > interpreted as underlining (among other reasons because of how it > is highlighted by org) is "really" a subscript. The current implementation, with its defects, is still configurable. `org-export-with-sub-superscripts' works as advertised, AFAIK. > 3. These inconsistencies cannot (or ought not) be addressed except by > some notional change to org syntax, which only you can (ought) make, > and which you’re not willing to discuss except in negative terms > ("don’t do it that way"). I'm not really able to change Org syntax without Carsten's consent. Anyway, I'd like any syntax change to be really discussed. Org has a long history of great ideas implemented without any consistent syntax in mind. Examples include @<tag>, Babel's #+header line with ":prop value" (even though every other part of Org used "key=value"), configurable emphasis markers and list item bullets, "comment" and "quote" keywords (even though Archive is a tag)... Also, changing Org syntax isn't limited to a mere patch over org-element.el. Remember that most of Org doesn't use this library (hint). Back to the topic. As you know, I'm not really open to per-user Org syntax. But I will consider any syntactical change that would solve the problem at hand. > I hope you realize why this situation might be frustrating to a user and > attempted contributor. I don't want to be frustrating. I try to make as clear as possible what I see as important and where I would like to head to. I even suggested topics to work on (e.g. escaped characters). There's also optimization to do on cache, if you're motivated. > PS I guess you might be frustrated too. You mentioned your previous > proposal about changing the regex which recognized X-scripts. I read > the thread at the time, and didn’t say anything because I didn’t have a > strong opinion one way or the other; it simply looked like a reasonable, > incremental change and you were getting positive feedback. I’ve re-read > the thread, and FWIW I think you should install the change, if you have > not done so. I again don’t have an opinion on the question about > grouping with parentheses which was left hanging at the end of the > thread. Coming from a latex background, it would never occur to me to > use parentheses to bracket an X-script. So it would not bother me if > you removed parenthesis-grouping as it seems you want to do. It is not applied. I am waiting for Carsten's green light about parenthesis-grouping removal. > PPS Also FWIW and again coming from a latex background, I think that > "bare" X-scripts such as a_b are always somewhat suspect. I would be > happy if org required brackets for X-scripts, always. I think this > would simplify the parsing problem a lot. But I don’t know if this > could have support enough to be implemented. You are right, it would simplify parsing. But it is very handy for note taking. I wouldn't suggest to remove it. Regards, -- Nicolas Goaziou ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parser] subscripts and underlines interacting badly 2013-12-12 20:47 ` Nicolas Goaziou @ 2013-12-16 3:15 ` Aaron Ecay 2013-12-16 3:24 ` [PATCH] quick patch to org-habit todo state keywords Ted Wiles 2013-12-17 16:57 ` [parser] subscripts and underlines interacting badly Nicolas Goaziou 0 siblings, 2 replies; 14+ messages in thread From: Aaron Ecay @ 2013-12-16 3:15 UTC (permalink / raw) To: Nicolas Goaziou, emacs-orgmode@gnu.org [-- Attachment #1: Type: text/plain, Size: 5135 bytes --] 2013ko abenudak 12an, Nicolas Goaziou-ek idatzi zuen: > No, it just means that I didn't put much thought into it. It also means > that I would prefer something more natural (and simpler) than such an > ad-hoc rule. > > If you work on it and really think it is an improvement over existing > situation, then I don't see why I wouldn't accept it. But I'd rather not > consider it as a definitive answer to the problem (and include it as > a part of a standard Org syntax implementation). I have worked up such a patch. You are right that it is complicated. I think it is important to give people the possibility to represent some of the currently-impossible sequences while a better syntax is cooked up, though. Since the present syntax is inadequate for representating these sequences, the new syntax will have to break backwards compatibility somehow in order to fix the problem. So there’s no long-term harm in having a short-term kludge that will eventually disappear. The patch is attached to this email. > >> 1. You have a plan to get rid of org-use-sub-superscripts. You might also >> want to get rid of `org-export-with-sub-superscripts' (depending on how >> one interprets your remark that the variable "do[es]n't make much sense >> anyway"). Also, other parts of org (e.g. the parser) cannot change to >> harmonize with these variables. This means that these variables are de >> facto deprecated, and org is headed to a future where sub/superscripts >> are non-optional and non-configurable. > > and non-intrusive, too, which isn't the case at the moment. > > You cannot get rid of subscript in LaTeX (well, you probably can, but > I guess most users don't). Why could you in Org? As a point of fact, you can get rid of it in Latex. There is even a Latex package on CTAN precisely to turn off subscripts (in text mode, not math): <http://www.ctan.org/tex-archive/macros/latex2e/contrib/underscore> As for why more users don’t turn it off, it probably has something to do with the fact that subscripts and underlines in Latex don’t interfere with each other. But in any case “you can do X in Latex” is strange as an argument for “you should *not* be able to do X in org” > >> 2. The current (non-optional, non-configurable) implementation of >> X-scripts by the parser has specifically identifiable defects, >> such as the one I mentioned whereby '_foo_, perhaps naturally >> interpreted as underlining (among other reasons because of how it >> is highlighted by org) is "really" a subscript. > > The current implementation, with its defects, is still configurable. > `org-export-with-sub-superscripts' works as advertised, AFAIK. It has several bugs; see separate thread. > I'm not really able to change Org syntax without Carsten's consent. > > Anyway, I'd like any syntax change to be really discussed. Org has > a long history of great ideas implemented without any consistent syntax > in mind. Examples include @<tag>, Babel's #+header line with ":prop > value" (even though every other part of Org used "key=value"), > configurable emphasis markers and list item bullets, "comment" and > "quote" keywords (even though Archive is a tag)... > > Also, changing Org syntax isn't limited to a mere patch over > org-element.el. Remember that most of Org doesn't use this library > (hint). But eventually it will (assuming the cache implementation proves robust enough), right? So, changes in org-element.el will eventually percolate to the rest of org, whereas changes elsewhere will wither and dry up. > > Back to the topic. As you know, I'm not really open to per-user Org > syntax. But I will consider any syntactical change that would solve the > problem at hand. > >> I hope you realize why this situation might be frustrating to a user and >> attempted contributor. > > I don't want to be frustrating. > > I try to make as clear as possible what I see as important and where > I would like to head to. I even suggested topics to work on (e.g. > escaped characters). I don’t think escaped characters help with the problem that it is presently impossible to represent the following (pseudo)-element sequence in org syntax: (plain-text "'") (underline "foo") –> which expands to: '_foo_ –> which parses to: (plain-text "'") (subscript "foo") (plain-text "_") Anyway, what do escaped characters do that entities cannot? You can write \under{} to escape an underscore today. It’s marginally less convenient to type (and remember) than \_, but this seems like a more-or-less solved problem, especially given the effort it would take to teach e.g. org-emph-re about backslash escapes. > > There's also optimization to do on cache, if you're motivated. I think the cache mechanism is an incredible piece of engineering, indeed. I hope to someday be able to understand the code well enough to contribute in that area. But it seems hard to get into without many hours of devoted study, which I (regretfully) don’t have time for ATM. Thanks, [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-org-element-use-brackets-to-disambiguate-subscript-u.patch --] [-- Type: text/x-diff, Size: 3568 bytes --] From 0d616d65a9b6175c9e7ef225f4baff3d185b5d5b Mon Sep 17 00:00:00 2001 From: Aaron Ecay <aaronecay@gmail.com> Date: Sun, 15 Dec 2013 21:30:27 -0500 Subject: [PATCH] org-element: use brackets to disambiguate subscript/underline * lisp/org-element.el (org-element--get-next-object-candidates): use brackets to disambiguate subscript/underline In an org-syntax string like 1 or 2 below, both subscript and underline are possible interpretations. This patch uses the presence of brackets to disambiguate these cases, that is, 1 is interpreted as an underline whereas 2 is a subscript (followed by plain-text _) 1: '_foo_ 2: '_{foo}_ This change does not touch the in-buffer highlighting, which will show underline or subscript in these cases depending on the org-use-sub-superscripts variable, which is (close to) deprecated (<http://mid.gmane.org/87lhzpx0d7.fsf@gmail.com>). --- lisp/org-element.el | 64 ++++++++++++++++++++++++++++++++++++----------------- 1 file changed, 44 insertions(+), 20 deletions(-) diff --git a/lisp/org-element.el b/lisp/org-element.el index 089ecfb..7082562 100644 --- a/lisp/org-element.el +++ b/lisp/org-element.el @@ -4424,26 +4424,50 @@ RESTRICTION should be looked after. Return value is an alist whose CAR is the object type and CDR its beginning position." - (delq - nil - (if (eq objects 'initial) - ;; When searching for the first time, look for every successor - ;; allowed in RESTRICTION. - (mapcar - (lambda (res) - (funcall (intern (format "org-element-%s-successor" res)))) - restriction) - ;; Focus on objects returned during last search. Keep those - ;; still after point. Search again objects before it. - (mapcar - (lambda (obj) - (if (>= (cdr obj) (point)) obj - (let* ((type (car obj)) - (succ (or (cdr (assq type org-element-object-successor-alist)) - type))) - (and succ - (funcall (intern (format "org-element-%s-successor" succ))))))) - objects)))) + (let ((res + (delq + nil + (if (eq objects 'initial) + ;; When searching for the first time, look for every successor + ;; allowed in RESTRICTION. + (mapcar + (lambda (res) + (funcall (intern (format "org-element-%s-successor" res)))) + restriction) + ;; Focus on objects returned during last search. Keep those + ;; still after point. Search again objects before it. + (mapcar + (lambda (obj) + (if (>= (cdr obj) (point)) obj + (let* ((type (car obj)) + (succ (or (cdr (assq type org-element-object-successor-alist)) + type))) + (and succ + (funcall (intern (format "org-element-%s-successor" succ))))))) + objects))))) + ;; To disambiguate between underlines and subscripts, which may + ;; both match at the same position, the heuristic is used that if + ;; there are brackets, the expression is a subscript, whereas + ;; otherwise it is an underline. Thus: + ;; '_foo_ parses as (plain-text "'") (underline "foo") + ;; whereas '_{foo}_ parses as + ;; (plain-text "'") (subscript "foo") (plain-text "_") + (setq res + (sort res + (lambda (a b) + (if (and (= (cdr a) (cdr b)) + (memq 'underline (list (car a) (car b))) + (memq 'subscript (list (car a) (car b)))) + (save-excursion + (goto-char (cdr a)) + (let ((bracketsp + (org-element-property + :use-brackets-p (org-element-subscript-parser)))) + (if (eq (car a) 'underline) + (not bracketsp) + bracketsp))) + (< (cdr a) (cdr b)))))) + res)) \f -- 1.8.5.1 [-- Attachment #3: Type: text/plain, Size: 15 bytes --] -- Aaron Ecay ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH] quick patch to org-habit todo state keywords 2013-12-16 3:15 ` Aaron Ecay @ 2013-12-16 3:24 ` Ted Wiles 2013-12-16 4:27 ` Aaron Ecay 2013-12-17 16:57 ` [parser] subscripts and underlines interacting badly Nicolas Goaziou 1 sibling, 1 reply; 14+ messages in thread From: Ted Wiles @ 2013-12-16 3:24 UTC (permalink / raw) Cc: emacs-orgmode@gnu.org [-- Attachment #1: Type: text/plain, Size: 868 bytes --] Took out the hardcoded todo-state * lisp/org-habit.el The "DONE" keyword was hardcoded in. This is my first patch (ever, to anything). TINYCHANGE 1 file changed, 2 insertions(+), 1 deletion(-) lisp/org-habit.el | 3 ++- Modified lisp/org-habit.el diff --git a/lisp/org-habit.el b/lisp/org-habit.el index eba9037..69d0fa6 100644 --- a/lisp/org-habit.el +++ b/lisp/org-habit.el @@ -200,7 +200,8 @@ This list represents a \"habit\" for the rest of this module." (count 0)) (unless reversed (goto-char end)) (while (and (< count maxdays) - (funcall search "- State \"DONE\".*\\[\\([^]]+\\)\\]" limit t)) + (funcall search (format "- State \"%s\".*\\[\\([^]]+\\)\\]" + (car (reverse (car org-todo-keywords)))) limit t)) (push (time-to-days (org-time-string-to-time (match-string-no-properties 1))) closed-dates) [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-Took-out-the-hardcoded-todo-state.patch --] [-- Type: text/x-diff; name="0001-Took-out-the-hardcoded-todo-state.patch", Size: 907 bytes --] From 1f486cdb5fffd83cd0fc065e86be4032fc368ee7 Mon Sep 17 00:00:00 2001 From: Theodore Wiles <theodore.wiles@gmail.com> Date: Sun, 15 Dec 2013 21:56:25 -0500 Subject: [PATCH] Took out the hardcoded todo-state. --- lisp/org-habit.el | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/lisp/org-habit.el b/lisp/org-habit.el index eba9037..69d0fa6 100644 --- a/lisp/org-habit.el +++ b/lisp/org-habit.el @@ -200,7 +200,8 @@ This list represents a \"habit\" for the rest of this module." (count 0)) (unless reversed (goto-char end)) (while (and (< count maxdays) - (funcall search "- State \"DONE\".*\\[\\([^]]+\\)\\]" limit t)) + (funcall search (format "- State \"%s\".*\\[\\([^]]+\\)\\]" + (car (reverse (car org-todo-keywords)))) limit t)) (push (time-to-days (org-time-string-to-time (match-string-no-properties 1))) closed-dates) -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] quick patch to org-habit todo state keywords 2013-12-16 3:24 ` [PATCH] quick patch to org-habit todo state keywords Ted Wiles @ 2013-12-16 4:27 ` Aaron Ecay 0 siblings, 0 replies; 14+ messages in thread From: Aaron Ecay @ 2013-12-16 4:27 UTC (permalink / raw) To: Ted Wiles; +Cc: emacs-orgmode@gnu.org Hi Ted, Welcome to org-mode, and thanks for the patch! I have one small comment: 2013ko abenudak 15an, Ted Wiles-ek idatzi zuen: > > Took out the hardcoded todo-state > > * lisp/org-habit.el > > The "DONE" keyword was hardcoded in. This is my first patch (ever, to anything). > > TINYCHANGE > > 1 file changed, 2 insertions(+), 1 deletion(-) > lisp/org-habit.el | 3 ++- > > Modified lisp/org-habit.el > diff --git a/lisp/org-habit.el b/lisp/org-habit.el > index eba9037..69d0fa6 100644 > --- a/lisp/org-habit.el > +++ b/lisp/org-habit.el > @@ -200,7 +200,8 @@ This list represents a \"habit\" for the rest of this module." > (count 0)) > (unless reversed (goto-char end)) > (while (and (< count maxdays) > - (funcall search "- State \"DONE\".*\\[\\([^]]+\\)\\]" limit t)) > + (funcall search (format "- State \"%s\".*\\[\\([^]]+\\)\\]" > + (car (reverse (car org-todo-keywords)))) limit t)) I think this will not be the correct value when the org-todo-keywords variable contains characters for fast todo selection (cf. the fifth paragraph of the docstring at C-h v org-todo-keywords). For example, my org-todo-keywords is set to: ((sequence "TODO(t)" "NEXT(n)" "|" "DONE(d)") (sequence "WAIT(w@/!)" "|" "CANCELLED(c@/!)")) I think you can get the value you want with: (concat "\\(" (mapconcat 'regexp-quote org-done-keywords "\\|") "\\)") The org-done-keywords variable is not documented, but it is set (locally in each buffer) to the list of done keywords in that buffer by the code that initializes org mode. (I found and slightly adapted the above bit of code from line 6283 of org.el, where some fontification regexps are calculated.) Aaron PS You should make a new message to the list when you send a new patch, rather than replying to another one. The replies interfere with the threading algorithms in some MUAs which parse the References or In-Reply-To header of the email in order to determine its parent. -- Aaron Ecay ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parser] subscripts and underlines interacting badly 2013-12-16 3:15 ` Aaron Ecay 2013-12-16 3:24 ` [PATCH] quick patch to org-habit todo state keywords Ted Wiles @ 2013-12-17 16:57 ` Nicolas Goaziou 2013-12-18 6:57 ` Aaron Ecay 1 sibling, 1 reply; 14+ messages in thread From: Nicolas Goaziou @ 2013-12-17 16:57 UTC (permalink / raw) To: emacs-orgmode@gnu.org Hello, Aaron Ecay <aaronecay@gmail.com> writes: > Since the present syntax is inadequate for representating these > sequences, the new syntax will have to break backwards compatibility > somehow in order to fix the problem. So there’s no long-term harm in > having a short-term kludge that will eventually disappear. OK. Thanks for the patch. Though, I think you are patching the wrong location. Modifying `org-element--get-next-object-candidates' is expensive. It would be better to patch `org-element-sub/superscript-successor' and make it ignore underline matches with brackets followed by an underscore character and resume searching. > But eventually it will (assuming the cache implementation proves robust > enough), right? So, changes in org-element.el will eventually percolate > to the rest of org, whereas changes elsewhere will wither and dry up. But it will be a slow process, and, meanwhile both org-element and the rest of Org must be handled. > I don’t think escaped characters help with the problem that it is > presently impossible to represent the following (pseudo)-element > sequence in org syntax: [...] You are right, escaped characters cannot help us here. > Anyway, what do escaped characters do that entities cannot? Not much. But they could be used in verbatim context. Also, they are somehow inconvenient to use, as you noticed. This can be troublesome in an environment also meant for note-taking. Regards, -- Nicolas Goaziou ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [parser] subscripts and underlines interacting badly 2013-12-17 16:57 ` [parser] subscripts and underlines interacting badly Nicolas Goaziou @ 2013-12-18 6:57 ` Aaron Ecay 2013-12-18 15:01 ` Nicolas Goaziou 0 siblings, 1 reply; 14+ messages in thread From: Aaron Ecay @ 2013-12-18 6:57 UTC (permalink / raw) To: Nicolas Goaziou, emacs-orgmode@gnu.org [-- Attachment #1: Type: text/plain, Size: 1427 bytes --] 2013ko abenudak 17an, Nicolas Goaziou-ek idatzi zuen: > > Hello, > > Aaron Ecay <aaronecay@gmail.com> writes: > >> Since the present syntax is inadequate for representating these >> sequences, the new syntax will have to break backwards compatibility >> somehow in order to fix the problem. So there’s no long-term harm in >> having a short-term kludge that will eventually disappear. > > OK. Thanks for the patch. > > Though, I think you are patching the wrong location. Modifying > `org-element--get-next-object-candidates' is expensive. It would be > better to patch `org-element-sub/superscript-successor' and make it > ignore underline matches with brackets followed by an underscore > character and resume searching. We (perhaps) have to worry about cases like: '_foo bar_ . Here it’s not enough to look at the character immediately following the (possible) subscript, but rather to take into account the full logic of org-emph-re. But now that I think about it, this is the only correct way, since what org-element--get-next-object-candidates sees is limited by the restriction. The attached patch implements this. It also updates the fontification to match (by calling out to the parser, so there are potential performance issues although with the cache it will hopefully not be an issue in practice), and notes the new heuristic in the manual. The test suite passes. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-org-element-use-brackets-to-disambiguate-subscript-u.patch --] [-- Type: text/x-diff, Size: 5608 bytes --] From e2044312b95f8b427ddc662cd1abf10bf4d87b2d Mon Sep 17 00:00:00 2001 From: Aaron Ecay <aaronecay@gmail.com> Date: Sun, 15 Dec 2013 21:30:27 -0500 Subject: [PATCH] org-element: use brackets to disambiguate subscript/underline * lisp/org-element.el (org-element-sub/superscript-successor): use brackets to disambiguate subscript/underline * lisp/org.el (org-do-emphasis-faces): incorporate the above disambiguation * doc/org.texi: reflect these changes in the manual In an org-syntax string like 1 or 2 below, both subscript and underline are possible interpretations. This patch uses the presence of brackets to disambiguate these cases, that is, 1 is interpreted as an underlined "foo" whereas 2 is subscript "foo" followed by plain-text "_" 1: '_foo_ 2: '_{foo}_ This the in-buffer highlighting is updated to match. --- doc/org.texi | 14 ++++++++++++++ lisp/org-element.el | 22 +++++++++++++++++++--- lisp/org.el | 36 ++++++++++++++++++++++-------------- 3 files changed, 55 insertions(+), 17 deletions(-) diff --git a/doc/org.texi b/doc/org.texi index b4c4078..3eefe9a 100644 --- a/doc/org.texi +++ b/doc/org.texi @@ -9739,6 +9739,17 @@ can tweak @code{org-emphasis-regexp-components}. Beware that changing one of the above variables will no take effect until you reload Org, for which you may need to restart Emacs. +When it follows an alphanumeric character, the underscore is always +interpreted as a subscript (@pxref{Subscripts and superscripts}), and when it +follows whitespace it is always the start of an underline (assuming a +matching underscore is found in a proper position further along). However, +after a punctuation character (for example the apostrophe), the underscore +character can be ambiguous between these two interpretations. Org uses a +simple heuristic for these cases: if the character following the underscore +is an opening brace @samp{@{} or if no matching underscore is seen in the +following text, the underscore is considered to be the start of a subscript. +Otherwise, it is the start of underlining. + @node Horizontal rules @subheading Horizontal rules @cindex horizontal rules, markup rules @@ -10123,6 +10134,9 @@ In addition to showing entities as UTF-8 characters, this command will also format sub- and superscripts in a WYSIWYM way. @end table +For discussion of the resolution of ambiguities between the underscore as the +introducer of a subscript vs.@ underline, see @ref{Emphasis and monospace}. + @node @LaTeX{} fragments @subsection @LaTeX{} fragments @cindex @LaTeX{} fragments diff --git a/lisp/org-element.el b/lisp/org-element.el index 089ecfb..faa1e44 100644 --- a/lisp/org-element.el +++ b/lisp/org-element.el @@ -3408,9 +3408,25 @@ Return value is a cons cell whose CAR is either `subscript' or `superscript' and CDR is beginning position." (save-excursion (unless (bolp) (backward-char)) - (when (re-search-forward org-match-substring-regexp nil t) - (cons (if (string= (match-string 2) "_") 'subscript 'superscript) - (match-beginning 2))))) + (let (res) + (while (and (not res) + (re-search-forward org-match-substring-regexp nil t)) + (goto-char (match-beginning 0)) + (when (or + ;; this subscript uses brackets -> handle as subscript + ;; unconditionally + (eq (aref (match-string 3) 0) ?{) + ;; it is not ambiguous with an underline -> handle as + ;; subscript + (not (looking-at-p org-emph-re))) + (setq res (cons (if (string= (match-string 2) "_") + 'subscript + 'superscript) + (match-beginning 2)))) + ;; otherwise -> keep going, and let the underline + ;; parser have it + (goto-char (match-end 0))) + res))) ;;;; Superscript diff --git a/lisp/org.el b/lisp/org.el index eeb0c33..dbddc0b 100644 --- a/lisp/org.el +++ b/lisp/org.el @@ -5721,21 +5721,29 @@ This should be called after the variable `org-link-types' has changed." (if (not (= (char-after (match-beginning 3)) (char-after (match-beginning 4)))) (progn - (setq rtn t) (setq a (assoc (match-string 3) org-emphasis-alist)) - (font-lock-prepend-text-property (match-beginning 2) (match-end 2) - 'face - (nth 1 a)) - (and (nth 2 a) - (org-remove-flyspell-overlays-in - (match-beginning 0) (match-end 0))) - (add-text-properties (match-beginning 2) (match-end 2) - '(font-lock-multiline t org-emphasis t)) - (when org-hide-emphasis-markers - (add-text-properties (match-end 4) (match-beginning 5) - '(invisible org-link)) - (add-text-properties (match-beginning 3) (match-end 3) - '(invisible org-link))))) + ;; Don't do any highlighting if this is an underscore + ;; which is part of a subscript. + (unless (and (string= (car a) "_") + (save-excursion + (goto-char (match-beginning 2)) + (save-match-data + (eq (org-element-type (org-element-context)) + 'subscript)))) + (setq rtn t) + (font-lock-prepend-text-property (match-beginning 2) (match-end 2) + 'face + (nth 1 a)) + (and (nth 2 a) + (org-remove-flyspell-overlays-in + (match-beginning 0) (match-end 0))) + (add-text-properties (match-beginning 2) (match-end 2) + '(font-lock-multiline t org-emphasis t)) + (when org-hide-emphasis-markers + (add-text-properties (match-end 4) (match-beginning 5) + '(invisible org-link)) + (add-text-properties (match-beginning 3) (match-end 3) + '(invisible org-link)))))) (backward-char 1)) rtn)) -- 1.8.5.1 [-- Attachment #3: Type: text/plain, Size: 25 bytes --] Thanks, -- Aaron Ecay ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [parser] subscripts and underlines interacting badly 2013-12-18 6:57 ` Aaron Ecay @ 2013-12-18 15:01 ` Nicolas Goaziou 0 siblings, 0 replies; 14+ messages in thread From: Nicolas Goaziou @ 2013-12-18 15:01 UTC (permalink / raw) To: emacs-orgmode@gnu.org Hello, Aaron Ecay <aaronecay@gmail.com> writes: > The attached patch implements this. It also updates the fontification > to match (by calling out to the parser, so there are potential > performance issues although with the cache it will hopefully not be an > issue in practice), and notes the new heuristic in the manual. The test > suite passes. Thank you. Here are some comments and the usual nitpicks. > From e2044312b95f8b427ddc662cd1abf10bf4d87b2d Mon Sep 17 00:00:00 2001 > From: Aaron Ecay <aaronecay@gmail.com> > Date: Sun, 15 Dec 2013 21:30:27 -0500 > Subject: [PATCH] org-element: use brackets to disambiguate subscript/underline You need a capital after colon. > * lisp/org-element.el (org-element-sub/superscript-successor): use > brackets to disambiguate subscript/underline Ditto, and a period at the end of the sentence. > * lisp/org.el (org-do-emphasis-faces): incorporate the above > disambiguation I'd rather not use `org-element-context' in fontification ATM. My plan is, indeed, to use the parser for fontification, but in a planned-out way. Doing it too early may be counter-productive. For now, we can accept some discrepancies between fontification and syntax (there are many other such occurrences anyway). > * doc/org.texi: reflect these changes in the manual See above. > +When it follows an alphanumeric character, the underscore is always > +interpreted as a subscript (@pxref{Subscripts and superscripts}), and when it > +follows whitespace it is always the start of an underline (assuming a > +matching underscore is found in a proper position further along). However, > +after a punctuation character (for example the apostrophe), the underscore > +character can be ambiguous between these two interpretations. Org uses a > +simple heuristic for these cases: if the character following the underscore > +is an opening brace @samp{@{} or if no matching underscore is seen in the > +following text, the underscore is considered to be the start of a subscript. > +Otherwise, it is the start of underlining. There is no harm in documenting it, but remember that it's not a feature of the syntax. Maybe it could be shortened and put into a footnote instead? > + (let (res) > + (while (and (not res) > + (re-search-forward org-match-substring-regexp nil t)) > + (goto-char (match-beginning 0)) > + (when (or > + ;; this subscript uses brackets -> handle as subscript > + ;; unconditionally Comments need to start with a capital and end with a period. > + (eq (aref (match-string 3) 0) ?{) > + ;; it is not ambiguous with an underline -> handle as > + ;; subscript > + (not (looking-at-p org-emph-re))) It should be `org-looking-at-p' for compatibility with other Emacsen. > + (setq res (cons (if (string= (match-string 2) "_") > + 'subscript > + 'superscript) > + (match-beginning 2)))) > + ;; otherwise -> keep going, and let the underline > + ;; parser have it > + (goto-char (match-end 0))) I think (save-excursion (goto-char (match-beginning 0)) ...) is better than (goto-char (match-beginning 0)) ... (goto-char (match-end 0)). > + res))) I suggest to use (catch 'found ... (throw 'found (cons ...))) instead of RES variable: the less `setq', the better. Regards, -- Nicolas Goaziou ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2013-12-18 15:01 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-12-11 2:30 [parser] subscripts and underlines interacting badly Aaron Ecay 2013-12-11 8:22 ` Nicolas Goaziou 2013-12-11 18:36 ` Aaron Ecay 2013-12-11 20:55 ` Nicolas Goaziou 2013-12-12 7:56 ` Aaron Ecay 2013-12-12 17:33 ` Nicolas Goaziou 2013-12-12 19:42 ` Aaron Ecay 2013-12-12 20:47 ` Nicolas Goaziou 2013-12-16 3:15 ` Aaron Ecay 2013-12-16 3:24 ` [PATCH] quick patch to org-habit todo state keywords Ted Wiles 2013-12-16 4:27 ` Aaron Ecay 2013-12-17 16:57 ` [parser] subscripts and underlines interacting badly Nicolas Goaziou 2013-12-18 6:57 ` Aaron Ecay 2013-12-18 15:01 ` Nicolas Goaziou
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).