Hello, worg/dev/org-syntax.org is not clear about whether the title of a headline is allowed to be empty. I occasionally (would like to) use headlines with empty titles, e.g., when a tag in some headline already provides all necessary information, and an additional title would duplicate that. Are empty headline titles meant to be supported? If yes, then there are at least two bugs. On #+BEGIN_EXAMPLE ,* A ,* :B: ,* C #+END_EXAMPLE org-element-parse-buffer and org-match-sparse-tree make the second headline have title ":B:" and no tags. Best wishes Sebastian
Hello,
Sebastian Miele <sebastian.miele@gmail.com> writes:
> worg/dev/org-syntax.org is not clear about whether the title of a
> headline is allowed to be empty. I occasionally (would like to) use
> headlines with empty titles, e.g., when a tag in some headline already
> provides all necessary information, and an additional title would
> duplicate that.
>
> Are empty headline titles meant to be supported? If yes, then there are
> at least two bugs. On
>
> #+BEGIN_EXAMPLE
> ,* A
> ,* :B:
> ,* C
> #+END_EXAMPLE
>
> org-element-parse-buffer and org-match-sparse-tree make the second
> headline have title ":B:" and no tags.
Empty headline titles are supported, and "org-syntax.org" makes it
clear.
However, empty headline + tags is ambiguous. The right solution is to
disambiguate it by inserting "something" as the headline you want to
tag, like a non-breaking space.
Regards,
--
Nicolas Goaziou
[-- Attachment #1: Type: text/plain, Size: 482 bytes --] Nicolas Goaziou <mail@nicolasgoaziou.fr> writes: > Empty headline titles are supported, and "org-syntax.org" makes it > clear. > > However, empty headline + tags is ambiguous. The right solution is to > disambiguate it by inserting "something" as the headline you want to > tag, like a non-breaking space. Yet, why not simply alter the headline parser a little bit to support empty titles + tag? Such headlines are used in some of the tests. See the attached patch. Best, Ihor [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-Parse-titeless-headlines-with-a-tag-correctly.patch --] [-- Type: text/x-diff, Size: 1214 bytes --] From a65e39125b732f6e091dbabdd63bb46c2463cc99 Mon Sep 17 00:00:00 2001 Message-Id: <a65e39125b732f6e091dbabdd63bb46c2463cc99.1621744991.git.yantar92@gmail.com> From: Ihor Radchenko <yantar92@gmail.com> Date: Sun, 23 May 2021 12:41:35 +0800 Subject: [PATCH] Parse titeless headlines with a tag correctly * lisp/org-element.el (org-element-headline-parser): Do not treat tag as headline title in headlines like "* :tag:". --- lisp/org-element.el | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/lisp/org-element.el b/lisp/org-element.el index ba4f0ead6..36b8f5847 100644 --- a/lisp/org-element.el +++ b/lisp/org-element.el @@ -1001,7 +1001,10 @@ (defun org-element-headline-parser (limit &optional raw-secondary-p) (commentedp (and (let (case-fold-search) (looking-at org-comment-string)) (goto-char (match-end 0)))) - (title-start (point)) + (title-start (prog1 (point) + (unless (or todo priority commentedp) + ;; Headline like "* :tag:" + (skip-syntax-backward " \t")))) (tags (when (re-search-forward "[ \t]+\\(:[[:alnum:]_@#%:]+:\\)[ \t]*$" (line-end-position) -- 2.26.3
Hello,
Ihor Radchenko <yantar92@gmail.com> writes:
>> However, empty headline + tags is ambiguous. The right solution is to
>> disambiguate it by inserting "something" as the headline you want to
>> tag, like a non-breaking space.
>
> Yet, why not simply alter the headline parser a little bit to support
> empty titles + tag?
Because, as I wrote, this is ambiguous. You cannot distinguish the
following two cases:
* :mytag:
* :myheadline:
So, your patch would only move the problem elsewhere.
I suggest to not tag emptiness. `org-set-tags' could raise an error in
this case.
Regards,
--
Nicolas Goaziou
Nicolas Goaziou <mail@nicolasgoaziou.fr> writes: > Because, as I wrote, this is ambiguous. You cannot distinguish the > following two cases: > > * :mytag: > * :myheadline: Makes sense. > So, your patch would only move the problem elsewhere. Actually, my patch would solve the existing inconsistency. Org mode already puts tag fontification on :tag: in "* :tag:" headline; org-get-tags returns ("tag") on such headline; org-set-tags sets the :tag: on "*" headline making it look like "* :tag:" (which is even tested in test-org/set-tags); org-tag-line-re used across the codebase also parses "* :tag:" as headline containing tags. It is only the org-element parser that does not recognise the :tag: in empty headline. > I suggest to not tag emptiness. `org-set-tags' could raise an error in > this case. Of course, we can go the other way around and recognise "* :myheadline:" as non-empty headline with no tags. Though it will be more tricky than simply changing `org-set-tags' as you suggest. Best, Ihor
Ihor Radchenko <yantar92@gmail.com> writes: > Actually, my patch would solve the existing inconsistency. > Org mode already puts tag fontification on :tag: in "* :tag:" headline; > org-get-tags returns ("tag") on such headline; org-set-tags sets the > :tag: on "*" headline making it look like "* :tag:" (which is even > tested in test-org/set-tags); org-tag-line-re used across the > codebase also parses "* :tag:" as headline containing tags. It is only > the org-element parser that does not recognise the :tag: in empty > headline. > >> I suggest to not tag emptiness. `org-set-tags' could raise an error in >> this case. > > Of course, we can go the other way around and recognise "* :myheadline:" > as non-empty headline with no tags. There is no spoon^W way. Or, no way is satisfactory. "Don't tag empty headlines" is better. In any case, I'm not convinced Org Element is wrong here. > Though it will be more tricky than > simply changing `org-set-tags' as you suggest. It probably boils down to changing `org-tag-line-re'. I didn't check though. Regards,
Nicolas Goaziou <mail@nicolasgoaziou.fr> writes: >>> I suggest to not tag emptiness. `org-set-tags' could raise an error in >>> this case. >> >> Of course, we can go the other way around and recognise "* :myheadline:" >> as non-empty headline with no tags. > > There is no spoon^W way. Or, no way is satisfactory. "Don't tag empty > headlines" is better. > > In any case, I'm not convinced Org Element is wrong here. Either way is fine while it is consistent. I just tried to test some edge cases with existing org-element code: * TODO COMMENT :tag: org-element-at-point returns :raw-value "". * TODO :tag: :raw-value ":tag:" >> Though it will be more tricky than >> simply changing `org-set-tags' as you suggest. > > It probably boils down to changing `org-tag-line-re'. I didn't check > though. org-tag-line-re and also org-tag-group-re, I think. Best, Ihor
Nicolas Goaziou <mail@nicolasgoaziou.fr> writes:
> You cannot distinguish the following two cases:
>
> * :mytag:
> * :myheadline:
In my opinion, the cleanest solution would be to allow not only tags
specifications of one or more tags, but also the tags specification ":"
of zero tags in the headline. Then in
* :t:
* :t: :
the former would be a headline with empty title and tag "t", and the
latter would be a headline with title ":t:" and not tags.
Sebastian Miele <sebastian.miele@gmail.com> writes:
> #+BEGIN_EXAMPLE
> ,* A
> ,* :B:
> ,* C
> #+END_EXAMPLE
>
> org-element-parse-buffer and org-match-sparse-tree make the second
> headline have title ":B:" and no tags.
Currently org-syntax.org says that "TITLE can be made of any character
but a new line. Though, it will match after every other part have been
matched." This does not reflect the currently effective behavior that
"* :t:" is a headline with title ":t:" and no tags.
Ihor Radchenko <yantar92@gmail.com> writes:
> Either way is fine while it is consistent. I just tried to test some
> edge cases with existing org-element code:
>
> * TODO COMMENT :tag:
>
> org-element-at-point returns :raw-value "".
>
> * TODO :tag:
>
> :raw-value ":tag:"
Concerning tags, it is the expected behavior according to
org-syntax.org: "If the first word appearing in the title is “COMMENT”,
the headline will be considered as “commented”." So the headline
* TODO COMMENT :tag:
is a headline with title "COMMENT" and tag "tag". The headline is not
empty.
However, according to org-element-api.org, the :raw-value should be
"COMMENT".
But this raises another question: In my opinion the apparently effective
behavior of org-element (make the headline :commentedp, but do not
actually include "COMMENT" in the title) is preferable. So I would
prefer to change the spec (org-syntax.org) to reflect that.
Sebastian Miele <sebastian.miele@gmail.com> writes:
> Sebastian Miele <sebastian.miele@gmail.com> writes:
>> #+BEGIN_EXAMPLE
>> ,* A
>> ,* :B:
>> ,* C
>> #+END_EXAMPLE
>>
>> org-element-parse-buffer and org-match-sparse-tree make the second
>> headline have title ":B:" and no tags.
>
> Currently org-syntax.org says that "TITLE can be made of any character
> but a new line. Though, it will match after every other part have been
> matched." This does not reflect the currently effective behavior that
> "* :t:" is a headline with title ":t:" and no tags.
Can you describe what should happen in a parser grammar (ie. BNF)? If
not, I would tend toward rethinking the structure of the Org file so
that it can be described in a grammar. Having a good grammar for Org
files will promote it's acceptance beyond Emacs.
--
David Masterson
Hi David and all,
David Masterson <dsmasterson92630@outlook.com> writes:
> Sebastian Miele <sebastian.miele@gmail.com> writes:
>> Currently org-syntax.org says that "TITLE can be made of any
>> character but a new line. Though, it will match after every other
>> part have been matched." This does not reflect the currently
>> effective behavior that "* :t:" is a headline with title ":t:" and no
>> tags.
>
> Can you describe what should happen in a parser grammar (ie. BNF)? If
> not, I would tend toward rethinking the structure of the Org file so
> that it can be described in a grammar. Having a good grammar for Org
> files will promote it's acceptance beyond Emacs.
I do not know whether it can be expressed in a context-free grammar,
although it may very well be possible. However, the way I understand
the above quote from org-syntax.org (which is, I think, in the end
preferable) is concisely expressible in a regular expression language
that can distinguish between greedy and non-greedy matching of
subexpressions, including Emacs Lisp's regular expressions:
#+BEGIN_SRC elisp
(rx line-start
(maximal-match STARS SPACE)
(maximal-match (optional KEYWORD SPACE))
(maximal-match (optional PRIORITY SPACE))
(maximal-match (optional COMMENT SPACE))
(minimal-match (optional TITLE SPACE))
(maximal-match (optional TAGS))
(maximal-match (optional SPACE))
line-end)
#+END_SRC
SPACE is (1+ (any " \t")). TITLE is (1+ not-newline). In the
following, I concentrate on differences from org-syntax.org.
The above expression contains COMMENT (matching "COMMENT") not as part
of the title but as separate entity. Although this is contrary to
org-syntax.org, it is how it is implemented now, e.g., in
org-element-headline-parser.
TAGS currently effectively is (seq ":" (1+ TAG ":")). In particular,
that means a TAGS specification in a headline must define at least one
tag.
I suggest to change that into (seq ":" (0+ TAG ":")), i.e., to also
allow TAGS specifications of zero tags (just ":"). This would enable to
clearly disambuate the following ambiguity between TITLEs and TAGS:
#+BEGIN_SRC org
,* :t:
,* :t: :
#+END_SRC
The former headline would have empty TITLE and TAGS ":t:". The latter
headline would have TITLE ":t:" and TAGS ":".
The following toy can be used to test some cases. It is not complete,
but contains the essential.
#+BEGIN_SRC elisp
(defun f (x)
(let ((r (rx line-start
(maximal-match (group (1+ "*")) (1+ (any " \t")))
(maximal-match (group (optional "TODO" (1+ (any " \t")))))
(minimal-match (optional (group (1+ not-newline)) (1+ (any " \t"))))
(maximal-match (group (optional (seq ":" (0+ (any "a-z") ":")))))
(maximal-match (optional (1+ (any " \t"))))
line-end)))
(when (let (case-fold-search) (string-match r x))
(list :stars (match-string 1 x)
:todo (match-string 2 x)
:title (let ((title (match-string 3 x))) (if title title ""))
:tags (match-string 4 x)))))
(f "*** :t: : ") ;(:stars "***" :todo "" :title ":t:" :tags ":")
(f "*** :t: ") ;(:stars "***" :todo "" :title "" :tags ":t:")
#+END_SRC
Best wishes
Sebastian
Sebastian Miele <sebastian.miele@gmail.com> writes:
>David Masterson <dsmasterson92630@outlook.com> writes:
>> Sebastian Miele <sebastian.miele@gmail.com> writes:
>>> Currently org-syntax.org says that "TITLE can be made of any
>>> character but a new line. Though, it will match after every other
>>> part have been matched." This does not reflect the currently
>>> effective behavior that "* :t:" is a headline with title ":t:" and no
>>> tags.
>>
>> Can you describe what should happen in a parser grammar (ie. BNF)? If
>> not, I would tend toward rethinking the structure of the Org file so
>> that it can be described in a grammar. Having a good grammar for Org
>> files will promote it's acceptance beyond Emacs.
>
> [...] However, the way I understand the above quote from
> org-syntax.org (which is, I think, in the end preferable) [...]
To be clearer: Preferable to the way it currently is implemented.
In the headline "* :t:", the above quote from org-syntax.org (at least
in my way of reading it) means TAGS ":t:" (which is an "other part [to
be] matched [before the TITLE]") and TITLE "" (which is matched "after
every other part").
But the way Org currently is implemented is different in such cases (no
TAGS, the ":t:" is the TITLE).
Hello,
Sebastian Miele <sebastian.miele@gmail.com> writes:
> Sebastian Miele <sebastian.miele@gmail.com> writes:
>>David Masterson <dsmasterson92630@outlook.com> writes:
>>> Sebastian Miele <sebastian.miele@gmail.com> writes:
>>>> Currently org-syntax.org says that "TITLE can be made of any
>>>> character but a new line. Though, it will match after every other
>>>> part have been matched." This does not reflect the currently
>>>> effective behavior that "* :t:" is a headline with title ":t:" and no
>>>> tags.
>>>
>>> Can you describe what should happen in a parser grammar (ie. BNF)? If
>>> not, I would tend toward rethinking the structure of the Org file so
>>> that it can be described in a grammar. Having a good grammar for Org
>>> files will promote it's acceptance beyond Emacs.
>>
>> [...] However, the way I understand the above quote from
>> org-syntax.org (which is, I think, in the end preferable) [...]
>
> To be clearer: Preferable to the way it currently is implemented.
>
> In the headline "* :t:", the above quote from org-syntax.org (at least
> in my way of reading it) means TAGS ":t:"
This is your way of reading it, but it's unfortunately not the sole way
to look at it. Also, I'm sympathetic to Ihor Radchenko's desire to see
consistent code in that area. But being consistent means we favor one
interpretation over the other.
We can do that, of course, but I can't see which one is better. The one
you prefer might be more common, but it is also nicer to fix, by adding
some space-like character in the headline. Fixing it the other way
requires the same space after the headline. Of course it's not a big
deal either way. The whole thread isn't, actually.
Currently, what Org does in this situation is unimportant, because the
behaviour is simply undefined, which is, IMO, tolerable. If we decide to
define it, it needs to be documented.
Regards,
--
Nicolas Goaziou
Nicolas Goaziou <mail@nicolasgoaziou.fr> writes: > Sebastian Miele <sebastian.miele@gmail.com> writes: >> Sebastian Miele <sebastian.miele@gmail.com> writes: >>>David Masterson <dsmasterson92630@outlook.com> writes: >>>> Sebastian Miele <sebastian.miele@gmail.com> writes: >>>>> Currently org-syntax.org says that "TITLE can be made of any >>>>> character but a new line. Though, it will match after every other >>>>> part have been matched." This does not reflect the currently >>>>> effective behavior that "* :t:" is a headline with title ":t:" and no >>>>> tags. >>>> Can you describe what should happen in a parser grammar (ie. BNF)? If >>>> not, I would tend toward rethinking the structure of the Org file so >>>> that it can be described in a grammar. Having a good grammar for Org >>>> files will promote it's acceptance beyond Emacs. >>> [...] However, the way I understand the above quote from >>> org-syntax.org (which is, I think, in the end preferable) [...] >> To be clearer: Preferable to the way it currently is implemented. >> >> In the headline "* :t:", the above quote from org-syntax.org (at least >> in my way of reading it) means TAGS ":t:" > This is your way of reading it, but it's unfortunately not the sole way > to look at it. Also, I'm sympathetic to Ihor Radchenko's desire to see > consistent code in that area. But being consistent means we favor one > interpretation over the other. Is there anything wrong with that? > Currently, what Org does in this situation is unimportant, because the > behaviour is simply undefined, which is, IMO, tolerable. If we decide to > define it, it needs to be documented. But having undefined behaviors is limiting on the portability of Org because people are unwilling to pick it up and attempt to (say) create a (partial) Org for other platforms (iPhone, Android, etc.). -- David Masterson
David Masterson <dsmasterson92630@outlook.com> writes: > Nicolas Goaziou <mail@nicolasgoaziou.fr> writes: > >> Sebastian Miele <sebastian.miele@gmail.com> writes: > >>> Sebastian Miele <sebastian.miele@gmail.com> writes: >>>>David Masterson <dsmasterson92630@outlook.com> writes: >>>>> Sebastian Miele <sebastian.miele@gmail.com> writes: >>>>>> Currently org-syntax.org says that "TITLE can be made of any >>>>>> character but a new line. Though, it will match after every other >>>>>> part have been matched." This does not reflect the currently >>>>>> effective behavior that "* :t:" is a headline with title ":t:" and no >>>>>> tags. > >>>>> Can you describe what should happen in a parser grammar (ie. BNF)? If >>>>> not, I would tend toward rethinking the structure of the Org file so >>>>> that it can be described in a grammar. Having a good grammar for Org >>>>> files will promote it's acceptance beyond Emacs. > >>>> [...] However, the way I understand the above quote from >>>> org-syntax.org (which is, I think, in the end preferable) [...] > >>> To be clearer: Preferable to the way it currently is implemented. >>> >>> In the headline "* :t:", the above quote from org-syntax.org (at least >>> in my way of reading it) means TAGS ":t:" > >> This is your way of reading it, but it's unfortunately not the sole way >> to look at it. Also, I'm sympathetic to Ihor Radchenko's desire to see >> consistent code in that area. But being consistent means we favor one >> interpretation over the other. > > Is there anything wrong with that? > I don't think the question is whether there is anything wrong, but rather which interpretation to adopt. Currently, both use cases can be catered for. However, once you lock in one interpretation, the other use case is not possible. Which one should we favour? My personal preference would be to keep it simple. A value of :t: is a tag and you cannot have a heading of the format :t:, but that is just my view. >> Currently, what Org does in this situation is unimportant, because the >> behaviour is simply undefined, which is, IMO, tolerable. If we decide to >> define it, it needs to be documented. > > But having undefined behaviors is limiting on the portability of Org > because people are unwilling to pick it up and attempt to (say) create a > (partial) Org for other platforms (iPhone, Android, etc.). This is very much a secondary consideration. While making it as easy as possible to parse org files outside of Emacs is not a bad thing, it should not be a primary driver for how org works. Org is an emacs mode and I think we need to be careful when considering limiting what you can do with it based on how easily it can be formally specified for external tools to use. I think few org users would welcome a change which removed a feature or required them to modify their workflow just to support the development of non-emacs tools. -- Tim Cross
Nicolas Goaziou <mail@nicolasgoaziou.fr> writes:
> Currently, what Org does in this situation is unimportant, because the
> behaviour is simply undefined, which is, IMO, tolerable. If we decide to
> define it, it needs to be documented.
I agree that currently there is no urgent need to decide on this, but
there (hopefully) will be in the future.
I am trying to implement storage of heading elements in
org-element-cache and reusing them later in org.el functions when
getting tags, properties, schedules, etc. This, among other things,
requires consistency between org.el and org-element.el when parsing
headlines.
Best,
Ihor
Tim Cross <theophilusx@gmail.com> writes: > David Masterson <dsmasterson92630@outlook.com> writes: >> But having undefined behaviors is limiting on the portability of Org >> because people are unwilling to pick it up and attempt to (say) create a >> (partial) Org for other platforms (iPhone, Android, etc.). > This is very much a secondary consideration. While making it as easy as > possible to parse org files outside of Emacs is not a bad thing, it > should not be a primary driver for how org works. Org is an emacs mode > and I think we need to be careful when considering limiting what you can > do with it based on how easily it can be formally specified for external > tools to use. I think few org users would welcome a change which removed > a feature or required them to modify their workflow just to support the > development of non-emacs tools. Could it be done via a secondary parser? That is, define a base level language for Org that fits into a BNF (or..?) grammar and then a package that could be a pre-hook to the parser that rewrites improper tidbits into a grammatically correct form? I'm thinking, if the grammar is well defined, the secondary parser won't be that complex. -- David Masterson
Ihor Radchenko <yantar92@gmail.com> writes:
> Nicolas Goaziou <mail@nicolasgoaziou.fr> writes:
>
>> Currently, what Org does in this situation is unimportant, because the
>> behaviour is simply undefined, which is, IMO, tolerable. If we decide to
>> define it, it needs to be documented.
>
> I agree that currently there is no urgent need to decide on this, but
> there (hopefully) will be in the future.
>
> I am trying to implement storage of heading elements in
> org-element-cache and reusing them later in org.el functions when
> getting tags, properties, schedules, etc. This, among other things,
> requires consistency between org.el and org-element.el when parsing
> headlines.
Could an extensible pre-hook that runs a list of functions take care of
the inconsistencies where each function recognizes one change to the
standard grammar and adjusts the input accordingly?
--
David Masterson
David Masterson <dsmasterson92630@outlook.com> writes:
> Could an extensible pre-hook that runs a list of functions take care of
> the inconsistencies where each function recognizes one change to the
> standard grammar and adjusts the input accordingly?
Could you elaborate? For now, this sounds like unnecessary
over-complication. Why would we need to introduce deviations from
grammar in different functions?
Best,
Ihor
Ihor Radchenko <yantar92@gmail.com> writes:
> David Masterson <dsmasterson92630@outlook.com> writes:
>
>> Could an extensible pre-hook that runs a list of functions take care of
>> the inconsistencies where each function recognizes one change to the
>> standard grammar and adjusts the input accordingly?
>
> Could you elaborate? For now, this sounds like unnecessary
> over-complication. Why would we need to introduce deviations from
> grammar in different functions?
Testing the usefulness of extensions to the grammar before they're added
to the grammar..?
Thinking about it, though, I'm not sure how you could "pre-parse" a
section of the file, make changes to the file via an extension, then
roll back the file pointer and let the regular parser take over. I was
thinking of a "hook" approach where, when the parser gets to a certain
point (beginning of a header?), it could call the hooks with the current
spot as input. If the hooks try to "extract" something from the input
(to process it and hide it from the regular grammar), I have a feeling
that would mess up the base parser.
It was just an idea... ;)
--
David Masterson
Hi all, Here is the 4th (or so) iteration of the grammar for titles that I think deals with most of the issues in this thread along with a bunch of nasty test cases. The previous attempts can be inspected in the git history, but long story short, it is extremely hard to find a grammar that follows the principle of least surprise and you have to use the tokenizer to ensure that the tags pattern always parses as such so that tags don't magically switch to being the title when you remove the rest of the contents of the title. The final example L1648-L1665 shows many of the things that should parse as tags and do with this tokenizer/grammar combination. The key to dealing with the ambiguity of empty title and tags vs something that looks like tags but parses as a title (which is surprising) is to use the tokenizer to greedily recognize tags at the end of the line. This ensures that the tags pattern at the end of the line always parses as tags and doesn't switch just because the title is empty. Happy to elaborate. Best, Tom https://github.com/tgbugs/laundry/blob/next/laundry/heading.rkt https://github.com/tgbugs/laundry/blob/971cf35683cd60156868c12b070c2dd9e19d8d06/laundry/tokenizer.rkt#L98-L140 https://github.com/tgbugs/laundry/blob/971cf35683cd60156868c12b070c2dd9e19d8d06/laundry/test.rkt#L326-L367 https://github.com/tgbugs/laundry/blob/971cf35683cd60156868c12b070c2dd9e19d8d06/laundry/test.rkt#L400-L558 https://github.com/tgbugs/laundry/blob/971cf35683cd60156868c12b070c2dd9e19d8d06/laundry/test.rkt#L1298-L1369 https://github.com/tgbugs/laundry/blob/971cf35683cd60156868c12b070c2dd9e19d8d06/laundry/test.rkt#L1371-L1419 https://github.com/tgbugs/laundry/blob/971cf35683cd60156868c12b070c2dd9e19d8d06/laundry/test.rkt#L1648-L1665
Tom Gillespie <tgbugs@gmail.com> writes:
> Hi all,
> Here is the 4th (or so) iteration of the grammar for titles that
> I think deals with most of the issues in this thread along with a
> bunch of nasty test cases. The previous attempts can be inspected in
> the git history, but long story short, it is extremely hard to find a
> grammar that follows the principle of least surprise and you have to
> use the tokenizer to ensure that the tags pattern always parses as
> such so that tags don't magically switch to being the title when you
> remove the rest of the contents of the title. The final example
> L1648-L1665 shows many of the things that should parse as tags and do
> with this tokenizer/grammar combination. The key to dealing with the
> ambiguity of empty title and tags vs something that looks like tags
> but parses as a title (which is surprising) is to use the tokenizer to
> greedily recognize tags at the end of the line. This ensures that the
> tags pattern at the end of the line always parses as tags and doesn't
> switch just because the title is empty. Happy to elaborate. Best,
> Tom
>
> https://github.com/tgbugs/laundry/blob/next/laundry/heading.rkt
> https://github.com/tgbugs/laundry/blob/971cf35683cd60156868c12b070c2dd9e19d8d06/laundry/tokenizer.rkt#L98-L140
>
> https://github.com/tgbugs/laundry/blob/971cf35683cd60156868c12b070c2dd9e19d8d06/laundry/test.rkt#L326-L367
> https://github.com/tgbugs/laundry/blob/971cf35683cd60156868c12b070c2dd9e19d8d06/laundry/test.rkt#L400-L558
> https://github.com/tgbugs/laundry/blob/971cf35683cd60156868c12b070c2dd9e19d8d06/laundry/test.rkt#L1298-L1369
> https://github.com/tgbugs/laundry/blob/971cf35683cd60156868c12b070c2dd9e19d8d06/laundry/test.rkt#L1371-L1419
> https://github.com/tgbugs/laundry/blob/971cf35683cd60156868c12b070c2dd9e19d8d06/laundry/test.rkt#L1648-L1665
This looks interesting, but I'm not yet familiar with racket & brag -= I
assume laundry doesn't produce any output from the input Org file other
than some form of "good" or "bad" signal. True?
--
David Masterson
Hi David, Laundry produces a full s-expression representation of the org parse tree (though it is still evolving). I haven't added a pass that converts it to some Racket internal representation (probably will be structs). If you get it installed and put #lang org at the top of an org file you can use racket-mode to parse arbitrary org files, though you may hit an error and will definitely encounter an incomplete/incorrect parse since it is still a work in progress. Best, Tom
Tom Gillespie <tgbugs@gmail.com> writes:
> Hi David,
> Laundry produces a full s-expression representation of the org
> parse tree (though it is still evolving). I haven't added a pass that
> converts it to some Racket internal representation (probably will be
> structs). If you get it installed and put #lang org at the top of an
> org file you can use racket-mode to parse arbitrary org files, though
> you may hit an error and will definitely encounter an
> incomplete/incorrect parse since it is still a work in progress. Best,
> Tom
Hi, I looked through your code. I saw some hand-written tests on
grammar. Would you be interested in converting those tests into tests in
test-org-element test suite? If the same tests are used for org-element
itself and for your parser, we can easily make sure that incorrect
parsing is avoided. Or you can even reuse the built-in tests for testing
your code (say, by remapping org-element-at-point calls in the tests
to analogous function from your parser).
Best,
Ihor
David Masterson <dsmasterson92630@outlook.com> writes:
> Testing the usefulness of extensions to the grammar before they're added
> to the grammar..?
For simple cases, there is org-element-update-syntax. Otherwise, you
will probably better use the usual patch/new feature workflow and modify
the parsers directly.
Best,
Ihor
Hi Ihor, Yes, happy to put my test cases into the org element cases and visa versa. My long term plan is to come up with a set of test cases that are unambiguous and potentially ambiguous so that we can determine the expected behavior in those cases, so this is a great first step. Best, Tom
Ihor Radchenko <yantar92@gmail.com> writes:
> David Masterson <dsmasterson92630@outlook.com> writes:
>> Testing the usefulness of extensions to the grammar before they're added
>> to the grammar..?
>
> For simple cases, there is org-element-update-syntax. Otherwise, you
> will probably better use the usual patch/new feature workflow and modify
> the parsers directly.
Thanks
--
David Masterson
Tom Gillespie <tgbugs@gmail.com> writes:
> Hi Ihor,
> Yes, happy to put my test cases into the org element cases and
> visa versa.
Patches welcome ;)
Hi all, Ihor Radchenko <yantar92@gmail.com> writes: > Yet, why not simply alter the headline parser a little bit to support > empty titles + tag? Such headlines are used in some of the tests. See > the attached patch. I'm in favor of this change... Nicolas Goaziou <mail@nicolasgoaziou.fr> writes: > Because, as I wrote, this is ambiguous. You cannot distinguish the > following two cases: > > * :mytag: > * :myheadline: ... because I'm not convinced by the above example: I agree this is syntactically ambiguous, but as a human I would understand that Org would parse * :myheadline: as [beginning of a headline + empty heading + tag]. > So, your patch would only move the problem elsewhere. Nicolas, do you strongly feel against this change? Is moving the problem elsewhere is creating more problems I might have overlooked? > I suggest to not tag emptiness. I'd rather allow empty headlines with tags, this seems a useful way of filtering/searching contents. WDYT? -- Bastien
Hi Bastien, I am strongly in favor of this change. It simplifies the grammar significantly, and from my work on the laundry lexer and parser, I'm 99% certain that the current behavior is a bug that is the result of gobbling the space after the stars in the headline. The correct implementation peeks 1 char ahead for the space, and then starts parsing again starting with the space. This is because tags MUST be preceded by a space, so if you incorrectly gobble the space after the stars then that space cannot be used as the start for tags. Best, Tom
Hello,
Bastien <bzg@gnu.org> writes:
> Ihor Radchenko <yantar92@gmail.com> writes:
>
>> Yet, why not simply alter the headline parser a little bit to support
>> empty titles + tag? Such headlines are used in some of the tests. See
>> the attached patch.
>
> I'm in favor of this change...
>
> Nicolas Goaziou <mail@nicolasgoaziou.fr> writes:
>
>> Because, as I wrote, this is ambiguous. You cannot distinguish the
>> following two cases:
>>
>> * :mytag:
>> * :myheadline:
>
> ... because I'm not convinced by the above example: I agree this is
> syntactically ambiguous, but as a human I would understand that Org
> would parse
>
> * :myheadline:
>
> as [beginning of a headline + empty heading + tag].
>
>> So, your patch would only move the problem elsewhere.
>
> Nicolas, do you strongly feel against this change? Is moving the
> problem elsewhere is creating more problems I might have overlooked?
>
>> I suggest to not tag emptiness.
>
> I'd rather allow empty headlines with tags, this seems a useful way
> of filtering/searching contents.
>
> WDYT?
I don't have anything new to bring to this discussion. I don't feel
strongly against anything.
Regards,
--
Nicolas Goaziou
Hi Ihor,
Ihor Radchenko <yantar92@gmail.com> writes:
> Yet, why not simply alter the headline parser a little bit to support
> empty titles + tag? Such headlines are used in some of the tests. See
> the attached patch.
I've now applied this patch -- let's not close the discussion though,
and keep in mind Nicolas caveats.
Thanks,
--
Bastien