* Unicode TODO keyword issues in agenda search leading to org-scan-tags regex
@ 2022-09-01 8:17 Marcel Lauhoff
2022-09-03 8:14 ` Ihor Radchenko
0 siblings, 1 reply; 7+ messages in thread
From: Marcel Lauhoff @ 2022-09-01 8:17 UTC (permalink / raw)
To: emacs-orgmode
Hello!
I use unicode characters as TODO keywords and noticed that the "M"
agenda view doesn't find all my TODO entries.
My TODO keywords:
┌────
│ org-todo-keywords '((sequence "≣(p)" "|" "∎(f)")
│ (sequence "◇(t)" "★" "⌛(w)" "🔥(.)" "⚙(s)" "⎇(r)" "⏵(e)" "|" "✔(d)" "✘(c)")
│ (sequence "⛱(m)" "|"))
│
└────
Digging a bit I found that the `re' regex at be beginning of
`org-scan-tags' is responsible:
┌────
│ (re (concat "^"
│ (if start-level
│ ;; Get the correct level to match
│ (concat "\\*\\{" (number-to-string start-level) "\\} ")
│ org-outline-regexp)
│ " *\\(" (regexp-opt org-todo-keywords-1 'words) "\\)?"
│ " *\\(.*?\\)\\([ \t]:\\(?:" org-tag-re ":\\)+\\)?[ \t]*$"))
└────
The `(regexp-opt org-todo-keywords-1 'words)' excludes most of my TODO
keywords (but not all). Changing `'words' to `'symbols' give me my
desired result.
Short reproducer for some TODO keywords:
┌────
│ (let* ((todo-keywords '("TODO" "DONE" "PROJECT" "≣" "∎" "◇" "★" "⌛" "🔥" "⚙" "⎇" "⏵" "✔" "✘"))
│ (re-words (regexp-opt todo-keywords 'words))
│ (re-symbols (regexp-opt todo-keywords 'symbols)))
│ (list
│ (-map (lambda (kw) (when (string-match-p re-words kw) kw)) todo-keywords)
│ (-map (lambda (kw) (when (string-match-p re-symbols kw) kw)) todo-keywords)))
│ (("TODO" "DONE" "PROJECT" nil nil nil nil nil "🔥" "⚙" nil nil "✔" "✘")
│ ("TODO" "DONE" "PROJECT" "≣" "∎" "◇" "★" "⌛" "🔥" "⚙" "⎇" "⏵" "✔" "✘"))
└────
I have no idea why "⚙" is a word and "⌛" isn't, but would you accept a
patch that changes `'words' to `'symbols'? Or is there another way
around my problem?
~marcel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Unicode TODO keyword issues in agenda search leading to org-scan-tags regex
2022-09-01 8:17 Unicode TODO keyword issues in agenda search leading to org-scan-tags regex Marcel Lauhoff
@ 2022-09-03 8:14 ` Ihor Radchenko
2022-09-05 9:53 ` Marcel Lauhoff
0 siblings, 1 reply; 7+ messages in thread
From: Ihor Radchenko @ 2022-09-03 8:14 UTC (permalink / raw)
To: Marcel Lauhoff; +Cc: emacs-orgmode
Marcel Lauhoff <ml@irq0.org> writes:
> My TODO keywords:
> ┌────
> │ org-todo-keywords '((sequence "≣(p)" "|" "∎(f)")
> │ (sequence "◇(t)" "★" "⌛(w)" "🔥(.)" "⚙(s)" "⎇(r)" "⏵(e)" "|" "✔(d)" "✘(c)")
> │ (sequence "⛱(m)" "|"))
> │
> └────
> ...
> The `(regexp-opt org-todo-keywords-1 'words)' excludes most of my TODO
> keywords (but not all). Changing `'words' to `'symbols' give me my
> desired result.
Thanks for reporting!
Fixed on main via a1c923afd.
https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=a1c923afda22089cbf14c7b7a175cee10bfd297b
--
Ihor Radchenko,
Org mode contributor,
Learn more about Org mode at https://orgmode.org/.
Support Org development at https://liberapay.com/org-mode,
or support my work at https://liberapay.com/yantar92
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Unicode TODO keyword issues in agenda search leading to org-scan-tags regex
2022-09-03 8:14 ` Ihor Radchenko
@ 2022-09-05 9:53 ` Marcel Lauhoff
2022-09-05 10:29 ` Ihor Radchenko
0 siblings, 1 reply; 7+ messages in thread
From: Marcel Lauhoff @ 2022-09-05 9:53 UTC (permalink / raw)
To: Ihor Radchenko; +Cc: emacs-orgmode
> On 2022-09-03, at 10:14, Ihor Radchenko <yantar92@gmail.com> wrote:
>
> Marcel Lauhoff <ml@irq0.org> writes:
>
>> My TODO keywords:
>> ┌────
>> │ org-todo-keywords '((sequence "≣(p)" "|" "∎(f)")
>> │ (sequence "◇(t)" "★" "⌛(w)" "🔥(.)" "⚙(s)" "⎇(r)" "⏵(e)" "|" "✔(d)" "✘(c)")
>> │ (sequence "⛱(m)" "|"))
>> │
>> └────
>> ...
>> The `(regexp-opt org-todo-keywords-1 'words)' excludes most of my TODO
>> keywords (but not all). Changing `'words' to `'symbols' give me my
>> desired result.
>
> Thanks for reporting!
> Fixed on main via a1c923afd.
> https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=a1c923afda22089cbf14c7b7a175cee10bfd297b
┌────
│ - " *\\(" (regexp-opt org-todo-keywords-1 'words) "\\)?"
│ + " *\\(" (regexp-opt org-todo-keywords-1 'words) " \\)?"
└────
That change doesn't seem to work for me, but maybe I'm holding it wrong or missed a follow up commit.
The following snippet tries some regex variations matched to one of `("TODO" "≣" "TODO " "≣ ")'. a1c923afd does seem to require a space after the keyword now:
┌────
│ (let* ((kwds '("TODO" "DONE" "PROJECT" "≣" "∎" "◇" "★" "⌛" "🔥" "⚙" "⎇" "⏵" "✔" "✘"))
│ (test-todos '("TODO" "≣" "TODO " "≣ "))
│ (regexes (list (concat "^ *\\(" (regexp-opt kwds 'words) "\\)?") ;; orig
│ (concat "^ *\\(" (regexp-opt kwds 'words) " \\)?") ;; a1c923afd
│ (concat "^ *\\(" (regexp-opt kwds 'symbols) "\\)?") ;; symbols
│ (concat "^ *\\(" (regexp-opt kwds 'symbols) " \\)?")))) ;; symbols + space
│ (-map (lambda (regex)
│ (list regex
│ (-map (lambda (todo)
│ (s-match-strings-all regex todo))
│ test-todos)))
│ regexes))
└────
┌────
│ (("^ *\\(\\<\\(DONE\\|PROJECT\\|TODO\\|[∎≣⌛⎇⏵◇★⚙✔✘🔥]\\)\\>\\)?"
│ ((("TODO" "TODO" "TODO"))
│ ((""))
│ (("TODO" "TODO" "TODO"))
│ ((""))))
│ ("^ *\\(\\<\\(DONE\\|PROJECT\\|TODO\\|[∎≣⌛⎇⏵◇★⚙✔✘🔥]\\)\\> \\)?"
│ (((""))
│ ((""))
│ (("TODO " "TODO " "TODO"))
│ ((""))))
│ ("^ *\\(\\_<\\(DONE\\|PROJECT\\|TODO\\|[∎≣⌛⎇⏵◇★⚙✔✘🔥]\\)\\_>\\)?"
│ ((("TODO" "TODO" "TODO"))
│ (("≣" "≣" "≣"))
│ (("TODO" "TODO" "TODO"))
│ (("≣" "≣" "≣"))))
│ ("^ *\\(\\_<\\(DONE\\|PROJECT\\|TODO\\|[∎≣⌛⎇⏵◇★⚙✔✘🔥]\\)\\_> \\)?"
│ (((""))
│ ((""))
│ (("TODO " "TODO " "TODO"))
│ (("≣ " "≣ " "≣")))))
└────
~marcel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Unicode TODO keyword issues in agenda search leading to org-scan-tags regex
2022-09-05 9:53 ` Marcel Lauhoff
@ 2022-09-05 10:29 ` Ihor Radchenko
2022-09-05 10:44 ` Marcel Lauhoff
0 siblings, 1 reply; 7+ messages in thread
From: Ihor Radchenko @ 2022-09-05 10:29 UTC (permalink / raw)
To: Marcel Lauhoff; +Cc: emacs-orgmode
Marcel Lauhoff <ml@irq0.org> writes:
>> Thanks for reporting!
>> Fixed on main via a1c923afd.
>> https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=a1c923afda22089cbf14c7b7a175cee10bfd297b
>
> ┌────
> │ - " *\\(" (regexp-opt org-todo-keywords-1 'words) "\\)?"
> │ + " *\\(" (regexp-opt org-todo-keywords-1 'words) " \\)?"
> └────
>
> That change doesn't seem to work for me, but maybe I'm holding it wrong or missed a follow up commit.
Oops. You are indeed right. I forgot to remove the 'words argument. Now
fixed (hopefully).
https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=eeb4fa8c09e6e21b86bec51328e53b4adf662945
> The following snippet tries some regex variations matched to one of `("TODO" "≣" "TODO " "≣ ")'. a1c923afd does seem to require a space after the keyword now:
Org requires a space after TODO-keyword in headlines. TODO-keywords
themselves do not need to have space.
Note that your test is artificial - todo matching is done on headlines,
not on arbitrary strings.
--
Ihor Radchenko,
Org mode contributor,
Learn more about Org mode at https://orgmode.org/.
Support Org development at https://liberapay.com/org-mode,
or support my work at https://liberapay.com/yantar92
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Unicode TODO keyword issues in agenda search leading to org-scan-tags regex
2022-09-05 10:29 ` Ihor Radchenko
@ 2022-09-05 10:44 ` Marcel Lauhoff
2022-09-05 10:54 ` Ihor Radchenko
0 siblings, 1 reply; 7+ messages in thread
From: Marcel Lauhoff @ 2022-09-05 10:44 UTC (permalink / raw)
To: Ihor Radchenko; +Cc: emacs-orgmode
> On 2022-09-05, at 12:29, Ihor Radchenko <yantar92@gmail.com> wrote:
>
> Marcel Lauhoff <ml@irq0.org> writes:
>
>>> Thanks for reporting!
>>> Fixed on main via a1c923afd.
>>> https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=a1c923afda22089cbf14c7b7a175cee10bfd297b
>>
>> ┌────
>> │ - " *\\(" (regexp-opt org-todo-keywords-1 'words) "\\)?"
>> │ + " *\\(" (regexp-opt org-todo-keywords-1 'words) " \\)?"
>> └────
>>
>> That change doesn't seem to work for me, but maybe I'm holding it wrong or missed a follow up commit.
>
> Oops. You are indeed right. I forgot to remove the 'words argument. Now
> fixed (hopefully).
> https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=eeb4fa8c09e6e21b86bec51328e53b4adf662945
I think it should be
" *\\(" (regexp-opt org-todo-keywords-1 t) "\\)?"
1. Without the 3rd argument regexp-opt generates a non-capturing group. 'word generates a capturing group.
2. With the additional space in " \\" I'm no longer able to match any todo in my org files.
>
>> The following snippet tries some regex variations matched to one of `("TODO" "≣" "TODO " "≣ ")'. a1c923afd does seem to require a space after the keyword now:
>
> Org requires a space after TODO-keyword in headlines. TODO-keywords
> themselves do not need to have space.
Yes, I added the space variant to test the "\\)?" -> " \\)?" part of the change.
~marcel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Unicode TODO keyword issues in agenda search leading to org-scan-tags regex
2022-09-05 10:44 ` Marcel Lauhoff
@ 2022-09-05 10:54 ` Ihor Radchenko
2022-09-06 7:11 ` Marcel Lauhoff
0 siblings, 1 reply; 7+ messages in thread
From: Ihor Radchenko @ 2022-09-05 10:54 UTC (permalink / raw)
To: Marcel Lauhoff; +Cc: emacs-orgmode
Marcel Lauhoff <ml@irq0.org> writes:
>> Oops. You are indeed right. I forgot to remove the 'words argument. Now
>> fixed (hopefully).
>> https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=eeb4fa8c09e6e21b86bec51328e53b4adf662945
>
> I think it should be
>
> " *\\(" (regexp-opt org-todo-keywords-1 t) "\\)?"
>
> 1. Without the 3rd argument regexp-opt generates a non-capturing group. 'word generates a capturing group.
> 2. With the additional space in " \\" I'm no longer able to match any todo in my org files.
Yeah. I indeed missed the match group change.
Your variant, however, will match things like
* TODObut not really
Can you try again with the latest main?
--
Ihor Radchenko,
Org mode contributor,
Learn more about Org mode at https://orgmode.org/.
Support Org development at https://liberapay.com/org-mode,
or support my work at https://liberapay.com/yantar92
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Unicode TODO keyword issues in agenda search leading to org-scan-tags regex
2022-09-05 10:54 ` Ihor Radchenko
@ 2022-09-06 7:11 ` Marcel Lauhoff
0 siblings, 0 replies; 7+ messages in thread
From: Marcel Lauhoff @ 2022-09-06 7:11 UTC (permalink / raw)
To: Ihor Radchenko; +Cc: emacs-orgmode
> On 2022-09-05, at 12:54, Ihor Radchenko <yantar92@gmail.com> wrote:
>
> Marcel Lauhoff <ml@irq0.org> writes:
>
>>> Oops. You are indeed right. I forgot to remove the 'words argument. Now
>>> fixed (hopefully).
>>> https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=eeb4fa8c09e6e21b86bec51328e53b4adf662945
>>
>> I think it should be
>>
>> " *\\(" (regexp-opt org-todo-keywords-1 t) "\\)?"
>>
>> 1. Without the 3rd argument regexp-opt generates a non-capturing group. 'word generates a capturing group.
>> 2. With the additional space in " \\" I'm no longer able to match any todo in my org files.
>
> Yeah. I indeed missed the match group change.
> Your variant, however, will match things like
> * TODObut not really
>
> Can you try again with the latest main?
Just tried the latest main (a3788bd8b). Tag todo searches work now with my unicode TODO keywords.
Awesome! Thank you!
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-09-06 7:13 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-09-01 8:17 Unicode TODO keyword issues in agenda search leading to org-scan-tags regex Marcel Lauhoff
2022-09-03 8:14 ` Ihor Radchenko
2022-09-05 9:53 ` Marcel Lauhoff
2022-09-05 10:29 ` Ihor Radchenko
2022-09-05 10:44 ` Marcel Lauhoff
2022-09-05 10:54 ` Ihor Radchenko
2022-09-06 7:11 ` Marcel Lauhoff
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).