emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* Unicode TODO keyword issues in agenda search leading to org-scan-tags regex
@ 2022-09-01  8:17 Marcel Lauhoff
  2022-09-03  8:14 ` Ihor Radchenko
  0 siblings, 1 reply; 7+ messages in thread
From: Marcel Lauhoff @ 2022-09-01  8:17 UTC (permalink / raw)
  To: emacs-orgmode

Hello!

I use unicode characters as TODO keywords and noticed that the "M"
agenda view doesn't find all my TODO entries.

My TODO keywords:
┌────
│ org-todo-keywords '((sequence "≣(p)" "|" "∎(f)")
│ 		    (sequence "◇(t)" "★" "⌛(w)" "🔥(.)" "⚙(s)" "⎇(r)" "⏵(e)" "|" "✔(d)" "✘(c)")
│ 		    (sequence "⛱(m)" "|"))
│ 
└────

Digging a bit I found that the `re' regex at be beginning of
`org-scan-tags' is responsible:

┌────
│ (re (concat "^"
│ 		     (if start-level
│ 			 ;; Get the correct level to match
│ 			 (concat "\\*\\{" (number-to-string start-level) "\\} ")
│ 		       org-outline-regexp)
│ 		     " *\\(" (regexp-opt org-todo-keywords-1 'words) "\\)?"
│ 		     " *\\(.*?\\)\\([ \t]:\\(?:" org-tag-re ":\\)+\\)?[ \t]*$"))
└────

The `(regexp-opt org-todo-keywords-1 'words)' excludes most of my TODO
keywords (but not all). Changing `'words' to `'symbols' give me my
desired result.

Short reproducer for some TODO keywords:

┌────
│ (let* ((todo-keywords '("TODO" "DONE" "PROJECT" "≣" "∎" "◇" "★" "⌛" "🔥" "⚙" "⎇" "⏵" "✔" "✘"))
│        (re-words (regexp-opt todo-keywords 'words))
│        (re-symbols (regexp-opt todo-keywords 'symbols)))
│   (list
│    (-map (lambda (kw) (when (string-match-p re-words kw) kw)) todo-keywords)
│    (-map (lambda (kw) (when (string-match-p re-symbols kw) kw)) todo-keywords)))
│ (("TODO" "DONE" "PROJECT" nil nil nil nil nil "🔥" "⚙" nil nil "✔" "✘")
│  ("TODO" "DONE" "PROJECT" "≣" "∎" "◇" "★" "⌛" "🔥" "⚙" "⎇" "⏵" "✔" "✘"))
└────

I have no idea why "⚙" is a word and "⌛" isn't, but would you accept a
patch that changes `'words' to `'symbols'? Or is there another way
around my problem?

~marcel



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Unicode TODO keyword issues in agenda search leading to org-scan-tags regex
  2022-09-01  8:17 Unicode TODO keyword issues in agenda search leading to org-scan-tags regex Marcel Lauhoff
@ 2022-09-03  8:14 ` Ihor Radchenko
  2022-09-05  9:53   ` Marcel Lauhoff
  0 siblings, 1 reply; 7+ messages in thread
From: Ihor Radchenko @ 2022-09-03  8:14 UTC (permalink / raw)
  To: Marcel Lauhoff; +Cc: emacs-orgmode

Marcel Lauhoff <ml@irq0.org> writes:

> My TODO keywords:
> ┌────
> │ org-todo-keywords '((sequence "≣(p)" "|" "∎(f)")
> │ 		    (sequence "◇(t)" "★" "⌛(w)" "🔥(.)" "⚙(s)" "⎇(r)" "⏵(e)" "|" "✔(d)" "✘(c)")
> │ 		    (sequence "⛱(m)" "|"))
> │ 
> └────
> ...
> The `(regexp-opt org-todo-keywords-1 'words)' excludes most of my TODO
> keywords (but not all). Changing `'words' to `'symbols' give me my
> desired result.

Thanks for reporting!
Fixed on main via a1c923afd.
https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=a1c923afda22089cbf14c7b7a175cee10bfd297b

-- 
Ihor Radchenko,
Org mode contributor,
Learn more about Org mode at https://orgmode.org/.
Support Org development at https://liberapay.com/org-mode,
or support my work at https://liberapay.com/yantar92


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Unicode TODO keyword issues in agenda search leading to org-scan-tags regex
  2022-09-03  8:14 ` Ihor Radchenko
@ 2022-09-05  9:53   ` Marcel Lauhoff
  2022-09-05 10:29     ` Ihor Radchenko
  0 siblings, 1 reply; 7+ messages in thread
From: Marcel Lauhoff @ 2022-09-05  9:53 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: emacs-orgmode



> On 2022-09-03, at 10:14, Ihor Radchenko <yantar92@gmail.com> wrote:
> 
> Marcel Lauhoff <ml@irq0.org> writes:
> 
>> My TODO keywords:
>> ┌────
>> │ org-todo-keywords '((sequence "≣(p)" "|" "∎(f)")
>> │ 		    (sequence "◇(t)" "★" "⌛(w)" "🔥(.)" "⚙(s)" "⎇(r)" "⏵(e)" "|" "✔(d)" "✘(c)")
>> │ 		    (sequence "⛱(m)" "|"))
>> │ 
>> └────
>> ...
>> The `(regexp-opt org-todo-keywords-1 'words)' excludes most of my TODO
>> keywords (but not all). Changing `'words' to `'symbols' give me my
>> desired result.
> 
> Thanks for reporting!
> Fixed on main via a1c923afd.
> https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=a1c923afda22089cbf14c7b7a175cee10bfd297b

┌────
│ -		     " *\\(" (regexp-opt org-todo-keywords-1 'words) "\\)?"
│ +		     " *\\(" (regexp-opt org-todo-keywords-1 'words) " \\)?"
└────

That change doesn't seem to work for me, but maybe I'm holding it wrong or missed a follow up commit.

The following snippet tries some regex variations matched to one of `("TODO" "≣" "TODO " "≣ ")'. a1c923afd does seem to require a space after the keyword now:

┌────
│ (let* ((kwds '("TODO" "DONE" "PROJECT" "≣" "∎" "◇" "★" "⌛" "🔥" "⚙" "⎇" "⏵" "✔" "✘"))
│        (test-todos '("TODO" "≣" "TODO " "≣ "))
│        (regexes (list (concat "^ *\\(" (regexp-opt kwds 'words) "\\)?")  ;; orig
│ 		      (concat "^ *\\(" (regexp-opt kwds 'words) " \\)?")  ;; a1c923afd
│ 		      (concat "^ *\\(" (regexp-opt kwds 'symbols) "\\)?")  ;; symbols
│ 		      (concat "^ *\\(" (regexp-opt kwds 'symbols) " \\)?"))))  ;; symbols + space
│   (-map (lambda (regex)
│ 	  (list regex
│ 	       (-map (lambda (todo)
│ 			       (s-match-strings-all regex todo))
│ 		      test-todos)))
│   regexes))
└────

┌────
│ (("^ *\\(\\<\\(DONE\\|PROJECT\\|TODO\\|[∎≣⌛⎇⏵◇★⚙✔✘🔥]\\)\\>\\)?"
│   ((("TODO" "TODO" "TODO"))
│    ((""))
│    (("TODO" "TODO" "TODO"))
│    ((""))))
│  ("^ *\\(\\<\\(DONE\\|PROJECT\\|TODO\\|[∎≣⌛⎇⏵◇★⚙✔✘🔥]\\)\\> \\)?"
│   (((""))
│    ((""))
│    (("TODO " "TODO " "TODO"))
│    ((""))))
│  ("^ *\\(\\_<\\(DONE\\|PROJECT\\|TODO\\|[∎≣⌛⎇⏵◇★⚙✔✘🔥]\\)\\_>\\)?"
│   ((("TODO" "TODO" "TODO"))
│    (("≣" "≣" "≣"))
│    (("TODO" "TODO" "TODO"))
│    (("≣" "≣" "≣"))))
│  ("^ *\\(\\_<\\(DONE\\|PROJECT\\|TODO\\|[∎≣⌛⎇⏵◇★⚙✔✘🔥]\\)\\_> \\)?"
│   (((""))
│    ((""))
│    (("TODO " "TODO " "TODO"))
│    (("≣ " "≣ " "≣")))))
└────


~marcel




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Unicode TODO keyword issues in agenda search leading to org-scan-tags regex
  2022-09-05  9:53   ` Marcel Lauhoff
@ 2022-09-05 10:29     ` Ihor Radchenko
  2022-09-05 10:44       ` Marcel Lauhoff
  0 siblings, 1 reply; 7+ messages in thread
From: Ihor Radchenko @ 2022-09-05 10:29 UTC (permalink / raw)
  To: Marcel Lauhoff; +Cc: emacs-orgmode

Marcel Lauhoff <ml@irq0.org> writes:

>> Thanks for reporting!
>> Fixed on main via a1c923afd.
>> https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=a1c923afda22089cbf14c7b7a175cee10bfd297b
>
> ┌────
> │ -		     " *\\(" (regexp-opt org-todo-keywords-1 'words) "\\)?"
> │ +		     " *\\(" (regexp-opt org-todo-keywords-1 'words) " \\)?"
> └────
>
> That change doesn't seem to work for me, but maybe I'm holding it wrong or missed a follow up commit.

Oops. You are indeed right. I forgot to remove the 'words argument. Now
fixed (hopefully).
https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=eeb4fa8c09e6e21b86bec51328e53b4adf662945

> The following snippet tries some regex variations matched to one of `("TODO" "≣" "TODO " "≣ ")'. a1c923afd does seem to require a space after the keyword now:

Org requires a space after TODO-keyword in headlines. TODO-keywords
themselves do not need to have space.

Note that your test is artificial - todo matching is done on headlines,
not on arbitrary strings.

-- 
Ihor Radchenko,
Org mode contributor,
Learn more about Org mode at https://orgmode.org/.
Support Org development at https://liberapay.com/org-mode,
or support my work at https://liberapay.com/yantar92


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Unicode TODO keyword issues in agenda search leading to org-scan-tags regex
  2022-09-05 10:29     ` Ihor Radchenko
@ 2022-09-05 10:44       ` Marcel Lauhoff
  2022-09-05 10:54         ` Ihor Radchenko
  0 siblings, 1 reply; 7+ messages in thread
From: Marcel Lauhoff @ 2022-09-05 10:44 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: emacs-orgmode



> On 2022-09-05, at 12:29, Ihor Radchenko <yantar92@gmail.com> wrote:
> 
> Marcel Lauhoff <ml@irq0.org> writes:
> 
>>> Thanks for reporting!
>>> Fixed on main via a1c923afd.
>>> https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=a1c923afda22089cbf14c7b7a175cee10bfd297b
>> 
>> ┌────
>> │ -		     " *\\(" (regexp-opt org-todo-keywords-1 'words) "\\)?"
>> │ +		     " *\\(" (regexp-opt org-todo-keywords-1 'words) " \\)?"
>> └────
>> 
>> That change doesn't seem to work for me, but maybe I'm holding it wrong or missed a follow up commit.
> 
> Oops. You are indeed right. I forgot to remove the 'words argument. Now
> fixed (hopefully).
> https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=eeb4fa8c09e6e21b86bec51328e53b4adf662945

I think it should be

" *\\(" (regexp-opt org-todo-keywords-1 t) "\\)?"

1. Without the 3rd argument regexp-opt generates a non-capturing group. 'word generates a capturing group.
2. With the additional space in " \\" I'm no longer able to match any todo in my org files.


> 
>> The following snippet tries some regex variations matched to one of `("TODO" "≣" "TODO " "≣ ")'. a1c923afd does seem to require a space after the keyword now:
> 
> Org requires a space after TODO-keyword in headlines. TODO-keywords
> themselves do not need to have space.

Yes, I added the space variant to test the "\\)?" -> " \\)?" part of the change.


~marcel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Unicode TODO keyword issues in agenda search leading to org-scan-tags regex
  2022-09-05 10:44       ` Marcel Lauhoff
@ 2022-09-05 10:54         ` Ihor Radchenko
  2022-09-06  7:11           ` Marcel Lauhoff
  0 siblings, 1 reply; 7+ messages in thread
From: Ihor Radchenko @ 2022-09-05 10:54 UTC (permalink / raw)
  To: Marcel Lauhoff; +Cc: emacs-orgmode

Marcel Lauhoff <ml@irq0.org> writes:

>> Oops. You are indeed right. I forgot to remove the 'words argument. Now
>> fixed (hopefully).
>> https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=eeb4fa8c09e6e21b86bec51328e53b4adf662945
>
> I think it should be
>
> " *\\(" (regexp-opt org-todo-keywords-1 t) "\\)?"
>
> 1. Without the 3rd argument regexp-opt generates a non-capturing group. 'word generates a capturing group.
> 2. With the additional space in " \\" I'm no longer able to match any todo in my org files.

Yeah. I indeed missed the match group change.
Your variant, however, will match things like
* TODObut not really

Can you try again with the latest main?

-- 
Ihor Radchenko,
Org mode contributor,
Learn more about Org mode at https://orgmode.org/.
Support Org development at https://liberapay.com/org-mode,
or support my work at https://liberapay.com/yantar92


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Unicode TODO keyword issues in agenda search leading to org-scan-tags regex
  2022-09-05 10:54         ` Ihor Radchenko
@ 2022-09-06  7:11           ` Marcel Lauhoff
  0 siblings, 0 replies; 7+ messages in thread
From: Marcel Lauhoff @ 2022-09-06  7:11 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: emacs-orgmode



> On 2022-09-05, at 12:54, Ihor Radchenko <yantar92@gmail.com> wrote:
> 
> Marcel Lauhoff <ml@irq0.org> writes:
> 
>>> Oops. You are indeed right. I forgot to remove the 'words argument. Now
>>> fixed (hopefully).
>>> https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=eeb4fa8c09e6e21b86bec51328e53b4adf662945
>> 
>> I think it should be
>> 
>> " *\\(" (regexp-opt org-todo-keywords-1 t) "\\)?"
>> 
>> 1. Without the 3rd argument regexp-opt generates a non-capturing group. 'word generates a capturing group.
>> 2. With the additional space in " \\" I'm no longer able to match any todo in my org files.
> 
> Yeah. I indeed missed the match group change.
> Your variant, however, will match things like
> * TODObut not really
> 
> Can you try again with the latest main?


Just tried the latest main (a3788bd8b). Tag todo searches work now with my unicode TODO keywords. 
Awesome! Thank you!

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-09-06  7:13 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-09-01  8:17 Unicode TODO keyword issues in agenda search leading to org-scan-tags regex Marcel Lauhoff
2022-09-03  8:14 ` Ihor Radchenko
2022-09-05  9:53   ` Marcel Lauhoff
2022-09-05 10:29     ` Ihor Radchenko
2022-09-05 10:44       ` Marcel Lauhoff
2022-09-05 10:54         ` Ihor Radchenko
2022-09-06  7:11           ` Marcel Lauhoff

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).