emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* org-scan-tags
@ 2010-09-15  3:19 Ilya Shlyakhter
  2010-09-15 14:36 ` org-scan-tags Ilya Shlyakhter
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Ilya Shlyakhter @ 2010-09-15  3:19 UTC (permalink / raw)
  To: emacs-orgmode

In org-scan-tags, if todo-only is t, would it be possible to speed
things up by changingthe regexp go to just the lines with a TODO
keyword?
I.e. in

  (let* ((re (concat "^" outline-regexp " *\\(\\<\\("
                     (mapconcat 'regexp-quote org-todo-keywords-1 "\\|")
                     (org-re
                      "\\>\\)\\)? *\\(.*?\\)\\(:[[:alnum:]_@:]+:\\)?[ \t]*$")))

remove the first "?" if todo-only is t.   Also, regexp-opt might make
a more efficient regexp than mapconcat with regexp-quote.

Reason for request:  I'm writing an extension of org for setting &
checking goals, and want to quickly find entries with headlines of the
form
    GOAL ....
of which there may be relatively few in a large file.   So, stepping
through all entries and then checking them for the GOAL keyword is
very inefficient.
It would be much faster if the regexp included the GOAL as a keyword.

It would be good if the parameter todo-only could be a list of
strings, and org-scan-tags would return only the headlines where the
todo keyword is from this list.
It could use regexp-opt to make an efficient regexp for this.

There also seem to be other opportunities for speeding up
org-scan-tags in this way: e.g. if the match string includes +mytag,
the regexp for the headline could include this as well.
Similarly for properties.  Maybe, org-make-tags-matcher could return a
list of tags and properties that must appear in any matching entry.

It would also help if the tags matcher expression could refer to text
properties stored on the headline -- perhaps, with conditions such as
:myprop=X  (i.e. same as for org properties, but property name must be
a keyword).   It already does this for the 'org-category text
property.
 Then one can e.g. mark entries representing unmet goals with text
properties, and then use a regular org-tags-view to browse them in a
sparsetree
or an agenda.

Thanks,

ilya

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: org-scan-tags
  2010-09-15  3:19 org-scan-tags Ilya Shlyakhter
@ 2010-09-15 14:36 ` Ilya Shlyakhter
  2010-09-15 16:13 ` org-scan-tags Ilya Shlyakhter
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Ilya Shlyakhter @ 2010-09-15 14:36 UTC (permalink / raw)
  To: emacs-orgmode

Another question: if org-map-continue-from is used to skip parts of
the file, could that affect
the correctness of org-scanner-tags?  I.e. is any code that sets
org-map-continue-from
also responsible for updating org-scanner-tags?

thanks,

ilya

On Tue, Sep 14, 2010 at 11:19 PM, Ilya Shlyakhter <ilya_shl@alum.mit.edu> wrote:
> In org-scan-tags, if todo-only is t, would it be possible to speed
> things up by changingthe regexp go to just the lines with a TODO
> keyword?
> I.e. in
>
>  (let* ((re (concat "^" outline-regexp " *\\(\\<\\("
>                     (mapconcat 'regexp-quote org-todo-keywords-1 "\\|")
>                     (org-re
>                      "\\>\\)\\)? *\\(.*?\\)\\(:[[:alnum:]_@:]+:\\)?[ \t]*$")))
>
> remove the first "?" if todo-only is t.   Also, regexp-opt might make
> a more efficient regexp than mapconcat with regexp-quote.
>
> Reason for request:  I'm writing an extension of org for setting &
> checking goals, and want to quickly find entries with headlines of the
> form
>    GOAL ....
> of which there may be relatively few in a large file.   So, stepping
> through all entries and then checking them for the GOAL keyword is
> very inefficient.
> It would be much faster if the regexp included the GOAL as a keyword.
>
> It would be good if the parameter todo-only could be a list of
> strings, and org-scan-tags would return only the headlines where the
> todo keyword is from this list.
> It could use regexp-opt to make an efficient regexp for this.
>
> There also seem to be other opportunities for speeding up
> org-scan-tags in this way: e.g. if the match string includes +mytag,
> the regexp for the headline could include this as well.
> Similarly for properties.  Maybe, org-make-tags-matcher could return a
> list of tags and properties that must appear in any matching entry.
>
> It would also help if the tags matcher expression could refer to text
> properties stored on the headline -- perhaps, with conditions such as
> :myprop=X  (i.e. same as for org properties, but property name must be
> a keyword).   It already does this for the 'org-category text
> property.
>  Then one can e.g. mark entries representing unmet goals with text
> properties, and then use a regular org-tags-view to browse them in a
> sparsetree
> or an agenda.
>
> Thanks,
>
> ilya
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: org-scan-tags
  2010-09-15  3:19 org-scan-tags Ilya Shlyakhter
  2010-09-15 14:36 ` org-scan-tags Ilya Shlyakhter
@ 2010-09-15 16:13 ` Ilya Shlyakhter
  2011-02-02 23:17 ` org-scan-tags Bastien
  2011-02-03  5:32 ` org-scan-tags Carsten Dominik
  3 siblings, 0 replies; 9+ messages in thread
From: Ilya Shlyakhter @ 2010-09-15 16:13 UTC (permalink / raw)
  To: emacs-orgmode

On Tue, Sep 14, 2010 at 11:19 PM, Ilya Shlyakhter <ilya_shl@alum.mit.edu> wrote:
> There also seem to be other opportunities for speeding up
> org-scan-tags in this way: e.g. if the match string includes +mytag,
> the regexp for the headline could include this as well.
> Similarly for properties.  Maybe, org-make-tags-matcher could return a
> list of tags and properties that must appear in any matching entry.

Correction: org-make-tags-matcher would also need to return the list of
_all_ tags/properties mentioned in the matcher, whose values should be
inherited.
The headline regexp would then need to match the mention of these
tags/properties
in entries, as well.

It would also be good if org-scan-tags could be told to skip selected
subtrees entirely
(selected either by another matcher or a predicate passed in).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: org-scan-tags
  2010-09-15  3:19 org-scan-tags Ilya Shlyakhter
  2010-09-15 14:36 ` org-scan-tags Ilya Shlyakhter
  2010-09-15 16:13 ` org-scan-tags Ilya Shlyakhter
@ 2011-02-02 23:17 ` Bastien
  2011-02-03  5:32 ` org-scan-tags Carsten Dominik
  3 siblings, 0 replies; 9+ messages in thread
From: Bastien @ 2011-02-02 23:17 UTC (permalink / raw)
  To: Ilya Shlyakhter; +Cc: emacs-orgmode

Hi Ilya,

Ilya Shlyakhter <ilya_shl@alum.mit.edu> writes:

> In org-scan-tags, if todo-only is t, would it be possible to speed
> things up by changingthe regexp go to just the lines with a TODO
> keyword?
> I.e. in
>
>   (let* ((re (concat "^" outline-regexp " *\\(\\<\\("
>                      (mapconcat 'regexp-quote org-todo-keywords-1 "\\|")
>                      (org-re
>                       "\\>\\)\\)? *\\(.*?\\)\\(:[[:alnum:]_@:]+:\\)?[ \t]*$")))
>
> remove the first "?" if todo-only is t.   Also, regexp-opt might make
> a more efficient regexp than mapconcat with regexp-quote.

I've optimized org-scan-tags a bit following your ideas (gaining ~12%
according to elp) -- thanks for these directions.

> It would be good if the parameter todo-only could be a list of
> strings, and org-scan-tags would return only the headlines where the
> todo keyword is from this list.

This would be confusing.  Particularily, org-tags-view uses
org-scan-tags using both the todo-only argument and a matcher: so 
if you make the todo-only argument aware of TODO keywords, there
might be some interference between todo-only and the matcher.

I'd rather not go that route.

Thanks,

-- 
 Bastien

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: org-scan-tags
  2010-09-15  3:19 org-scan-tags Ilya Shlyakhter
                   ` (2 preceding siblings ...)
  2011-02-02 23:17 ` org-scan-tags Bastien
@ 2011-02-03  5:32 ` Carsten Dominik
  2011-02-03 16:13   ` org-scan-tags Carsten Dominik
  3 siblings, 1 reply; 9+ messages in thread
From: Carsten Dominik @ 2011-02-03  5:32 UTC (permalink / raw)
  To: Ilya Shlyakhter; +Cc: emacs-orgmode


On Sep 15, 2010, at 5:19 AM, Ilya Shlyakhter wrote:

> In org-scan-tags, if todo-only is t, would it be possible to speed
> things up by changingthe regexp go to just the lines with a TODO
> keyword?

I believe this may cause a problem.  The scanner needs to see at least
every parent node to be able to collect all inherited tags.
So I think that a tree like

* heading
** one :tag1:
*** TODO two :tag2:

would incorrectly miss out on :tag1:

- Carsten

> I.e. in
>
>  (let* ((re (concat "^" outline-regexp " *\\(\\<\\("
>                     (mapconcat 'regexp-quote org-todo-keywords-1 "\ 
> \|")
>                     (org-re
>                      "\\>\\)\\)? *\\(.*?\\)\\(:[[:alnum:]_@:]+:\\)? 
> [ \t]*$")))
>
> remove the first "?" if todo-only is t.   Also, regexp-opt might make
> a more efficient regexp than mapconcat with regexp-quote.
>
> Reason for request:  I'm writing an extension of org for setting &
> checking goals, and want to quickly find entries with headlines of the
> form
>    GOAL ....
> of which there may be relatively few in a large file.   So, stepping
> through all entries and then checking them for the GOAL keyword is
> very inefficient.
> It would be much faster if the regexp included the GOAL as a keyword.
>
> It would be good if the parameter todo-only could be a list of
> strings, and org-scan-tags would return only the headlines where the
> todo keyword is from this list.
> It could use regexp-opt to make an efficient regexp for this.
>
> There also seem to be other opportunities for speeding up
> org-scan-tags in this way: e.g. if the match string includes +mytag,
> the regexp for the headline could include this as well.
> Similarly for properties.  Maybe, org-make-tags-matcher could return a
> list of tags and properties that must appear in any matching entry.
>
> It would also help if the tags matcher expression could refer to text
> properties stored on the headline -- perhaps, with conditions such as
> :myprop=X  (i.e. same as for org properties, but property name must be
> a keyword).   It already does this for the 'org-category text
> property.
> Then one can e.g. mark entries representing unmet goals with text
> properties, and then use a regular org-tags-view to browse them in a
> sparsetree
> or an agenda.
>
> Thanks,
>
> ilya
>
> _______________________________________________
> Emacs-orgmode mailing list
> Please use `Reply All' to send replies to the list.
> Emacs-orgmode@gnu.org
> http://lists.gnu.org/mailman/listinfo/emacs-orgmode

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: org-scan-tags
  2011-02-03  5:32 ` org-scan-tags Carsten Dominik
@ 2011-02-03 16:13   ` Carsten Dominik
  2011-02-03 16:37     ` org-scan-tags Bastien
  0 siblings, 1 reply; 9+ messages in thread
From: Carsten Dominik @ 2011-02-03 16:13 UTC (permalink / raw)
  To: Carsten Dominik; +Cc: Ilya Shlyakhter, emacs-orgmode


On Feb 3, 2011, at 6:32 AM, Carsten Dominik wrote:

>
> On Sep 15, 2010, at 5:19 AM, Ilya Shlyakhter wrote:
>
>> In org-scan-tags, if todo-only is t, would it be possible to speed
>> things up by changingthe regexp go to just the lines with a TODO
>> keyword?
>
> I believe this may cause a problem.  The scanner needs to see at least
> every parent node to be able to collect all inherited tags.
> So I think that a tree like
>
> * heading
> ** one :tag1:
> *** TODO two :tag2:
>
> would incorrectly miss out on :tag1:


OK, here is an example where it really does fail:


* heading
** one                                                        :tag1:
*** two
*** two 							       :tag2:
*** TODO two 							       :tag2:
*** two 							       :tag2:


Fold up the tree, then do

C-c / m +tag1/! RET

This should find the "TODO two", but it does not, because the
new regexp moves right past the "one" line and so tag1 is
overlooked.

- Carsten




>
> - Carsten
>
>> I.e. in
>>
>> (let* ((re (concat "^" outline-regexp " *\\(\\<\\("
>>                    (mapconcat 'regexp-quote org-todo-keywords-1 "\ 
>> \|")
>>                    (org-re
>>                     "\\>\\)\\)? *\\(.*?\\)\\(:[[:alnum:]_@:]+:\\)? 
>> [ \t]*$")))
>>
>> remove the first "?" if todo-only is t.   Also, regexp-opt might make
>> a more efficient regexp than mapconcat with regexp-quote.
>>
>> Reason for request:  I'm writing an extension of org for setting &
>> checking goals, and want to quickly find entries with headlines of  
>> the
>> form
>>   GOAL ....
>> of which there may be relatively few in a large file.   So, stepping
>> through all entries and then checking them for the GOAL keyword is
>> very inefficient.
>> It would be much faster if the regexp included the GOAL as a keyword.
>>
>> It would be good if the parameter todo-only could be a list of
>> strings, and org-scan-tags would return only the headlines where the
>> todo keyword is from this list.
>> It could use regexp-opt to make an efficient regexp for this.
>>
>> There also seem to be other opportunities for speeding up
>> org-scan-tags in this way: e.g. if the match string includes +mytag,
>> the regexp for the headline could include this as well.
>> Similarly for properties.  Maybe, org-make-tags-matcher could  
>> return a
>> list of tags and properties that must appear in any matching entry.
>>
>> It would also help if the tags matcher expression could refer to text
>> properties stored on the headline -- perhaps, with conditions such as
>> :myprop=X  (i.e. same as for org properties, but property name must  
>> be
>> a keyword).   It already does this for the 'org-category text
>> property.
>> Then one can e.g. mark entries representing unmet goals with text
>> properties, and then use a regular org-tags-view to browse them in a
>> sparsetree
>> or an agenda.
>>
>> Thanks,
>>
>> ilya
>>
>> _______________________________________________
>> Emacs-orgmode mailing list
>> Please use `Reply All' to send replies to the list.
>> Emacs-orgmode@gnu.org
>> http://lists.gnu.org/mailman/listinfo/emacs-orgmode
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: org-scan-tags
  2011-02-03 16:13   ` org-scan-tags Carsten Dominik
@ 2011-02-03 16:37     ` Bastien
  2011-02-03 16:47       ` org-scan-tags Carsten Dominik
  2011-02-05  0:36       ` org-scan-tags Ilya Shlyakhter
  0 siblings, 2 replies; 9+ messages in thread
From: Bastien @ 2011-02-03 16:37 UTC (permalink / raw)
  To: Carsten Dominik; +Cc: Ilya Shlyakhter, emacs-orgmode

Carsten Dominik <carsten.dominik@gmail.com> writes:

> OK, here is an example where it really does fail:
>
>
> * heading
> ** one                                                        :tag1:
> *** two
> *** two 							       :tag2:
> *** TODO two 							       :tag2:
> *** two 							       :tag2:
>
>
> Fold up the tree, then do
>
> C-c / m +tag1/! RET
>
> This should find the "TODO two", but it does not, because the
> new regexp moves right past the "one" line and so tag1 is
> overlooked.

Right, thanks for the detailed example.  I reverted the commit, 
it should be fine again.

-- 
 Bastien

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: org-scan-tags
  2011-02-03 16:37     ` org-scan-tags Bastien
@ 2011-02-03 16:47       ` Carsten Dominik
  2011-02-05  0:36       ` org-scan-tags Ilya Shlyakhter
  1 sibling, 0 replies; 9+ messages in thread
From: Carsten Dominik @ 2011-02-03 16:47 UTC (permalink / raw)
  To: Bastien; +Cc: emacs-orgmode Mode

Thanks for the fast reaction, Bastioen!

- Carsten

On Feb 3, 2011, at 5:37 PM, Bastien wrote:

> Carsten Dominik <carsten.dominik@gmail.com> writes:
>
>> OK, here is an example where it really does fail:
>>
>>
>> * heading
>> ** one                                                        :tag1:
>> *** two
>> *** two 							       :tag2:
>> *** TODO two 							       :tag2:
>> *** two 							       :tag2:
>>
>>
>> Fold up the tree, then do
>>
>> C-c / m +tag1/! RET
>>
>> This should find the "TODO two", but it does not, because the
>> new regexp moves right past the "one" line and so tag1 is
>> overlooked.
>
> Right, thanks for the detailed example.  I reverted the commit,
> it should be fine again.
>
> -- 
> Bastien

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: org-scan-tags
  2011-02-03 16:37     ` org-scan-tags Bastien
  2011-02-03 16:47       ` org-scan-tags Carsten Dominik
@ 2011-02-05  0:36       ` Ilya Shlyakhter
  1 sibling, 0 replies; 9+ messages in thread
From: Ilya Shlyakhter @ 2011-02-05  0:36 UTC (permalink / raw)
  To: Bastien; +Cc: emacs-orgmode, Carsten Dominik

Thanks for catching this, Carsten!

This could perhaps be fixed by doing a full lookup of the tags up the
hierarchy, rather than relying on the cached tags.
This is more expensive, but if fewer entries actually have to be
looked at (because the search only stops at TODO entries),
it might be faster overall.

One general way to speed up searches would be to move as much work as
possible into Emacs' built-in regexp matcher.
When parsing a search expression, right now it is parsed into an elisp
form that is evaluated at each entry and says
whether the entry matches.   Each clause of a search expression could
instead be parsed into an elisp form _and_ a regexp,
such that matching the regexp would be a necessary (but not
sufficient) condition for the entry to match.
E.g. if looking for entries with property PROP equal to 1, you could
construct a regexp that would match only that.
Some things aren't expressible in regexp language so they'd still have
to be checked in lisp.  And tag lookups could not use the
cache.   But if most of the filtering is done by Emacs' regexp
matcher, and only a bit of lisp filtering on top of that,
overall searches might be faster.

On Thu, Feb 3, 2011 at 11:37 AM, Bastien <bastien.guerry@wikimedia.fr> wrote:
> Carsten Dominik <carsten.dominik@gmail.com> writes:
>
>> OK, here is an example where it really does fail:
>>
>>
>> * heading
>> ** one                                                        :tag1:
>> *** two
>> *** two                                                              :tag2:
>> *** TODO two                                                         :tag2:
>> *** two                                                              :tag2:
>>
>>
>> Fold up the tree, then do
>>
>> C-c / m +tag1/! RET
>>
>> This should find the "TODO two", but it does not, because the
>> new regexp moves right past the "one" line and so tag1 is
>> overlooked.
>
> Right, thanks for the detailed example.  I reverted the commit,
> it should be fine again.
>
> --
>  Bastien
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-02-05  0:36 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-15  3:19 org-scan-tags Ilya Shlyakhter
2010-09-15 14:36 ` org-scan-tags Ilya Shlyakhter
2010-09-15 16:13 ` org-scan-tags Ilya Shlyakhter
2011-02-02 23:17 ` org-scan-tags Bastien
2011-02-03  5:32 ` org-scan-tags Carsten Dominik
2011-02-03 16:13   ` org-scan-tags Carsten Dominik
2011-02-03 16:37     ` org-scan-tags Bastien
2011-02-03 16:47       ` org-scan-tags Carsten Dominik
2011-02-05  0:36       ` org-scan-tags Ilya Shlyakhter

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).