* Boolean word/regexp search problem
@ 2009-11-27 17:18 Matt Lundin
2009-11-27 17:37 ` Matthew Lundin
0 siblings, 1 reply; 4+ messages in thread
From: Matt Lundin @ 2009-11-27 17:18 UTC (permalink / raw)
To: Org Mode
The word/regexp agenda search to work with more than one word or regexp
unless the first word or regexp is also preceded by a "+" or "-".
Take the following example.
--8<---------------cut here---------------start------------->8---
* Org-mode
Org mode is a major mode for Emacs written by Carsten Dominik.
--8<---------------cut here---------------end--------------->8---
Let's say I search for Emacs with "C-a s [RET] Emacs". So far, so good:
this item appears in the results. But let's say I want to narrow down
the search. When I press "[" to add a search term, I see the following
prompt in the minibuffer:
[+-]Word/{Regexp} ...: Emacs +
If I complete the prompt as given ("Emacs +Carsten"), there are no
results.
The search only succeeds if I add a "+" in front of Emacs as well, i.e.,
"+Emacs +Carsten".
The same behavior occurs with exclusion ("-") and with the regexp
search (i.e., brackets).
Two questions:
1) Do boolean word/regexp searches require a "+" or "-" symbol before
the first word/regexp? If so, this is a bit confusing, since tag and
property searches do not require an initial symbol. (E.g.,
"emacs+orgmode" works as a tag search.)
2) If boolean word/regexp do require an initial "+" or "-", could the
prompt after pressing "[" or "]" or "{" or "}" in the search results
buffer be amended to add a plus in front of the first search term?
Here is the relevant portion of the manual:
,----[10.5 Commands in the agenda buffer]
| `[ ] { }'
|
| in search view
| add new search words (`[' and `]') or new regular expressions
| (`{' and `}') to the query string. The opening bracket/brace
| will add a positive search term prefixed by `+', indicating
| that this search term must occur/match in the entry. The
| closing bracket/brace will add a negative search term which
| must not occur/match in the entry for it to be selected.
`----
Thanks,
Matt
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Boolean word/regexp search problem
2009-11-27 17:18 Boolean word/regexp search problem Matt Lundin
@ 2009-11-27 17:37 ` Matthew Lundin
2009-11-27 19:54 ` Matt Lundin
0 siblings, 1 reply; 4+ messages in thread
From: Matthew Lundin @ 2009-11-27 17:37 UTC (permalink / raw)
To: Matt Lundin; +Cc: Org Mode
Matt Lundin <mdl@imapmail.org> writes:
> The word/regexp agenda search to work with more than one word or regexp
> unless the first word or regexp is also preceded by a "+" or "-".
>
> Take the following example.
>
> * Org-mode
>
> Org mode is a major mode for Emacs written by Carsten Dominik.
>
> Let's say I search for Emacs with "C-a s [RET] Emacs". So far, so good:
> this item appears in the results. But let's say I want to narrow down
> the search. When I press "[" to add a search term, I see the following
> prompt in the minibuffer:
>
> [+-]Word/{Regexp} ...: Emacs +
>
> If I complete the prompt as given ("Emacs +Carsten"), there are no
> results.
>
> The search only succeeds if I add a "+" in front of Emacs as well, i.e.,
> "+Emacs +Carsten".
>
> The same behavior occurs with exclusion ("-") and with the regexp
> search (i.e., brackets).
>
> Two questions:
>
> 1) Do boolean word/regexp searches require a "+" or "-" symbol before
> the first word/regexp? If so, this is a bit confusing, since tag and
> property searches do not require an initial symbol. (E.g.,
> "emacs+orgmode" works as a tag search.)
>
> 2) If boolean word/regexp do require an initial "+" or "-", could the
> prompt after pressing "[" or "]" or "{" or "}" in the search results
> buffer be amended to add a plus in front of the first search term?
>
> Here is the relevant portion of the manual:
>
> ,----[10.5 Commands in the agenda buffer]
> | `[ ] { }'
> |
> | in search view
> | add new search words (`[' and `]') or new regular expressions
> | (`{' and `}') to the query string. The opening bracket/brace
> | will add a positive search term prefixed by `+', indicating
> | that this search term must occur/match in the entry. The
> | closing bracket/brace will add a negative search term which
> | must not occur/match in the entry for it to be selected.
> `----
O.K., I just found the variable org-agenda-search-view-search-words-only:
,----
| Non-nil means, the search string is interpreted as individual words
| The search then looks for each word separately in each entry and
| selects entries that have matches for all words.
| When nil, matching as loose words will only take place if the first
| word is preceded by + or -. If that is not the case, the search
| string will just be matched as a substring in the entry, but with
| each space character allowing for any whitespace, including newlines.
`----
Please disregard question one above, at least insofar as it applies to
word searches (I'm still trying to work out the regexps). But re:
question two, would it be worthwhile to add a "+" to the beginning of
the search org-agenda-manipulate-query is invoked in a search agenda
buffer?
Thanks,
Matt
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Boolean word/regexp search problem
2009-11-27 17:37 ` Matthew Lundin
@ 2009-11-27 19:54 ` Matt Lundin
2010-01-05 11:17 ` Carsten Dominik
0 siblings, 1 reply; 4+ messages in thread
From: Matt Lundin @ 2009-11-27 19:54 UTC (permalink / raw)
Cc: Org Mode
Hi Carsten,
Matthew Lundin <mdl@imapmail.org> writes:
> Matt Lundin <mdl@imapmail.org> writes:
>
>> The word/regexp agenda search to work with more than one word or regexp
>> unless the first word or regexp is also preceded by a "+" or "-".
I've investigated this further and beg your permission to offer a few
comments/suggestions.
First, I apologize for missing the change in behavior in the
org-search-view introduced in Org 6.32. Reading the ChangeLog, I now see
the following information:
,----
| Agenda Search view: Search for substrings
|
| The default in search view (C-c a s) is now that the search expression
| is searched for as a substring, i.e. the different words must occur in
| direct sequence, and it may be only part of a word. If you want to
| look for a number of separate keywords with Boolean logic, all words
| must be preceded by + or -.
|
| This was, more-or-less, requested by John Wiegley.
`----
In particular, I see that "all words must be preceded by + or -" for a
boolean search. I've also read the manual section 10.3.5 as well as the
docstring for org-search-view and appreciate that this new behavior can
be turned off with the variable
org-agenda-search-view-search-words-only.
A few comments:
1) I'm wondering whether the substring search should be the default. I
search quite often for two or three words or regexps that I know are in
an entry (regardless of order), while I rarely search for a specific
phrase or sequence of words. Of course, others might disagree.
2) Many web and database search engines use the following convention: a
space between words becomes an automatic AND, while quotation marks
indicate searches for a phrase/substring (i.e., words in sequence).
Having missed the description of the new behavior in the ChangeLog, I
found the new default substring search a bit counter-intuitive. My vote
would be for sloppy boolean searches by default, with quotation marks
reserved for substring searches. But of course, this is not a huge
priority for org-mode development, and I have no idea how difficult it
would be to implement!
3) The new substring search changes the behavior of regexp searches. A
simple regexp search with brackets (e.g, {Carst}) no longer produces any
results unless the brackets are preceded by a +. This is true even if
one is searching only for a single regexp. In other words, regexp
brackets now *must* always be preceded by a plus or a minus. Is this the
intended behavior?
4) Pressing "[" or "]" or "{" or "}" in the agenda buffer adds a "+" or
"-" after the first term in the minibuffer. E.g.,
--8<---------------cut here---------------start------------->8---
[+-]Word/{Regexp} ...: Emacs +
--8<---------------cut here---------------end--------------->8---
But if the user simply adds another term at the cursor (i.e., after the
"+"), the search will fail, since "Emacs" now must also be preceded by a
"+".
Thanks for reading this long email.
- Matt
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Re: Boolean word/regexp search problem
2009-11-27 19:54 ` Matt Lundin
@ 2010-01-05 11:17 ` Carsten Dominik
0 siblings, 0 replies; 4+ messages in thread
From: Carsten Dominik @ 2010-01-05 11:17 UTC (permalink / raw)
To: Matt Lundin; +Cc: Org Mode
Hi Matt,
On Nov 27, 2009, at 8:54 PM, Matt Lundin wrote:
> Hi Carsten,
>
> Matthew Lundin <mdl@imapmail.org> writes:
>
>> Matt Lundin <mdl@imapmail.org> writes:
>>
>>> The word/regexp agenda search to work with more than one word or
>>> regexp
>>> unless the first word or regexp is also preceded by a "+" or "-".
>
> I've investigated this further and beg your permission to offer a few
> comments/suggestions.
>
> First, I apologize for missing the change in behavior in the
> org-search-view introduced in Org 6.32. Reading the ChangeLog, I now
> see
> the following information:
>
> ,----
> | Agenda Search view: Search for substrings
> |
> | The default in search view (C-c a s) is now that the search
> expression
> | is searched for as a substring, i.e. the different words must
> occur in
> | direct sequence, and it may be only part of a word. If you want to
> | look for a number of separate keywords with Boolean logic, all words
> | must be preceded by + or -.
> |
> | This was, more-or-less, requested by John Wiegley.
> `----
>
> In particular, I see that "all words must be preceded by + or -"
In fact, only the first needs the "+", for any additional words, the
plus
is optional, only a "-" is necessary. I have improved the documentation
here.
> for a
> boolean search. I've also read the manual section 10.3.5 as well as
> the
> docstring for org-search-view and appreciate that this new behavior
> can
> be turned off with the variable
> org-agenda-search-view-search-words-only.
>
> A few comments:
>
> 1) I'm wondering whether the substring search should be the default. I
> search quite often for two or three words or regexps that I know are
> in
> an entry (regardless of order), while I rarely search for a specific
> phrase or sequence of words. Of course, others might disagree.
I think the main application is actually not looking for a phrase,
but looking for a partial word - which was impossible before this
change.
>
> 2) Many web and database search engines use the following
> convention: a
> space between words becomes an automatic AND,
That is right.
> while quotation marks
> indicate searches for a phrase/substring (i.e., words in sequence).
Yes. This is a bit of a hassle to implement. But I agree that this
would be nice to have - if the search is Boolean. OK, this is now
in as well.
> Having missed the description of the new behavior in the ChangeLog, I
> found the new default substring search a bit counter-intuitive. My
> vote
> would be for sloppy boolean searches by default, with quotation marks
> reserved for substring searches. But of course, this is not a huge
> priority for org-mode development, and I have no idea how difficult it
> would be to implement!
This is really a matter of taste. John argues in an email to
me for something which is more emacs internally consistent than
consistent with other programs:
> I realize that search engines work differently than Emacs in several
> cases. For example, if you type M-x search-forward, then foo, Emacs
> will do a substring search for foo, not a complete string search.
> In fact, it takes work to get Emacs to do a precise word
> search (you have to re-search, then use \<foo\>), and so it seemed
> odd to me that Org-mode made this its default.
Also, the prompt was really bad, suggesting a Boolean search in any
case.
Now the prompt does a better job, I think.
> 3) The new substring search changes the behavior of regexp searches. A
> simple regexp search with brackets (e.g, {Carst}) no longer produces
> any
> results unless the brackets are preceded by a +. This is true even if
> one is searching only for a single regexp. In other words, regexp
> brackets now *must* always be preceded by a plus or a minus. Is this
> the
> intended behavior?
This is a bug, which I just fixed. If the first thing is a regexp, this
will turn on Boolean search as well. Please verify that this is
indeed fixed.
>
> 4) Pressing "[" or "]" or "{" or "}" in the agenda buffer adds a "+"
> or
> "-" after the first term in the minibuffer. E.g.,
>
> --8<---------------cut here---------------start------------->8---
> [+-]Word/{Regexp} ...: Emacs +
> --8<---------------cut here---------------end--------------->8---
>
> But if the user simply adds another term at the cursor (i.e., after
> the
> "+"), the search will fail, since "Emacs" now must also be preceded
> by a
> "+".
I don't think so, see above, additional "+" is, in fact, optional,
a space is enough.
Another improvement I made is that the "+" is only added by "[" if
the last search was Boolean. If not, you simply get back to edit
the phrase.
> Thanks for reading this long email.
Thanks for putting so much time in helping to improve Org-mode!
I have tried to improve the logic of all this a bit, but I am
sticking with the default for phrase search. It is important
to keep John Wiegley happy :-) and I quite like it this way.
The prompt is now more explicit about what is expected, and
you can default to Boolean search by setting the variable
`org-agenda-search-view-always-boolean' if you prefer.
Hope I am also keeping *you* happy this way :-)
Here is the new docstring for org-search view, which explains
things a bit better.
--------------------------------------------------------------------------
Show all entries that contain a phrase or words or regular expressions.
With optional prefix argument TODO-ONLY, only consider entries that are
TODO entries. The argument STRING can be used to pass a default search
string into this function. If EDIT-AT is non-nil, it means that the
user should get a chance to edit this string, with cursor at position
EDIT-AT.
The search string can be viewed either as a phrase that should be
found as
is, or it can be broken into a number of snippets, each of which must
match
in a Boolean way to select an entry. The default depends on the
variable
`org-agenda-search-view-always-boolean'.
Even if this is turned off (the default) you can always switch to
Boolean search dynamically by preceeding the first word with \"+\" or
\"-\".
The default is a direct search of the whole phrase, where each space in
the search string can expand to an arbitrary amount of whitespace,
including newlines.
If using a Boolean search, the search string is split on whitespace and
each snipped is search separately, with logical AND to select an entry.
Words prefixed with a minus must *not* occur in the entry. Words
without
a prefix or prefixed with a plus must occur in the entry. Matching is
case-insensitive. Words are enclosed by word delimiters (i.e. they must
match whole words, not parts of a word) if
`org-agenda-search-view-force-full-words' is set (default is nil).
Boolean search snippets enclosed by curly braces are interpreted as
regular expressions that must or (when preceeded with \"-\") must not
match in the entry.
- If the search string starts with an asterisk, search only in
headlines.
- If (possibly after the leading star) the search string starts with an
exclamation mark, this also means to look at TODO entries only, an
effect
that can also be achieved with a prefix argument.
- If (possibly after star and exclamation mark) the seatch string starts
with a colon, this will mean that the snippets of the boolean search
must match as full words.
This command searches the agenda files, and in addition the files listed
in `org-agenda-text-search-extra-files'.
- Carsten
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2010-01-05 11:18 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-27 17:18 Boolean word/regexp search problem Matt Lundin
2009-11-27 17:37 ` Matthew Lundin
2009-11-27 19:54 ` Matt Lundin
2010-01-05 11:17 ` Carsten Dominik
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).