From mboxrd@z Thu Jan 1 00:00:00 1970 From: Carsten Dominik Subject: Re: Re: Boolean word/regexp search problem Date: Tue, 5 Jan 2010 12:17:59 +0100 Message-ID: <242B91B8-9615-49E6-A245-ABFB3E29EDBC@gmail.com> References: Mime-Version: 1.0 (Apple Message framework v936) Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Return-path: Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NS7QS-0000Xp-Br for emacs-orgmode@gnu.org; Tue, 05 Jan 2010 06:18:12 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NS7QM-0000Or-2k for emacs-orgmode@gnu.org; Tue, 05 Jan 2010 06:18:11 -0500 Received: from [199.232.76.173] (port=54813 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NS7QL-0000Ob-Kg for emacs-orgmode@gnu.org; Tue, 05 Jan 2010 06:18:05 -0500 Received: from mail-ew0-f224.google.com ([209.85.219.224]:53586) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NS7QL-0007Un-7a for emacs-orgmode@gnu.org; Tue, 05 Jan 2010 06:18:05 -0500 Received: by ewy24 with SMTP id 24so19034962ewy.26 for ; Tue, 05 Jan 2010 03:18:02 -0800 (PST) In-Reply-To: List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Matt Lundin Cc: Org Mode Hi Matt, On Nov 27, 2009, at 8:54 PM, Matt Lundin wrote: > Hi Carsten, > > Matthew Lundin writes: > >> Matt Lundin writes: >> >>> The word/regexp agenda search to work with more than one word or >>> regexp >>> unless the first word or regexp is also preceded by a "+" or "-". > > I've investigated this further and beg your permission to offer a few > comments/suggestions. > > First, I apologize for missing the change in behavior in the > org-search-view introduced in Org 6.32. Reading the ChangeLog, I now > see > the following information: > > ,---- > | Agenda Search view: Search for substrings > | > | The default in search view (C-c a s) is now that the search > expression > | is searched for as a substring, i.e. the different words must > occur in > | direct sequence, and it may be only part of a word. If you want to > | look for a number of separate keywords with Boolean logic, all words > | must be preceded by + or -. > | > | This was, more-or-less, requested by John Wiegley. > `---- > > In particular, I see that "all words must be preceded by + or -" In fact, only the first needs the "+", for any additional words, the plus is optional, only a "-" is necessary. I have improved the documentation here. > for a > boolean search. I've also read the manual section 10.3.5 as well as > the > docstring for org-search-view and appreciate that this new behavior > can > be turned off with the variable > org-agenda-search-view-search-words-only. > > A few comments: > > 1) I'm wondering whether the substring search should be the default. I > search quite often for two or three words or regexps that I know are > in > an entry (regardless of order), while I rarely search for a specific > phrase or sequence of words. Of course, others might disagree. I think the main application is actually not looking for a phrase, but looking for a partial word - which was impossible before this change. > > 2) Many web and database search engines use the following > convention: a > space between words becomes an automatic AND, That is right. > while quotation marks > indicate searches for a phrase/substring (i.e., words in sequence). Yes. This is a bit of a hassle to implement. But I agree that this would be nice to have - if the search is Boolean. OK, this is now in as well. > Having missed the description of the new behavior in the ChangeLog, I > found the new default substring search a bit counter-intuitive. My > vote > would be for sloppy boolean searches by default, with quotation marks > reserved for substring searches. But of course, this is not a huge > priority for org-mode development, and I have no idea how difficult it > would be to implement! This is really a matter of taste. John argues in an email to me for something which is more emacs internally consistent than consistent with other programs: > I realize that search engines work differently than Emacs in several > cases. For example, if you type M-x search-forward, then foo, Emacs > will do a substring search for foo, not a complete string search. > In fact, it takes work to get Emacs to do a precise word > search (you have to re-search, then use \), and so it seemed > odd to me that Org-mode made this its default. Also, the prompt was really bad, suggesting a Boolean search in any case. Now the prompt does a better job, I think. > 3) The new substring search changes the behavior of regexp searches. A > simple regexp search with brackets (e.g, {Carst}) no longer produces > any > results unless the brackets are preceded by a +. This is true even if > one is searching only for a single regexp. In other words, regexp > brackets now *must* always be preceded by a plus or a minus. Is this > the > intended behavior? This is a bug, which I just fixed. If the first thing is a regexp, this will turn on Boolean search as well. Please verify that this is indeed fixed. > > 4) Pressing "[" or "]" or "{" or "}" in the agenda buffer adds a "+" > or > "-" after the first term in the minibuffer. E.g., > > --8<---------------cut here---------------start------------->8--- > [+-]Word/{Regexp} ...: Emacs + > --8<---------------cut here---------------end--------------->8--- > > But if the user simply adds another term at the cursor (i.e., after > the > "+"), the search will fail, since "Emacs" now must also be preceded > by a > "+". I don't think so, see above, additional "+" is, in fact, optional, a space is enough. Another improvement I made is that the "+" is only added by "[" if the last search was Boolean. If not, you simply get back to edit the phrase. > Thanks for reading this long email. Thanks for putting so much time in helping to improve Org-mode! I have tried to improve the logic of all this a bit, but I am sticking with the default for phrase search. It is important to keep John Wiegley happy :-) and I quite like it this way. The prompt is now more explicit about what is expected, and you can default to Boolean search by setting the variable `org-agenda-search-view-always-boolean' if you prefer. Hope I am also keeping *you* happy this way :-) Here is the new docstring for org-search view, which explains things a bit better. -------------------------------------------------------------------------- Show all entries that contain a phrase or words or regular expressions. With optional prefix argument TODO-ONLY, only consider entries that are TODO entries. The argument STRING can be used to pass a default search string into this function. If EDIT-AT is non-nil, it means that the user should get a chance to edit this string, with cursor at position EDIT-AT. The search string can be viewed either as a phrase that should be found as is, or it can be broken into a number of snippets, each of which must match in a Boolean way to select an entry. The default depends on the variable `org-agenda-search-view-always-boolean'. Even if this is turned off (the default) you can always switch to Boolean search dynamically by preceeding the first word with \"+\" or \"-\". The default is a direct search of the whole phrase, where each space in the search string can expand to an arbitrary amount of whitespace, including newlines. If using a Boolean search, the search string is split on whitespace and each snipped is search separately, with logical AND to select an entry. Words prefixed with a minus must *not* occur in the entry. Words without a prefix or prefixed with a plus must occur in the entry. Matching is case-insensitive. Words are enclosed by word delimiters (i.e. they must match whole words, not parts of a word) if `org-agenda-search-view-force-full-words' is set (default is nil). Boolean search snippets enclosed by curly braces are interpreted as regular expressions that must or (when preceeded with \"-\") must not match in the entry. - If the search string starts with an asterisk, search only in headlines. - If (possibly after the leading star) the search string starts with an exclamation mark, this also means to look at TODO entries only, an effect that can also be achieved with a prefix argument. - If (possibly after star and exclamation mark) the seatch string starts with a colon, this will mean that the snippets of the boolean search must match as full words. This command searches the agenda files, and in addition the files listed in `org-agenda-text-search-extra-files'. - Carsten