From mboxrd@z Thu Jan 1 00:00:00 1970 From: Samuel Wales Subject: Re: Searching inside of attachments (pdf, odt)? Date: Mon, 12 Oct 2009 15:59:30 -0700 Message-ID: <20524da70910121559v6f9ccdfdi5ad27c8c3db07f79@mail.gmail.com> References: <20091012154016.64827lact9yeuj48@webmail.df.eu> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Return-path: Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1MxTrx-0003Ix-RV for emacs-orgmode@gnu.org; Mon, 12 Oct 2009 18:59:57 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1MxTrt-0003Dj-7Y for emacs-orgmode@gnu.org; Mon, 12 Oct 2009 18:59:57 -0400 Received: from [199.232.76.173] (port=53823 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MxTrt-0003Db-2w for emacs-orgmode@gnu.org; Mon, 12 Oct 2009 18:59:53 -0400 Received: from mail-iw0-f201.google.com ([209.85.223.201]:60120) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1MxTrs-0004Ki-Nw for emacs-orgmode@gnu.org; Mon, 12 Oct 2009 18:59:52 -0400 Received: by iwn39 with SMTP id 39so6031202iwn.25 for ; Mon, 12 Oct 2009 15:59:51 -0700 (PDT) In-Reply-To: <20091012154016.64827lact9yeuj48@webmail.df.eu> List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Karl Maihofer Cc: emacs-orgmode@gnu.org Hi Karl, I have been thinking about this recently also, but in a different direction. I agree that searching inside attachments is important. On Mon, Oct 12, 2009 at 06:40, Karl Maihofer wrote: > does anyone use something like Lucene[*] with orgmode to search inside > attachments like pdf- and odt-files? At the moment I use org for My idea is to use ordinary agenda search like this: 1) agenda search displays the headline that has the attachment. 2) org uses an alist to determine the correct textifier according to extension. e.g. '((".pdf" . "pdf2text")). 3) agenda searches normally (as if the contents of the attachment were body text). Note that we are searching only attachments that the agenda would search. Thus, "<" in the agenda will work properly.[1] Also, note that archived tasks always still point to attachments. With the above solution, if you search agenda files, the results won't be polluted with archived attachments. If you use an external solution, you would have to find a way to exclude the archived attachments. IR software could still be integrated. At the very least, you might choose Lucene as a back-end textifier for all extensions. It's not as fancy as integrating IR with all of the IR features, but it might be a simple solution. Samuel [1] This raises another, much more general idea. Is there a feature to restrict agenda commands (including search) to the currently displayed (or even marked) agenda results? i.e. you run an agenda search, filter however you like, then search within the results (or run any custom agenda command) within those results. It would allow fast switching among multiple user-defined sorting strategies (kind of like filtering with "/"), which is something I've wanted. But I just thought of it now, and don't know if it's a good idea.