From mboxrd@z Thu Jan 1 00:00:00 1970 From: Samuel Wales Subject: Re: Searching inside of attachments (pdf, odt)? Date: Tue, 13 Oct 2009 10:09:10 -0700 Message-ID: <20524da70910131009oe24948m2fcab864e2c4229a@mail.gmail.com> References: <20091012154016.64827lact9yeuj48@webmail.df.eu> <20524da70910121559v6f9ccdfdi5ad27c8c3db07f79@mail.gmail.com> <20091013100924.147106zsin75yyt0@webmail.df.eu> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Return-path: Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1MxksR-0001sZ-MZ for emacs-orgmode@gnu.org; Tue, 13 Oct 2009 13:09:35 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1MxksN-0001q3-Lf for emacs-orgmode@gnu.org; Tue, 13 Oct 2009 13:09:35 -0400 Received: from [199.232.76.173] (port=43456 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MxksN-0001py-GU for emacs-orgmode@gnu.org; Tue, 13 Oct 2009 13:09:31 -0400 Received: from mail-iw0-f201.google.com ([209.85.223.201]:57825) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1MxksN-0003gQ-3U for emacs-orgmode@gnu.org; Tue, 13 Oct 2009 13:09:31 -0400 Received: by iwn39 with SMTP id 39so6627663iwn.25 for ; Tue, 13 Oct 2009 10:09:30 -0700 (PDT) In-Reply-To: <20091013100924.147106zsin75yyt0@webmail.df.eu> List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Karl Maihofer Cc: emacs-orgmode@gnu.org Hi, My idea is to keep it simple at first. Everybody will come up with great ways to integrate with his favorite IR tool. Here I want to focus on the org interface. The org interface can be the same as any other agenda search, with all the same controls. The back end can use special-purpose textifiers like pdf2text (or whatever) or general-purpose textifiers from IR tools. Doesn't matter. Later, the mechanism can get more fancy if desired. But first, we should implement existing behavior. I often move things to attachments merely because they are large. I don't want search to work differently just because I did that. Search should IMO work the same as it does for outline bodies. This includes regexp syntax. If we use anything other than Emacs, we risk one regexp syntax for attachments and another for outline bodies. That makes me shudder. Later, we can use the fancier IR tools, or use reverse indexes. But not everybody has IR tools installed, and reverse indexes might be premature optimization. If you're worried about speed, this is a perfect, simple application for caching. I'd try it before concluding that it is too slow. If it is, we have a good foundation into which we can hook your favorite IR. I don't think there's a downside to achieving compatibility and full agenda integration first, then only after that doing the fancy stuff. Have you tried the agenda search feature yet? If not, perhaps trying it first will help ground the discussion.