emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Samuel Wales <samologist@gmail.com>
To: Karl Maihofer <ignoramus@gmx.de>
Cc: emacs-orgmode@gnu.org
Subject: Re: Searching inside of attachments (pdf, odt)?
Date: Mon, 12 Oct 2009 15:59:30 -0700	[thread overview]
Message-ID: <20524da70910121559v6f9ccdfdi5ad27c8c3db07f79@mail.gmail.com> (raw)
In-Reply-To: <20091012154016.64827lact9yeuj48@webmail.df.eu>

Hi Karl,

I have been thinking about this recently also, but in a
different direction.  I agree that searching inside
attachments is important.

On Mon, Oct 12, 2009 at 06:40, Karl Maihofer <ignoramus@gmx.de> wrote:
> does anyone use something like Lucene[*] with orgmode to search inside
> attachments like pdf- and odt-files? At the moment I use org for

My idea is to use ordinary agenda search like this:

  1) agenda search displays the headline that has the
     attachment.
  2) org uses an alist to determine the correct textifier
     according to extension.  e.g. '((".pdf" . "pdf2text")).
  3) agenda searches normally (as if the contents of the
     attachment were body text).

Note that we are searching only attachments that the agenda
would search.  Thus, "<" in the agenda will work
properly.[1]

Also, note that archived tasks always still point to
attachments.  With the above solution, if you search agenda
files, the results won't be polluted with archived
attachments.  If you use an external solution, you would
have to find a way to exclude the archived attachments.

IR software could still be integrated.  At the very least,
you might choose Lucene as a back-end textifier for all
extensions.

It's not as fancy as integrating IR with all of the IR
features, but it might be a simple solution.


Samuel


[1] This raises another, much more general idea.  Is
there a feature to restrict agenda commands (including
search) to the currently displayed (or even marked) agenda
results?  i.e. you run an agenda search, filter however you
like, then search within the results (or run any custom
agenda command) within those results.  It
would allow fast switching among multiple user-defined
sorting strategies (kind of like filtering with "/"),
which is something I've wanted.  But I just thought of it now, and
don't know if it's a good idea.

  reply	other threads:[~2009-10-12 22:59 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-12 13:40 Searching inside of attachments (pdf, odt)? Karl Maihofer
2009-10-12 22:59 ` Samuel Wales [this message]
2009-10-13  8:09   ` Karl Maihofer
2009-10-13 14:31     ` Tim O'Callaghan
2009-10-13 17:09     ` Samuel Wales
2009-10-14 16:47       ` Karl Maihofer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20524da70910121559v6f9ccdfdi5ad27c8c3db07f79@mail.gmail.com \
    --to=samologist@gmail.com \
    --cc=emacs-orgmode@gnu.org \
    --cc=ignoramus@gmx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).