John's amazing indexing posts

emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed

* John's amazing indexing posts
@ 2015-07-13  9:41 Matt Price
  2015-07-13 14:31 ` John Kitchin
  0 siblings, 1 reply; 9+ messages in thread
From: Matt Price @ 2015-07-13  9:41 UTC (permalink / raw)
  To: Org Mode

[-- Attachment #1: Type: text/plain, Size: 544 bytes --]

Not sure if everyone has seen John's latest post about indexing org files
with swish-e:

http://kitchingroup.cheme.cmu.edu/blog/2015/07/06/Indexing-headlines-in-org-files-with-swish-e-with-laser-sharp-results/

It's very impressive.  It strikes me as a step towards an incredibly
ambitious project that would bring file indexing inside of Emacs -- so it
would not longer be necessary to go out to a shell or a Desktop Search tool
in order to find files that contain particular search terms.  I'm looking
forward to your next steps, John!

Matt

[-- Attachment #2: Type: text/html, Size: 759 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: John's amazing indexing posts
  2015-07-13  9:41 John's amazing indexing posts Matt Price
@ 2015-07-13 14:31 ` John Kitchin
  2015-07-27  5:16   ` Erik Hetzner
  0 siblings, 1 reply; 9+ messages in thread
From: John Kitchin @ 2015-07-13 14:31 UTC (permalink / raw)
  To: Matt Price; +Cc: Org Mode

Thanks Matt,

That is also my impression of where this will go. Eventually this will
move towards a database search engine, e.g. like Oleg's project at
https://github.com/wvxvw/sphinx-mode. I am not sure precisely which
direction though. Swish-e is nice, but at the moment you cannot
incrementally update the database, and full indexing is required every
time. I am not sure that is fixable, and swish-e does not do
unicode. There are half a dozen or so candidates to go forward on, and
they all have some pros and cons to think about.

It has a lot of other applications in org too, e.g. a file-system wide
agenda, tag search, etc...

Matt Price writes:

> Not sure if everyone has seen John's latest post about indexing org files
> with swish-e:
>
> http://kitchingroup.cheme.cmu.edu/blog/2015/07/06/Indexing-headlines-in-org-files-with-swish-e-with-laser-sharp-results/
>
> It's very impressive.  It strikes me as a step towards an incredibly
> ambitious project that would bring file indexing inside of Emacs -- so it
> would not longer be necessary to go out to a shell or a Desktop Search tool
> in order to find files that contain particular search terms.  I'm looking
> forward to your next steps, John!
>
> Matt

--
Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: John's amazing indexing posts
  2015-07-13 14:31 ` John Kitchin
@ 2015-07-27  5:16   ` Erik Hetzner
  2015-07-27 13:19     ` Oleh Krehel
  0 siblings, 1 reply; 9+ messages in thread
From: Erik Hetzner @ 2015-07-27  5:16 UTC (permalink / raw)
  To: Org Mode; +Cc: John Kitchin

Hi all,

I previously hooked up org with recoll with pretty good results. I’ve
written this up for worg, but I have my ssh key on a different
machine, so I can’t push now. Here is the info for the record.

** Recoll
In order to index using the [[http://www.lesbonscomptes.com/recoll/][recoll]] search engine, you will want to add
the following to your =~/.recoll/mimeinfo= file:

#+BEGIN_SRC
.org  = text/x-org
.org_archive  = text/x-org
#+END_SRC

You will also need a shell script to convert your org mode files to
HTML in batch mode. The script takes as an argument the file to
convert and prints the output to stdout. Here is an example:

#+BEGIN_SRC sh
#!/bin/sh
emacs --batch --eval "(progn (find-file \"$1\") (org-html-export-as-html) (set-buffer \"*Org HTML Export*\") (princ (buffer-string)))"
#+END_SRC

You will also need the following in your =~/.recoll/mimeconf=:

#+BEGIN_SRC
[index]
text/x-org = exec /home/egh/.recoll/rclorg ;
  mimetype = text/html
#+END_SRC

Now, rebuild your recoll index. Org mode files should be converted to
HTML and indexed. It will take some time, because emacs will be
launched for each conversion. An alternative is to use [[http://pandoc.org][pandoc]] to do
the conversion. It can be configured as follows in your
=~/.recoll/mimeconf= file:

#+BEGIN_SRC
[index]
text/x-org = exec pandoc -s -f org -t html5 ;
  mimetype = text/html
#+END_SRC

If you want, you might change the pandoc template or org mode output
to generate =meta= tags that will be recognized by recoll. See
http://www.lesbonscomptes.com/recoll/usermanual/RCL.PROGRAM.html#RCL.PROGRAM.FILTERS.HTML
for details.

On Mon, 13 Jul 2015 07:31:31 -0700,
John Kitchin <jkitchin@andrew.cmu.edu> wrote:
> 
> Thanks Matt,
> 
> That is also my impression of where this will go. Eventually this will
> move towards a database search engine, e.g. like Oleg's project at
> https://github.com/wvxvw/sphinx-mode. I am not sure precisely which
> direction though. Swish-e is nice, but at the moment you cannot
> incrementally update the database, and full indexing is required every
> time. I am not sure that is fixable, and swish-e does not do
> unicode. There are half a dozen or so candidates to go forward on, and
> they all have some pros and cons to think about.
> 
> It has a lot of other applications in org too, e.g. a file-system wide
> agenda, tag search, etc...
> 
> 
> Matt Price writes:
> 
> > Not sure if everyone has seen John's latest post about indexing org files
> > with swish-e:
> >
> > http://kitchingroup.cheme.cmu.edu/blog/2015/07/06/Indexing-headlines-in-org-files-with-swish-e-with-laser-sharp-results/
> >
> > It's very impressive.  It strikes me as a step towards an incredibly
> > ambitious project that would bring file indexing inside of Emacs -- so it
> > would not longer be necessary to go out to a shell or a Desktop Search tool
> > in order to find files that contain particular search terms.  I'm looking
> > forward to your next steps, John!
> >
> > Matt
> 
> --
> Professor John Kitchin
> Doherty Hall A207F
> Department of Chemical Engineering
> Carnegie Mellon University
> Pittsburgh, PA 15213
> 412-268-7803
> @johnkitchin
> http://kitchingroup.cheme.cmu.edu
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: John's amazing indexing posts
  2015-07-27  5:16   ` Erik Hetzner
@ 2015-07-27 13:19     ` Oleh Krehel
  2015-07-27 14:19       ` John Kitchin
  2015-07-27 16:40       ` Erik Hetzner
  0 siblings, 2 replies; 9+ messages in thread
From: Oleh Krehel @ 2015-07-27 13:19 UTC (permalink / raw)
  To: Erik Hetzner; +Cc: Org Mode, John Kitchin

Hi Erik,

Erik Hetzner <egh@e6h.org> writes:
> I previously hooked up org with recoll with pretty good results.
> ...
> http://www.lesbonscomptes.com/recoll/

Thanks for the link. I tried it, and it actually works great on my
system (unlike swish-e). And I did no configuration of mimeinfo, I only
told it to index my whole org/ directory.

If anyone is interested, I've added an Emacs interface to recallq (a
shell tool that comes with recoll that you have to build yourself).  See
counsel-recoll command from
https://github.com/abo-abo/swiper/blob/master/counsel.el.

Initially, this command gives you a list of files that match the
query. After selecting the file, it's searched for the current query.
Unfortunately, the -A (abstract) switch isn't as useful as the context
that e.g. grep gives, so I went only with the file names.

regards,
Oleh

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: John's amazing indexing posts
  2015-07-27 13:19     ` Oleh Krehel
@ 2015-07-27 14:19       ` John Kitchin
  2015-07-27 16:40       ` Erik Hetzner
  1 sibling, 0 replies; 9+ messages in thread
From: John Kitchin @ 2015-07-27 14:19 UTC (permalink / raw)
  To: Oleh Krehel; +Cc: Erik Hetzner, Org Mode

There is also a helm-recoll package available. I came across recoll a
few weeks ago when I saw that package! It also basically worked as
advertised and I am just looking forward to some free time to figure out
how to get the super focused search I worked out for swish-e.

Has anyone tried using emacsclient for the html export? Maybe that would
speed up indexing if a whole new emacs isn't spun up each time.

Oleh Krehel writes:

> Hi Erik,
>
> Erik Hetzner <egh@e6h.org> writes:
>> I previously hooked up org with recoll with pretty good results.
>> ...
>> http://www.lesbonscomptes.com/recoll/
>
> Thanks for the link. I tried it, and it actually works great on my
> system (unlike swish-e). And I did no configuration of mimeinfo, I only
> told it to index my whole org/ directory.
>
> If anyone is interested, I've added an Emacs interface to recallq (a
> shell tool that comes with recoll that you have to build yourself).  See
> counsel-recoll command from
> https://github.com/abo-abo/swiper/blob/master/counsel.el.
>
> Initially, this command gives you a list of files that match the
> query. After selecting the file, it's searched for the current query.
> Unfortunately, the -A (abstract) switch isn't as useful as the context
> that e.g. grep gives, so I went only with the file names.
>
> regards,
> Oleh

--
Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: John's amazing indexing posts
  2015-07-27 13:19     ` Oleh Krehel
  2015-07-27 14:19       ` John Kitchin
@ 2015-07-27 16:40       ` Erik Hetzner
  2015-07-28  8:14         ` Oleh Krehel
  1 sibling, 1 reply; 9+ messages in thread
From: Erik Hetzner @ 2015-07-27 16:40 UTC (permalink / raw)
  To: Oleh Krehel; +Cc: Org Mode, John Kitchin

Hi Oleh,

On Mon, 27 Jul 2015 06:19:35 -0700,
Oleh Krehel <ohwoeowho@gmail.com> wrote:
> 
> 
> Hi Erik,
> 
> Erik Hetzner <egh@e6h.org> writes:
> > I previously hooked up org with recoll with pretty good results.
> > ...
> > http://www.lesbonscomptes.com/recoll/
> 
> Thanks for the link. I tried it, and it actually works great on my
> system (unlike swish-e). And I did no configuration of mimeinfo, I only
> told it to index my whole org/ directory.

Of course! I’ve forgotten exactly what I was intending to accomplish
by converting to HTML - I believe I was generating citations - but
recoll should be able to index plain text without issue. This will
probably work better for org files, as well, since you can search the
complete content.

> If anyone is interested, I've added an Emacs interface to recallq (a
> shell tool that comes with recoll that you have to build yourself).  See
> counsel-recoll command from
> https://github.com/abo-abo/swiper/blob/master/counsel.el.
> 
> Initially, this command gives you a list of files that match the
> query. After selecting the file, it's searched for the current query.
> Unfortunately, the -A (abstract) switch isn't as useful as the context
> that e.g. grep gives, so I went only with the file names.

This is great, especially for me, as I’ve been using ivy lately.

I believe that you can rewrite using the recoll tool directly instead
of recollq, using `recoll -t -b 'search string'`:

(defun counsel-recoll-function (string &optional _pred &rest _unused)
  "Grep in the current directory for STRING."
  (if (< (length string) 3)
      (counsel-more-chars 3)
    (counsel--async-command
     (format "recoll -t -b '%s'" string))
    nil))

If you use `recoll -A -t 'search string'` and do some post processing
you could get snippets, too. I can’t see how to do that easily with
counsel--async-command, though.

best, Erik
--
Sent from my free software system <http://fsf.org/>.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: John's amazing indexing posts
  2015-07-27 16:40       ` Erik Hetzner
@ 2015-07-28  8:14         ` Oleh Krehel
  2015-07-31  7:55           ` Xebar Saram
  0 siblings, 1 reply; 9+ messages in thread
From: Oleh Krehel @ 2015-07-28  8:14 UTC (permalink / raw)
  To: Erik Hetzner; +Cc: Org Mode, John Kitchin

Erik Hetzner <egh@e6h.org> writes:

> I believe that you can rewrite using the recoll tool directly instead
> of recollq, using `recoll -t -b 'search string'`:
>
> (defun counsel-recoll-function (string &optional _pred &rest _unused)
>   "Grep in the current directory for STRING."
>   (if (< (length string) 3)
>       (counsel-more-chars 3)
>     (counsel--async-command
>      (format "recoll -t -b '%s'" string))
>     nil))
>
> If you use `recoll -A -t 'search string'` and do some post processing
> you could get snippets, too. I can’t see how to do that easily with
> counsel--async-command, though.

Thanks, Erik. I've merged your pull request. So now it's very easy to
start using recoll with Emacs - outside of Emacs the only necessary
thing is:

    sudo apt-get install recoll

And inside Emacs it's:

    package-install counsel

I did look into the annotation switch. The thing is that it just shows
some database aggregates instead of the actual line context, like grep
does. With 30 candidates and no line context, a pure list of files looks
simpler than a list of files and a list of out-of-sequence words that
each file contains.

--Oleh

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: John's amazing indexing posts
  2015-07-28  8:14         ` Oleh Krehel
@ 2015-07-31  7:55           ` Xebar Saram
  2015-07-31  8:31             ` Oleh Krehel
  0 siblings, 1 reply; 9+ messages in thread
From: Xebar Saram @ 2015-07-31  7:55 UTC (permalink / raw)
  To: Oleh Krehel; +Cc: Erik Hetzner, Org Mode, John Kitchin

[-- Attachment #1: Type: text/plain, Size: 1695 bytes --]

Hi Oleh

i start using counsel and the counsel-recoll and is quite good, thx for
this!
a small Q. when i launch the command am i supposed to see the search term
inline (like in grep) or just the file name it resides in. currently i just
see the filename that contains the search term. example screenshot:
https://paste.xinu.at/B77QYh/

best

Z

On Tue, Jul 28, 2015 at 11:14 AM, Oleh Krehel <ohwoeowho@gmail.com> wrote:

> Erik Hetzner <egh@e6h.org> writes:
>
> > I believe that you can rewrite using the recoll tool directly instead
> > of recollq, using `recoll -t -b 'search string'`:
> >
> > (defun counsel-recoll-function (string &optional _pred &rest _unused)
> >   "Grep in the current directory for STRING."
> >   (if (< (length string) 3)
> >       (counsel-more-chars 3)
> >     (counsel--async-command
> >      (format "recoll -t -b '%s'" string))
> >     nil))
> >
> > If you use `recoll -A -t 'search string'` and do some post processing
> > you could get snippets, too. I can’t see how to do that easily with
> > counsel--async-command, though.
>
> Thanks, Erik. I've merged your pull request. So now it's very easy to
> start using recoll with Emacs - outside of Emacs the only necessary
> thing is:
>
>     sudo apt-get install recoll
>
> And inside Emacs it's:
>
>     package-install counsel
>
> I did look into the annotation switch. The thing is that it just shows
> some database aggregates instead of the actual line context, like grep
> does. With 30 candidates and no line context, a pure list of files looks
> simpler than a list of files and a list of out-of-sequence words that
> each file contains.
>
> --Oleh
>
>

[-- Attachment #2: Type: text/html, Size: 2441 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: John's amazing indexing posts
  2015-07-31  7:55           ` Xebar Saram
@ 2015-07-31  8:31             ` Oleh Krehel
  0 siblings, 0 replies; 9+ messages in thread
From: Oleh Krehel @ 2015-07-31  8:31 UTC (permalink / raw)
  To: Xebar Saram; +Cc: Erik Hetzner, Org Mode, John Kitchin

Hi Xebar,

Xebar Saram <zeltakc@gmail.com> writes:

> i start using counsel and the counsel-recoll and is quite good, thx for this!
> a small Q. when i launch the command am i supposed to see the search term inline (like in grep) or just the file name it
> resides in. currently i just see the filename that contains the search term. example screenshot:
> https://paste.xinu.at/B77QYh/

You can't see the search term inline, only the file name.  Recoll
doesn't support inline context. It has support for abstracts, but they
are disjointed, as in not a full sentence, but a summary of
keywords. Showing those is a possibility, but at times it's just too
much information.

For instance, here's one candidate for input 'git':

text/plain	[file:///home/oleh/Dropbox/org/wiki/git.org]	[git.org]	9921	bytes	
ABSTRACT
title my git notes language en ... css articles done http git scm com
book en git branching branching workflows git branching workflows closed
2013 ... 22 00 done http git scm com book en distributed git distributed
workflows git distributed workflows closed 2013 ...
/ABSTRACT

I think the file name here is more relevant than the abstract, taking
into account that there are 59 other candidates, besides this one.

--Oleh

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-07-31  8:38 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-13  9:41 John's amazing indexing posts Matt Price
2015-07-13 14:31 ` John Kitchin
2015-07-27  5:16   ` Erik Hetzner
2015-07-27 13:19     ` Oleh Krehel
2015-07-27 14:19       ` John Kitchin
2015-07-27 16:40       ` Erik Hetzner
2015-07-28  8:14         ` Oleh Krehel
2015-07-31  7:55           ` Xebar Saram
2015-07-31  8:31             ` Oleh Krehel

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).