* John's amazing indexing posts @ 2015-07-13 9:41 Matt Price 2015-07-13 14:31 ` John Kitchin 0 siblings, 1 reply; 9+ messages in thread From: Matt Price @ 2015-07-13 9:41 UTC (permalink / raw) To: Org Mode [-- Attachment #1: Type: text/plain, Size: 544 bytes --] Not sure if everyone has seen John's latest post about indexing org files with swish-e: http://kitchingroup.cheme.cmu.edu/blog/2015/07/06/Indexing-headlines-in-org-files-with-swish-e-with-laser-sharp-results/ It's very impressive. It strikes me as a step towards an incredibly ambitious project that would bring file indexing inside of Emacs -- so it would not longer be necessary to go out to a shell or a Desktop Search tool in order to find files that contain particular search terms. I'm looking forward to your next steps, John! Matt [-- Attachment #2: Type: text/html, Size: 759 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: John's amazing indexing posts 2015-07-13 9:41 John's amazing indexing posts Matt Price @ 2015-07-13 14:31 ` John Kitchin 2015-07-27 5:16 ` Erik Hetzner 0 siblings, 1 reply; 9+ messages in thread From: John Kitchin @ 2015-07-13 14:31 UTC (permalink / raw) To: Matt Price; +Cc: Org Mode Thanks Matt, That is also my impression of where this will go. Eventually this will move towards a database search engine, e.g. like Oleg's project at https://github.com/wvxvw/sphinx-mode. I am not sure precisely which direction though. Swish-e is nice, but at the moment you cannot incrementally update the database, and full indexing is required every time. I am not sure that is fixable, and swish-e does not do unicode. There are half a dozen or so candidates to go forward on, and they all have some pros and cons to think about. It has a lot of other applications in org too, e.g. a file-system wide agenda, tag search, etc... Matt Price writes: > Not sure if everyone has seen John's latest post about indexing org files > with swish-e: > > http://kitchingroup.cheme.cmu.edu/blog/2015/07/06/Indexing-headlines-in-org-files-with-swish-e-with-laser-sharp-results/ > > It's very impressive. It strikes me as a step towards an incredibly > ambitious project that would bring file indexing inside of Emacs -- so it > would not longer be necessary to go out to a shell or a Desktop Search tool > in order to find files that contain particular search terms. I'm looking > forward to your next steps, John! > > Matt -- Professor John Kitchin Doherty Hall A207F Department of Chemical Engineering Carnegie Mellon University Pittsburgh, PA 15213 412-268-7803 @johnkitchin http://kitchingroup.cheme.cmu.edu ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: John's amazing indexing posts 2015-07-13 14:31 ` John Kitchin @ 2015-07-27 5:16 ` Erik Hetzner 2015-07-27 13:19 ` Oleh Krehel 0 siblings, 1 reply; 9+ messages in thread From: Erik Hetzner @ 2015-07-27 5:16 UTC (permalink / raw) To: Org Mode; +Cc: John Kitchin Hi all, I previously hooked up org with recoll with pretty good results. I’ve written this up for worg, but I have my ssh key on a different machine, so I can’t push now. Here is the info for the record. ** Recoll In order to index using the [[http://www.lesbonscomptes.com/recoll/][recoll]] search engine, you will want to add the following to your =~/.recoll/mimeinfo= file: #+BEGIN_SRC .org = text/x-org .org_archive = text/x-org #+END_SRC You will also need a shell script to convert your org mode files to HTML in batch mode. The script takes as an argument the file to convert and prints the output to stdout. Here is an example: #+BEGIN_SRC sh #!/bin/sh emacs --batch --eval "(progn (find-file \"$1\") (org-html-export-as-html) (set-buffer \"*Org HTML Export*\") (princ (buffer-string)))" #+END_SRC You will also need the following in your =~/.recoll/mimeconf=: #+BEGIN_SRC [index] text/x-org = exec /home/egh/.recoll/rclorg ; mimetype = text/html #+END_SRC Now, rebuild your recoll index. Org mode files should be converted to HTML and indexed. It will take some time, because emacs will be launched for each conversion. An alternative is to use [[http://pandoc.org][pandoc]] to do the conversion. It can be configured as follows in your =~/.recoll/mimeconf= file: #+BEGIN_SRC [index] text/x-org = exec pandoc -s -f org -t html5 ; mimetype = text/html #+END_SRC If you want, you might change the pandoc template or org mode output to generate =meta= tags that will be recognized by recoll. See http://www.lesbonscomptes.com/recoll/usermanual/RCL.PROGRAM.html#RCL.PROGRAM.FILTERS.HTML for details. On Mon, 13 Jul 2015 07:31:31 -0700, John Kitchin <jkitchin@andrew.cmu.edu> wrote: > > Thanks Matt, > > That is also my impression of where this will go. Eventually this will > move towards a database search engine, e.g. like Oleg's project at > https://github.com/wvxvw/sphinx-mode. I am not sure precisely which > direction though. Swish-e is nice, but at the moment you cannot > incrementally update the database, and full indexing is required every > time. I am not sure that is fixable, and swish-e does not do > unicode. There are half a dozen or so candidates to go forward on, and > they all have some pros and cons to think about. > > It has a lot of other applications in org too, e.g. a file-system wide > agenda, tag search, etc... > > > Matt Price writes: > > > Not sure if everyone has seen John's latest post about indexing org files > > with swish-e: > > > > http://kitchingroup.cheme.cmu.edu/blog/2015/07/06/Indexing-headlines-in-org-files-with-swish-e-with-laser-sharp-results/ > > > > It's very impressive. It strikes me as a step towards an incredibly > > ambitious project that would bring file indexing inside of Emacs -- so it > > would not longer be necessary to go out to a shell or a Desktop Search tool > > in order to find files that contain particular search terms. I'm looking > > forward to your next steps, John! > > > > Matt > > -- > Professor John Kitchin > Doherty Hall A207F > Department of Chemical Engineering > Carnegie Mellon University > Pittsburgh, PA 15213 > 412-268-7803 > @johnkitchin > http://kitchingroup.cheme.cmu.edu > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: John's amazing indexing posts 2015-07-27 5:16 ` Erik Hetzner @ 2015-07-27 13:19 ` Oleh Krehel 2015-07-27 14:19 ` John Kitchin 2015-07-27 16:40 ` Erik Hetzner 0 siblings, 2 replies; 9+ messages in thread From: Oleh Krehel @ 2015-07-27 13:19 UTC (permalink / raw) To: Erik Hetzner; +Cc: Org Mode, John Kitchin Hi Erik, Erik Hetzner <egh@e6h.org> writes: > I previously hooked up org with recoll with pretty good results. > ... > http://www.lesbonscomptes.com/recoll/ Thanks for the link. I tried it, and it actually works great on my system (unlike swish-e). And I did no configuration of mimeinfo, I only told it to index my whole org/ directory. If anyone is interested, I've added an Emacs interface to recallq (a shell tool that comes with recoll that you have to build yourself). See counsel-recoll command from https://github.com/abo-abo/swiper/blob/master/counsel.el. Initially, this command gives you a list of files that match the query. After selecting the file, it's searched for the current query. Unfortunately, the -A (abstract) switch isn't as useful as the context that e.g. grep gives, so I went only with the file names. regards, Oleh ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: John's amazing indexing posts 2015-07-27 13:19 ` Oleh Krehel @ 2015-07-27 14:19 ` John Kitchin 2015-07-27 16:40 ` Erik Hetzner 1 sibling, 0 replies; 9+ messages in thread From: John Kitchin @ 2015-07-27 14:19 UTC (permalink / raw) To: Oleh Krehel; +Cc: Erik Hetzner, Org Mode There is also a helm-recoll package available. I came across recoll a few weeks ago when I saw that package! It also basically worked as advertised and I am just looking forward to some free time to figure out how to get the super focused search I worked out for swish-e. Has anyone tried using emacsclient for the html export? Maybe that would speed up indexing if a whole new emacs isn't spun up each time. Oleh Krehel writes: > Hi Erik, > > Erik Hetzner <egh@e6h.org> writes: >> I previously hooked up org with recoll with pretty good results. >> ... >> http://www.lesbonscomptes.com/recoll/ > > Thanks for the link. I tried it, and it actually works great on my > system (unlike swish-e). And I did no configuration of mimeinfo, I only > told it to index my whole org/ directory. > > If anyone is interested, I've added an Emacs interface to recallq (a > shell tool that comes with recoll that you have to build yourself). See > counsel-recoll command from > https://github.com/abo-abo/swiper/blob/master/counsel.el. > > Initially, this command gives you a list of files that match the > query. After selecting the file, it's searched for the current query. > Unfortunately, the -A (abstract) switch isn't as useful as the context > that e.g. grep gives, so I went only with the file names. > > regards, > Oleh -- Professor John Kitchin Doherty Hall A207F Department of Chemical Engineering Carnegie Mellon University Pittsburgh, PA 15213 412-268-7803 @johnkitchin http://kitchingroup.cheme.cmu.edu ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: John's amazing indexing posts 2015-07-27 13:19 ` Oleh Krehel 2015-07-27 14:19 ` John Kitchin @ 2015-07-27 16:40 ` Erik Hetzner 2015-07-28 8:14 ` Oleh Krehel 1 sibling, 1 reply; 9+ messages in thread From: Erik Hetzner @ 2015-07-27 16:40 UTC (permalink / raw) To: Oleh Krehel; +Cc: Org Mode, John Kitchin Hi Oleh, On Mon, 27 Jul 2015 06:19:35 -0700, Oleh Krehel <ohwoeowho@gmail.com> wrote: > > > Hi Erik, > > Erik Hetzner <egh@e6h.org> writes: > > I previously hooked up org with recoll with pretty good results. > > ... > > http://www.lesbonscomptes.com/recoll/ > > Thanks for the link. I tried it, and it actually works great on my > system (unlike swish-e). And I did no configuration of mimeinfo, I only > told it to index my whole org/ directory. Of course! I’ve forgotten exactly what I was intending to accomplish by converting to HTML - I believe I was generating citations - but recoll should be able to index plain text without issue. This will probably work better for org files, as well, since you can search the complete content. > If anyone is interested, I've added an Emacs interface to recallq (a > shell tool that comes with recoll that you have to build yourself). See > counsel-recoll command from > https://github.com/abo-abo/swiper/blob/master/counsel.el. > > Initially, this command gives you a list of files that match the > query. After selecting the file, it's searched for the current query. > Unfortunately, the -A (abstract) switch isn't as useful as the context > that e.g. grep gives, so I went only with the file names. This is great, especially for me, as I’ve been using ivy lately. I believe that you can rewrite using the recoll tool directly instead of recollq, using `recoll -t -b 'search string'`: (defun counsel-recoll-function (string &optional _pred &rest _unused) "Grep in the current directory for STRING." (if (< (length string) 3) (counsel-more-chars 3) (counsel--async-command (format "recoll -t -b '%s'" string)) nil)) If you use `recoll -A -t 'search string'` and do some post processing you could get snippets, too. I can’t see how to do that easily with counsel--async-command, though. best, Erik -- Sent from my free software system <http://fsf.org/>. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: John's amazing indexing posts 2015-07-27 16:40 ` Erik Hetzner @ 2015-07-28 8:14 ` Oleh Krehel 2015-07-31 7:55 ` Xebar Saram 0 siblings, 1 reply; 9+ messages in thread From: Oleh Krehel @ 2015-07-28 8:14 UTC (permalink / raw) To: Erik Hetzner; +Cc: Org Mode, John Kitchin Erik Hetzner <egh@e6h.org> writes: > I believe that you can rewrite using the recoll tool directly instead > of recollq, using `recoll -t -b 'search string'`: > > (defun counsel-recoll-function (string &optional _pred &rest _unused) > "Grep in the current directory for STRING." > (if (< (length string) 3) > (counsel-more-chars 3) > (counsel--async-command > (format "recoll -t -b '%s'" string)) > nil)) > > If you use `recoll -A -t 'search string'` and do some post processing > you could get snippets, too. I can’t see how to do that easily with > counsel--async-command, though. Thanks, Erik. I've merged your pull request. So now it's very easy to start using recoll with Emacs - outside of Emacs the only necessary thing is: sudo apt-get install recoll And inside Emacs it's: package-install counsel I did look into the annotation switch. The thing is that it just shows some database aggregates instead of the actual line context, like grep does. With 30 candidates and no line context, a pure list of files looks simpler than a list of files and a list of out-of-sequence words that each file contains. --Oleh ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: John's amazing indexing posts 2015-07-28 8:14 ` Oleh Krehel @ 2015-07-31 7:55 ` Xebar Saram 2015-07-31 8:31 ` Oleh Krehel 0 siblings, 1 reply; 9+ messages in thread From: Xebar Saram @ 2015-07-31 7:55 UTC (permalink / raw) To: Oleh Krehel; +Cc: Erik Hetzner, Org Mode, John Kitchin [-- Attachment #1: Type: text/plain, Size: 1695 bytes --] Hi Oleh i start using counsel and the counsel-recoll and is quite good, thx for this! a small Q. when i launch the command am i supposed to see the search term inline (like in grep) or just the file name it resides in. currently i just see the filename that contains the search term. example screenshot: https://paste.xinu.at/B77QYh/ best Z On Tue, Jul 28, 2015 at 11:14 AM, Oleh Krehel <ohwoeowho@gmail.com> wrote: > Erik Hetzner <egh@e6h.org> writes: > > > I believe that you can rewrite using the recoll tool directly instead > > of recollq, using `recoll -t -b 'search string'`: > > > > (defun counsel-recoll-function (string &optional _pred &rest _unused) > > "Grep in the current directory for STRING." > > (if (< (length string) 3) > > (counsel-more-chars 3) > > (counsel--async-command > > (format "recoll -t -b '%s'" string)) > > nil)) > > > > If you use `recoll -A -t 'search string'` and do some post processing > > you could get snippets, too. I can’t see how to do that easily with > > counsel--async-command, though. > > Thanks, Erik. I've merged your pull request. So now it's very easy to > start using recoll with Emacs - outside of Emacs the only necessary > thing is: > > sudo apt-get install recoll > > And inside Emacs it's: > > package-install counsel > > I did look into the annotation switch. The thing is that it just shows > some database aggregates instead of the actual line context, like grep > does. With 30 candidates and no line context, a pure list of files looks > simpler than a list of files and a list of out-of-sequence words that > each file contains. > > --Oleh > > [-- Attachment #2: Type: text/html, Size: 2441 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: John's amazing indexing posts 2015-07-31 7:55 ` Xebar Saram @ 2015-07-31 8:31 ` Oleh Krehel 0 siblings, 0 replies; 9+ messages in thread From: Oleh Krehel @ 2015-07-31 8:31 UTC (permalink / raw) To: Xebar Saram; +Cc: Erik Hetzner, Org Mode, John Kitchin Hi Xebar, Xebar Saram <zeltakc@gmail.com> writes: > i start using counsel and the counsel-recoll and is quite good, thx for this! > a small Q. when i launch the command am i supposed to see the search term inline (like in grep) or just the file name it > resides in. currently i just see the filename that contains the search term. example screenshot: > https://paste.xinu.at/B77QYh/ You can't see the search term inline, only the file name. Recoll doesn't support inline context. It has support for abstracts, but they are disjointed, as in not a full sentence, but a summary of keywords. Showing those is a possibility, but at times it's just too much information. For instance, here's one candidate for input 'git': text/plain [file:///home/oleh/Dropbox/org/wiki/git.org] [git.org] 9921 bytes ABSTRACT title my git notes language en ... css articles done http git scm com book en git branching branching workflows git branching workflows closed 2013 ... 22 00 done http git scm com book en distributed git distributed workflows git distributed workflows closed 2013 ... /ABSTRACT I think the file name here is more relevant than the abstract, taking into account that there are 59 other candidates, besides this one. --Oleh ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-07-31 8:38 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-07-13 9:41 John's amazing indexing posts Matt Price 2015-07-13 14:31 ` John Kitchin 2015-07-27 5:16 ` Erik Hetzner 2015-07-27 13:19 ` Oleh Krehel 2015-07-27 14:19 ` John Kitchin 2015-07-27 16:40 ` Erik Hetzner 2015-07-28 8:14 ` Oleh Krehel 2015-07-31 7:55 ` Xebar Saram 2015-07-31 8:31 ` Oleh Krehel
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).