emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* Anyone use 3rd party search tools w/org-mode?
@ 2019-10-30 22:07 Nathan Neff
  2019-10-30 22:17 ` Jean Louis
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Nathan Neff @ 2019-10-30 22:07 UTC (permalink / raw)
  To: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 393 bytes --]

Hello all,

I'm considering "indexing" my org-mode files and haven't done any research
into
this.  I'm sure there's 100 different ways to do this but wanted to ask the
list if anyone
is indexing their org-mode files and using a search tool like Solr, Elastic
or smaller indexing engines to search their org-files.

Emacs integration obviously would be a plus.

Thanks for any feedback,
--Nate

[-- Attachment #2: Type: text/html, Size: 550 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Anyone use 3rd party search tools w/org-mode?
  2019-10-30 22:07 Anyone use 3rd party search tools w/org-mode? Nathan Neff
@ 2019-10-30 22:17 ` Jean Louis
  2019-10-31 15:42   ` Fraga, Eric
  2019-10-31  1:09 ` John Kitchin
  2019-11-06 16:02 ` Roland Everaert
  2 siblings, 1 reply; 18+ messages in thread
From: Jean Louis @ 2019-10-30 22:17 UTC (permalink / raw)
  To: Nathan Neff; +Cc: emacs-orgmode

* Nathan Neff <nathan.neff@gmail.com> [2019-10-30 23:08]:
> Hello all,
> 
> I'm considering indexing my org-mode files and haven't done any research
> into
> this.  I'm sure there's 100 different ways to do this but wanted to ask the
> list if anyone
> is indexing their org-mode files and using a search tool like Solr, Elastic
> or smaller indexing engines to search their org-files.
> 
> Emacs integration obviously would be a plus.

Me using `M-x grep'

You may use tracker or any desktop search.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Anyone use 3rd party search tools w/org-mode?
  2019-10-30 22:07 Anyone use 3rd party search tools w/org-mode? Nathan Neff
  2019-10-30 22:17 ` Jean Louis
@ 2019-10-31  1:09 ` John Kitchin
  2019-11-06 16:02 ` Roland Everaert
  2 siblings, 0 replies; 18+ messages in thread
From: John Kitchin @ 2019-10-31  1:09 UTC (permalink / raw)
  To: emacs-orgmode

I wrote https://github.com/jkitchin/scimax/blob/master/org-db.el to use
sqlite for this. It does not do full text search, I found that too slow
with my files and sqlite.

you might see https://github.com/emacs-helm/helm-recoll.

I have some blog posts with similar ideas:
https://kitchingroup.cheme.cmu.edu/blog/2017/01/03/Find-stuff-in-org-mode-anywhere/

you might find interesting.

Nathan Neff <nathan.neff@gmail.com> writes:

> Hello all,
>
> I'm considering "indexing" my org-mode files and haven't done any research
> into
> this.  I'm sure there's 100 different ways to do this but wanted to ask the
> list if anyone
> is indexing their org-mode files and using a search tool like Solr, Elastic
> or smaller indexing engines to search their org-files.
>
> Emacs integration obviously would be a plus.
>
> Thanks for any feedback,
> --Nate


--
Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Anyone use 3rd party search tools w/org-mode?
  2019-10-30 22:17 ` Jean Louis
@ 2019-10-31 15:42   ` Fraga, Eric
  0 siblings, 0 replies; 18+ messages in thread
From: Fraga, Eric @ 2019-10-31 15:42 UTC (permalink / raw)
  To: Jean Louis; +Cc: emacs-orgmode

On Wednesday, 30 Oct 2019 at 23:17, Jean Louis wrote:
> Me using `M-x grep'

+1

I frequently do

   find ~ -name '*.org'| xargs grep -l XXXX

to search all my org files for XXXX (and using variants of grep like
egrep for full regex).  Have never found the need for more than this
(for my uses).  YMMV, of course.

-- 
Eric S Fraga via Emacs 27.0.50, Org release_9.2.6-552-g8c5a78

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Anyone use 3rd party search tools w/org-mode?
  2019-10-30 22:07 Anyone use 3rd party search tools w/org-mode? Nathan Neff
  2019-10-30 22:17 ` Jean Louis
  2019-10-31  1:09 ` John Kitchin
@ 2019-11-06 16:02 ` Roland Everaert
  2019-11-06 17:33   ` Russell Adams
  2019-11-06 18:10   ` Eric Abrahamsen
  2 siblings, 2 replies; 18+ messages in thread
From: Roland Everaert @ 2019-11-06 16:02 UTC (permalink / raw)
  To: emacs-orgmode

Hello all,

I am interested in a search/indexing engine targeting the org format,
too.

My interest comes from the fact that I have a growing number of org
files and as org-mode has no file archiving feature, AFAIK, searching
needs more and more time to complete.

Moving files, that are no more necessary, outside of my org-directories,
can be tedious and prone to moving the wrong file to the wrong location.

Hence, an indexer could comes in handy, especially if it is optimised
for the Org format (i.e.: it knows what are categories, tags,
properties, etc in an Org file).


Regards,

Roland.

Nathan Neff writes:

> Hello all,
>
> I'm considering "indexing" my org-mode files and haven't done any research
> into
> this.  I'm sure there's 100 different ways to do this but wanted to ask the
> list if anyone
> is indexing their org-mode files and using a search tool like Solr, Elastic
> or smaller indexing engines to search their org-files.
>
> Emacs integration obviously would be a plus.
>
> Thanks for any feedback,
> --Nate


-- 
Luke, use the FOSS

Sent from Emacs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Anyone use 3rd party search tools w/org-mode?
  2019-11-06 16:02 ` Roland Everaert
@ 2019-11-06 17:33   ` Russell Adams
  2019-11-06 18:10   ` Eric Abrahamsen
  1 sibling, 0 replies; 18+ messages in thread
From: Russell Adams @ 2019-11-06 17:33 UTC (permalink / raw)
  To: emacs-orgmode

I use Recoll. It has a GUI, a CLI, and I use a script with dialog to popup
results.

I index all my org files, all my PDFs (vendor technical documentation), email,
etc.

Works great, refreshes daily.

On Wed, Nov 06, 2019 at 05:02:07PM +0100, Roland Everaert wrote:
> Hello all,
>
> I am interested in a search/indexing engine targeting the org format,
> too.
>
> My interest comes from the fact that I have a growing number of org
> files and as org-mode has no file archiving feature, AFAIK, searching
> needs more and more time to complete.
>
> Moving files, that are no more necessary, outside of my org-directories,
> can be tedious and prone to moving the wrong file to the wrong location.
>
> Hence, an indexer could comes in handy, especially if it is optimised
> for the Org format (i.e.: it knows what are categories, tags,
> properties, etc in an Org file).
>
>
> Regards,
>
> Roland.
>
> Nathan Neff writes:
>
> > Hello all,
> >
> > I'm considering "indexing" my org-mode files and haven't done any research
> > into
> > this.  I'm sure there's 100 different ways to do this but wanted to ask the
> > list if anyone
> > is indexing their org-mode files and using a search tool like Solr, Elastic
> > or smaller indexing engines to search their org-files.
> >
> > Emacs integration obviously would be a plus.
> >
> > Thanks for any feedback,
> > --Nate
>
>
> --
> Luke, use the FOSS
>
> Sent from Emacs
>


------------------------------------------------------------------
Russell Adams                            RLAdams@AdamsInfoServ.com

PGP Key ID:     0x1160DCB3           http://www.adamsinfoserv.com/

Fingerprint:    1723 D8CA 4280 1EC9 557F  66E8 1154 E018 1160 DCB3

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Anyone use 3rd party search tools w/org-mode?
  2019-11-06 16:02 ` Roland Everaert
  2019-11-06 17:33   ` Russell Adams
@ 2019-11-06 18:10   ` Eric Abrahamsen
  2019-11-06 20:09     ` John Kitchin
  2019-11-06 21:47     ` Adam Porter
  1 sibling, 2 replies; 18+ messages in thread
From: Eric Abrahamsen @ 2019-11-06 18:10 UTC (permalink / raw)
  To: emacs-orgmode

Roland Everaert <reveatwork@gmail.com> writes:

> Hello all,
>
> I am interested in a search/indexing engine targeting the org format,
> too.
>
> My interest comes from the fact that I have a growing number of org
> files and as org-mode has no file archiving feature, AFAIK, searching
> needs more and more time to complete.
>
> Moving files, that are no more necessary, outside of my org-directories,
> can be tedious and prone to moving the wrong file to the wrong location.
>
> Hence, an indexer could comes in handy, especially if it is optimised
> for the Org format (i.e.: it knows what are categories, tags,
> properties, etc in an Org file).

I think this last point is key. Most full-text search engines provide
config options for defining fields, or "facets", which in theory we
could set up to parse tags/properties/timestamps. My guess is that any
of the major contenders (solr, xapian, lucene) would work pretty much as
well as any of the others -- for our purposes, they probably only differ
in the details. Xapian might be considered "in the family" from a
license standpoint, but I don't know that that matters too much.

It would be fun to provide an Org indexing config for one of these
engines, and then build the Agenda on top of it.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Anyone use 3rd party search tools w/org-mode?
  2019-11-06 18:10   ` Eric Abrahamsen
@ 2019-11-06 20:09     ` John Kitchin
  2019-11-07 17:04       ` Eric Abrahamsen
  2019-11-06 21:47     ` Adam Porter
  1 sibling, 1 reply; 18+ messages in thread
From: John Kitchin @ 2019-11-06 20:09 UTC (permalink / raw)
  To: emacs-orgmode

The way I got Swish to index org files was to create a script that
generated an xml file
(https://kitchingroup.cheme.cmu.edu/blog/2015/07/06/Indexing-headlines-in-org-files-with-swish-e-with-laser-sharp-results/)
or html (http://kitchingroup.cheme.cmu.edu/blog/2015/07/03/Using-swish-e-to-index-org-files-as-html/)
that it could index. This is probably a general strategy for these tools.

Eric Abrahamsen <eric@ericabrahamsen.net> writes:

> Roland Everaert <reveatwork@gmail.com> writes:
>
>> Hello all,
>>
>> I am interested in a search/indexing engine targeting the org format,
>> too.
>>
>> My interest comes from the fact that I have a growing number of org
>> files and as org-mode has no file archiving feature, AFAIK, searching
>> needs more and more time to complete.
>>
>> Moving files, that are no more necessary, outside of my org-directories,
>> can be tedious and prone to moving the wrong file to the wrong location.
>>
>> Hence, an indexer could comes in handy, especially if it is optimised
>> for the Org format (i.e.: it knows what are categories, tags,
>> properties, etc in an Org file).
>
> I think this last point is key. Most full-text search engines provide
> config options for defining fields, or "facets", which in theory we
> could set up to parse tags/properties/timestamps. My guess is that any
> of the major contenders (solr, xapian, lucene) would work pretty much as
> well as any of the others -- for our purposes, they probably only differ
> in the details. Xapian might be considered "in the family" from a
> license standpoint, but I don't know that that matters too much.
>
> It would be fun to provide an Org indexing config for one of these
> engines, and then build the Agenda on top of it.


--
Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Anyone use 3rd party search tools w/org-mode?
  2019-11-06 18:10   ` Eric Abrahamsen
  2019-11-06 20:09     ` John Kitchin
@ 2019-11-06 21:47     ` Adam Porter
  1 sibling, 0 replies; 18+ messages in thread
From: Adam Porter @ 2019-11-06 21:47 UTC (permalink / raw)
  To: emacs-orgmode

Eric Abrahamsen <eric@ericabrahamsen.net> writes:

> I think this last point is key. Most full-text search engines provide
> config options for defining fields, or "facets", which in theory we
> could set up to parse tags/properties/timestamps.

Of course it's an Emacs-based tool, but please note that org-ql has
extensive, optimized support for searching Org-specific data like that.
For example, you could search for those three data types in a single
query, like:

  tags:Emacs,Org property:key1=val1,key2=val2 ts-active:on=2019-11-06

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Anyone use 3rd party search tools w/org-mode?
  2019-11-06 20:09     ` John Kitchin
@ 2019-11-07 17:04       ` Eric Abrahamsen
  2019-11-08 13:12         ` Roland Everaert
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Abrahamsen @ 2019-11-07 17:04 UTC (permalink / raw)
  To: emacs-orgmode

John Kitchin <jkitchin@andrew.cmu.edu> writes:

> The way I got Swish to index org files was to create a script that
> generated an xml file
> (https://kitchingroup.cheme.cmu.edu/blog/2015/07/06/Indexing-headlines-in-org-files-with-swish-e-with-laser-sharp-results/)
> or html
> (http://kitchingroup.cheme.cmu.edu/blog/2015/07/03/Using-swish-e-to-index-org-files-as-html/)
> that it could index. This is probably a general strategy for these tools.

That seems unfortunately roundabout, but I don't know enough about the
various FTS engines to know if they could be taught to read Org files directly...

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Anyone use 3rd party search tools w/org-mode?
  2019-11-07 17:04       ` Eric Abrahamsen
@ 2019-11-08 13:12         ` Roland Everaert
  2019-11-08 13:28           ` John Kitchin
  0 siblings, 1 reply; 18+ messages in thread
From: Roland Everaert @ 2019-11-08 13:12 UTC (permalink / raw)
  To: emacs-orgmode

Is it me or Swish-e is dead?

The url www.swish-e.org, leads to a whisky e-shop oO.
Eric Abrahamsen writes:

> John Kitchin <jkitchin@andrew.cmu.edu> writes:
>
>> The way I got Swish to index org files was to create a script that
>> generated an xml file
>> (https://kitchingroup.cheme.cmu.edu/blog/2015/07/06/Indexing-headlines-in-org-files-with-swish-e-with-laser-sharp-results/)
>> or html
>> (http://kitchingroup.cheme.cmu.edu/blog/2015/07/03/Using-swish-e-to-index-org-files-as-html/)
>> that it could index. This is probably a general strategy for these tools.
>
> That seems unfortunately roundabout, but I don't know enough about the
> various FTS engines to know if they could be taught to read Org files directly...


-- 
Luke, use the FOSS

Sent from Emacs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Anyone use 3rd party search tools w/org-mode?
  2019-11-08 13:12         ` Roland Everaert
@ 2019-11-08 13:28           ` John Kitchin
  2019-11-08 13:51             ` Russell Adams
  0 siblings, 1 reply; 18+ messages in thread
From: John Kitchin @ 2019-11-08 13:28 UTC (permalink / raw)
  To: Roland Everaert; +Cc: org-mode-email

[-- Attachment #1: Type: text/plain, Size: 1458 bytes --]

It could be dead. At the time I worked with it, the project had already
switched to a library form that was not directly useful to me, and the
original swish project was not being further developed. These days, I would
look to something like xapian or postgresql I think (assuming sqlite is not
sufficient for your needs).

John

-----------------------------------
Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu



On Fri, Nov 8, 2019 at 8:13 AM Roland Everaert <reveatwork@gmail.com> wrote:

> Is it me or Swish-e is dead?
>
> The url www.swish-e.org, leads to a whisky e-shop oO.
> Eric Abrahamsen writes:
>
> > John Kitchin <jkitchin@andrew.cmu.edu> writes:
> >
> >> The way I got Swish to index org files was to create a script that
> >> generated an xml file
> >> (
> https://kitchingroup.cheme.cmu.edu/blog/2015/07/06/Indexing-headlines-in-org-files-with-swish-e-with-laser-sharp-results/
> )
> >> or html
> >> (
> http://kitchingroup.cheme.cmu.edu/blog/2015/07/03/Using-swish-e-to-index-org-files-as-html/
> )
> >> that it could index. This is probably a general strategy for these
> tools.
> >
> > That seems unfortunately roundabout, but I don't know enough about the
> > various FTS engines to know if they could be taught to read Org files
> directly...
>
>
> --
> Luke, use the FOSS
>
> Sent from Emacs
>
>

[-- Attachment #2: Type: text/html, Size: 2586 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Anyone use 3rd party search tools w/org-mode?
  2019-11-08 13:28           ` John Kitchin
@ 2019-11-08 13:51             ` Russell Adams
  2019-11-08 14:21               ` Roland Everaert
  0 siblings, 1 reply; 18+ messages in thread
From: Russell Adams @ 2019-11-08 13:51 UTC (permalink / raw)
  To: emacs-orgmode

Recoll is xaipan based.

On Fri, Nov 08, 2019 at 08:28:22AM -0500, John Kitchin wrote:
> It could be dead. At the time I worked with it, the project had already
> switched to a library form that was not directly useful to me, and the
> original swish project was not being further developed. These days, I would
> look to something like xapian or postgresql I think (assuming sqlite is not
> sufficient for your needs).
>
> John
>
> -----------------------------------
> Professor John Kitchin
> Doherty Hall A207F
> Department of Chemical Engineering
> Carnegie Mellon University
> Pittsburgh, PA 15213
> 412-268-7803
> @johnkitchin
> http://kitchingroup.cheme.cmu.edu
>
>
>
> On Fri, Nov 8, 2019 at 8:13 AM Roland Everaert <reveatwork@gmail.com> wrote:
>
> > Is it me or Swish-e is dead?
> >
> > The url www.swish-e.org, leads to a whisky e-shop oO.
> > Eric Abrahamsen writes:
> >
> > > John Kitchin <jkitchin@andrew.cmu.edu> writes:
> > >
> > >> The way I got Swish to index org files was to create a script that
> > >> generated an xml file
> > >> (
> > https://kitchingroup.cheme.cmu.edu/blog/2015/07/06/Indexing-headlines-in-org-files-with-swish-e-with-laser-sharp-results/
> > )
> > >> or html
> > >> (
> > http://kitchingroup.cheme.cmu.edu/blog/2015/07/03/Using-swish-e-to-index-org-files-as-html/
> > )
> > >> that it could index. This is probably a general strategy for these
> > tools.
> > >
> > > That seems unfortunately roundabout, but I don't know enough about the
> > > various FTS engines to know if they could be taught to read Org files
> > directly...
> >
> >
> > --
> > Luke, use the FOSS
> >
> > Sent from Emacs
> >
> >


------------------------------------------------------------------
Russell Adams                            RLAdams@AdamsInfoServ.com

PGP Key ID:     0x1160DCB3           http://www.adamsinfoserv.com/

Fingerprint:    1723 D8CA 4280 1EC9 557F  66E8 1154 E018 1160 DCB3

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Anyone use 3rd party search tools w/org-mode?
  2019-11-08 13:51             ` Russell Adams
@ 2019-11-08 14:21               ` Roland Everaert
  2019-11-12 11:34                 ` Roland Everaert
  0 siblings, 1 reply; 18+ messages in thread
From: Roland Everaert @ 2019-11-08 14:21 UTC (permalink / raw)
  To: emacs-orgmode

Good to know, I will have a look at it when time permit.
Russell Adams writes:

> Recoll is xaipan based.
>
> On Fri, Nov 08, 2019 at 08:28:22AM -0500, John Kitchin wrote:
>> It could be dead. At the time I worked with it, the project had already
>> switched to a library form that was not directly useful to me, and the
>> original swish project was not being further developed. These days, I would
>> look to something like xapian or postgresql I think (assuming sqlite is not
>> sufficient for your needs).
>>
>> John
>>
>> -----------------------------------
>> Professor John Kitchin
>> Doherty Hall A207F
>> Department of Chemical Engineering
>> Carnegie Mellon University
>> Pittsburgh, PA 15213
>> 412-268-7803
>> @johnkitchin
>> http://kitchingroup.cheme.cmu.edu
>>
>>
>>
>> On Fri, Nov 8, 2019 at 8:13 AM Roland Everaert <reveatwork@gmail.com> wrote:
>>
>> > Is it me or Swish-e is dead?
>> >
>> > The url www.swish-e.org, leads to a whisky e-shop oO.
>> > Eric Abrahamsen writes:
>> >
>> > > John Kitchin <jkitchin@andrew.cmu.edu> writes:
>> > >
>> > >> The way I got Swish to index org files was to create a script that
>> > >> generated an xml file
>> > >> (
>> > https://kitchingroup.cheme.cmu.edu/blog/2015/07/06/Indexing-headlines-in-org-files-with-swish-e-with-laser-sharp-results/
>> > )
>> > >> or html
>> > >> (
>> > http://kitchingroup.cheme.cmu.edu/blog/2015/07/03/Using-swish-e-to-index-org-files-as-html/
>> > )
>> > >> that it could index. This is probably a general strategy for these
>> > tools.
>> > >
>> > > That seems unfortunately roundabout, but I don't know enough about the
>> > > various FTS engines to know if they could be taught to read Org files
>> > directly...
>> >
>> >
>> > --
>> > Luke, use the FOSS
>> >
>> > Sent from Emacs
>> >
>> >
>
>
> ------------------------------------------------------------------
> Russell Adams                            RLAdams@AdamsInfoServ.com
>
> PGP Key ID:     0x1160DCB3           http://www.adamsinfoserv.com/
>
> Fingerprint:    1723 D8CA 4280 1EC9 557F  66E8 1154 E018 1160 DCB3


-- 
Luke, use the FOSS

Sent from Emacs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Anyone use 3rd party search tools w/org-mode?
  2019-11-08 14:21               ` Roland Everaert
@ 2019-11-12 11:34                 ` Roland Everaert
  2019-11-12 13:01                   ` Russell Adams
  0 siblings, 1 reply; 18+ messages in thread
From: Roland Everaert @ 2019-11-12 11:34 UTC (permalink / raw)
  To: emacs-orgmode

I had a quick look at the recoll and I notice that there is a python API
to update/create index.

Maybe something could be developped using the python package recently
released by Karl Voit, to feed a recoll index with org data.

Roland.

Roland Everaert writes:

> Good to know, I will have a look at it when time permit.
> Russell Adams writes:
>
>> Recoll is xaipan based.
>>
>> On Fri, Nov 08, 2019 at 08:28:22AM -0500, John Kitchin wrote:
>>> It could be dead. At the time I worked with it, the project had already
>>> switched to a library form that was not directly useful to me, and the
>>> original swish project was not being further developed. These days, I would
>>> look to something like xapian or postgresql I think (assuming sqlite is not
>>> sufficient for your needs).
>>>
>>> John
>>>
>>> -----------------------------------
>>> Professor John Kitchin
>>> Doherty Hall A207F
>>> Department of Chemical Engineering
>>> Carnegie Mellon University
>>> Pittsburgh, PA 15213
>>> 412-268-7803
>>> @johnkitchin
>>> http://kitchingroup.cheme.cmu.edu
>>>
>>>
>>>
>>> On Fri, Nov 8, 2019 at 8:13 AM Roland Everaert <reveatwork@gmail.com> wrote:
>>>
>>> > Is it me or Swish-e is dead?
>>> >
>>> > The url www.swish-e.org, leads to a whisky e-shop oO.
>>> > Eric Abrahamsen writes:
>>> >
>>> > > John Kitchin <jkitchin@andrew.cmu.edu> writes:
>>> > >
>>> > >> The way I got Swish to index org files was to create a script that
>>> > >> generated an xml file
>>> > >> (
>>> > https://kitchingroup.cheme.cmu.edu/blog/2015/07/06/Indexing-headlines-in-org-files-with-swish-e-with-laser-sharp-results/
>>> > )
>>> > >> or html
>>> > >> (
>>> > http://kitchingroup.cheme.cmu.edu/blog/2015/07/03/Using-swish-e-to-index-org-files-as-html/
>>> > )
>>> > >> that it could index. This is probably a general strategy for these
>>> > tools.
>>> > >
>>> > > That seems unfortunately roundabout, but I don't know enough about the
>>> > > various FTS engines to know if they could be taught to read Org files
>>> > directly...
>>> >
>>> >
>>> > --
>>> > Luke, use the FOSS
>>> >
>>> > Sent from Emacs
>>> >
>>> >
>>
>>
>> ------------------------------------------------------------------
>> Russell Adams                            RLAdams@AdamsInfoServ.com
>>
>> PGP Key ID:     0x1160DCB3           http://www.adamsinfoserv.com/
>>
>> Fingerprint:    1723 D8CA 4280 1EC9 557F  66E8 1154 E018 1160 DCB3


-- 
Luke, use the FOSS

Sent from Emacs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Anyone use 3rd party search tools w/org-mode?
  2019-11-12 11:34                 ` Roland Everaert
@ 2019-11-12 13:01                   ` Russell Adams
  2019-11-12 23:02                     ` briangpowell .
  0 siblings, 1 reply; 18+ messages in thread
From: Russell Adams @ 2019-11-12 13:01 UTC (permalink / raw)
  To: emacs-orgmode

To further explain my setup, I have three libraries of files Personal, Technical
and Business. Personal is all personal data including Org files, Technical is
all whitepapers and vendor documentation, and Business is Org projects and other
matters. Recoll is used to search all of them.

In my shell profile I have a few functions to access each library, and to file
away new documents (ie: I downloaded a whitepaper, and just want to slap it into
a unique directory in the library).

#+BEGIN_EXAMPLE
  # For recoll and library
  func _FileRecoll()  { DEST="$HOME/Library/$1/$(date +%Y/%m/%d)" ; mkdir -p $DEST ; mv -i "$2" $DEST ; }
  func FileTech()     { _FileRecoll "Technical" "$1" ; }
  func FilePersonal() { _FileRecoll "Personal"  "$1" ; }
  func FileBiz()      { _FileRecoll "Business"  "$1" ; }

  func recollt() { RECOLL_CONFDIR=~/Library/.recoll-Technical ~/scripts/recolltui.sh $@ ; }
  func recollp() { RECOLL_CONFDIR=~/Library/.recoll-Personal  ~/scripts/recolltui.sh $@ ; }
  func recollb() { RECOLL_CONFDIR=~/Library/.recoll-Business  ~/scripts/recolltui.sh $@ ; }
#+END_EXAMPLE

I have a daily cronjob to index those directories:

#+BEGIN_EXAMPLE
  # Recoll
  00 2  * * * /usr/bin/recollindex -c ${HOME}/Library/.recoll-Personal/  >> "${HOME}/Library/.recoll-Personal/recollindex.log" 2>&1
  00 3  * * * /usr/bin/recollindex -c ${HOME}/Library/.recoll-Technical/ >> "${HOME}/Library/.recoll-Technical/recollindex.log" 2>&1
  00 4  * * * /usr/bin/recollindex -c ${HOME}/Library/.recoll-Business/  >> "${HOME}/Library/.recoll-Business/recollindex.log" 2>&1
#+END_EXAMPLE

Then I have a simple TUI shell script which wraps dialog around recoll's
CLI. This puts the filename in my clip board for command line pasting, and opens
PDFs in Firefox.

#+BEGIN_EXAMPLE
  #!/bin/sh
  # ~/scripts/recolltui.sh

  # requires recollq optional cli binary to be present from recoll package
  # uses base64, xsel, and dialog

  DB=$(mktemp)
  MENU=$(mktemp)
  trap 'rm -f -- "${DB}" "${MENU}"' INT TERM HUP EXIT

  # Make sure to customize RECOLL_CONFDIR (ie: ~/Library/.recoll-Technical) if needed

  # query recoll, save the base64 output to $DB as 3 space separated columns: row #, title, url
  recollq -e -F "title url" $@ 2>/dev/null | nl > $DB

  # copy header into menu
  head -n 2 $DB | while read num rest ; do
      echo "= \"$rest\"" >> $MENU
  done

  # Convert results to dialog menu using row # and title + filename as list item
  # skip first two lines of results, they are not base64
  tail -n +3 $DB | while read num title url ; do
      echo "$num \"$(echo "$title" | base64 -w0 -d ) : $(basename "$(echo "$url" | base64 -w0 -d | sed 's,file://,,g')")\"" >> $MENU
  done

  # ask the user which results to view
  SEL=$(dialog --menu "Search results" 0 0 0 --file $MENU --stdout)

  # if a choice was made, open the url in firefox AND copy it to the clipboard
  [ $? -eq 0 ] && {
      URL="$(awk "\$1 == $SEL  {print \$3}" $DB | base64 -w0 -d)"
      echo "$URL" | sed 's,file://,,g' | xsel
      firefox "$URL"
  }

#+END_EXAMPLE

I've often thought that the dialog script could be easily replaced by an Emacs
interface, but I haven't taken the time to try to write one.

I've found that recoll's indexing in Xapian is excellent. I frequently can find
my search terms in technical documentation very rapidly. The support of many
file types makes it index well. I think my most frequent formats are text
including Org, PDF, and DOC.

I used to have a "Scrapbook" extension in Firefox which would instantly save a
webpage being viewed into my Personal library. Unfortunately that isn't
supported on modern Firefox versions so I need to find a replacement for that
functionality.

On Tue, Nov 12, 2019 at 12:34:29PM +0100, Roland Everaert wrote:
> I had a quick look at the recoll and I notice that there is a python API
> to update/create index.
>
> Maybe something could be developped using the python package recently
> released by Karl Voit, to feed a recoll index with org data.
>
> Roland.


------------------------------------------------------------------
Russell Adams                            RLAdams@AdamsInfoServ.com

PGP Key ID:     0x1160DCB3           http://www.adamsinfoserv.com/

Fingerprint:    1723 D8CA 4280 1EC9 557F  66E8 1154 E018 1160 DCB3

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Anyone use 3rd party search tools w/org-mode?
  2019-11-12 13:01                   ` Russell Adams
@ 2019-11-12 23:02                     ` briangpowell .
  2019-11-13  8:44                       ` Roland Everaert
  0 siblings, 1 reply; 18+ messages in thread
From: briangpowell . @ 2019-11-12 23:02 UTC (permalink / raw)
  To: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 5748 bytes --]

Emacs (shortened name from "Editor Macros") has the fastest Regular
Expression engine in the world--when you compare the engines that are
programmed to find and display character strings AS YOU TYPE THEM.

So, just hoping you keep that in mind: As far as editing documents and
searching documents and in some cases replacing strings, there is nothing
faster than Emacs and its native regular expression engine, which is built
for editing tasks--editing tasks that are especially related to and
programmed for searching strings and/or regular expressions as you type
them in

In many other ways, of course other engines are faster; but, not for
editing and searching and replacing tasks

And even when you talk about editing multi-gigabyte and even multi-terabyte
files--suggest you look into and try out vlf-mode (i.e. "Very Large File
Mode") for that, just for the fun and excitement of it, if for nothing else.

So, again, GNU Emacs is by far the world's most powerful editor, and it has
been for many, many years--there is no need for 3rd party tools, maybe
there's a need to investigate the "engines under the hood" and why they
work the way they do.

On Tue, Nov 12, 2019 at 8:04 AM Russell Adams <RLAdams@adamsinfoserv.com>
wrote:

> To further explain my setup, I have three libraries of files Personal,
> Technical
> and Business. Personal is all personal data including Org files, Technical
> is
> all whitepapers and vendor documentation, and Business is Org projects and
> other
> matters. Recoll is used to search all of them.
>
> In my shell profile I have a few functions to access each library, and to
> file
> away new documents (ie: I downloaded a whitepaper, and just want to slap
> it into
> a unique directory in the library).
>
> #+BEGIN_EXAMPLE
>   # For recoll and library
>   func _FileRecoll()  { DEST="$HOME/Library/$1/$(date +%Y/%m/%d)" ; mkdir
> -p $DEST ; mv -i "$2" $DEST ; }
>   func FileTech()     { _FileRecoll "Technical" "$1" ; }
>   func FilePersonal() { _FileRecoll "Personal"  "$1" ; }
>   func FileBiz()      { _FileRecoll "Business"  "$1" ; }
>
>   func recollt() { RECOLL_CONFDIR=~/Library/.recoll-Technical
> ~/scripts/recolltui.sh $@ ; }
>   func recollp() { RECOLL_CONFDIR=~/Library/.recoll-Personal
> ~/scripts/recolltui.sh $@ ; }
>   func recollb() { RECOLL_CONFDIR=~/Library/.recoll-Business
> ~/scripts/recolltui.sh $@ ; }
> #+END_EXAMPLE
>
> I have a daily cronjob to index those directories:
>
> #+BEGIN_EXAMPLE
>   # Recoll
>   00 2  * * * /usr/bin/recollindex -c ${HOME}/Library/.recoll-Personal/
> >> "${HOME}/Library/.recoll-Personal/recollindex.log" 2>&1
>   00 3  * * * /usr/bin/recollindex -c ${HOME}/Library/.recoll-Technical/
> >> "${HOME}/Library/.recoll-Technical/recollindex.log" 2>&1
>   00 4  * * * /usr/bin/recollindex -c ${HOME}/Library/.recoll-Business/
> >> "${HOME}/Library/.recoll-Business/recollindex.log" 2>&1
> #+END_EXAMPLE
>
> Then I have a simple TUI shell script which wraps dialog around recoll's
> CLI. This puts the filename in my clip board for command line pasting, and
> opens
> PDFs in Firefox.
>
> #+BEGIN_EXAMPLE
>   #!/bin/sh
>   # ~/scripts/recolltui.sh
>
>   # requires recollq optional cli binary to be present from recoll package
>   # uses base64, xsel, and dialog
>
>   DB=$(mktemp)
>   MENU=$(mktemp)
>   trap 'rm -f -- "${DB}" "${MENU}"' INT TERM HUP EXIT
>
>   # Make sure to customize RECOLL_CONFDIR (ie:
> ~/Library/.recoll-Technical) if needed
>
>   # query recoll, save the base64 output to $DB as 3 space separated
> columns: row #, title, url
>   recollq -e -F "title url" $@ 2>/dev/null | nl > $DB
>
>   # copy header into menu
>   head -n 2 $DB | while read num rest ; do
>       echo "= \"$rest\"" >> $MENU
>   done
>
>   # Convert results to dialog menu using row # and title + filename as
> list item
>   # skip first two lines of results, they are not base64
>   tail -n +3 $DB | while read num title url ; do
>       echo "$num \"$(echo "$title" | base64 -w0 -d ) : $(basename "$(echo
> "$url" | base64 -w0 -d | sed 's,file://,,g')")\"" >> $MENU
>   done
>
>   # ask the user which results to view
>   SEL=$(dialog --menu "Search results" 0 0 0 --file $MENU --stdout)
>
>   # if a choice was made, open the url in firefox AND copy it to the
> clipboard
>   [ $? -eq 0 ] && {
>       URL="$(awk "\$1 == $SEL  {print \$3}" $DB | base64 -w0 -d)"
>       echo "$URL" | sed 's,file://,,g' | xsel
>       firefox "$URL"
>   }
>
> #+END_EXAMPLE
>
> I've often thought that the dialog script could be easily replaced by an
> Emacs
> interface, but I haven't taken the time to try to write one.
>
> I've found that recoll's indexing in Xapian is excellent. I frequently can
> find
> my search terms in technical documentation very rapidly. The support of
> many
> file types makes it index well. I think my most frequent formats are text
> including Org, PDF, and DOC.
>
> I used to have a "Scrapbook" extension in Firefox which would instantly
> save a
> webpage being viewed into my Personal library. Unfortunately that isn't
> supported on modern Firefox versions so I need to find a replacement for
> that
> functionality.
>
> On Tue, Nov 12, 2019 at 12:34:29PM +0100, Roland Everaert wrote:
> > I had a quick look at the recoll and I notice that there is a python API
> > to update/create index.
> >
> > Maybe something could be developped using the python package recently
> > released by Karl Voit, to feed a recoll index with org data.
> >
> > Roland.
>
>
> ------------------------------------------------------------------
> Russell Adams                            RLAdams@AdamsInfoServ.com
>
> PGP Key ID:     0x1160DCB3           http://www.adamsinfoserv.com/
>
> Fingerprint:    1723 D8CA 4280 1EC9 557F  66E8 1154 E018 1160 DCB3
>
>

[-- Attachment #2: Type: text/html, Size: 7030 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Anyone use 3rd party search tools w/org-mode?
  2019-11-12 23:02                     ` briangpowell .
@ 2019-11-13  8:44                       ` Roland Everaert
  0 siblings, 0 replies; 18+ messages in thread
From: Roland Everaert @ 2019-11-13  8:44 UTC (permalink / raw)
  To: emacs-orgmode

It is not a question of searching and replacing strings in one file, but
searching for a document or a set of documents among tenth of document or
even more, possibly in various format.

Roland.
briangpowell . writes:

> Emacs (shortened name from "Editor Macros") has the fastest Regular
> Expression engine in the world--when you compare the engines that are
> programmed to find and display character strings AS YOU TYPE THEM.
>
> So, just hoping you keep that in mind: As far as editing documents and
> searching documents and in some cases replacing strings, there is nothing
> faster than Emacs and its native regular expression engine, which is built
> for editing tasks--editing tasks that are especially related to and
> programmed for searching strings and/or regular expressions as you type
> them in
>
> In many other ways, of course other engines are faster; but, not for
> editing and searching and replacing tasks
>
> And even when you talk about editing multi-gigabyte and even multi-terabyte
> files--suggest you look into and try out vlf-mode (i.e. "Very Large File
> Mode") for that, just for the fun and excitement of it, if for nothing else.
>
> So, again, GNU Emacs is by far the world's most powerful editor, and it has
> been for many, many years--there is no need for 3rd party tools, maybe
> there's a need to investigate the "engines under the hood" and why they
> work the way they do.
>
> On Tue, Nov 12, 2019 at 8:04 AM Russell Adams <RLAdams@adamsinfoserv.com>
> wrote:
>
>> To further explain my setup, I have three libraries of files Personal,
>> Technical
>> and Business. Personal is all personal data including Org files, Technical
>> is
>> all whitepapers and vendor documentation, and Business is Org projects and
>> other
>> matters. Recoll is used to search all of them.
>>
>> In my shell profile I have a few functions to access each library, and to
>> file
>> away new documents (ie: I downloaded a whitepaper, and just want to slap
>> it into
>> a unique directory in the library).
>>
>> #+BEGIN_EXAMPLE
>>   # For recoll and library
>>   func _FileRecoll()  { DEST="$HOME/Library/$1/$(date +%Y/%m/%d)" ; mkdir
>> -p $DEST ; mv -i "$2" $DEST ; }
>>   func FileTech()     { _FileRecoll "Technical" "$1" ; }
>>   func FilePersonal() { _FileRecoll "Personal"  "$1" ; }
>>   func FileBiz()      { _FileRecoll "Business"  "$1" ; }
>>
>>   func recollt() { RECOLL_CONFDIR=~/Library/.recoll-Technical
>> ~/scripts/recolltui.sh $@ ; }
>>   func recollp() { RECOLL_CONFDIR=~/Library/.recoll-Personal
>> ~/scripts/recolltui.sh $@ ; }
>>   func recollb() { RECOLL_CONFDIR=~/Library/.recoll-Business
>> ~/scripts/recolltui.sh $@ ; }
>> #+END_EXAMPLE
>>
>> I have a daily cronjob to index those directories:
>>
>> #+BEGIN_EXAMPLE
>>   # Recoll
>>   00 2  * * * /usr/bin/recollindex -c ${HOME}/Library/.recoll-Personal/
>> >> "${HOME}/Library/.recoll-Personal/recollindex.log" 2>&1
>>   00 3  * * * /usr/bin/recollindex -c ${HOME}/Library/.recoll-Technical/
>> >> "${HOME}/Library/.recoll-Technical/recollindex.log" 2>&1
>>   00 4  * * * /usr/bin/recollindex -c ${HOME}/Library/.recoll-Business/
>> >> "${HOME}/Library/.recoll-Business/recollindex.log" 2>&1
>> #+END_EXAMPLE
>>
>> Then I have a simple TUI shell script which wraps dialog around recoll's
>> CLI. This puts the filename in my clip board for command line pasting, and
>> opens
>> PDFs in Firefox.
>>
>> #+BEGIN_EXAMPLE
>>   #!/bin/sh
>>   # ~/scripts/recolltui.sh
>>
>>   # requires recollq optional cli binary to be present from recoll package
>>   # uses base64, xsel, and dialog
>>
>>   DB=$(mktemp)
>>   MENU=$(mktemp)
>>   trap 'rm -f -- "${DB}" "${MENU}"' INT TERM HUP EXIT
>>
>>   # Make sure to customize RECOLL_CONFDIR (ie:
>> ~/Library/.recoll-Technical) if needed
>>
>>   # query recoll, save the base64 output to $DB as 3 space separated
>> columns: row #, title, url
>>   recollq -e -F "title url" $@ 2>/dev/null | nl > $DB
>>
>>   # copy header into menu
>>   head -n 2 $DB | while read num rest ; do
>>       echo "= \"$rest\"" >> $MENU
>>   done
>>
>>   # Convert results to dialog menu using row # and title + filename as
>> list item
>>   # skip first two lines of results, they are not base64
>>   tail -n +3 $DB | while read num title url ; do
>>       echo "$num \"$(echo "$title" | base64 -w0 -d ) : $(basename "$(echo
>> "$url" | base64 -w0 -d | sed 's,file://,,g')")\"" >> $MENU
>>   done
>>
>>   # ask the user which results to view
>>   SEL=$(dialog --menu "Search results" 0 0 0 --file $MENU --stdout)
>>
>>   # if a choice was made, open the url in firefox AND copy it to the
>> clipboard
>>   [ $? -eq 0 ] && {
>>       URL="$(awk "\$1 == $SEL  {print \$3}" $DB | base64 -w0 -d)"
>>       echo "$URL" | sed 's,file://,,g' | xsel
>>       firefox "$URL"
>>   }
>>
>> #+END_EXAMPLE
>>
>> I've often thought that the dialog script could be easily replaced by an
>> Emacs
>> interface, but I haven't taken the time to try to write one.
>>
>> I've found that recoll's indexing in Xapian is excellent. I frequently can
>> find
>> my search terms in technical documentation very rapidly. The support of
>> many
>> file types makes it index well. I think my most frequent formats are text
>> including Org, PDF, and DOC.
>>
>> I used to have a "Scrapbook" extension in Firefox which would instantly
>> save a
>> webpage being viewed into my Personal library. Unfortunately that isn't
>> supported on modern Firefox versions so I need to find a replacement for
>> that
>> functionality.
>>
>> On Tue, Nov 12, 2019 at 12:34:29PM +0100, Roland Everaert wrote:
>> > I had a quick look at the recoll and I notice that there is a python API
>> > to update/create index.
>> >
>> > Maybe something could be developped using the python package recently
>> > released by Karl Voit, to feed a recoll index with org data.
>> >
>> > Roland.
>>
>>
>> ------------------------------------------------------------------
>> Russell Adams                            RLAdams@AdamsInfoServ.com
>>
>> PGP Key ID:     0x1160DCB3           http://www.adamsinfoserv.com/
>>
>> Fingerprint:    1723 D8CA 4280 1EC9 557F  66E8 1154 E018 1160 DCB3
>>
>>


-- 
Luke, use the FOSS

Sent from Emacs

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-11-13  8:44 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-30 22:07 Anyone use 3rd party search tools w/org-mode? Nathan Neff
2019-10-30 22:17 ` Jean Louis
2019-10-31 15:42   ` Fraga, Eric
2019-10-31  1:09 ` John Kitchin
2019-11-06 16:02 ` Roland Everaert
2019-11-06 17:33   ` Russell Adams
2019-11-06 18:10   ` Eric Abrahamsen
2019-11-06 20:09     ` John Kitchin
2019-11-07 17:04       ` Eric Abrahamsen
2019-11-08 13:12         ` Roland Everaert
2019-11-08 13:28           ` John Kitchin
2019-11-08 13:51             ` Russell Adams
2019-11-08 14:21               ` Roland Everaert
2019-11-12 11:34                 ` Roland Everaert
2019-11-12 13:01                   ` Russell Adams
2019-11-12 23:02                     ` briangpowell .
2019-11-13  8:44                       ` Roland Everaert
2019-11-06 21:47     ` Adam Porter

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).