From mboxrd@z Thu Jan 1 00:00:00 1970 From: "briangpowell ." Subject: Re: Anyone use 3rd party search tools w/org-mode? Date: Tue, 12 Nov 2019 18:02:10 -0500 Message-ID: References: <87wocduh6o.fsf@gmail.com> <878sosdgfq.fsf@ericabrahamsen.net> <87h83f8vod.fsf@ericabrahamsen.net> <87eeyisea1.fsf@gmail.com> <20191108135147.GK27044@volibear> <87d0e2sb2n.fsf@gmail.com> <87k185s4ze.fsf@gmail.com> <20191112130131.GA28797@volibear> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="0000000000009b886905972e3d2c" Return-path: Received: from eggs.gnu.org ([2001:470:142:3::10]:45046) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iUfB3-0005kD-Nv for emacs-orgmode@gnu.org; Tue, 12 Nov 2019 18:02:27 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1iUfB1-0001wa-KX for emacs-orgmode@gnu.org; Tue, 12 Nov 2019 18:02:25 -0500 Received: from mail-wr1-x42b.google.com ([2a00:1450:4864:20::42b]:37054) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1iUfB1-0001wD-Ae for emacs-orgmode@gnu.org; Tue, 12 Nov 2019 18:02:23 -0500 Received: by mail-wr1-x42b.google.com with SMTP id t1so144123wrv.4 for ; Tue, 12 Nov 2019 15:02:22 -0800 (PST) In-Reply-To: <20191112130131.GA28797@volibear> List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: "Emacs-orgmode" To: emacs-orgmode --0000000000009b886905972e3d2c Content-Type: text/plain; charset="UTF-8" Emacs (shortened name from "Editor Macros") has the fastest Regular Expression engine in the world--when you compare the engines that are programmed to find and display character strings AS YOU TYPE THEM. So, just hoping you keep that in mind: As far as editing documents and searching documents and in some cases replacing strings, there is nothing faster than Emacs and its native regular expression engine, which is built for editing tasks--editing tasks that are especially related to and programmed for searching strings and/or regular expressions as you type them in In many other ways, of course other engines are faster; but, not for editing and searching and replacing tasks And even when you talk about editing multi-gigabyte and even multi-terabyte files--suggest you look into and try out vlf-mode (i.e. "Very Large File Mode") for that, just for the fun and excitement of it, if for nothing else. So, again, GNU Emacs is by far the world's most powerful editor, and it has been for many, many years--there is no need for 3rd party tools, maybe there's a need to investigate the "engines under the hood" and why they work the way they do. On Tue, Nov 12, 2019 at 8:04 AM Russell Adams wrote: > To further explain my setup, I have three libraries of files Personal, > Technical > and Business. Personal is all personal data including Org files, Technical > is > all whitepapers and vendor documentation, and Business is Org projects and > other > matters. Recoll is used to search all of them. > > In my shell profile I have a few functions to access each library, and to > file > away new documents (ie: I downloaded a whitepaper, and just want to slap > it into > a unique directory in the library). > > #+BEGIN_EXAMPLE > # For recoll and library > func _FileRecoll() { DEST="$HOME/Library/$1/$(date +%Y/%m/%d)" ; mkdir > -p $DEST ; mv -i "$2" $DEST ; } > func FileTech() { _FileRecoll "Technical" "$1" ; } > func FilePersonal() { _FileRecoll "Personal" "$1" ; } > func FileBiz() { _FileRecoll "Business" "$1" ; } > > func recollt() { RECOLL_CONFDIR=~/Library/.recoll-Technical > ~/scripts/recolltui.sh $@ ; } > func recollp() { RECOLL_CONFDIR=~/Library/.recoll-Personal > ~/scripts/recolltui.sh $@ ; } > func recollb() { RECOLL_CONFDIR=~/Library/.recoll-Business > ~/scripts/recolltui.sh $@ ; } > #+END_EXAMPLE > > I have a daily cronjob to index those directories: > > #+BEGIN_EXAMPLE > # Recoll > 00 2 * * * /usr/bin/recollindex -c ${HOME}/Library/.recoll-Personal/ > >> "${HOME}/Library/.recoll-Personal/recollindex.log" 2>&1 > 00 3 * * * /usr/bin/recollindex -c ${HOME}/Library/.recoll-Technical/ > >> "${HOME}/Library/.recoll-Technical/recollindex.log" 2>&1 > 00 4 * * * /usr/bin/recollindex -c ${HOME}/Library/.recoll-Business/ > >> "${HOME}/Library/.recoll-Business/recollindex.log" 2>&1 > #+END_EXAMPLE > > Then I have a simple TUI shell script which wraps dialog around recoll's > CLI. This puts the filename in my clip board for command line pasting, and > opens > PDFs in Firefox. > > #+BEGIN_EXAMPLE > #!/bin/sh > # ~/scripts/recolltui.sh > > # requires recollq optional cli binary to be present from recoll package > # uses base64, xsel, and dialog > > DB=$(mktemp) > MENU=$(mktemp) > trap 'rm -f -- "${DB}" "${MENU}"' INT TERM HUP EXIT > > # Make sure to customize RECOLL_CONFDIR (ie: > ~/Library/.recoll-Technical) if needed > > # query recoll, save the base64 output to $DB as 3 space separated > columns: row #, title, url > recollq -e -F "title url" $@ 2>/dev/null | nl > $DB > > # copy header into menu > head -n 2 $DB | while read num rest ; do > echo "= \"$rest\"" >> $MENU > done > > # Convert results to dialog menu using row # and title + filename as > list item > # skip first two lines of results, they are not base64 > tail -n +3 $DB | while read num title url ; do > echo "$num \"$(echo "$title" | base64 -w0 -d ) : $(basename "$(echo > "$url" | base64 -w0 -d | sed 's,file://,,g')")\"" >> $MENU > done > > # ask the user which results to view > SEL=$(dialog --menu "Search results" 0 0 0 --file $MENU --stdout) > > # if a choice was made, open the url in firefox AND copy it to the > clipboard > [ $? -eq 0 ] && { > URL="$(awk "\$1 == $SEL {print \$3}" $DB | base64 -w0 -d)" > echo "$URL" | sed 's,file://,,g' | xsel > firefox "$URL" > } > > #+END_EXAMPLE > > I've often thought that the dialog script could be easily replaced by an > Emacs > interface, but I haven't taken the time to try to write one. > > I've found that recoll's indexing in Xapian is excellent. I frequently can > find > my search terms in technical documentation very rapidly. The support of > many > file types makes it index well. I think my most frequent formats are text > including Org, PDF, and DOC. > > I used to have a "Scrapbook" extension in Firefox which would instantly > save a > webpage being viewed into my Personal library. Unfortunately that isn't > supported on modern Firefox versions so I need to find a replacement for > that > functionality. > > On Tue, Nov 12, 2019 at 12:34:29PM +0100, Roland Everaert wrote: > > I had a quick look at the recoll and I notice that there is a python API > > to update/create index. > > > > Maybe something could be developped using the python package recently > > released by Karl Voit, to feed a recoll index with org data. > > > > Roland. > > > ------------------------------------------------------------------ > Russell Adams RLAdams@AdamsInfoServ.com > > PGP Key ID: 0x1160DCB3 http://www.adamsinfoserv.com/ > > Fingerprint: 1723 D8CA 4280 1EC9 557F 66E8 1154 E018 1160 DCB3 > > --0000000000009b886905972e3d2c Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Emacs (shortened name from "Editor Macros") has = the fastest Regular Expression engine in the world--when you compare the en= gines that are programmed to find and display character strings AS YOU TYPE= THEM.

So, just hoping you keep that in mind: As far as = editing documents and searching documents and in some cases replacing strin= gs, there is nothing faster than Emacs and its native regular expression en= gine, which is built for editing tasks--editing tasks that are especially r= elated to and programmed for searching strings and/or regular expressions a= s you type them in

In many other ways, of course o= ther engines are faster; but, not for editing and searching and replacing t= asks

And even when you talk about editing multi-gi= gabyte and even multi-terabyte files--suggest you look into and try out vlf= -mode (i.e. "Very Large File Mode") for that, just for the fun an= d excitement of it, if for nothing else.

So, again= , GNU Emacs is by far the world's most powerful editor, and it has been= for many, many years--there is no need for 3rd party tools, maybe there= 9;s a need to investigate the "engines under the hood" and why th= ey work the way they do.

On Tue, Nov 12, 2019 at 8:04 AM Russell Adams= <RLAdams@adamsinfoserv.com= > wrote:
= To further explain my setup, I have three libraries of files Personal, Tech= nical
and Business. Personal is all personal data including Org files, Technical = is
all whitepapers and vendor documentation, and Business is Org projects and = other
matters. Recoll is used to search all of them.

In my shell profile I have a few functions to access each library, and to f= ile
away new documents (ie: I downloaded a whitepaper, and just want to slap it= into
a unique directory in the library).

#+BEGIN_EXAMPLE
=C2=A0 # For recoll and library
=C2=A0 func _FileRecoll()=C2=A0 { DEST=3D"$HOME/Library/$1/$(date +%Y/= %m/%d)" ; mkdir -p $DEST ; mv -i "$2" $DEST ; }
=C2=A0 func FileTech()=C2=A0 =C2=A0 =C2=A0{ _FileRecoll "Technical&quo= t; "$1" ; }
=C2=A0 func FilePersonal() { _FileRecoll "Personal"=C2=A0 "$= 1" ; }
=C2=A0 func FileBiz()=C2=A0 =C2=A0 =C2=A0 { _FileRecoll "Business"= ;=C2=A0 "$1" ; }

=C2=A0 func recollt() { RECOLL_CONFDIR=3D~/Library/.recoll-Technical ~/scri= pts/recolltui.sh $@ ; }
=C2=A0 func recollp() { RECOLL_CONFDIR=3D~/Library/.recoll-Personal=C2=A0 ~= /scripts/recolltui.sh $@ ; }
=C2=A0 func recollb() { RECOLL_CONFDIR=3D~/Library/.recoll-Business=C2=A0 ~= /scripts/recolltui.sh $@ ; }
#+END_EXAMPLE

I have a daily cronjob to index those directories:

#+BEGIN_EXAMPLE
=C2=A0 # Recoll
=C2=A0 00 2=C2=A0 * * * /usr/bin/recollindex -c ${HOME}/Library/.recoll-Per= sonal/=C2=A0 >> "${HOME}/Library/.recoll-Personal/recollindex.lo= g" 2>&1
=C2=A0 00 3=C2=A0 * * * /usr/bin/recollindex -c ${HOME}/Library/.recoll-Tec= hnical/ >> "${HOME}/Library/.recoll-Technical/recollindex.log&qu= ot; 2>&1
=C2=A0 00 4=C2=A0 * * * /usr/bin/recollindex -c ${HOME}/Library/.recoll-Bus= iness/=C2=A0 >> "${HOME}/Library/.recoll-Business/recollindex.lo= g" 2>&1
#+END_EXAMPLE

Then I have a simple TUI shell script which wraps dialog around recoll'= s
CLI. This puts the filename in my clip board for command line pasting, and = opens
PDFs in Firefox.

#+BEGIN_EXAMPLE
=C2=A0 #!/bin/sh
=C2=A0 # ~/scripts/recolltui.sh

=C2=A0 # requires recollq optional cli binary to be present from recoll pac= kage
=C2=A0 # uses base64, xsel, and dialog

=C2=A0 DB=3D$(mktemp)
=C2=A0 MENU=3D$(mktemp)
=C2=A0 trap 'rm -f -- "${DB}" "${MENU}"' INT TE= RM HUP EXIT

=C2=A0 # Make sure to customize RECOLL_CONFDIR (ie: ~/Library/.recoll-Techn= ical) if needed

=C2=A0 # query recoll, save the base64 output to $DB as 3 space separated c= olumns: row #, title, url
=C2=A0 recollq -e -F "title url" $@ 2>/dev/null | nl > $DB<= br>
=C2=A0 # copy header into menu
=C2=A0 head -n 2 $DB | while read num rest ; do
=C2=A0 =C2=A0 =C2=A0 echo "=3D \"$rest\"" >> $MEN= U
=C2=A0 done

=C2=A0 # Convert results to dialog menu using row # and title + filename as= list item
=C2=A0 # skip first two lines of results, they are not base64
=C2=A0 tail -n +3 $DB | while read num title url ; do
=C2=A0 =C2=A0 =C2=A0 echo "$num \"$(echo "$title" | bas= e64 -w0 -d ) : $(basename "$(echo "$url" | base64 -w0 -d | s= ed 's,file://,,g')")\"" >> $MENU
=C2=A0 done

=C2=A0 # ask the user which results to view
=C2=A0 SEL=3D$(dialog --menu "Search results" 0 0 0 --file $MENU = --stdout)

=C2=A0 # if a choice was made, open the url in firefox AND copy it to the c= lipboard
=C2=A0 [ $? -eq 0 ] && {
=C2=A0 =C2=A0 =C2=A0 URL=3D"$(awk "\$1 =3D=3D $SEL=C2=A0 {print \= $3}" $DB | base64 -w0 -d)"
=C2=A0 =C2=A0 =C2=A0 echo "$URL" | sed 's,file://,,g' | x= sel
=C2=A0 =C2=A0 =C2=A0 firefox "$URL"
=C2=A0 }

#+END_EXAMPLE

I've often thought that the dialog script could be easily replaced by a= n Emacs
interface, but I haven't taken the time to try to write one.

I've found that recoll's indexing in Xapian is excellent. I frequen= tly can find
my search terms in technical documentation very rapidly. The support of man= y
file types makes it index well. I think my most frequent formats are text including Org, PDF, and DOC.

I used to have a "Scrapbook" extension in Firefox which would ins= tantly save a
webpage being viewed into my Personal library. Unfortunately that isn't=
supported on modern Firefox versions so I need to find a replacement for th= at
functionality.

On Tue, Nov 12, 2019 at 12:34:29PM +0100, Roland Everaert wrote:
> I had a quick look at the recoll and I notice that there is a python A= PI
> to update/create index.
>
> Maybe something could be developped using the python package recently<= br> > released by Karl Voit, to feed a recoll index with org data.
>
> Roland.


------------------------------------------------------------------
Russell Adams=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 RLAdams@AdamsInfoServ.com

PGP Key ID:=C2=A0 =C2=A0 =C2=A00x1160DCB3=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0http://www.adamsinfoserv.com/

Fingerprint:=C2=A0 =C2=A0 1723 D8CA 4280 1EC9 557F=C2=A0 66E8 1154 E018 116= 0 DCB3

--0000000000009b886905972e3d2c--