From mboxrd@z Thu Jan 1 00:00:00 1970 From: Roland Everaert Subject: Re: Anyone use 3rd party search tools w/org-mode? Date: Wed, 13 Nov 2019 09:44:38 +0100 Message-ID: <87bltgnp1l.fsf@gmail.com> References: <87wocduh6o.fsf@gmail.com> <878sosdgfq.fsf@ericabrahamsen.net> <87h83f8vod.fsf@ericabrahamsen.net> <87eeyisea1.fsf@gmail.com> <20191108135147.GK27044@volibear> <87d0e2sb2n.fsf@gmail.com> <87k185s4ze.fsf@gmail.com> <20191112130131.GA28797@volibear> Mime-Version: 1.0 Content-Type: text/plain Return-path: Received: from eggs.gnu.org ([2001:470:142:3::10]:39387) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iUoGk-0002Mc-O6 for emacs-orgmode@gnu.org; Wed, 13 Nov 2019 03:44:56 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1iUoGj-0003rs-2Q for emacs-orgmode@gnu.org; Wed, 13 Nov 2019 03:44:54 -0500 Received: from mail-wm1-x336.google.com ([2a00:1450:4864:20::336]:36258) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1iUoGi-0003pt-Po for emacs-orgmode@gnu.org; Wed, 13 Nov 2019 03:44:53 -0500 Received: by mail-wm1-x336.google.com with SMTP id c22so1028196wmd.1 for ; Wed, 13 Nov 2019 00:44:52 -0800 (PST) Received: from tanko (cust-52-2-109-94.dyn.as47377.net. [94.109.2.52]) by smtp.gmail.com with ESMTPSA id k14sm2158606wrw.46.2019.11.13.00.44.46 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Nov 2019 00:44:50 -0800 (PST) In-reply-to: List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: "Emacs-orgmode" To: emacs-orgmode@gnu.org It is not a question of searching and replacing strings in one file, but searching for a document or a set of documents among tenth of document or even more, possibly in various format. Roland. briangpowell . writes: > Emacs (shortened name from "Editor Macros") has the fastest Regular > Expression engine in the world--when you compare the engines that are > programmed to find and display character strings AS YOU TYPE THEM. > > So, just hoping you keep that in mind: As far as editing documents and > searching documents and in some cases replacing strings, there is nothing > faster than Emacs and its native regular expression engine, which is built > for editing tasks--editing tasks that are especially related to and > programmed for searching strings and/or regular expressions as you type > them in > > In many other ways, of course other engines are faster; but, not for > editing and searching and replacing tasks > > And even when you talk about editing multi-gigabyte and even multi-terabyte > files--suggest you look into and try out vlf-mode (i.e. "Very Large File > Mode") for that, just for the fun and excitement of it, if for nothing else. > > So, again, GNU Emacs is by far the world's most powerful editor, and it has > been for many, many years--there is no need for 3rd party tools, maybe > there's a need to investigate the "engines under the hood" and why they > work the way they do. > > On Tue, Nov 12, 2019 at 8:04 AM Russell Adams > wrote: > >> To further explain my setup, I have three libraries of files Personal, >> Technical >> and Business. Personal is all personal data including Org files, Technical >> is >> all whitepapers and vendor documentation, and Business is Org projects and >> other >> matters. Recoll is used to search all of them. >> >> In my shell profile I have a few functions to access each library, and to >> file >> away new documents (ie: I downloaded a whitepaper, and just want to slap >> it into >> a unique directory in the library). >> >> #+BEGIN_EXAMPLE >> # For recoll and library >> func _FileRecoll() { DEST="$HOME/Library/$1/$(date +%Y/%m/%d)" ; mkdir >> -p $DEST ; mv -i "$2" $DEST ; } >> func FileTech() { _FileRecoll "Technical" "$1" ; } >> func FilePersonal() { _FileRecoll "Personal" "$1" ; } >> func FileBiz() { _FileRecoll "Business" "$1" ; } >> >> func recollt() { RECOLL_CONFDIR=~/Library/.recoll-Technical >> ~/scripts/recolltui.sh $@ ; } >> func recollp() { RECOLL_CONFDIR=~/Library/.recoll-Personal >> ~/scripts/recolltui.sh $@ ; } >> func recollb() { RECOLL_CONFDIR=~/Library/.recoll-Business >> ~/scripts/recolltui.sh $@ ; } >> #+END_EXAMPLE >> >> I have a daily cronjob to index those directories: >> >> #+BEGIN_EXAMPLE >> # Recoll >> 00 2 * * * /usr/bin/recollindex -c ${HOME}/Library/.recoll-Personal/ >> >> "${HOME}/Library/.recoll-Personal/recollindex.log" 2>&1 >> 00 3 * * * /usr/bin/recollindex -c ${HOME}/Library/.recoll-Technical/ >> >> "${HOME}/Library/.recoll-Technical/recollindex.log" 2>&1 >> 00 4 * * * /usr/bin/recollindex -c ${HOME}/Library/.recoll-Business/ >> >> "${HOME}/Library/.recoll-Business/recollindex.log" 2>&1 >> #+END_EXAMPLE >> >> Then I have a simple TUI shell script which wraps dialog around recoll's >> CLI. This puts the filename in my clip board for command line pasting, and >> opens >> PDFs in Firefox. >> >> #+BEGIN_EXAMPLE >> #!/bin/sh >> # ~/scripts/recolltui.sh >> >> # requires recollq optional cli binary to be present from recoll package >> # uses base64, xsel, and dialog >> >> DB=$(mktemp) >> MENU=$(mktemp) >> trap 'rm -f -- "${DB}" "${MENU}"' INT TERM HUP EXIT >> >> # Make sure to customize RECOLL_CONFDIR (ie: >> ~/Library/.recoll-Technical) if needed >> >> # query recoll, save the base64 output to $DB as 3 space separated >> columns: row #, title, url >> recollq -e -F "title url" $@ 2>/dev/null | nl > $DB >> >> # copy header into menu >> head -n 2 $DB | while read num rest ; do >> echo "= \"$rest\"" >> $MENU >> done >> >> # Convert results to dialog menu using row # and title + filename as >> list item >> # skip first two lines of results, they are not base64 >> tail -n +3 $DB | while read num title url ; do >> echo "$num \"$(echo "$title" | base64 -w0 -d ) : $(basename "$(echo >> "$url" | base64 -w0 -d | sed 's,file://,,g')")\"" >> $MENU >> done >> >> # ask the user which results to view >> SEL=$(dialog --menu "Search results" 0 0 0 --file $MENU --stdout) >> >> # if a choice was made, open the url in firefox AND copy it to the >> clipboard >> [ $? -eq 0 ] && { >> URL="$(awk "\$1 == $SEL {print \$3}" $DB | base64 -w0 -d)" >> echo "$URL" | sed 's,file://,,g' | xsel >> firefox "$URL" >> } >> >> #+END_EXAMPLE >> >> I've often thought that the dialog script could be easily replaced by an >> Emacs >> interface, but I haven't taken the time to try to write one. >> >> I've found that recoll's indexing in Xapian is excellent. I frequently can >> find >> my search terms in technical documentation very rapidly. The support of >> many >> file types makes it index well. I think my most frequent formats are text >> including Org, PDF, and DOC. >> >> I used to have a "Scrapbook" extension in Firefox which would instantly >> save a >> webpage being viewed into my Personal library. Unfortunately that isn't >> supported on modern Firefox versions so I need to find a replacement for >> that >> functionality. >> >> On Tue, Nov 12, 2019 at 12:34:29PM +0100, Roland Everaert wrote: >> > I had a quick look at the recoll and I notice that there is a python API >> > to update/create index. >> > >> > Maybe something could be developped using the python package recently >> > released by Karl Voit, to feed a recoll index with org data. >> > >> > Roland. >> >> >> ------------------------------------------------------------------ >> Russell Adams RLAdams@AdamsInfoServ.com >> >> PGP Key ID: 0x1160DCB3 http://www.adamsinfoserv.com/ >> >> Fingerprint: 1723 D8CA 4280 1EC9 557F 66E8 1154 E018 1160 DCB3 >> >> -- Luke, use the FOSS Sent from Emacs