emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Roland Everaert <reveatwork@gmail.com>
To: emacs-orgmode@gnu.org
Subject: Re: Anyone use 3rd party search tools w/org-mode?
Date: Wed, 13 Nov 2019 09:44:38 +0100	[thread overview]
Message-ID: <87bltgnp1l.fsf@gmail.com> (raw)
In-Reply-To: <CAFm0skFcZr8t-WfhxPA7qLkLFkxONi943BKfs4JEL0ovAuW5Kg@mail.gmail.com>

It is not a question of searching and replacing strings in one file, but
searching for a document or a set of documents among tenth of document or
even more, possibly in various format.

Roland.
briangpowell . writes:

> Emacs (shortened name from "Editor Macros") has the fastest Regular
> Expression engine in the world--when you compare the engines that are
> programmed to find and display character strings AS YOU TYPE THEM.
>
> So, just hoping you keep that in mind: As far as editing documents and
> searching documents and in some cases replacing strings, there is nothing
> faster than Emacs and its native regular expression engine, which is built
> for editing tasks--editing tasks that are especially related to and
> programmed for searching strings and/or regular expressions as you type
> them in
>
> In many other ways, of course other engines are faster; but, not for
> editing and searching and replacing tasks
>
> And even when you talk about editing multi-gigabyte and even multi-terabyte
> files--suggest you look into and try out vlf-mode (i.e. "Very Large File
> Mode") for that, just for the fun and excitement of it, if for nothing else.
>
> So, again, GNU Emacs is by far the world's most powerful editor, and it has
> been for many, many years--there is no need for 3rd party tools, maybe
> there's a need to investigate the "engines under the hood" and why they
> work the way they do.
>
> On Tue, Nov 12, 2019 at 8:04 AM Russell Adams <RLAdams@adamsinfoserv.com>
> wrote:
>
>> To further explain my setup, I have three libraries of files Personal,
>> Technical
>> and Business. Personal is all personal data including Org files, Technical
>> is
>> all whitepapers and vendor documentation, and Business is Org projects and
>> other
>> matters. Recoll is used to search all of them.
>>
>> In my shell profile I have a few functions to access each library, and to
>> file
>> away new documents (ie: I downloaded a whitepaper, and just want to slap
>> it into
>> a unique directory in the library).
>>
>> #+BEGIN_EXAMPLE
>>   # For recoll and library
>>   func _FileRecoll()  { DEST="$HOME/Library/$1/$(date +%Y/%m/%d)" ; mkdir
>> -p $DEST ; mv -i "$2" $DEST ; }
>>   func FileTech()     { _FileRecoll "Technical" "$1" ; }
>>   func FilePersonal() { _FileRecoll "Personal"  "$1" ; }
>>   func FileBiz()      { _FileRecoll "Business"  "$1" ; }
>>
>>   func recollt() { RECOLL_CONFDIR=~/Library/.recoll-Technical
>> ~/scripts/recolltui.sh $@ ; }
>>   func recollp() { RECOLL_CONFDIR=~/Library/.recoll-Personal
>> ~/scripts/recolltui.sh $@ ; }
>>   func recollb() { RECOLL_CONFDIR=~/Library/.recoll-Business
>> ~/scripts/recolltui.sh $@ ; }
>> #+END_EXAMPLE
>>
>> I have a daily cronjob to index those directories:
>>
>> #+BEGIN_EXAMPLE
>>   # Recoll
>>   00 2  * * * /usr/bin/recollindex -c ${HOME}/Library/.recoll-Personal/
>> >> "${HOME}/Library/.recoll-Personal/recollindex.log" 2>&1
>>   00 3  * * * /usr/bin/recollindex -c ${HOME}/Library/.recoll-Technical/
>> >> "${HOME}/Library/.recoll-Technical/recollindex.log" 2>&1
>>   00 4  * * * /usr/bin/recollindex -c ${HOME}/Library/.recoll-Business/
>> >> "${HOME}/Library/.recoll-Business/recollindex.log" 2>&1
>> #+END_EXAMPLE
>>
>> Then I have a simple TUI shell script which wraps dialog around recoll's
>> CLI. This puts the filename in my clip board for command line pasting, and
>> opens
>> PDFs in Firefox.
>>
>> #+BEGIN_EXAMPLE
>>   #!/bin/sh
>>   # ~/scripts/recolltui.sh
>>
>>   # requires recollq optional cli binary to be present from recoll package
>>   # uses base64, xsel, and dialog
>>
>>   DB=$(mktemp)
>>   MENU=$(mktemp)
>>   trap 'rm -f -- "${DB}" "${MENU}"' INT TERM HUP EXIT
>>
>>   # Make sure to customize RECOLL_CONFDIR (ie:
>> ~/Library/.recoll-Technical) if needed
>>
>>   # query recoll, save the base64 output to $DB as 3 space separated
>> columns: row #, title, url
>>   recollq -e -F "title url" $@ 2>/dev/null | nl > $DB
>>
>>   # copy header into menu
>>   head -n 2 $DB | while read num rest ; do
>>       echo "= \"$rest\"" >> $MENU
>>   done
>>
>>   # Convert results to dialog menu using row # and title + filename as
>> list item
>>   # skip first two lines of results, they are not base64
>>   tail -n +3 $DB | while read num title url ; do
>>       echo "$num \"$(echo "$title" | base64 -w0 -d ) : $(basename "$(echo
>> "$url" | base64 -w0 -d | sed 's,file://,,g')")\"" >> $MENU
>>   done
>>
>>   # ask the user which results to view
>>   SEL=$(dialog --menu "Search results" 0 0 0 --file $MENU --stdout)
>>
>>   # if a choice was made, open the url in firefox AND copy it to the
>> clipboard
>>   [ $? -eq 0 ] && {
>>       URL="$(awk "\$1 == $SEL  {print \$3}" $DB | base64 -w0 -d)"
>>       echo "$URL" | sed 's,file://,,g' | xsel
>>       firefox "$URL"
>>   }
>>
>> #+END_EXAMPLE
>>
>> I've often thought that the dialog script could be easily replaced by an
>> Emacs
>> interface, but I haven't taken the time to try to write one.
>>
>> I've found that recoll's indexing in Xapian is excellent. I frequently can
>> find
>> my search terms in technical documentation very rapidly. The support of
>> many
>> file types makes it index well. I think my most frequent formats are text
>> including Org, PDF, and DOC.
>>
>> I used to have a "Scrapbook" extension in Firefox which would instantly
>> save a
>> webpage being viewed into my Personal library. Unfortunately that isn't
>> supported on modern Firefox versions so I need to find a replacement for
>> that
>> functionality.
>>
>> On Tue, Nov 12, 2019 at 12:34:29PM +0100, Roland Everaert wrote:
>> > I had a quick look at the recoll and I notice that there is a python API
>> > to update/create index.
>> >
>> > Maybe something could be developped using the python package recently
>> > released by Karl Voit, to feed a recoll index with org data.
>> >
>> > Roland.
>>
>>
>> ------------------------------------------------------------------
>> Russell Adams                            RLAdams@AdamsInfoServ.com
>>
>> PGP Key ID:     0x1160DCB3           http://www.adamsinfoserv.com/
>>
>> Fingerprint:    1723 D8CA 4280 1EC9 557F  66E8 1154 E018 1160 DCB3
>>
>>


-- 
Luke, use the FOSS

Sent from Emacs

  reply	other threads:[~2019-11-13  8:44 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-30 22:07 Anyone use 3rd party search tools w/org-mode? Nathan Neff
2019-10-30 22:17 ` Jean Louis
2019-10-31 15:42   ` Fraga, Eric
2019-10-31  1:09 ` John Kitchin
2019-11-06 16:02 ` Roland Everaert
2019-11-06 17:33   ` Russell Adams
2019-11-06 18:10   ` Eric Abrahamsen
2019-11-06 20:09     ` John Kitchin
2019-11-07 17:04       ` Eric Abrahamsen
2019-11-08 13:12         ` Roland Everaert
2019-11-08 13:28           ` John Kitchin
2019-11-08 13:51             ` Russell Adams
2019-11-08 14:21               ` Roland Everaert
2019-11-12 11:34                 ` Roland Everaert
2019-11-12 13:01                   ` Russell Adams
2019-11-12 23:02                     ` briangpowell .
2019-11-13  8:44                       ` Roland Everaert [this message]
2019-11-06 21:47     ` Adam Porter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87bltgnp1l.fsf@gmail.com \
    --to=reveatwork@gmail.com \
    --cc=emacs-orgmode@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).