From: John Kitchin <jkitchin@andrew.cmu.edu>
To: Oleg Sivokon <olegsivokon@gmail.com>
Cc: emacs-orgmode@gnu.org
Subject: Re: Extract document structure from Org file
Date: Fri, 03 Jul 2015 10:20:04 -0400 [thread overview]
Message-ID: <m24mllmigb.fsf@andrew.cmu.edu> (raw)
In-Reply-To: <87egkpfxe3.fsf@gmail.com>
That sounds really cool. I recently hacked a swish-e index of my org
files (there might have been 3000+!)
http://kitchingroup.cheme.cmu.edu/blog/2015/06/25/Integrating-swish-e-and-Emacs/.
and
I just updated it to index the html version of an org-file so that I
take advantage of the structure in the
search. http://kitchingroup.cheme.cmu.edu/blog/2015/07/03/Using-swish-e-to-index-org-files-as-html/. It
would be cool to have more granular searching though.
Is your info project visible
anywhere? i can imagine a close-file hook function that updates the
database automatically.
Oleg Sivokon writes:
> Hello list!
>
> Suppose I wanted to extract the structure from an Org document, where,
> what's important for me would be to have it cathegorically divided into
> headers, paragraphs of text, technical information and inclusion of
> other documents (code snippets). How would I do it?
>
> The reason I'm asking is that I've a small project I work on, where I'm
> trying to enhance the search in documents by using indexing combined
> with queries based on things like distance between words, frequency of a
> word appearing in a document and so on. (I'm using Sphinx for it.)
> I've tried to do this with Info pages, and I liked the results, however,
> in order to do this more intelligently, I'd like to index the documents
> with better granularity (i.e. so that later on I could search assigning
> different weights to words appearing in headers and words appearing in
> comments).
>
> Best.
>
> Oleg
--
Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu
next prev parent reply other threads:[~2015-07-03 14:20 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-03 8:39 Extract document structure from Org file Oleg Sivokon
2015-07-03 11:58 ` Rasmus
2015-07-03 14:20 ` John Kitchin [this message]
[not found] ` <87a8vdfacd.fsf@gmail.com>
2015-07-04 15:54 ` John Kitchin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.orgmode.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m24mllmigb.fsf@andrew.cmu.edu \
--to=jkitchin@andrew.cmu.edu \
--cc=emacs-orgmode@gnu.org \
--cc=olegsivokon@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).