emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: John Kitchin <jkitchin@andrew.cmu.edu>
To: Oleg Sivokon <olegsivokon@gmail.com>
Cc: emacs-orgmode@gnu.org
Subject: Re: Extract document structure from Org file
Date: Fri, 03 Jul 2015 10:20:04 -0400	[thread overview]
Message-ID: <m24mllmigb.fsf@andrew.cmu.edu> (raw)
In-Reply-To: <87egkpfxe3.fsf@gmail.com>

That sounds really cool. I recently hacked a swish-e index of my org
files (there might have been 3000+!)
http://kitchingroup.cheme.cmu.edu/blog/2015/06/25/Integrating-swish-e-and-Emacs/.
and

I just updated it to index the html version of an org-file so that I
take advantage of the structure in the
search. http://kitchingroup.cheme.cmu.edu/blog/2015/07/03/Using-swish-e-to-index-org-files-as-html/. It
would be cool to have more granular searching though.

Is your info project visible
anywhere? i can imagine a close-file hook function that updates the
database automatically.

Oleg Sivokon writes:

> Hello list!
>
> Suppose I wanted to extract the structure from an Org document, where,
> what's important for me would be to have it cathegorically divided into
> headers, paragraphs of text, technical information and inclusion of
> other documents (code snippets).  How would I do it?
>
> The reason I'm asking is that I've a small project I work on, where I'm
> trying to enhance the search in documents by using indexing combined
> with queries based on things like distance between words, frequency of a
> word appearing in a document and so on.  (I'm using Sphinx for it.)
> I've tried to do this with Info pages, and I liked the results, however,
> in order to do this more intelligently, I'd like to index the documents
> with better granularity (i.e. so that later on I could search assigning
> different weights to words appearing in headers and words appearing in
> comments).
>
> Best.
>
> Oleg

--
Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu

  parent reply	other threads:[~2015-07-03 14:20 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-03  8:39 Extract document structure from Org file Oleg Sivokon
2015-07-03 11:58 ` Rasmus
2015-07-03 14:20 ` John Kitchin [this message]
     [not found]   ` <87a8vdfacd.fsf@gmail.com>
2015-07-04 15:54     ` John Kitchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m24mllmigb.fsf@andrew.cmu.edu \
    --to=jkitchin@andrew.cmu.edu \
    --cc=emacs-orgmode@gnu.org \
    --cc=olegsivokon@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).