emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: John Kitchin <jkitchin@andrew.cmu.edu>
To: Oleg Sivokon <olegsivokon@gmail.com>,
	"emacs-orgmode@gnu.org" <emacs-orgmode@gnu.org>
Subject: Re: Extract document structure from Org file
Date: Sat, 04 Jul 2015 11:54:39 -0400	[thread overview]
Message-ID: <m2pp47ex4w.fsf@andrew.cmu.edu> (raw)
In-Reply-To: <87a8vdfacd.fsf@gmail.com>

I worked out a new version of the swish-e org indexer that indexes
custom xml representing the org file that you may find interesting for
your project.

http://kitchingroup.cheme.cmu.edu/blog/2015/07/04/An-xml-representation-of-an-org-document-for-indexing-with-swish-e/

It enables a search like this:

swish-e -f index-org2xml.swish-e -w src-block.language=python -w src-block=diffusion

to find org files with a python source block containing the word
diffusion.

I think swish-e supports ranking
(http://swish-e.org/docs/swish-faq.html#how_is_ranking_calculated_) too,
but I have not tried it.

It is pretty interesting overall!



Oleg Sivokon writes:

> John Kitchin <jkitchin@andrew.cmu.edu> writes:
>
>> You would use org-element.  Try org-element-parse-buffer and
>> org-element-map and maybe org-element-interpret-data.  There's also a
>> bunch of regexp for identifying/finding particular types of elements.
>
> Thanks! I'm already looking into it.
>
>> That sounds really cool. I recently hacked a swish-e index of my org
>> files (there might have been 3000+!)
>> http://kitchingroup.cheme.cmu.edu/blog/2015/06/25/Integrating-swish-e-and-Emacs/.
>> and
>>
>> I just updated it to index the html version of an org-file so that I
>> take advantage of the structure in the
>> search. http://kitchingroup.cheme.cmu.edu/blog/2015/07/03/Using-swish-e-to-index-org-files-as-html/. It
>> would be cool to have more granular searching though.
>>
>> Is your info project visible
>> anywhere? i can imagine a close-file hook function that updates the
>> database automatically.
>
> Whoa, that's a lot of Org files :) What I wrote so far is on Github, but
> it's in a very early stage, so it's not something you could just drop
> into your Emacs directory and start using right away.
> https://github.com/wvxvw/sphinx-mode
> I've also looked into Swish some time ago.  I also thought about using
> Nepomuk, but, in the later case, I've to admit, I didn't make it through
> the documentation.
>
> The difference in using Sphinx is that it has ranking, and it has a
> relatively terse way of specifying searching criteria.  For example, you
> could ask to search for "some words in this phrase"/3 and it would look
> for occurances of 3 of 5 words given between the quotes.  Or, you could
> ask it to search for @node "R" @contents "printf" "format", and this
> would search for node titles mentioning "R" and having contents with
> words "printf" and "format".
> I've to admit I didn't master it fully (there are far more options and
> settings) but it does something that seems reasonable (if I compare it
> to M-x info-apropos).
>
> I'm also still trying to learn what's the best way to do indenxing, so
> the project is still very raw, but I'll get there one day :)
>
> The ultimate goal is also to write a more human-friendly interface to
> Sphinx, where one could ask questions in a subset of natural language :)
> (but that's a very long way into the future!)
>
> PS. I see that many posts on this list are titled with [O].  What does
> it mean, should I do that too?
>
> Best.
>
> Oleg

--
Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu

      parent reply	other threads:[~2015-07-04 15:54 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-03  8:39 Extract document structure from Org file Oleg Sivokon
2015-07-03 11:58 ` Rasmus
2015-07-03 14:20 ` John Kitchin
     [not found]   ` <87a8vdfacd.fsf@gmail.com>
2015-07-04 15:54     ` John Kitchin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m2pp47ex4w.fsf@andrew.cmu.edu \
    --to=jkitchin@andrew.cmu.edu \
    --cc=emacs-orgmode@gnu.org \
    --cc=olegsivokon@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).