From mboxrd@z Thu Jan 1 00:00:00 1970 From: John Kitchin Subject: Re: Extract document structure from Org file Date: Sat, 04 Jul 2015 11:54:39 -0400 Message-ID: References: <87egkpfxe3.fsf@gmail.com> <87a8vdfacd.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:51974) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZBPmF-00007r-Ct for emacs-orgmode@gnu.org; Sat, 04 Jul 2015 11:54:52 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZBPmA-0001rA-RO for emacs-orgmode@gnu.org; Sat, 04 Jul 2015 11:54:51 -0400 Received: from mail-qg0-x22b.google.com ([2607:f8b0:400d:c04::22b]:34136) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZBPmA-0001pc-MO for emacs-orgmode@gnu.org; Sat, 04 Jul 2015 11:54:46 -0400 Received: by qgii30 with SMTP id i30so56310954qgi.1 for ; Sat, 04 Jul 2015 08:54:44 -0700 (PDT) In-reply-to: <87a8vdfacd.fsf@gmail.com> List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Oleg Sivokon , "emacs-orgmode@gnu.org" I worked out a new version of the swish-e org indexer that indexes custom xml representing the org file that you may find interesting for your project. http://kitchingroup.cheme.cmu.edu/blog/2015/07/04/An-xml-representation-of-an-org-document-for-indexing-with-swish-e/ It enables a search like this: swish-e -f index-org2xml.swish-e -w src-block.language=python -w src-block=diffusion to find org files with a python source block containing the word diffusion. I think swish-e supports ranking (http://swish-e.org/docs/swish-faq.html#how_is_ranking_calculated_) too, but I have not tried it. It is pretty interesting overall! Oleg Sivokon writes: > John Kitchin writes: > >> You would use org-element. Try org-element-parse-buffer and >> org-element-map and maybe org-element-interpret-data. There's also a >> bunch of regexp for identifying/finding particular types of elements. > > Thanks! I'm already looking into it. > >> That sounds really cool. I recently hacked a swish-e index of my org >> files (there might have been 3000+!) >> http://kitchingroup.cheme.cmu.edu/blog/2015/06/25/Integrating-swish-e-and-Emacs/. >> and >> >> I just updated it to index the html version of an org-file so that I >> take advantage of the structure in the >> search. http://kitchingroup.cheme.cmu.edu/blog/2015/07/03/Using-swish-e-to-index-org-files-as-html/. It >> would be cool to have more granular searching though. >> >> Is your info project visible >> anywhere? i can imagine a close-file hook function that updates the >> database automatically. > > Whoa, that's a lot of Org files :) What I wrote so far is on Github, but > it's in a very early stage, so it's not something you could just drop > into your Emacs directory and start using right away. > https://github.com/wvxvw/sphinx-mode > I've also looked into Swish some time ago. I also thought about using > Nepomuk, but, in the later case, I've to admit, I didn't make it through > the documentation. > > The difference in using Sphinx is that it has ranking, and it has a > relatively terse way of specifying searching criteria. For example, you > could ask to search for "some words in this phrase"/3 and it would look > for occurances of 3 of 5 words given between the quotes. Or, you could > ask it to search for @node "R" @contents "printf" "format", and this > would search for node titles mentioning "R" and having contents with > words "printf" and "format". > I've to admit I didn't master it fully (there are far more options and > settings) but it does something that seems reasonable (if I compare it > to M-x info-apropos). > > I'm also still trying to learn what's the best way to do indenxing, so > the project is still very raw, but I'll get there one day :) > > The ultimate goal is also to write a more human-friendly interface to > Sphinx, where one could ask questions in a subset of natural language :) > (but that's a very long way into the future!) > > PS. I see that many posts on this list are titled with [O]. What does > it mean, should I do that too? > > Best. > > Oleg -- Professor John Kitchin Doherty Hall A207F Department of Chemical Engineering Carnegie Mellon University Pittsburgh, PA 15213 412-268-7803 @johnkitchin http://kitchingroup.cheme.cmu.edu