From mboxrd@z Thu Jan 1 00:00:00 1970 From: John Kitchin Subject: Re: Extract document structure from Org file Date: Fri, 03 Jul 2015 10:20:04 -0400 Message-ID: References: <87egkpfxe3.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:53660) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZB1p6-0007Dm-Lz for emacs-orgmode@gnu.org; Fri, 03 Jul 2015 10:20:13 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZB1p1-0003gP-Ng for emacs-orgmode@gnu.org; Fri, 03 Jul 2015 10:20:12 -0400 Received: from mail-qg0-x22b.google.com ([2607:f8b0:400d:c04::22b]:35975) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZB1p1-0003g1-JJ for emacs-orgmode@gnu.org; Fri, 03 Jul 2015 10:20:07 -0400 Received: by qgeg89 with SMTP id g89so46859150qge.3 for ; Fri, 03 Jul 2015 07:20:07 -0700 (PDT) In-reply-to: <87egkpfxe3.fsf@gmail.com> List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Oleg Sivokon Cc: emacs-orgmode@gnu.org That sounds really cool. I recently hacked a swish-e index of my org files (there might have been 3000+!) http://kitchingroup.cheme.cmu.edu/blog/2015/06/25/Integrating-swish-e-and-Emacs/. and I just updated it to index the html version of an org-file so that I take advantage of the structure in the search. http://kitchingroup.cheme.cmu.edu/blog/2015/07/03/Using-swish-e-to-index-org-files-as-html/. It would be cool to have more granular searching though. Is your info project visible anywhere? i can imagine a close-file hook function that updates the database automatically. Oleg Sivokon writes: > Hello list! > > Suppose I wanted to extract the structure from an Org document, where, > what's important for me would be to have it cathegorically divided into > headers, paragraphs of text, technical information and inclusion of > other documents (code snippets). How would I do it? > > The reason I'm asking is that I've a small project I work on, where I'm > trying to enhance the search in documents by using indexing combined > with queries based on things like distance between words, frequency of a > word appearing in a document and so on. (I'm using Sphinx for it.) > I've tried to do this with Info pages, and I liked the results, however, > in order to do this more intelligently, I'd like to index the documents > with better granularity (i.e. so that later on I could search assigning > different weights to words appearing in headers and words appearing in > comments). > > Best. > > Oleg -- Professor John Kitchin Doherty Hall A207F Department of Chemical Engineering Carnegie Mellon University Pittsburgh, PA 15213 412-268-7803 @johnkitchin http://kitchingroup.cheme.cmu.edu