From mboxrd@z Thu Jan 1 00:00:00 1970 From: Oleg Sivokon Subject: Extract document structure from Org file Date: Fri, 03 Jul 2015 11:39:16 +0300 Message-ID: <87egkpfxe3.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:55023) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZAwZT-0006m5-3F for emacs-orgmode@gnu.org; Fri, 03 Jul 2015 04:43:43 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZAwZP-0000Ln-TU for emacs-orgmode@gnu.org; Fri, 03 Jul 2015 04:43:43 -0400 Received: from mail-wi0-x233.google.com ([2a00:1450:400c:c05::233]:37953) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZAwZP-0000Kf-M4 for emacs-orgmode@gnu.org; Fri, 03 Jul 2015 04:43:39 -0400 Received: by wibdq8 with SMTP id dq8so94087577wib.1 for ; Fri, 03 Jul 2015 01:43:38 -0700 (PDT) Received: from wvxvvw-laptop (bzq-79-176-181-72.red.bezeqint.net. [79.176.181.72]) by mx.google.com with ESMTPSA id ev8sm12280522wjb.8.2015.07.03.01.43.36 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 03 Jul 2015 01:43:37 -0700 (PDT) List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: emacs-orgmode@gnu.org Hello list! Suppose I wanted to extract the structure from an Org document, where, what's important for me would be to have it cathegorically divided into headers, paragraphs of text, technical information and inclusion of other documents (code snippets). How would I do it? The reason I'm asking is that I've a small project I work on, where I'm trying to enhance the search in documents by using indexing combined with queries based on things like distance between words, frequency of a word appearing in a document and so on. (I'm using Sphinx for it.) I've tried to do this with Info pages, and I liked the results, however, in order to do this more intelligently, I'd like to index the documents with better granularity (i.e. so that later on I could search assigning different weights to words appearing in headers and words appearing in comments). Best. Oleg