From mboxrd@z Thu Jan  1 00:00:00 1970
From: Oleg Sivokon <olegsivokon@gmail.com>
Subject: Extract document structure from Org file
Date: Fri, 03 Jul 2015 11:39:16 +0300
Message-ID: <87egkpfxe3.fsf@gmail.com>
Mime-Version: 1.0
Content-Type: text/plain
Return-path: <emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org>
Received: from eggs.gnu.org ([2001:4830:134:3::10]:55023)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <olegsivokon@gmail.com>) id 1ZAwZT-0006m5-3F
	for emacs-orgmode@gnu.org; Fri, 03 Jul 2015 04:43:43 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <olegsivokon@gmail.com>) id 1ZAwZP-0000Ln-TU
	for emacs-orgmode@gnu.org; Fri, 03 Jul 2015 04:43:43 -0400
Received: from mail-wi0-x233.google.com ([2a00:1450:400c:c05::233]:37953)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <olegsivokon@gmail.com>) id 1ZAwZP-0000Kf-M4
	for emacs-orgmode@gnu.org; Fri, 03 Jul 2015 04:43:39 -0400
Received: by wibdq8 with SMTP id dq8so94087577wib.1
	for <emacs-orgmode@gnu.org>; Fri, 03 Jul 2015 01:43:38 -0700 (PDT)
Received: from wvxvvw-laptop (bzq-79-176-181-72.red.bezeqint.net.
	[79.176.181.72])
	by mx.google.com with ESMTPSA id ev8sm12280522wjb.8.2015.07.03.01.43.36
	for <emacs-orgmode@gnu.org>
	(version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Fri, 03 Jul 2015 01:43:37 -0700 (PDT)
List-Id: "General discussions about Org-mode." <emacs-orgmode.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-orgmode>,
	<mailto:emacs-orgmode-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-orgmode>
List-Post: <mailto:emacs-orgmode@gnu.org>
List-Help: <mailto:emacs-orgmode-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-orgmode>,
	<mailto:emacs-orgmode-request@gnu.org?subject=subscribe>
Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org
Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org
To: emacs-orgmode@gnu.org

Hello list!

Suppose I wanted to extract the structure from an Org document, where,
what's important for me would be to have it cathegorically divided into
headers, paragraphs of text, technical information and inclusion of
other documents (code snippets).  How would I do it?

The reason I'm asking is that I've a small project I work on, where I'm
trying to enhance the search in documents by using indexing combined
with queries based on things like distance between words, frequency of a
word appearing in a document and so on.  (I'm using Sphinx for it.)
I've tried to do this with Info pages, and I liked the results, however,
in order to do this more intelligently, I'd like to index the documents
with better granularity (i.e. so that later on I could search assigning
different weights to words appearing in headers and words appearing in
comments).

Best.

Oleg