From mboxrd@z Thu Jan 1 00:00:00 1970 From: MidLifeXis at PerlMonks Subject: Re: Formal description of Org files Date: Fri, 15 Apr 2011 10:27:18 -0700 (PDT) Message-ID: <20123.23043.qm@web81608.mail.mud.yahoo.com> References: Reply-To: MidLifeXis at PerlMonks Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="0-166877834-1302888438=:23043" Return-path: Received: from eggs.gnu.org ([140.186.70.92]:38763) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QAmnm-0002b4-7n for emacs-orgmode@gnu.org; Fri, 15 Apr 2011 13:27:27 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QAmni-0003aK-Ks for emacs-orgmode@gnu.org; Fri, 15 Apr 2011 13:27:26 -0400 Received: from nm13.access.bullet.mail.mud.yahoo.com ([66.94.237.214]:22710) by eggs.gnu.org with smtp (Exim 4.71) (envelope-from ) id 1QAmni-0003Zy-F3 for emacs-orgmode@gnu.org; Fri, 15 Apr 2011 13:27:22 -0400 In-Reply-To: List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Rustom Mody , emacs-orgmode --0-166877834-1302888438=:23043 Content-Type: text/plain; charset=us-ascii This is something I would like to see adopted by the org community as well. One of my main reasons is to allow other languages to be able to at least, given the proper configuration information (for TODO keywords, etc), to be able to deal with org files. I am not quite sure where the lines should be drawn, but I am envisioning something like: * The ORG format definition and actions The format of the .org file (ORG) itself and a set of actions that are well behaved (add node, move node up, node id, property value, etc). A core set of validation tests could be created to exercise any compliant org library. * The configuration for a set of ORG files The configuration for a given org (note the small "org") application, defining particular tweaks that an application can use to make sense of a properly formatted ORG file, such as valid TODO keywords, etc. This area is muddy in my mind. * The org application itself - how the org files are used in a particular application, or the ideas and data that set of ORG files represents, or how an exporter handles the data thrown at it, or.... But that is just /my/ dream. ________________________________ From: Rustom Mody To: emacs-orgmode Sent: Fri, April 15, 2011 12:13:45 PM Subject: Re: [O] Formal description of Org files Hi, > >I have a question. > >At FOSDEM, someone asked me if there was a formal description of the structure >of Org files, in some language >that would be the input for a parser (or parser generator?) so that Org file >could be easily parsed. > >Unfortunately I did not catch the name of the format description language that >could be >used for something, not did I catch the name of the person who talked to me. > >Can anyone help out here? Let me know what language to use, and maybe help work >on such a formal description? I >think it would be useful to have.... > Something like yacc (bison, antlr etc) are all 'executable BNF' languages. When they work they can make the code an order of magnitude smaller and development/programming correspondingly easier. That said I see a couple of hitches. 1. Grammatical handling of languages is based on the assumption of a clear and well defined set of tokens/lexemes. I expect this would be harder in org than the typical programming languages for which yacc etc are used. For example in most 'normal' languages there are comments and strings. These involve some non trivial handling which is entirely hidden from the grammar by being pushed into the lexer. 2. Parsing a program is done for the full program as a rule (IDEs are the exception to the rule). Sensible parsing of program fragments, where the fragmenting could be quite arbitrary, is a bit of a research problem 3. As I see it, the main declarative tool (somewhat akin to grammars) that org uses is regular expressions. IOW org is written with re-s strung together with programming logic ie vanilla elisp. An alternative that stays within the regular framework (not using the heavy guns of context free parsing) may be ragel: http://www.complang.org/ragel/ Rusi --0-166877834-1302888438=:23043 Content-Type: text/html; charset=us-ascii
This is something I would like to see adopted by the org community as well.  One of my main reasons is to allow other languages to be able to at least, given the proper configuration information (for TODO keywords, etc), to be able to deal with org files.  I am not quite sure where the lines should be drawn, but I am envisioning something like:

* The ORG format definition and actions

The format of the .org file (ORG) itself and a set of actions that are well behaved (add node, move node up, node id, property value, etc).  A core set of validation tests could be created to exercise any compliant org library.

* The configuration for a set of ORG files

The configuration for a given org (note the small "org") application, defining particular tweaks that an application can use to make sense of a properly formatted ORG file, such as valid TODO keywords, etc.  This area is muddy in my mind.

* The org application itself - how the org files are used in a particular application, or the ideas and data that set of ORG files represents, or how an exporter handles the data thrown at it, or....

But that is just /my/ dream.




From: Rustom Mody <rustompmody@gmail.com>
To: emacs-orgmode <emacs-orgmode@gnu.org>
Sent: Fri, April 15, 2011 12:13:45 PM
Subject: Re: [O] Formal description of Org files

Hi,

I have a question.

At FOSDEM, someone asked me if there was a formal description of the structure of Org files, in some language
that would be the input for a parser (or parser generator?) so that Org file could be easily parsed.

Unfortunately I did not catch the name of the format description language that could be
used for something, not did I catch the name of the person who talked to me.

Can anyone help out here?  Let me know what language to use, and maybe help work on such a formal description?  I
think it would be useful to have....

Something like yacc (bison, antlr etc) are all 'executable BNF' languages.  When they work they can make the code an order of magnitude smaller and development/programming correspondingly easier.

That said I see a couple of hitches.

1. Grammatical handling of languages is based on the assumption of a clear and well defined set of tokens/lexemes.  I expect this would be harder in org than the typical programming languages for which yacc etc are used.  For example in most 'normal' languages there are comments and strings.  These involve some non trivial handling which is entirely hidden from the grammar by being pushed into the lexer.

2. Parsing a program is done for the full program as a rule (IDEs are the exception to the rule). Sensible parsing of program fragments, where the fragmenting could be quite arbitrary, is a bit of a research problem

3. As I see it, the main declarative tool (somewhat akin to grammars) that org uses is regular expressions.  IOW org is written with re-s strung together with programming logic ie vanilla elisp.  An alternative that stays within the regular framework (not using the heavy guns of context free parsing) may be ragel: http://www.complang.org/ragel/

Rusi
--0-166877834-1302888438=:23043--