From mboxrd@z Thu Jan 1 00:00:00 1970 From: Rustom Mody Subject: Re: Formal description of Org files Date: Fri, 15 Apr 2011 22:43:45 +0530 Message-ID: Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=bcaec51d2a7c15898d04a0f82af8 Return-path: Received: from eggs.gnu.org ([140.186.70.92]:43939) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QAmac-0006rz-OD for emacs-orgmode@gnu.org; Fri, 15 Apr 2011 13:13:54 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QAmaY-0000l8-Jb for emacs-orgmode@gnu.org; Fri, 15 Apr 2011 13:13:50 -0400 Received: from mail-gx0-f169.google.com ([209.85.161.169]:40492) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QAmaY-0000l4-H0 for emacs-orgmode@gnu.org; Fri, 15 Apr 2011 13:13:46 -0400 Received: by gxk23 with SMTP id 23so1550834gxk.0 for ; Fri, 15 Apr 2011 10:13:45 -0700 (PDT) List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: emacs-orgmode --bcaec51d2a7c15898d04a0f82af8 Content-Type: text/plain; charset=ISO-8859-1 > > Hi, > > I have a question. > > At FOSDEM, someone asked me if there was a formal description of the > structure of Org files, in some language > that would be the input for a parser (or parser generator?) so that Org > file could be easily parsed. > > Unfortunately I did not catch the name of the format description language > that could be > used for something, not did I catch the name of the person who talked to > me. > > Can anyone help out here? Let me know what language to use, and maybe help > work on such a formal description? I > think it would be useful to have.... > Something like yacc (bison, antlr etc) are all 'executable BNF' languages. When they work they can make the code an order of magnitude smaller and development/programming correspondingly easier. That said I see a couple of hitches. 1. Grammatical handling of languages is based on the assumption of a clear and well defined set of tokens/lexemes. I expect this would be harder in org than the typical programming languages for which yacc etc are used. For example in most 'normal' languages there are comments and strings. These involve some non trivial handling which is entirely hidden from the grammar by being pushed into the lexer. 2. Parsing a program is done for the full program as a rule (IDEs are the exception to the rule). Sensible parsing of program fragments, where the fragmenting could be quite arbitrary, is a bit of a research problem 3. As I see it, the main declarative tool (somewhat akin to grammars) that org uses is regular expressions. IOW org is written with re-s strung together with programming logic ie vanilla elisp. An alternative that stays within the regular framework (not using the heavy guns of context free parsing) may be ragel: http://www.complang.org/ragel/ Rusi --bcaec51d2a7c15898d04a0f82af8 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi,

I have= a question.

At FOSDEM, someone asked me if there was a formal descr= iption of the structure of Org files, in some language
that would be the input for a parser (or parser generator?) so that Org fil= e could be easily parsed.

Unfortunately I did not catch the name of = the format description language that could be
used for something, not di= d I catch the name of the person who talked to me.

Can anyone help out here? =A0Let me know what language to use, and mayb= e help work on such a formal description? =A0I
think it would be useful = to have....

Something like yacc (bison, antlr etc) are = all 'executable BNF' languages. =A0When they work they can make the= code an order of magnitude smaller and development/programming correspondi= ngly easier.

That said I see a couple of hitches.

1. Grammatical handling of = languages is based on the assumption of a clear and well defined set of tok= ens/lexemes.=A0 I expect this would be harder in org than the typical progr= amming languages for which yacc etc are used.=A0 For example in most 'n= ormal' languages there are comments and strings.=A0 These involve some = non trivial handling which is entirely hidden from the grammar by being pus= hed into the lexer.

2. Parsing a program is done for the full program as a rule (IDEs are t= he exception to the rule). Sensible parsing of program fragments, where the= fragmenting could be quite arbitrary, is a bit of a research problem

3. As I see it, the main declarative tool (somewhat akin to grammars) t= hat org uses is regular expressions.=A0 IOW org is written with re-s strung= together with programming logic ie vanilla elisp.=A0 An alternative that s= tays within the regular framework (not using the heavy guns of context free= parsing) may be ragel: http://w= ww.complang.org/ragel/

Rusi
--bcaec51d2a7c15898d04a0f82af8--