emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Rustom Mody <rustompmody@gmail.com>
To: Bastien <bzg@gnu.org>
Cc: emacs-orgmode <emacs-orgmode@gnu.org>
Subject: Re: Request for worg page -- escaping questions
Date: Sun, 20 Apr 2014 13:29:35 +0530	[thread overview]
Message-ID: <CAJ+TeoeB1_igG70LFEaCwZXfOKjKHX2iSHXO+ytAOZrYX-6nug@mail.gmail.com> (raw)
In-Reply-To: <87ha5qg849.fsf@bzg.ath.cx>


[-- Attachment #1.1: Type: text/plain, Size: 992 bytes --]

On Fri, Apr 18, 2014 at 5:30 PM, Bastien <bzg@gnu.org> wrote:

> Rustom Mody <rustompmody@gmail.com> writes:
>
> > Trouble is you guys are the hares that make us (me at least) into
> > tortoises --
> > you add significant functionality faster than I can keep with the
> > 'what' (leave aside how and details).
>
> Well, remember that Worg is here to *stay* -- so let's start with
> basic stuff, not bleeding-edge features.  There is a lot of basic
> stuff we still need to document... and we also need to clean-up
> stuff... but agreed that's pretty boring :/
>
> > Still I'll try my hand.
> > What I would like is a graphic -- an automata-diagram made perhaps by
> > dot.
> > Can that easily/conveniently go up onto worg?
>
> Of course!  Thanks in advance,
>
>
Thanks to Alan's dot, Ive come up with something (attached).
Leave aside complete, its not even really work-in-progress :-)
Just a sense of the direction I was asking for.

Would such a doc be appropriate for worg?

Regards,
Rusi

[-- Attachment #1.2: Type: text/html, Size: 1722 bytes --]

[-- Attachment #2: orgdfa.org --]
[-- Type: application/octet-stream, Size: 4250 bytes --]

# -*- org-confirm-babel-evaluate: nil -*-
#+TITLE: Org Mode Automaton
#+OPTIONS: toc:nil
* Preliminary Analogy
** Terminology
Below, /programming language/ is used rather generically, ie. 
- C, Elisp, python are programming languages
- So are latex, html 
- And of course org-mode!
** Tokenization
Most people understand that programming languages have /syntax/. But before syntax can come into play, somehow one needs to work out what are the basic elements which are syntactically organized. This is called /tokenization/ (or /lexical analysis/ if one prefers jargon). Typical tokens of a programming language are:
- identifiers (variables)
- numerals (number literals)
- keywords (best to distinguish keywords like =if= from other variables)
And 2 critical ones:
- comments
- strings

For example in most PLs, identifiers must start with letters but thereafter both letters and digits are allowed. This clearly implies that the tokenizer must be in a different /state/ when
- the first character of an identifier is seen
- when the remaining are being seen
- when something other than an identifer is being seen

In fact tokenization of a programming language can involve an automaton having hundreds of states.  For most programmers this is irrelevant -- they need not think or know of this -- what we may call the micro-level of the automaton.

However there is another higher level of the tokenizing automaton that a programmer needs to get  -- I'll call that the  macro-level.

My wish in writing this is that I start with what I understand (reasonably!) well -- the C automaton.  And then work out (with devs and others' help) what the org-mode automaton is (which I expect is far more intricate than the C one).
** C Automaton
#+name: cdfa
#+begin_src dot :file cgraph.png :exports results
digraph {
  Default -> Comment [label = "/*"];
  Comment -> Default [label = "*/"];
  Default -> String  [label = "\""];
  String -> Default  [label = "\""];
}
#+end_src

#+results: cdfa
[[file:cgraph.png]]

In words this says:
- By default the C tokenizer is in the *Default* state
- When a "/*" is seen it goes into the *Comment* state
- When a "*/" is seen in the *Comment* state it returns to the *Default* state (corresponds to the intuition that the comment ends)

  The above line also indicates why its necessary to have such a doc: I do not know how to write the string consisting of star (*) / without the * starting a bold section
- Likewise the " both starts and ends the *String* state

Now there are many implications even from the above (over-simplified) diagram:
1. Comments cannot contain strings -- Ok
2. Strings cannot contain comments -- Ok
3. Comments cannot contain a "*/"
4. Strings cannot contain a double-quote

3 is true of C. \\
Now 4 is actually not true of C. This is accomplished with the aid of the notion of...
** Escaping
C allows a double quote to be *escaped* with a "/" into a string
#+name: cdfa2
#+begin_src dot :file cgraph2.png :exports results
digraph {
  Default -> Comment [label = "/*"];
  Comment -> Default [label = "*/"];
  Default -> String  [label = "\""];
  String -> Default  [label = "\""];
  String -> EscString [label = "\"\\\""];
  EscString -> String [label = "[\"|\\\]"];
}
#+end_src
#+results: cdfa2
[[file:cgraph2.png]]

What this addition is saying is that:
1. In the *String* state a backslash puts the automaton into a special *EscString* state
2. In the *EscString* state either of a double-quote or a backslash puts it back into the *String* state
3. Nothing else is acceptable in the *EscString* state

So much for analogies and preliminaries. The purpose of this writeup is to flesh out the following:
* Org mode Automaton
#+name: orgdfa
#+begin_src dot :file orggraph.png :exports results
digraph {
  Default -> "HeaderLine(Q1)" [label = "\"^#+\""];
  "HeaderLine(Q1)" -> Default [label = "newline"];
  Default -> TableMode  [label = "\"|\""];
  TableMode -> Default  [label = "Note1."];
  Default -> "??"  [label = "\"^:\""];
  Default -> "???"  [label = "\"^'\"Note2"];
}
#+end_src

#+RESULTS: orgdfa
[[file:orggraph.png]]



Notes
1. A line that starts with a non-"|" character \\
2. A single-quote at start of line \\

Questions
1. Should that be HeaderLine or OptionKeywordLine?

  reply	other threads:[~2014-04-20  8:00 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-18  3:21 Request for worg page -- escaping questions Rustom Mody
2014-04-18  7:08 ` Alan Schmitt
2014-04-18 12:00 ` Bastien
2014-04-20  7:59   ` Rustom Mody [this message]
2014-04-20  8:43     ` Bastien
  -- strict thread matches above, loose matches on Subject: below --
2014-04-02  4:40 Rustom Mody
2014-04-17 16:12 ` Bastien

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJ+TeoeB1_igG70LFEaCwZXfOKjKHX2iSHXO+ytAOZrYX-6nug@mail.gmail.com \
    --to=rustompmody@gmail.com \
    --cc=bzg@gnu.org \
    --cc=emacs-orgmode@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).