emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Eric Schulte <eric.schulte@gmx.com>
To: "Thomas S. Dye" <tsd@tsdye.com>
Cc: Michael Hannon <jm_hannon@yahoo.com>,
	Org-Mode List <emacs-orgmode@gnu.org>
Subject: Re: Babel: communicating irregular data to R source-code block
Date: Sun, 22 Apr 2012 11:58:40 -0400	[thread overview]
Message-ID: <87ipgrn4by.fsf@gmx.com> (raw)
In-Reply-To: m1397w1toe.fsf@tsdye.com

[-- Attachment #1: Type: text/plain, Size: 7302 bytes --]

tsd@tsdye.com (Thomas S. Dye) writes:

> Aloha Michael,
>
> Michael Hannon <jm_hannon@yahoo.com> writes:
>
>> Greetings.  I'm sitting in on a weekly, informal, "brown-bag" seminar on data
>> technologies in statistics.  There are more people attending the seminar than
>> there are weeks in which to give talks, so I may get by with being my usual,
>> passive-slug self.
>>
>> But I thought it might be useful to have a contingency plan and decided that
>> giving a brief talk about Babel might be useful/instructive.  I thought (and
>> think) that mushing together (with attribution) some of the content of the
>> paper [1] by The Gang of Four and the content of Eric's talk [2] might be a
>> good approach.  (BTW, if this isn't legal, desirable, permissible, etc., this
>> would be a good time to tell me.)
>>

I would be happy for you to re-use these materials.

>>
>> I liked the Pascal's Triangle example (which morphed from elisp to Python, or
>> vice versa, in the two references), but I was afraid that the elisp routine
>> "pst-check", used as a check on the correctness of the previously-generated
>> Pascal's triangle, might be too esoteric for this audience, not to mention me.
>> (The recursive Fibonacci function is virtually identical in all languages,
>> but the second part is more obscure.)
>>

I was giving a presentation to a local lisp/scheme user group, so I
figured I'd spare them the pain of trying to read python code :).

>>
>> I thought it should be possible to use R to do the same sanity check, as R
>> would be much more-familiar to this audience (and its use would still
>> demonstrate the meta-language feature of Babel).
>>
>> Unfortunately, I haven't been able to find a way to communicate the output of
>> the Pascal's Triangle example to an R source-code block.  The gist of the
>> problem seems to be that regardless of how I try to grab the data (scan,
>> readLines, etc.) Babel always ends up trying to read a data frame (table) and
>> I get an error similar to:
>>

I present some options below specific to Tom's discussion, but another
option may be to use the ":results output" option on a python code block
which prints the table to STDOUT, and then use something line readLines
to read from the resulting string into R.

>>
>> <<<<<<
>>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
>>> : line 1 did not have 5 elements
>>
>> Enter a frame number, or 0 to exit   
>>
>> 1: read.table("/tmp/babel-3780tje/R-import-3780Akj", header = FALSE, row.names
>> = NULL, sep = "
>>>>>>>>
>>
>> If I construct a table "by hand" with all of the cells occupied, everything
>> goes OK.  For instance:
>>
>> <<<<<<
>> #+TBLNAME: some-junk
>> | 1 | 0 | 0 | 0 |
>> | 1 | 1 | 0 | 0 |
>> | 1 | 2 | 1 | 0 |
>> | 1 | 3 | 3 | 1 | 
>>
>> #+NAME: read-some-junk(sj_input=some-junk)
>> #+BEGIN_SRC R
>>
>> rowSums(sj_input)
>>
>> #+END_SRC  
>>
>> #+RESULTS: read-some-junk
>> | 1 |
>> | 2 |
>> | 4 |
>> | 8 |
>>>>>>>>
>>
>> But the following gives the kind of error I described above:
>>
>> <<<<<<
>> #+name: pascals_triangle
>> #+begin_src python :var n=5 :exports none :return pascals_triangle(5)
>> def pascals_triangle(n):
>>     if n == 0:
>>         return [[1]]
>>     prev_triangle = pascals_triangle(n-1)
>>     prev_row = prev_triangle[n-1]
>>     this_row = map(sum, zip([0] + prev_row, prev_row + [0]))
>>     return prev_triangle + [this_row]
>>
>> pascals_triangle(n)
>> #+end_src
>
> A few things are wrong at this point.  It seems the JSS article has
> an error in the header of the pascals_triangle source block.  AFAIK
> there is no header argument :return.  I don't know how :return
> pascals_triangle(5) got there, but am fairly certain it shouldn't be.
>

The :return header argument *is* a supported header argument of python
code blocks and is not an error.  The python code block should run w/o
error and without the extra "return pascals_triangle(n)" at the bottom.
The following works for me.

#+name: pascals_triangle
#+begin_src python :var n=5 :exports none :return pascals_triangle(5)
def pascals_triangle(n):
    if n == 0:
        return [[1]]
    prev_triangle = pascals_triangle(n-1)
    prev_row = prev_triangle[n-1]
    this_row = map(sum, zip([0] + prev_row, prev_row + [0]))
    return prev_triangle + [this_row]

#+end_src

#+RESULTS: pascals_triangle
| 1 |   |    |    |   |   |
| 1 | 1 |    |    |   |   |
| 1 | 2 |  1 |    |   |   |
| 1 | 3 |  3 |  1 |   |   |
| 1 | 4 |  6 |  4 | 1 |   |
| 1 | 5 | 10 | 10 | 5 | 1 |

[...]
>
> I vaguely remember that it once was possible to pass variables in
> through the name line, but I couldn't find this syntax in some fairly
> recent documentation.

This style of passing arguments is still supported, but not necessarily
encouraged by the documentation.

> It does appear to work still using a recent Org-mode.  If I rename the
> results and then pass that to the source code block, all is well.
>
> #+RESULTS: pascals-tri
> | 1 |   |    |    |   |   |
> | 1 | 1 |    |    |   |   |
> | 1 | 2 |  1 |    |   |   |
> | 1 | 3 |  3 |  1 |   |   |
> | 1 | 4 |  6 |  4 | 1 |   |
> | 1 | 5 | 10 | 10 | 5 | 1 |
>
>   
> #+name: pst-checkR(p=pascals-tri)
> #+BEGIN_SRC R
> p
> #+END_SRC
>
> #+RESULTS: pst-checkR
>
> | 1 | nil | nil | nil | nil | nil |
> | 1 |   1 | nil | nil | nil | nil |
> | 1 |   2 |   1 | nil | nil | nil |
> | 1 |   3 |   3 |   1 | nil | nil |
> | 1 |   4 |   6 |   4 | 1   | nil |
> | 1 |   5 |  10 |  10 | 5   | 1   |
>
> This looks like a bug to me, but Eric S. will know better what might be
> going on.

The above is due to the inability of R (or at least of the read.table
function) to read in tables with different row length.  The process of
writing to an Org-mode table and *then* referencing that table as Tom
suggests above has the side effect of filling in blank spots in the
final exported table, turning what would otherwise be something like

1
1  1
1  2  1

into something like

1  ""  ""
1   1  ""
1   2  1

You could also use a function like the following to explicitly fill in
these missing lines.

#+name: padded_pascals_triangle
#+begin_src emacs-lisp :var data=pascals_triangle
  (let ((max-length (apply #'max (mapcar #'length data))))
    (mapcar (lambda (row)
              (append row (make-vector (- max-length (length row)) "") nil))
            data))
#+end_src

> I can't do much more than this, but I'm optimistic things will be
> sorted out before your turn to speak at the seminar rolls around.
>
> Thanks for bringing the error in the JSS article to light.
>
> All the best,
> Tom
>

I often have to explicitly convert data read into R code blocks as a
table into some other data structure like a vector or a matrix.  I run
into this myself when trying to use the statistical functions of R.  It
generally takes a while to look up the function to do the conversion,
but I imagine that there is a reason why people who know more R than I
do chose to make tables the default data type for data read into R
blocks.

Best,

Combining the examples above yields the following,


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: example.org --]
[-- Type: text/x-org, Size: 924 bytes --]

#+name: pascals_triangle
#+begin_src python :var n=5 :exports none :return pascals_triangle(5) :results vector
def pascals_triangle(n):
    if n == 0:
        return [[1]]
    prev_triangle = pascals_triangle(n-1)
    prev_row = prev_triangle[n-1]
    this_row = map(sum, zip([0] + prev_row, prev_row + [0]))
    return prev_triangle + [this_row]

#+end_src

#+name: padded_pascals_triangle
#+begin_src emacs-lisp :var data=pascals_triangle
  (let ((max-length (apply #'max (mapcar #'length data))))
    (mapcar (lambda (row)
              (append row (make-vector (- max-length (length row)) "") nil))
            data))
#+end_src

#+begin_src R :var data=padded_pascals_triangle
data
#+end_src

#+RESULTS:
| 1 | nil | nil | nil | nil | nil |
| 1 |   1 | nil | nil | nil | nil |
| 1 |   2 |   1 | nil | nil | nil |
| 1 |   3 |   3 |   1 | nil | nil |
| 1 |   4 |   6 |   4 | 1   | nil |
| 1 |   5 |  10 |  10 | 5   | 1   |

[-- Attachment #3: Type: text/plain, Size: 1843 bytes --]


>
>>>>>>>>
>>
>> Note that I don't really want to do rowSums in this case.  I'm just trying to
>> demonstrate the error.
>>
>> Of course, it's clear that the first line does NOT contain five elements, nor
>> does the second, etc., as all of the above-diagonal elements are blanks.
>>
>> But I've been unable to find an R input function that doesn't end up treating
>> the source data as a table, i.e., in the context of Babel source blocks -- R
>> is "happy" to read a lower-diagonal structure.  See the appendix for an
>> example.
>>
>> Any suggestions?  Note that I'm happy to acknowledge that my own ignorance of
>> R and/or Babel might be the source of the problem.  If so, please enlighten
>> me.
>>
>> Thanks.
>>
>> -- Mike
>>
>> [1] http://www.jstatsoft.org/v46/i03
>> [2] https://github.com/eschulte/babel-presentation
>>
>> <<<<<<
>> Appendix
>> --------
>>
>>
>> $ cat pascal.dat
>> 1
>> 1 1
>> 1 2 1
>> 1 3 3 1
>> 1 4 6 4 1
>>
>> $ R --vanilla < pascal.R
>>
>> R version 2.15.0 (2012-03-30)
>> Copyright (C) 2012 The R Foundation for Statistical Computing
>> ISBN 3-900051-07-0
>> Platform: x86_64-redhat-linux-gnu (64-bit)
>> .
>> .
>> .
>>
>>> x <- readLines("pascal.dat")
>>> x
>> [1] "1"         "1 1"       "1 2 1"     "1 3 3 1"   "1 4 6 4 1"
>>> str(x)
>>  chr [1:5] "1" "1 1" "1 2 1" "1 3 3 1" "1 4 6 4 1"
>>> 
>>> y <- scan("pascal.dat")
>> Read 15 items
>>> y
>>  [1] 1 1 1 1 2 1 1 3 3 1 1 4 6 4 1
>>> str(y)
>>  num [1:15] 1 1 1 1 2 1 1 3 3 1 ...
>>> 
>>> z <- read.table("pascal.dat", header=FALSE)
>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
>>   line 1 did not have 5 elements
>> Calls: read.table -> scan
>> Execution halted
>>
>>

-- 
Eric Schulte
http://cs.unm.edu/~eschulte/

  reply	other threads:[~2012-04-22 18:01 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-21 20:17 Babel: communicating irregular data to R source-code block Michael Hannon
2012-04-22  0:44 ` Thomas S. Dye
2012-04-22 15:58   ` Eric Schulte [this message]
2012-04-23 16:46     ` Thomas S. Dye
2012-04-23 15:41       ` Eric Schulte
2012-04-23 19:17         ` Thomas S. Dye
2012-04-23 22:24     ` Michael Hannon
2012-04-23 21:05       ` Eric Schulte
2012-04-24  0:23       ` Thomas S. Dye
2012-04-23 22:55         ` Eric Schulte
2012-04-24  6:44           ` Thomas S. Dye
2012-04-24  7:07             ` Michael Hannon
2012-04-24 17:18               ` Thomas S. Dye
2012-04-24 19:23                 ` Thomas S. Dye
2012-04-25 23:52               ` Thomas S. Dye
2012-04-26  2:06                 ` Michael Hannon
2012-04-26  6:34                   ` Thomas S. Dye

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ipgrn4by.fsf@gmx.com \
    --to=eric.schulte@gmx.com \
    --cc=emacs-orgmode@gnu.org \
    --cc=jm_hannon@yahoo.com \
    --cc=tsd@tsdye.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).