From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Schulte Subject: Re: Babel: communicating irregular data to R source-code block Date: Sun, 22 Apr 2012 11:58:40 -0400 Message-ID: <87ipgrn4by.fsf@gmx.com> References: <1335039472.9075.YahooMailNeo@web161901.mail.bf1.yahoo.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===-=-=" Return-path: Received: from eggs.gnu.org ([208.118.235.92]:59604) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SM16M-0006no-Tt for emacs-orgmode@gnu.org; Sun, 22 Apr 2012 14:01:37 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SM16K-0005VT-1J for emacs-orgmode@gnu.org; Sun, 22 Apr 2012 14:01:34 -0400 Received: from mailout-us.gmx.com ([74.208.5.67]:58341 helo=mailout-us.mail.com) by eggs.gnu.org with smtp (Exim 4.71) (envelope-from ) id 1SM16J-0005VF-QN for emacs-orgmode@gnu.org; Sun, 22 Apr 2012 14:01:31 -0400 List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: "Thomas S. Dye" Cc: Michael Hannon , Org-Mode List --===-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable tsd@tsdye.com (Thomas S. Dye) writes: > Aloha Michael, > > Michael Hannon writes: > >> Greetings.=C2=A0 I'm sitting in on a weekly, informal, "brown-bag" semin= ar on data >> technologies in statistics.=C2=A0 There are more people attending the se= minar than >> there are weeks in which to give talks, so I may get by with being my us= ual, >> passive-slug self. >> >> But I thought it might be useful to have a contingency plan and decided = that >> giving a brief talk about Babel might be useful/instructive.=C2=A0 I tho= ught (and >> think) that mushing together (with attribution) some of the content of t= he >> paper [1] by The Gang of Four and the content of Eric's talk [2] might b= e a >> good approach.=C2=A0 (BTW, if this isn't legal, desirable, permissible, = etc., this >> would be a good time to tell me.) >> I would be happy for you to re-use these materials. >> >> I liked the Pascal's Triangle example (which morphed from elisp to Pytho= n, or >> vice versa, in the two references), but I was afraid that the elisp rout= ine >> "pst-check", used as a check on the correctness of the previously-genera= ted >> Pascal's triangle, might be too esoteric for this audience, not to menti= on me. >> (The recursive Fibonacci function is virtually identical in all language= s, >> but the second part is more obscure.) >> I was giving a presentation to a local lisp/scheme user group, so I figured I'd spare them the pain of trying to read python code :). >> >> I thought it should be possible to use R to do the same sanity check, as= R >> would be much more-familiar to this audience (and its use would still >> demonstrate the meta-language feature of Babel). >> >> Unfortunately, I haven't been able to find a way to communicate the outp= ut of >> the Pascal's Triangle example to an R source-code block.=C2=A0 The gist = of the >> problem seems to be that regardless of how I try to grab the data (scan, >> readLines, etc.) Babel always ends up trying to read a data frame (table= ) and >> I get an error similar to: >> I present some options below specific to Tom's discussion, but another option may be to use the ":results output" option on a python code block which prints the table to STDOUT, and then use something line readLines to read from the resulting string into R. >> >> <<<<<< >>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strin= gs, >>> : line 1 did not have 5 elements >> >> Enter a frame number, or 0 to exit=C2=A0=C2=A0=20 >> >> 1: read.table("/tmp/babel-3780tje/R-import-3780Akj", header =3D FALSE, r= ow.names >> =3D NULL, sep =3D " >>>>>>>> >> >> If I construct a table "by hand" with all of the cells occupied, everyth= ing >> goes OK.=C2=A0 For instance: >> >> <<<<<< >> #+TBLNAME: some-junk >> | 1 | 0 | 0 | 0 | >> | 1 | 1 | 0 | 0 | >> | 1 | 2 | 1 | 0 | >> | 1 | 3 | 3 | 1 |=20 >> >> #+NAME: read-some-junk(sj_input=3Dsome-junk) >> #+BEGIN_SRC R >> >> rowSums(sj_input) >> >> #+END_SRC=C2=A0=20 >> >> #+RESULTS: read-some-junk >> | 1 | >> | 2 | >> | 4 | >> | 8 | >>>>>>>> >> >> But the following gives the kind of error I described above: >> >> <<<<<< >> #+name: pascals_triangle >> #+begin_src python :var n=3D5 :exports none :return pascals_triangle(5) >> def pascals_triangle(n): >> =C2=A0=C2=A0=C2=A0 if n =3D=3D 0: >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return [[1]] >> =C2=A0=C2=A0=C2=A0 prev_triangle =3D pascals_triangle(n-1) >> =C2=A0=C2=A0=C2=A0 prev_row =3D prev_triangle[n-1] >> =C2=A0=C2=A0=C2=A0 this_row =3D map(sum, zip([0] + prev_row, prev_row + = [0])) >> =C2=A0=C2=A0=C2=A0 return prev_triangle + [this_row] >> >> pascals_triangle(n) >> #+end_src > > A few things are wrong at this point. It seems the JSS article has > an error in the header of the pascals_triangle source block. AFAIK > there is no header argument :return. I don't know how :return > pascals_triangle(5) got there, but am fairly certain it shouldn't be. > The :return header argument *is* a supported header argument of python code blocks and is not an error. The python code block should run w/o error and without the extra "return pascals_triangle(n)" at the bottom. The following works for me. #+name: pascals_triangle #+begin_src python :var n=3D5 :exports none :return pascals_triangle(5) def pascals_triangle(n): if n =3D=3D 0: return [[1]] prev_triangle =3D pascals_triangle(n-1) prev_row =3D prev_triangle[n-1] this_row =3D map(sum, zip([0] + prev_row, prev_row + [0])) return prev_triangle + [this_row] #+end_src #+RESULTS: pascals_triangle | 1 | | | | | | | 1 | 1 | | | | | | 1 | 2 | 1 | | | | | 1 | 3 | 3 | 1 | | | | 1 | 4 | 6 | 4 | 1 | | | 1 | 5 | 10 | 10 | 5 | 1 | [...] > > I vaguely remember that it once was possible to pass variables in > through the name line, but I couldn't find this syntax in some fairly > recent documentation. This style of passing arguments is still supported, but not necessarily encouraged by the documentation. > It does appear to work still using a recent Org-mode. If I rename the > results and then pass that to the source code block, all is well. > > #+RESULTS: pascals-tri > | 1 | | | | | | > | 1 | 1 | | | | | > | 1 | 2 | 1 | | | | > | 1 | 3 | 3 | 1 | | | > | 1 | 4 | 6 | 4 | 1 | | > | 1 | 5 | 10 | 10 | 5 | 1 | > >=20=20=20 > #+name: pst-checkR(p=3Dpascals-tri) > #+BEGIN_SRC R > p > #+END_SRC > > #+RESULTS: pst-checkR > > | 1 | nil | nil | nil | nil | nil | > | 1 | 1 | nil | nil | nil | nil | > | 1 | 2 | 1 | nil | nil | nil | > | 1 | 3 | 3 | 1 | nil | nil | > | 1 | 4 | 6 | 4 | 1 | nil | > | 1 | 5 | 10 | 10 | 5 | 1 | > > This looks like a bug to me, but Eric S. will know better what might be > going on. The above is due to the inability of R (or at least of the read.table function) to read in tables with different row length. The process of writing to an Org-mode table and *then* referencing that table as Tom suggests above has the side effect of filling in blank spots in the final exported table, turning what would otherwise be something like 1 1 1 1 2 1 into something like 1 "" "" 1 1 "" 1 2 1 You could also use a function like the following to explicitly fill in these missing lines. #+name: padded_pascals_triangle #+begin_src emacs-lisp :var data=3Dpascals_triangle (let ((max-length (apply #'max (mapcar #'length data)))) (mapcar (lambda (row) (append row (make-vector (- max-length (length row)) "") nil)) data)) #+end_src > I can't do much more than this, but I'm optimistic things will be > sorted out before your turn to speak at the seminar rolls around. > > Thanks for bringing the error in the JSS article to light. > > All the best, > Tom > I often have to explicitly convert data read into R code blocks as a table into some other data structure like a vector or a matrix. I run into this myself when trying to use the statistical functions of R. It generally takes a while to look up the function to do the conversion, but I imagine that there is a reason why people who know more R than I do chose to make tables the default data type for data read into R blocks. Best, Combining the examples above yields the following, --===-=-= Content-Type: text/x-org Content-Disposition: inline; filename=example.org #+name: pascals_triangle #+begin_src python :var n=5 :exports none :return pascals_triangle(5) :results vector def pascals_triangle(n): if n == 0: return [[1]] prev_triangle = pascals_triangle(n-1) prev_row = prev_triangle[n-1] this_row = map(sum, zip([0] + prev_row, prev_row + [0])) return prev_triangle + [this_row] #+end_src #+name: padded_pascals_triangle #+begin_src emacs-lisp :var data=pascals_triangle (let ((max-length (apply #'max (mapcar #'length data)))) (mapcar (lambda (row) (append row (make-vector (- max-length (length row)) "") nil)) data)) #+end_src #+begin_src R :var data=padded_pascals_triangle data #+end_src #+RESULTS: | 1 | nil | nil | nil | nil | nil | | 1 | 1 | nil | nil | nil | nil | | 1 | 2 | 1 | nil | nil | nil | | 1 | 3 | 3 | 1 | nil | nil | | 1 | 4 | 6 | 4 | 1 | nil | | 1 | 5 | 10 | 10 | 5 | 1 | --===-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable > >>>>>>>> >> >> Note that I don't really want to do rowSums in this case.=C2=A0 I'm just= trying to >> demonstrate the error. >> >> Of course, it's clear that the first line does NOT contain five elements= , nor >> does the second, etc., as all of the above-diagonal elements are blanks. >> >> But I've been unable to find an R input function that doesn't end up tre= ating >> the source data as a table, i.e., in the context of Babel source blocks = -- R >> is "happy" to read a lower-diagonal structure.=C2=A0 See the appendix fo= r an >> example. >> >> Any suggestions?=C2=A0 Note that I'm happy to acknowledge that my own ig= norance of >> R and/or Babel might be the source of the problem.=C2=A0 If so, please e= nlighten >> me. >> >> Thanks. >> >> -- Mike >> >> [1] http://www.jstatsoft.org/v46/i03 >> [2] https://github.com/eschulte/babel-presentation >> >> <<<<<< >> Appendix >> -------- >> >> >> $ cat pascal.dat >> 1 >> 1 1 >> 1 2 1 >> 1 3 3 1 >> 1 4 6 4 1 >> >> $ R --vanilla < pascal.R >> >> R version 2.15.0 (2012-03-30) >> Copyright (C) 2012 The R Foundation for Statistical Computing >> ISBN 3-900051-07-0 >> Platform: x86_64-redhat-linux-gnu (64-bit) >> . >> . >> . >> >>> x <- readLines("pascal.dat") >>> x >> [1] "1"=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "1 1"=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 "1 2 1"=C2=A0=C2=A0=C2=A0=C2=A0 "1 3 3 1"=C2=A0= =C2=A0 "1 4 6 4 1" >>> str(x) >> =C2=A0chr [1:5] "1" "1 1" "1 2 1" "1 3 3 1" "1 4 6 4 1" >>>=20 >>> y <- scan("pascal.dat") >> Read 15 items >>> y >> =C2=A0[1] 1 1 1 1 2 1 1 3 3 1 1 4 6 4 1 >>> str(y) >> =C2=A0num [1:15] 1 1 1 1 2 1 1 3 3 1 ... >>>=20 >>> z <- read.table("pascal.dat", header=3DFALSE) >> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.string= s,=C2=A0 :=20 >> =C2=A0 line 1 did not have 5 elements >> Calls: read.table -> scan >> Execution halted >> >> --=20 Eric Schulte http://cs.unm.edu/~eschulte/ --===-=-=--