From mboxrd@z Thu Jan 1 00:00:00 1970 From: tsd@tsdye.com (Thomas S. Dye) Subject: Re: Babel: communicating irregular data to R source-code block Date: Mon, 23 Apr 2012 06:46:50 -1000 Message-ID: References: <1335039472.9075.YahooMailNeo@web161901.mail.bf1.yahoo.com> <87ipgrn4by.fsf@gmx.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([208.118.235.92]:50386) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SMMPw-0001px-TY for emacs-orgmode@gnu.org; Mon, 23 Apr 2012 12:47:17 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SMMPp-0004fw-Nk for emacs-orgmode@gnu.org; Mon, 23 Apr 2012 12:47:12 -0400 Received: from oproxy1-pub.bluehost.com ([66.147.249.253]:34630) by eggs.gnu.org with smtp (Exim 4.71) (envelope-from ) id 1SMMPp-0004fm-AT for emacs-orgmode@gnu.org; Mon, 23 Apr 2012 12:47:05 -0400 In-Reply-To: <87ipgrn4by.fsf@gmx.com> (Eric Schulte's message of "Sun, 22 Apr 2012 11:58:40 -0400") List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Eric Schulte Cc: Michael Hannon , Org-Mode List Hi Eric, Eric Schulte writes: > tsd@tsdye.com (Thomas S. Dye) writes: > >> Aloha Michael, >> >> Michael Hannon writes: >> >>> Greetings.=C2=A0 I'm sitting in on a weekly, informal, "brown-bag" semi= nar on data >>> technologies in statistics.=C2=A0 There are more people attending the s= eminar than >>> there are weeks in which to give talks, so I may get by with being my u= sual, >>> passive-slug self. >>> >>> But I thought it might be useful to have a contingency plan and decided= that >>> giving a brief talk about Babel might be useful/instructive.=C2=A0 I th= ought (and >>> think) that mushing together (with attribution) some of the content of = the >>> paper [1] by The Gang of Four and the content of Eric's talk [2] might = be a >>> good approach.=C2=A0 (BTW, if this isn't legal, desirable, permissible,= etc., this >>> would be a good time to tell me.) >>> > > I would be happy for you to re-use these materials. > >>> >>> I liked the Pascal's Triangle example (which morphed from elisp to Pyth= on, or >>> vice versa, in the two references), but I was afraid that the elisp rou= tine >>> "pst-check", used as a check on the correctness of the previously-gener= ated >>> Pascal's triangle, might be too esoteric for this audience, not to ment= ion me. >>> (The recursive Fibonacci function is virtually identical in all languag= es, >>> but the second part is more obscure.) >>> > > I was giving a presentation to a local lisp/scheme user group, so I > figured I'd spare them the pain of trying to read python code :). > >>> >>> I thought it should be possible to use R to do the same sanity check, a= s R >>> would be much more-familiar to this audience (and its use would still >>> demonstrate the meta-language feature of Babel). >>> >>> Unfortunately, I haven't been able to find a way to communicate the out= put of >>> the Pascal's Triangle example to an R source-code block.=C2=A0 The gist= of the >>> problem seems to be that regardless of how I try to grab the data (scan, >>> readLines, etc.) Babel always ends up trying to read a data frame (tabl= e) and >>> I get an error similar to: >>> > > I present some options below specific to Tom's discussion, but another > option may be to use the ":results output" option on a python code block > which prints the table to STDOUT, and then use something line readLines > to read from the resulting string into R. > I didn't have any luck with :results output, but didn't spend much time trying to figure it out. >>> >>> <<<<<< >>>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.stri= ngs, >>>> : line 1 did not have 5 elements >>> >>> Enter a frame number, or 0 to exit=C2=A0=C2=A0=20 >>> >>> 1: read.table("/tmp/babel-3780tje/R-import-3780Akj", header =3D FALSE, = row.names >>> =3D NULL, sep =3D " >>>>>>>>> >>> >>> If I construct a table "by hand" with all of the cells occupied, everyt= hing >>> goes OK.=C2=A0 For instance: >>> >>> <<<<<< >>> #+TBLNAME: some-junk >>> | 1 | 0 | 0 | 0 | >>> | 1 | 1 | 0 | 0 | >>> | 1 | 2 | 1 | 0 | >>> | 1 | 3 | 3 | 1 |=20 >>> >>> #+NAME: read-some-junk(sj_input=3Dsome-junk) >>> #+BEGIN_SRC R >>> >>> rowSums(sj_input) >>> >>> #+END_SRC=C2=A0=20 >>> >>> #+RESULTS: read-some-junk >>> | 1 | >>> | 2 | >>> | 4 | >>> | 8 | >>>>>>>>> >>> >>> But the following gives the kind of error I described above: >>> >>> <<<<<< >>> #+name: pascals_triangle >>> #+begin_src python :var n=3D5 :exports none :return pascals_triangle(5) >>> def pascals_triangle(n): >>> =C2=A0=C2=A0=C2=A0 if n =3D=3D 0: >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return [[1]] >>> =C2=A0=C2=A0=C2=A0 prev_triangle =3D pascals_triangle(n-1) >>> =C2=A0=C2=A0=C2=A0 prev_row =3D prev_triangle[n-1] >>> =C2=A0=C2=A0=C2=A0 this_row =3D map(sum, zip([0] + prev_row, prev_row += [0])) >>> =C2=A0=C2=A0=C2=A0 return prev_triangle + [this_row] >>> >>> pascals_triangle(n) >>> #+end_src >> >> A few things are wrong at this point. It seems the JSS article has >> an error in the header of the pascals_triangle source block. AFAIK >> there is no header argument :return. I don't know how :return >> pascals_triangle(5) got there, but am fairly certain it shouldn't be. >> > > The :return header argument *is* a supported header argument of python > code blocks and is not an error. The python code block should run w/o > error and without the extra "return pascals_triangle(n)" at the bottom. > The following works for me. > > #+name: pascals_triangle > #+begin_src python :var n=3D5 :exports none :return pascals_triangle(5) > def pascals_triangle(n): > if n =3D=3D 0: > return [[1]] > prev_triangle =3D pascals_triangle(n-1) > prev_row =3D prev_triangle[n-1] > this_row =3D map(sum, zip([0] + prev_row, prev_row + [0])) > return prev_triangle + [this_row] > > #+end_src > > #+RESULTS: pascals_triangle > | 1 | | | | | | > | 1 | 1 | | | | | > | 1 | 2 | 1 | | | | > | 1 | 3 | 3 | 1 | | | > | 1 | 4 | 6 | 4 | 1 | | > | 1 | 5 | 10 | 10 | 5 | 1 | > > [...] I'm beginning to see why you have strong feelings about python. In the code above, the blank line before #+end_src is necessary and must not contain any spaces, and :var n can be set to anything, since it is declared for initialization only. The code in the JSS article doesn't run for me with a recent Org-mode unless I add a blank line before #+end_src, or remove the :return header argument. If I remove the :return header argument, then the need for the blank line goes away. The following code block seems to work: #+name: pascals-triangle #+begin_src python :var n=3D2 :exports none def pascals_triangle(n): if n =3D=3D 0: return [[1]] prev_triangle =3D pascals_triangle(n-1) prev_row =3D prev_triangle[n-1] this_row =3D map(sum, zip([0] + prev_row, prev_row + [0])) return prev_triangle + [this_row] return pascals_triangle(n) #+end_src #+RESULTS: pascals-triangle | 1 | | | | 1 | 1 | | | 1 | 2 | 1 | I'm guessing that the need for a blank line when using :results has arisen since the JSS article was published, because the article was generated from source code and didn't show any errors.=20=20 If I have this right (a big if), then might it be possible to re-establish the old behavior so the JSS code works?=20=20 >> >> I vaguely remember that it once was possible to pass variables in >> through the name line, but I couldn't find this syntax in some fairly >> recent documentation. > > This style of passing arguments is still supported, but not necessarily > encouraged by the documentation. > >> It does appear to work still using a recent Org-mode. If I rename the >> results and then pass that to the source code block, all is well. >> >> #+RESULTS: pascals-tri >> | 1 | | | | | | >> | 1 | 1 | | | | | >> | 1 | 2 | 1 | | | | >> | 1 | 3 | 3 | 1 | | | >> | 1 | 4 | 6 | 4 | 1 | | >> | 1 | 5 | 10 | 10 | 5 | 1 | >> >>=20=20=20 >> #+name: pst-checkR(p=3Dpascals-tri) >> #+BEGIN_SRC R >> p >> #+END_SRC >> >> #+RESULTS: pst-checkR >> >> | 1 | nil | nil | nil | nil | nil | >> | 1 | 1 | nil | nil | nil | nil | >> | 1 | 2 | 1 | nil | nil | nil | >> | 1 | 3 | 3 | 1 | nil | nil | >> | 1 | 4 | 6 | 4 | 1 | nil | >> | 1 | 5 | 10 | 10 | 5 | 1 | >> >> This looks like a bug to me, but Eric S. will know better what might be >> going on. > > The above is due to the inability of R (or at least of the read.table > function) to read in tables with different row length. The process of > writing to an Org-mode table and *then* referencing that table as Tom > suggests above has the side effect of filling in blank spots in the > final exported table, turning what would otherwise be something like > > 1 > 1 1 > 1 2 1 > > into something like > > 1 "" "" > 1 1 "" > 1 2 1 > Thanks for this explanation. It makes sense that mapping a python data structure to an R data structure would involve an intermediate representation.=20 All the best, Tom > You could also use a function like the following to explicitly fill in > these missing lines. > > #+name: padded_pascals_triangle > #+begin_src emacs-lisp :var data=3Dpascals_triangle > (let ((max-length (apply #'max (mapcar #'length data)))) > (mapcar (lambda (row) > (append row (make-vector (- max-length (length row)) "") ni= l)) > data)) > #+end_src > >> I can't do much more than this, but I'm optimistic things will be >> sorted out before your turn to speak at the seminar rolls around. >> >> Thanks for bringing the error in the JSS article to light. >> >> All the best, >> Tom >> > > I often have to explicitly convert data read into R code blocks as a > table into some other data structure like a vector or a matrix. I run > into this myself when trying to use the statistical functions of R. It > generally takes a while to look up the function to do the conversion, > but I imagine that there is a reason why people who know more R than I > do chose to make tables the default data type for data read into R > blocks. > > Best, > > Combining the examples above yields the following, > > > #+name: pascals_triangle > #+begin_src python :var n=3D5 :exports none :return pascals_triangle(5) := results vector > def pascals_triangle(n): > if n =3D=3D 0: > return [[1]] > prev_triangle =3D pascals_triangle(n-1) > prev_row =3D prev_triangle[n-1] > this_row =3D map(sum, zip([0] + prev_row, prev_row + [0])) > return prev_triangle + [this_row] > > #+end_src > > #+name: padded_pascals_triangle > #+begin_src emacs-lisp :var data=3Dpascals_triangle > (let ((max-length (apply #'max (mapcar #'length data)))) > (mapcar (lambda (row) > (append row (make-vector (- max-length (length row)) "") ni= l)) > data)) > #+end_src > > #+begin_src R :var data=3Dpadded_pascals_triangle > data > #+end_src > > #+RESULTS: > | 1 | nil | nil | nil | nil | nil | > | 1 | 1 | nil | nil | nil | nil | > | 1 | 2 | 1 | nil | nil | nil | > | 1 | 3 | 3 | 1 | nil | nil | > | 1 | 4 | 6 | 4 | 1 | nil | > | 1 | 5 | 10 | 10 | 5 | 1 | > > >> >>>>>>>>> >>> >>> Note that I don't really want to do rowSums in this case.=C2=A0 I'm jus= t trying to >>> demonstrate the error. >>> >>> Of course, it's clear that the first line does NOT contain five element= s, nor >>> does the second, etc., as all of the above-diagonal elements are blanks. >>> >>> But I've been unable to find an R input function that doesn't end up tr= eating >>> the source data as a table, i.e., in the context of Babel source blocks= -- R >>> is "happy" to read a lower-diagonal structure.=C2=A0 See the appendix f= or an >>> example. >>> >>> Any suggestions?=C2=A0 Note that I'm happy to acknowledge that my own i= gnorance of >>> R and/or Babel might be the source of the problem.=C2=A0 If so, please = enlighten >>> me. >>> >>> Thanks. >>> >>> -- Mike >>> >>> [1] http://www.jstatsoft.org/v46/i03 >>> [2] https://github.com/eschulte/babel-presentation >>> >>> <<<<<< >>> Appendix >>> -------- >>> >>> >>> $ cat pascal.dat >>> 1 >>> 1 1 >>> 1 2 1 >>> 1 3 3 1 >>> 1 4 6 4 1 >>> >>> $ R --vanilla < pascal.R >>> >>> R version 2.15.0 (2012-03-30) >>> Copyright (C) 2012 The R Foundation for Statistical Computing >>> ISBN 3-900051-07-0 >>> Platform: x86_64-redhat-linux-gnu (64-bit) >>> . >>> . >>> . >>> >>>> x <- readLines("pascal.dat") >>>> x >>> [1] "1"=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "1 1"=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 "1 2 1"=C2=A0=C2=A0=C2=A0=C2=A0 "1 3 3 1"=C2=A0= =C2=A0 "1 4 6 4 1" >>>> str(x) >>> =C2=A0chr [1:5] "1" "1 1" "1 2 1" "1 3 3 1" "1 4 6 4 1" >>>>=20 >>>> y <- scan("pascal.dat") >>> Read 15 items >>>> y >>> =C2=A0[1] 1 1 1 1 2 1 1 3 3 1 1 4 6 4 1 >>>> str(y) >>> =C2=A0num [1:15] 1 1 1 1 2 1 1 3 3 1 ... >>>>=20 >>>> z <- read.table("pascal.dat", header=3DFALSE) >>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strin= gs,=C2=A0 :=20 >>> =C2=A0 line 1 did not have 5 elements >>> Calls: read.table -> scan >>> Execution halted >>> >>> --=20 Thomas S. Dye http://www.tsdye.com