Re: org tables and R

From: Carsten Dominik <dominik@science.uva.nl>
To: Dan Davison <davison@stats.ox.ac.uk>
Cc: emacs org-mode mailing list <emacs-orgmode@gnu.org>
Subject: Re: org tables and R
Date: Thu, 22 Jan 2009 09:09:08 +0100	[thread overview]
Message-ID: <DBC20123-704B-4972-A8F4-0A371F73FF58@uva.nl> (raw)
In-Reply-To: <20081230193550.GA7961@stats.ox.ac.uk>

Hi everyone,

should we include Dan's code as a contributed package or
even merge it into org-plot?

- Carsten

On Dec 30, 2008, at 8:35 PM, Dan Davison wrote:

> Hi all,
>
> I've had a go at taking the org tables and R thing a bit further. I'm
> using two different #+ lines in the org buffer: Lines starting with
> #+TBLR: are in the standard org style (option:value) and can be used
> to specify certain transformations of the table and standard plots of
> the table data. In lines starting #+TBLR:: you can supply literal R
> code, giving you full control over what you do with the table. M-x
> org-table-R-apply makes happen whatever has been specified in those
> lines. As long as the transformation results in something reasonably
> one- or two-dimensional, then this is output to the org-buffer as an
> org table (you can choose whether or not it replaces the original
> table). You need to have R running in an inferior-ess-mode
> buffer. Then, if you have this table,
>
> | rowname | col1 | col2 |
> |---------+------+------|
> | row 1   |    1 |    2 |
> | row 2   |    3 |    4 |
> | total   |      |      |
> #+TBLR:: x[3,] <- x[1,] + x[2,]
> #+TBLR: rownames:1
>
> org-table-R-apply turns it into
>
> | rownames(x) | col1 | col2 |
> |-------------+------+------|
> | row 1       |    1 |    2 |
> | row 2       |    3 |    4 |
> | total       |    4 |    6 |
>
> The action:<something> option specifies off-the-shelf actions, without
> having to write any R code. E.g.
>
> | col1 | col2 |
> |------+------|
> |    1 |    2 |
> |    3 |    4 |
> #+TBLR: action:transpose
>
> produces
>
> |      | V1 | V2 |
> |------+----+----|
> | col1 |  1 |  3 |
> | col2 |  2 |  4 |
>
> and
>
> #+TBLR: action:plot columns:((1)(2)) lines:t rownames:1
>
> would plot column 2 against column 1.
>
> You can mix user-code and off-the-shelf code: in this somewhat
> solipsistic example user-supplied code is used to extract the day of
> week, and then action:tabulate is used to build a 2-way table:
>
> | author              | date                            |
> |---------------------+---------------------------------|
> | Carsten Dominik     | Thu, 12 Jun 2008 12:51:54 +0200 |
> | Carsten Dominik     | Wed, 11 Jun 2008 08:57:39 +0200 |
> | Adam Spiers         | Wed, 11 Jun 2008 12:06:23 +0100 |
> | Eddward DeVilla     | Wed, 11 Jun 2008 12:15:11 -0500 |
> | Eddward DeVilla     | Wed, 11 Jun 2008 20:09:50 -0500 |
> | Harri Kiiskinen     | Wed, 04 Jun 2008 16:38:37 +0200 |
> | Carsten Dominik     | Thu, 12 Jun 2008 14:15:49 +0200 |
> | Harri Kiiskinen     | Thu, 12 Jun 2008 14:31:49 +0200 |
> | Carsten Dominik     | Thu, 12 Jun 2008 16:17:59 +0200 |
> | Manoj Srivastava    | Mon, 09 Jun 2008 01:52:03 -0500 |
> | Daniel Clemente     | Wed, 04 Jun 2008 16:35:01 +0200 |
> | Carsten Dominik     | Mon, 9 Jun 2008 09:56:09 +0200  |
> | Carsten Dominik     | Tue, 10 Jun 2008 10:05:24 +0200 |
> | Adam Spiers         | Tue, 10 Jun 2008 10:57:52 +0100 |
> | Manuel Hermenegildo | Tue, 10 Jun 2008 13:50:44 +0200 |
> | Christian Egli      | Tue, 24 Jun 2008 13:27:05 +0200 |
> #+TBLR: columns:(1 2) action:tabulate
> #+TBLR:: x[,2] <- substr(x[,2], 1, 3)
>
> results in
>
> |                     | Mon | Thu | Tue | Wed |
> |---------------------+-----+-----+-----+-----|
> | Adam Spiers         |   0 |   0 |   1 |   1 |
> | Carsten Dominik     |   1 |   3 |   1 |   1 |
> | Christian Egli      |   0 |   0 |   1 |   0 |
> | Daniel Clemente     |   0 |   0 |   0 |   1 |
> | Eddward DeVilla     |   0 |   0 |   0 |   2 |
> | Harri Kiiskinen     |   0 |   1 |   0 |   1 |
> | Manoj Srivastava    |   1 |   0 |   0 |   0 |
> | Manuel Hermenegildo |   0 |   0 |   1 |   0 |
> #+TBLR: action:barplot rownames:1 columns:(1 2 3 4) showcode:t
>
> The #+TBLR: line below that produces a bar plot of the data.
>
> There are more details below. The code is at
>
> http://www.stats.ox.ac.uk/~davison/software/org-table-R/org-table-R.el
>
> It would be great to get any feedback on this. My thought was that
> something like this has the potential to provide a unified plotting
> and table formula interface, which might be attractive to people who
> know and/or like and/or want to learn R. There's lots more that could
> be done with this, and there must be all sorts of bugs in it at this
> stage. But if there's any interest in it then it could be
> improved. Anyway, read on if you're interested in hearing more details
> about the options and actions available.
>
> Dan
>
> Currently, the available actions are
>
> - plot
>   A simple plot of the x and y values. If no x-values are specified
>   then the the y values are plotted against 1,2,...,length(y). If
>   lines:t then the points are joined by lines.
> - lines
>   Equivalent to action:plot lines:t
> - points
>   Equivalent to action:plot lines:nil
> - barplot
>   Create a bar plot. A vertical bar is drawn for each row, with
>   height given by the value in that row. If multiple columns are
>   selected the bars for different columns are placed side-by-side.
> - hist
>   A histogram
> - density
>   A smoothed histogram
> - image
>   A plot of a table in which each cell is coloured according its  
> numeric value.
> - tabulate
>   Create a table containing counts of the distinct values of the
>   columns selected (if v columns are selected, the table will be
>   v-dimensional, giving the counts of joint occurrences of the
>   different values of the columns).
> - transpose
>   Transpose the table
>
>
> . Apart from tabulate and transpose, those produce plots of the
> selected columns using the R function of the same name (type
> e.g. ?barplot at the R prompt to see the help page).
>
> In addition to the action: option, the following options can be given
> on the #+TBLR: line:
>
> - showcode:t
>   org-table-R-apply creates an R function which hopefully implements
>   the requested actions (explicit user-supplied code comes first;
>   off-the-shelf afterwards). With this option that function
>   definition is displayed in a new R-mode buffer. That could serve as
>   a starting point for fine-tuning the behaviour. One option would be
>   to edit that function definition (say you call it f), save it in a
>   file, and then use
> #+TBLR:: source("/path/to/file.R") ; f(x)
>
> - rownames:<integer>
>   Specifies that column n contains the names of the rows of the
>   table. These must be unique.
>
> - replace:t
>   The original org-table is replaced by the text output (which will be
>   an org-table if the result is like a 1- or 2-dimensional array).
>
> - columns:<lisp-list>
>   This specifies the columns that the off-the-shelf action will
>   operate on (e.g. the columns you want to plot). The simplest case
>   is columns:j, where j is an integer. This could also be written
>   columns:(j). columns:((1)(2 3)) says that you want a graphic in
>   which columns 2 and 3 are plotted on the y-axis, and column 1 is
>   plotted on the x-axis. What form that will take depends on the
>   plotting function used (action:<something>). It might involve
>   multiple plots in a single figure, although to be fair I haven't
>   implemented most of the multiple column options so you're likely to
>   get an error with anything except for
>   action:<plot/lines/points>. I've given a description of how columns
>   are specified, and what sort of behaviour might be expected, in the
>   docstring to org-table-R-make-index-vectors. Basically, my
>   intention was that columns:((1)(2 3)) should correspond to
>   xy.coords(x=1, y=c(2,3)) in R. (See ?xy.coords if you want to get
>   involved in this.)
> 		
> - lines:t
>   When action:plot is given, this means that the points are joined
>   with lines. That's the same behaviour as action:lines.
>
> - output-to-buffer:t
>  This specifies that the text output from R goes into the org
>  buffer. You shouldn't normally need to use this option as the code
>  tries to work out whether it's appropriate. The rule it follows is
>  that the org buffer gets the output if any bespoke code has been
>  supplied on the #+TBLR:: line, or if an action: has been requested
>  that results in text (action:<tabulate/transpose> at the moment).
>
> p.s.
> I agree with Eric that we could do with a way of referencing tables
> from remote areas of an org file.
>
>
> -- 
> http://www.stats.ox.ac.uk/~davison
>
>
> _______________________________________________
> Emacs-orgmode mailing list
> Remember: use `Reply All' to send replies to the list.
> Emacs-orgmode@gnu.org
> http://lists.gnu.org/mailman/listinfo/emacs-orgmode