From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dan Davison Subject: org tables and R Date: Tue, 30 Dec 2008 19:35:50 +0000 Message-ID: <20081230193550.GA7961@stats.ox.ac.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1LHkNy-0004e9-UD for emacs-orgmode@gnu.org; Tue, 30 Dec 2008 14:36:15 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1LHkNw-0004dI-DR for emacs-orgmode@gnu.org; Tue, 30 Dec 2008 14:36:13 -0500 Received: from [199.232.76.173] (port=57322 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LHkNv-0004dC-VN for emacs-orgmode@gnu.org; Tue, 30 Dec 2008 14:36:12 -0500 Received: from markov.stats.ox.ac.uk ([163.1.210.1]:38374) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1LHkNv-000871-F5 for emacs-orgmode@gnu.org; Tue, 30 Dec 2008 14:36:11 -0500 Received: from blackcap.stats.ox.ac.uk (blackcap.stats [163.1.210.5]) by markov.stats.ox.ac.uk (8.13.6/8.13.6) with ESMTP id mBUJa6xK018221 for ; Tue, 30 Dec 2008 19:36:06 GMT Content-Disposition: inline List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: emacs org-mode mailing list Hi all, I've had a go at taking the org tables and R thing a bit further. I'm using two different #+ lines in the org buffer: Lines starting with #+TBLR: are in the standard org style (option:value) and can be used to specify certain transformations of the table and standard plots of the table data. In lines starting #+TBLR:: you can supply literal R code, giving you full control over what you do with the table. M-x org-table-R-apply makes happen whatever has been specified in those lines. As long as the transformation results in something reasonably one- or two-dimensional, then this is output to the org-buffer as an org table (you can choose whether or not it replaces the original table). You need to have R running in an inferior-ess-mode buffer. Then, if you have this table, | rowname | col1 | col2 | |---------+------+------| | row 1 | 1 | 2 | | row 2 | 3 | 4 | | total | | | #+TBLR:: x[3,] <- x[1,] + x[2,] #+TBLR: rownames:1 org-table-R-apply turns it into | rownames(x) | col1 | col2 | |-------------+------+------| | row 1 | 1 | 2 | | row 2 | 3 | 4 | | total | 4 | 6 | The action: option specifies off-the-shelf actions, without having to write any R code. E.g. | col1 | col2 | |------+------| | 1 | 2 | | 3 | 4 | #+TBLR: action:transpose produces | | V1 | V2 | |------+----+----| | col1 | 1 | 3 | | col2 | 2 | 4 | and #+TBLR: action:plot columns:((1)(2)) lines:t rownames:1 would plot column 2 against column 1. You can mix user-code and off-the-shelf code: in this somewhat solipsistic example user-supplied code is used to extract the day of week, and then action:tabulate is used to build a 2-way table: | author | date | |---------------------+---------------------------------| | Carsten Dominik | Thu, 12 Jun 2008 12:51:54 +0200 | | Carsten Dominik | Wed, 11 Jun 2008 08:57:39 +0200 | | Adam Spiers | Wed, 11 Jun 2008 12:06:23 +0100 | | Eddward DeVilla | Wed, 11 Jun 2008 12:15:11 -0500 | | Eddward DeVilla | Wed, 11 Jun 2008 20:09:50 -0500 | | Harri Kiiskinen | Wed, 04 Jun 2008 16:38:37 +0200 | | Carsten Dominik | Thu, 12 Jun 2008 14:15:49 +0200 | | Harri Kiiskinen | Thu, 12 Jun 2008 14:31:49 +0200 | | Carsten Dominik | Thu, 12 Jun 2008 16:17:59 +0200 | | Manoj Srivastava | Mon, 09 Jun 2008 01:52:03 -0500 | | Daniel Clemente | Wed, 04 Jun 2008 16:35:01 +0200 | | Carsten Dominik | Mon, 9 Jun 2008 09:56:09 +0200 | | Carsten Dominik | Tue, 10 Jun 2008 10:05:24 +0200 | | Adam Spiers | Tue, 10 Jun 2008 10:57:52 +0100 | | Manuel Hermenegildo | Tue, 10 Jun 2008 13:50:44 +0200 | | Christian Egli | Tue, 24 Jun 2008 13:27:05 +0200 | #+TBLR: columns:(1 2) action:tabulate #+TBLR:: x[,2] <- substr(x[,2], 1, 3) results in | | Mon | Thu | Tue | Wed | |---------------------+-----+-----+-----+-----| | Adam Spiers | 0 | 0 | 1 | 1 | | Carsten Dominik | 1 | 3 | 1 | 1 | | Christian Egli | 0 | 0 | 1 | 0 | | Daniel Clemente | 0 | 0 | 0 | 1 | | Eddward DeVilla | 0 | 0 | 0 | 2 | | Harri Kiiskinen | 0 | 1 | 0 | 1 | | Manoj Srivastava | 1 | 0 | 0 | 0 | | Manuel Hermenegildo | 0 | 0 | 1 | 0 | #+TBLR: action:barplot rownames:1 columns:(1 2 3 4) showcode:t The #+TBLR: line below that produces a bar plot of the data. There are more details below. The code is at http://www.stats.ox.ac.uk/~davison/software/org-table-R/org-table-R.el It would be great to get any feedback on this. My thought was that something like this has the potential to provide a unified plotting and table formula interface, which might be attractive to people who know and/or like and/or want to learn R. There's lots more that could be done with this, and there must be all sorts of bugs in it at this stage. But if there's any interest in it then it could be improved. Anyway, read on if you're interested in hearing more details about the options and actions available. Dan Currently, the available actions are - plot A simple plot of the x and y values. If no x-values are specified then the the y values are plotted against 1,2,...,length(y). If lines:t then the points are joined by lines. - lines Equivalent to action:plot lines:t - points Equivalent to action:plot lines:nil - barplot Create a bar plot. A vertical bar is drawn for each row, with height given by the value in that row. If multiple columns are selected the bars for different columns are placed side-by-side. - hist A histogram - density A smoothed histogram - image A plot of a table in which each cell is coloured according its numeric value. - tabulate Create a table containing counts of the distinct values of the columns selected (if v columns are selected, the table will be v-dimensional, giving the counts of joint occurrences of the different values of the columns). - transpose Transpose the table . Apart from tabulate and transpose, those produce plots of the selected columns using the R function of the same name (type e.g. ?barplot at the R prompt to see the help page). In addition to the action: option, the following options can be given on the #+TBLR: line: - showcode:t org-table-R-apply creates an R function which hopefully implements the requested actions (explicit user-supplied code comes first; off-the-shelf afterwards). With this option that function definition is displayed in a new R-mode buffer. That could serve as a starting point for fine-tuning the behaviour. One option would be to edit that function definition (say you call it f), save it in a file, and then use #+TBLR:: source("/path/to/file.R") ; f(x) - rownames: Specifies that column n contains the names of the rows of the table. These must be unique. - replace:t The original org-table is replaced by the text output (which will be an org-table if the result is like a 1- or 2-dimensional array). - columns: This specifies the columns that the off-the-shelf action will operate on (e.g. the columns you want to plot). The simplest case is columns:j, where j is an integer. This could also be written columns:(j). columns:((1)(2 3)) says that you want a graphic in which columns 2 and 3 are plotted on the y-axis, and column 1 is plotted on the x-axis. What form that will take depends on the plotting function used (action:). It might involve multiple plots in a single figure, although to be fair I haven't implemented most of the multiple column options so you're likely to get an error with anything except for action:. I've given a description of how columns are specified, and what sort of behaviour might be expected, in the docstring to org-table-R-make-index-vectors. Basically, my intention was that columns:((1)(2 3)) should correspond to xy.coords(x=1, y=c(2,3)) in R. (See ?xy.coords if you want to get involved in this.) - lines:t When action:plot is given, this means that the points are joined with lines. That's the same behaviour as action:lines. - output-to-buffer:t This specifies that the text output from R goes into the org buffer. You shouldn't normally need to use this option as the code tries to work out whether it's appropriate. The rule it follows is that the org buffer gets the output if any bespoke code has been supplied on the #+TBLR:: line, or if an action: has been requested that results in text (action: at the moment). p.s. I agree with Eric that we could do with a way of referencing tables from remote areas of an org file. -- http://www.stats.ox.ac.uk/~davison