Hi Johannes,

Johannes Rainer <johannes.rainer@i-med.ac.at> writes:

> well, I'm using emacs/org for my data analyses in R. I thus combine
> documentation (i.e. the conclusions drawn from e.g. plots created in
> R) and the R code to perform the analysis in my org file. Since I'm
> analysing high throughput data some tasks to handle the data are quite
> time consuming.

A variation on Allen's Makefile approach, I structure my (Python) code
to be idempotent.  The "real" code never (re)runs unless its output
either doesn't exist or is older than the input.  Normally when org runs
the code's higher level interface it's essentially a no-op and very
fast.  If something does change it's usually because I'm doing
development and will be running the code outside of org enough that it's
reached its "no-op state" before I go back into emacs to refresh/edit
the org document.  Also, I keep things synchronous to avoid having to
multiple, competing processes running at once.

What I want is to take this approach and generalize into a more formal
workflow system.  One which handles caching files and in-memory objects
in a versioned store to provide data provenance and idempotent running
in a general sense.  There's already been some work on this ("bein",
"pyutilib workflow" and "sumatra" to name some Python oriented ones) but
so far nothing that "just fits".  I'm in the middle of trying my hand at
something better, but it's not yet usable.

-Brett.