we routinely do this, in the following way. We run jobs that may take up to a week to finish, and they are usually run on a cluster. Our setup relies on the following behavior for a script.

1. you can run the script anytime you want, and it can tell the state of the calculation by some means. If the script has never been run before, it submits the job to a queue and exits. If the job is still in the queue, it exits, and if the job is done, it gives you the result. We manage this in python using exceptions to control the workflow. You can even have the script output an org-task marked TODO so you get an item on your agenda to check it later to see if it is done.

Essentially the way it works is:
when you run a script there is some code that checks the state, which in our case is looking for certain output files, and flag files. If the output files do not exist, we submit the job to a queue, and create a jobid file containing the output from the queue submission command. Then the script raises a JobSubmitted exception which we can handle if we want to.

Later, I run the code block again. A  jobid file is found, so the script gets the jobid, and checks if the job is still in the queue. If so a JobQueued exception is raised, which we can handle if we want to. If not, we delete the jobid file, and proceed to analyze the output files.

All of that is done behind the scenes.

You can see how this is done at https://github.com/jkitchin/jasp/blob/master/jasp/jasp_extensions.py#L416

this is a highly customized approach to a very specific code, but you could adapt the ideas to other codes and computing environments I think.

John

-----------------------------------
John Kitchin
Associate Professor
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
http://kitchingroup.cheme.cmu.edu



On Thu, Apr 17, 2014 at 4:45 AM, Christoph Groth <christoph@grothesque.org> wrote:
Dear all,

I’m considering the use of org-babel for “reproducible science”.  Is
anyone successfully using org-babel for research that involves
calculations that typically take a few minutes up to several hours on a
single CPU (i.e. up to a few minutes on a cluster)?  It seems to me that
currently org-babel is not well suited for such an usage profile, but
I’d love to be taught otherwise.

Many thanks,
Christoph