we routinely do this, in the following way. We run jobs that may take up to
a week to finish, and they are usually run on a cluster. Our setup relies
on the following behavior for a script.

1. you can run the script anytime you want, and it can tell the state of
the calculation by some means. If the script has never been run before, it
submits the job to a queue and exits. If the job is still in the queue, it
exits, and if the job is done, it gives you the result. We manage this in
python using exceptions to control the workflow. You can even have the
script output an org-task marked TODO so you get an item on your agenda to
check it later to see if it is done.

Essentially the way it works is:
when you run a script there is some code that checks the state, which in
our case is looking for certain output files, and flag files. If the output
files do not exist, we submit the job to a queue, and create a jobid file
containing the output from the queue submission command. Then the script
raises a JobSubmitted exception which we can handle if we want to.

Later, I run the code block again. A  jobid file is found, so the script
gets the jobid, and checks if the job is still in the queue. If so a
JobQueued exception is raised, which we can handle if we want to. If not,
we delete the jobid file, and proceed to analyze the output files.

All of that is done behind the scenes.

You can see how this is done at
https://github.com/jkitchin/jasp/blob/master/jasp/jasp_extensions.py#L416

this is a highly customized approach to a very specific code, but you could
adapt the ideas to other codes and computing environments I think.

John

-----------------------------------
John Kitchin
Associate Professor
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
http://kitchingroup.cheme.cmu.edu



On Thu, Apr 17, 2014 at 4:45 AM, Christoph Groth
<christoph@grothesque.org>wrote:

> Dear all,
>
> I’m considering the use of org-babel for “reproducible science”.  Is
> anyone successfully using org-babel for research that involves
> calculations that typically take a few minutes up to several hours on a
> single CPU (i.e. up to a few minutes on a cluster)?  It seems to me that
> currently org-babel is not well suited for such an usage profile, but
> I’d love to be taught otherwise.
>
> Many thanks,
> Christoph
>
>
>