On Sat, Apr 19, 2014 at 5:49 PM, Christoph Groth <christoph@grothesque.org> wrote:
The following message is a courtesy copy of an article
that has been posted to gmane.emacs.orgmode as well.

Thank you, John, for your detailed reply.

> we routinely do this, in the following way. We run jobs that may take
> up to a week to finish, and they are usually run on a cluster. Our
> setup relies on the following behavior for a script.
>
> 1. you can run the script anytime you want, and it can tell the state
> of the calculation by some means. If the script has never been run
> before, it submits the job to a queue and exits. If the job is still
> in the queue, it exits, and if the job is done, it gives you the
> result.

Returning immediately with whatever state the long-running computation
is in currently seems indeed to be a good solution.  I think I will
setup something similar.  Would you share your experience on the
following issues?

- How do you interface such jobs from orgmode?  With org-babel, do
  you execute Python code, or do you run shell commands?

We just have code blocks in org-mode. They are usually python blocks, but we can also do shell, emacs-lisp, etc... Anything that can run a system command, and get the output will do.
 

- Do you run your Emacs on the master node of the cluster?  Or does your
  setup involve running emacs on the machine you are working on and
  talking to the cluster over the network?

Currently, we run emacs on the master node. Once upon a time I had a sophisticated ssh setup that would allow me to do this on a local machine, rsync the necessary files to the cluster, ssh some commands to run the jobs, and then when they were done to rsync the files back. It was pretty sweet, but I have stopped used and maintaining it.
 

Cheers,
Christoph