From mboxrd@z Thu Jan 1 00:00:00 1970 From: John Kitchin Subject: Re: babel and long-running computations Date: Fri, 18 Apr 2014 09:20:46 -0400 Message-ID: References: <87lhv4qr7l.fsf@grothesque.org> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=047d7b6d95d07c1e2004f75104f3 Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:51440) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Wb8io-0000LE-26 for emacs-orgmode@gnu.org; Fri, 18 Apr 2014 09:20:51 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Wb8im-00044s-8Q for emacs-orgmode@gnu.org; Fri, 18 Apr 2014 09:20:49 -0400 Received: from mail-pa0-x232.google.com ([2607:f8b0:400e:c03::232]:61785) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Wb8il-00044b-U8 for emacs-orgmode@gnu.org; Fri, 18 Apr 2014 09:20:48 -0400 Received: by mail-pa0-f50.google.com with SMTP id kq14so1469060pab.23 for ; Fri, 18 Apr 2014 06:20:46 -0700 (PDT) In-Reply-To: <87lhv4qr7l.fsf@grothesque.org> List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Christoph Groth Cc: "emacs-orgmode@gnu.org" --047d7b6d95d07c1e2004f75104f3 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable we routinely do this, in the following way. We run jobs that may take up to a week to finish, and they are usually run on a cluster. Our setup relies on the following behavior for a script. 1. you can run the script anytime you want, and it can tell the state of the calculation by some means. If the script has never been run before, it submits the job to a queue and exits. If the job is still in the queue, it exits, and if the job is done, it gives you the result. We manage this in python using exceptions to control the workflow. You can even have the script output an org-task marked TODO so you get an item on your agenda to check it later to see if it is done. Essentially the way it works is: when you run a script there is some code that checks the state, which in our case is looking for certain output files, and flag files. If the output files do not exist, we submit the job to a queue, and create a jobid file containing the output from the queue submission command. Then the script raises a JobSubmitted exception which we can handle if we want to. Later, I run the code block again. A jobid file is found, so the script gets the jobid, and checks if the job is still in the queue. If so a JobQueued exception is raised, which we can handle if we want to. If not, we delete the jobid file, and proceed to analyze the output files. All of that is done behind the scenes. You can see how this is done at https://github.com/jkitchin/jasp/blob/master/jasp/jasp_extensions.py#L416 this is a highly customized approach to a very specific code, but you could adapt the ideas to other codes and computing environments I think. John ----------------------------------- John Kitchin Associate Professor Doherty Hall A207F Department of Chemical Engineering Carnegie Mellon University Pittsburgh, PA 15213 412-268-7803 http://kitchingroup.cheme.cmu.edu On Thu, Apr 17, 2014 at 4:45 AM, Christoph Groth wrote: > Dear all, > > I=E2=80=99m considering the use of org-babel for =E2=80=9Creproducible sc= ience=E2=80=9D. Is > anyone successfully using org-babel for research that involves > calculations that typically take a few minutes up to several hours on a > single CPU (i.e. up to a few minutes on a cluster)? It seems to me that > currently org-babel is not well suited for such an usage profile, but > I=E2=80=99d love to be taught otherwise. > > Many thanks, > Christoph > > > --047d7b6d95d07c1e2004f75104f3 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
we routinely do this, in the= following way. We run jobs that may take up to a week to finish, and they = are usually run on a cluster. Our setup relies on the following behavior fo= r a script.

1. you can run the script anytime you want, and it can tell the s= tate of the calculation by some means. If the script has never been run bef= ore, it submits the job to a queue and exits. If the job is still in the qu= eue, it exits, and if the job is done, it gives you the result. We manage t= his in python using exceptions to control the workflow. You can even have t= he script output an org-task marked TODO so you get an item on your agenda = to check it later to see if it is done.

Essentially the way it works is:
when you run a script t= here is some code that checks the state, which in our case is looking for c= ertain output files, and flag files. If the output files do not exist, we s= ubmit the job to a queue, and create a jobid file containing the output fro= m the queue submission command. Then the script raises a JobSubmitted excep= tion which we can handle if we want to.

Later, I run the code block again. A=C2=A0 jobid file is found, s= o the script gets the jobid, and checks if the job is still in the queue. I= f so a JobQueued exception is raised, which we can handle if we want to. If= not, we delete the jobid file, and proceed to analyze the output files.
All of that is done behind the scenes.
this is a highly customized approach to a very specific code, but= you could adapt the ideas to other codes and computing environments I thin= k.

John

-= ----------------------------------
John Kitchin
Associate Professor
Doherty Hall A207F
Department of = Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213<= br>412-268-7803
http://kitchingroup.cheme.cmu.edu



On Thu, Apr 17, 2014 at 4:45 AM, Christo= ph Groth <christoph@grothesque.org> wrote:
Dear all,

I=E2=80=99m considering the use of org-babel for =E2=80=9Creproducible scie= nce=E2=80=9D. =C2=A0Is
anyone successfully using org-babel for research that involves
calculations that typically take a few minutes up to several hours on a
single CPU (i.e. up to a few minutes on a cluster)? =C2=A0It seems to me th= at
currently org-babel is not well suited for such an usage profile, but
I=E2=80=99d love to be taught otherwise.

Many thanks,
Christoph



--047d7b6d95d07c1e2004f75104f3--