Dear all, I’m considering the use of org-babel for “reproducible science”. Is anyone successfully using org-babel for research that involves calculations that typically take a few minutes up to several hours on a single CPU (i.e. up to a few minutes on a cluster)? It seems to me that currently org-babel is not well suited for such an usage profile, but I’d love to be taught otherwise. Many thanks, Christoph
Hi Christoph,
Generally, org-babel is suited for long running computations. Its
caching can help you avoid rerunning code chunks. And long runtime does
not conflict with the idea of reproducible research, it just may be not
very comfortable for the user.
In my experience, one can avoid your problems best by structuring your
code and analysis in small, logical steps and save intermediate results,
such that you can restart in the middle of your analysis. Otherwise,
all other frameworks for reproducible research I know have the same
"limitations".
Best wishes
Julian
On 17/04/14 10:45, Christoph Groth wrote:
> Dear all,
>
> I’m considering the use of org-babel for “reproducible science”. Is
> anyone successfully using org-babel for research that involves
> calculations that typically take a few minutes up to several hours on a
> single CPU (i.e. up to a few minutes on a cluster)? It seems to me that
> currently org-babel is not well suited for such an usage profile, but
> I’d love to be taught otherwise.
>
> Many thanks,
> Christoph
>
>
>
Hi Julian,
> Generally, org-babel is suited for long running computations. Its
> caching can help you avoid rerunning code chunks. And long runtime does
> not conflict with the idea of reproducible research, it just may be not
> very comfortable for the user.
I agree of course that it’s generally a good idea to structure the
analysis in small logical steps and to save intermediate results. Let
me narrow down my question then: does org-babel support working with
tasks that take several minutes to execute? By this I mean (most
important first):
- Not freezing the editor during the execution of a task
- Being able to execute multiple independent tasks in parallel
- Being able to interrupt a running task
- Being able to inspect the incomplete output of a running task
I’d love to hear about any frameworks or workflows that fulfill these
requirements.
Cheers
Christoph
PS.
I’m using ipython notebooks but I’m not happy with them because of the
freezing problem and the complete lack of isolation of tasks within a
single notebook (they live in a common mutable namespace). I think a
useful framework must be “functional” at the highest level for caching
and dependencies to be useful.
Christoph Groth <christoph@grothesque.org> writes: > I’d love to hear about any frameworks or workflows that fulfill these > requirements. here is a link about that topic: ,----------------------------------------------- | http://draketo.de/english/emacs/parallel-babel `----------------------------------------------- -- cheers, Thorsten
Thorsten Jolitz <tjolitz@gmail.com> writes:
>> I’d love to hear about any frameworks or workflows that fulfill these
>> requirements.
>
> here is a link about that topic:
>
> ,-----------------------------------------------
> | http://draketo.de/english/emacs/parallel-babel
> `-----------------------------------------------
Thanks. I had found this one myself - this way of working does not seem
to allow to capture the results of the calculation in orgmode.
So probably a better way is to have some caching machinery in the
background. I see that babel has built-in caching but there seems to be
no way to support dependencies.
Christoph
[-- Attachment #1: Type: text/plain, Size: 1764 bytes --] Hey Christoph, Not sure if this would help you, but I've playing with the idea of extracting some of the functionality that Org babel provides and created a small tool for doing reproducible runs using Org mode syntax: https://gist.github.com/wallyqs/10989253 It is based on the Org ruby parser implementation so not all the latest syntax and features are covered... I named it `org-converge` in the sense that a reproducible run should converge into the same result, and also because in the long run I hope to achieve some similar functionality (or at least wrap around) what is possible to do currently with tools like chef, ansible etc... I' ve found it useful for doing reproducible runs along with other tools like Chef and Capistrano, and also for some development work. If you have some ideas or feature requests the feedback is welcome :) The repository is here: https://github.com/wallyqs/org-converge Hope this helps, - Wally On Thu, Apr 17, 2014 at 9:45 PM, Christoph Groth <christoph@grothesque.org>wrote: > Thorsten Jolitz <tjolitz@gmail.com> writes: > > >> I’d love to hear about any frameworks or workflows that fulfill these > >> requirements. > > > > here is a link about that topic: > > > > ,----------------------------------------------- > > | http://draketo.de/english/emacs/parallel-babel > > `----------------------------------------------- > > Thanks. I had found this one myself - this way of working does not seem > to allow to capture the results of the calculation in orgmode. > > So probably a better way is to have some caching machinery in the > background. I see that babel has built-in caching but there seems to be > no way to support dependencies. > > Christoph > > > [-- Attachment #2: Type: text/html, Size: 2593 bytes --]
Christoph Groth <christoph@grothesque.org> writes: > Thorsten Jolitz <tjolitz@gmail.com> writes: > >>> I’d love to hear about any frameworks or workflows that fulfill these >>> requirements. >> >> here is a link about that topic: >> >> ,----------------------------------------------- >> | http://draketo.de/english/emacs/parallel-babel >> `----------------------------------------------- > > Thanks. I had found this one myself - this way of working does not seem > to allow to capture the results of the calculation in orgmode. > > So probably a better way is to have some caching machinery in the > background. I see that babel has built-in caching but there seems to be > no way to support dependencies. > > Christoph > > Hi Christoph, Currently Babel does not have support for long-running computations, however this feature has been discussed and there are previous messages on the mailing list and even some comments in the code outlining a potential implementation strategy. So this may change if/when someone finds the time and drive to tackle the implementation. Best, -- Eric Schulte https://cs.unm.edu/~eschulte PGP: 0x614CA05D
On Thursday, 17 Apr 2014 at 11:52, Christoph Groth wrote: [...] > - Not freezing the editor during the execution of a task This is a problem but I solve this by running multiple instances of emacs. The difficulty arises if you wish to edit the same file in more than one emacs... but if you can avoid this, multiple instances work. > - Being able to execute multiple independent tasks in parallel Ditto. > - Being able to interrupt a running task Trickier but easy enough via the shell, of course (kill xxx). > - Being able to inspect the incomplete output of a running task Tricky as well unless the task sends its output to a file or even "tee"s the output so that you can "tail -f" that file. -- : Eric S Fraga (0xFFFCF67D), Emacs 24.4.50.2, Org release_8.2.5h-898-g11f8ef
Eric Schulte <schulte.eric@gmail.com> writes:
> Currently Babel does not have support for long-running computations,
> however this feature has been discussed and there are previous
> messages on the mailing list and even some comments in the code
> outlining a potential implementation strategy. So this may change
> if/when someone finds the time and drive to tackle the implementation.
Hmm, I searched the list archives before asking of course, but I didn’t
find any previous discussion. Could you please provide some references?
Thanks,
Christoph
Christoph Groth <christoph@grothesque.org> writes: > Eric Schulte <schulte.eric@gmail.com> writes: > >> Currently Babel does not have support for long-running computations, >> however this feature has been discussed and there are previous >> messages on the mailing list and even some comments in the code >> outlining a potential implementation strategy. So this may change >> if/when someone finds the time and drive to tackle the implementation. > > Hmm, I searched the list archives before asking of course, but I didn’t > find any previous discussion. Could you please provide some references? > One example: http://thread.gmane.org/gmane.emacs.orgmode/52852 -- Nick
[-- Attachment #1: Type: text/plain, Size: 2466 bytes --] we routinely do this, in the following way. We run jobs that may take up to a week to finish, and they are usually run on a cluster. Our setup relies on the following behavior for a script. 1. you can run the script anytime you want, and it can tell the state of the calculation by some means. If the script has never been run before, it submits the job to a queue and exits. If the job is still in the queue, it exits, and if the job is done, it gives you the result. We manage this in python using exceptions to control the workflow. You can even have the script output an org-task marked TODO so you get an item on your agenda to check it later to see if it is done. Essentially the way it works is: when you run a script there is some code that checks the state, which in our case is looking for certain output files, and flag files. If the output files do not exist, we submit the job to a queue, and create a jobid file containing the output from the queue submission command. Then the script raises a JobSubmitted exception which we can handle if we want to. Later, I run the code block again. A jobid file is found, so the script gets the jobid, and checks if the job is still in the queue. If so a JobQueued exception is raised, which we can handle if we want to. If not, we delete the jobid file, and proceed to analyze the output files. All of that is done behind the scenes. You can see how this is done at https://github.com/jkitchin/jasp/blob/master/jasp/jasp_extensions.py#L416 this is a highly customized approach to a very specific code, but you could adapt the ideas to other codes and computing environments I think. John ----------------------------------- John Kitchin Associate Professor Doherty Hall A207F Department of Chemical Engineering Carnegie Mellon University Pittsburgh, PA 15213 412-268-7803 http://kitchingroup.cheme.cmu.edu On Thu, Apr 17, 2014 at 4:45 AM, Christoph Groth <christoph@grothesque.org>wrote: > Dear all, > > I’m considering the use of org-babel for “reproducible science”. Is > anyone successfully using org-babel for research that involves > calculations that typically take a few minutes up to several hours on a > single CPU (i.e. up to a few minutes on a cluster)? It seems to me that > currently org-babel is not well suited for such an usage profile, but > I’d love to be taught otherwise. > > Many thanks, > Christoph > > > [-- Attachment #2: Type: text/html, Size: 3113 bytes --]
Thank you, John, for your detailed reply.
> we routinely do this, in the following way. We run jobs that may take
> up to a week to finish, and they are usually run on a cluster. Our
> setup relies on the following behavior for a script.
>
> 1. you can run the script anytime you want, and it can tell the state
> of the calculation by some means. If the script has never been run
> before, it submits the job to a queue and exits. If the job is still
> in the queue, it exits, and if the job is done, it gives you the
> result.
Returning immediately with whatever state the long-running computation
is in currently seems indeed to be a good solution. I think I will
setup something similar. Would you share your experience on the
following issues?
- How do you interface such jobs from orgmode? With org-babel, do
you execute Python code, or do you run shell commands?
- Do you run your Emacs on the master node of the cluster? Or does your
setup involve running emacs on the machine you are working on and
talking to the cluster over the network?
Cheers,
Christoph
[-- Attachment #1: Type: text/plain, Size: 1861 bytes --] On Sat, Apr 19, 2014 at 5:49 PM, Christoph Groth <christoph@grothesque.org>wrote: > The following message is a courtesy copy of an article > that has been posted to gmane.emacs.orgmode as well. > > Thank you, John, for your detailed reply. > > > we routinely do this, in the following way. We run jobs that may take > > up to a week to finish, and they are usually run on a cluster. Our > > setup relies on the following behavior for a script. > > > > 1. you can run the script anytime you want, and it can tell the state > > of the calculation by some means. If the script has never been run > > before, it submits the job to a queue and exits. If the job is still > > in the queue, it exits, and if the job is done, it gives you the > > result. > > Returning immediately with whatever state the long-running computation > is in currently seems indeed to be a good solution. I think I will > setup something similar. Would you share your experience on the > following issues? > > - How do you interface such jobs from orgmode? With org-babel, do > you execute Python code, or do you run shell commands? > We just have code blocks in org-mode. They are usually python blocks, but we can also do shell, emacs-lisp, etc... Anything that can run a system command, and get the output will do. > > - Do you run your Emacs on the master node of the cluster? Or does your > setup involve running emacs on the machine you are working on and > talking to the cluster over the network? > Currently, we run emacs on the master node. Once upon a time I had a sophisticated ssh setup that would allow me to do this on a local machine, rsync the necessary files to the cluster, ssh some commands to run the jobs, and then when they were done to rsync the files back. It was pretty sweet, but I have stopped used and maintaining it. > > Cheers, > Christoph > [-- Attachment #2: Type: text/html, Size: 2614 bytes --]
[-- Attachment #1: Type: text/plain, Size: 2014 bytes --] You are describing what I see as a need to run processes asynchronously. We use the Torque queue system for this. It takes care of scheduling and running jobs on a cluster. On Thu, Apr 17, 2014 at 6:52 AM, Christoph Groth <christoph@grothesque.org>wrote: > Hi Julian, > > > Generally, org-babel is suited for long running computations. Its > > caching can help you avoid rerunning code chunks. And long runtime does > > not conflict with the idea of reproducible research, it just may be not > > very comfortable for the user. > > I agree of course that it’s generally a good idea to structure the > analysis in small logical steps and to save intermediate results. Let > me narrow down my question then: does org-babel support working with > tasks that take several minutes to execute? By this I mean (most > important first): > > - Not freezing the editor during the execution of a task > You code-block should exit if a job is submitted, and record the job-id so that you can check it later. Alternatively, you could do this in ipython notebooks if you have python functions that return jobids. > - Being able to execute multiple independent tasks in parallel > No problem. this is what queue systems were designed for. > - Being able to interrupt a running task > you do this with the queue commands, e.g. qdel jobid > - Being able to inspect the incomplete output of a running task > This is just checking the output files in the running jobs directories. > > I’d love to hear about any frameworks or workflows that fulfill these > requirements. > > Cheers > > Christoph > > > PS. > > I’m using ipython notebooks but I’m not happy with them because of the > freezing problem and the complete lack of isolation of tasks within a > single notebook (they live in a common mutable namespace). I think a > useful framework must be “functional” at the highest level for caching > and dependencies to be useful. > > > [-- Attachment #2: Type: text/html, Size: 3092 bytes --]