* babel and long-running computations @ 2014-04-17 8:45 Christoph Groth 2014-04-17 9:47 ` Julian Gehring 2014-04-18 13:20 ` John Kitchin 0 siblings, 2 replies; 14+ messages in thread From: Christoph Groth @ 2014-04-17 8:45 UTC (permalink / raw) To: emacs-orgmode Dear all, I’m considering the use of org-babel for “reproducible science”. Is anyone successfully using org-babel for research that involves calculations that typically take a few minutes up to several hours on a single CPU (i.e. up to a few minutes on a cluster)? It seems to me that currently org-babel is not well suited for such an usage profile, but I’d love to be taught otherwise. Many thanks, Christoph ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: babel and long-running computations 2014-04-17 8:45 babel and long-running computations Christoph Groth @ 2014-04-17 9:47 ` Julian Gehring 2014-04-17 10:52 ` Christoph Groth 2014-04-18 13:20 ` John Kitchin 1 sibling, 1 reply; 14+ messages in thread From: Julian Gehring @ 2014-04-17 9:47 UTC (permalink / raw) To: emacs-orgmode Hi Christoph, Generally, org-babel is suited for long running computations. Its caching can help you avoid rerunning code chunks. And long runtime does not conflict with the idea of reproducible research, it just may be not very comfortable for the user. In my experience, one can avoid your problems best by structuring your code and analysis in small, logical steps and save intermediate results, such that you can restart in the middle of your analysis. Otherwise, all other frameworks for reproducible research I know have the same "limitations". Best wishes Julian On 17/04/14 10:45, Christoph Groth wrote: > Dear all, > > I’m considering the use of org-babel for “reproducible science”. Is > anyone successfully using org-babel for research that involves > calculations that typically take a few minutes up to several hours on a > single CPU (i.e. up to a few minutes on a cluster)? It seems to me that > currently org-babel is not well suited for such an usage profile, but > I’d love to be taught otherwise. > > Many thanks, > Christoph > > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: babel and long-running computations 2014-04-17 9:47 ` Julian Gehring @ 2014-04-17 10:52 ` Christoph Groth 2014-04-17 12:06 ` Thorsten Jolitz ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Christoph Groth @ 2014-04-17 10:52 UTC (permalink / raw) To: emacs-orgmode Hi Julian, > Generally, org-babel is suited for long running computations. Its > caching can help you avoid rerunning code chunks. And long runtime does > not conflict with the idea of reproducible research, it just may be not > very comfortable for the user. I agree of course that it’s generally a good idea to structure the analysis in small logical steps and to save intermediate results. Let me narrow down my question then: does org-babel support working with tasks that take several minutes to execute? By this I mean (most important first): - Not freezing the editor during the execution of a task - Being able to execute multiple independent tasks in parallel - Being able to interrupt a running task - Being able to inspect the incomplete output of a running task I’d love to hear about any frameworks or workflows that fulfill these requirements. Cheers Christoph PS. I’m using ipython notebooks but I’m not happy with them because of the freezing problem and the complete lack of isolation of tasks within a single notebook (they live in a common mutable namespace). I think a useful framework must be “functional” at the highest level for caching and dependencies to be useful. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: babel and long-running computations 2014-04-17 10:52 ` Christoph Groth @ 2014-04-17 12:06 ` Thorsten Jolitz 2014-04-17 12:45 ` Christoph Groth 2014-04-17 15:33 ` Eric S Fraga 2014-04-20 13:29 ` John Kitchin 2 siblings, 1 reply; 14+ messages in thread From: Thorsten Jolitz @ 2014-04-17 12:06 UTC (permalink / raw) To: emacs-orgmode Christoph Groth <christoph@grothesque.org> writes: > I’d love to hear about any frameworks or workflows that fulfill these > requirements. here is a link about that topic: ,----------------------------------------------- | http://draketo.de/english/emacs/parallel-babel `----------------------------------------------- -- cheers, Thorsten ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: babel and long-running computations 2014-04-17 12:06 ` Thorsten Jolitz @ 2014-04-17 12:45 ` Christoph Groth 2014-04-17 14:58 ` Waldemar Quevedo 2014-04-17 15:29 ` Eric Schulte 0 siblings, 2 replies; 14+ messages in thread From: Christoph Groth @ 2014-04-17 12:45 UTC (permalink / raw) To: emacs-orgmode Thorsten Jolitz <tjolitz@gmail.com> writes: >> I’d love to hear about any frameworks or workflows that fulfill these >> requirements. > > here is a link about that topic: > > ,----------------------------------------------- > | http://draketo.de/english/emacs/parallel-babel > `----------------------------------------------- Thanks. I had found this one myself - this way of working does not seem to allow to capture the results of the calculation in orgmode. So probably a better way is to have some caching machinery in the background. I see that babel has built-in caching but there seems to be no way to support dependencies. Christoph ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: babel and long-running computations 2014-04-17 12:45 ` Christoph Groth @ 2014-04-17 14:58 ` Waldemar Quevedo 2014-04-17 15:29 ` Eric Schulte 1 sibling, 0 replies; 14+ messages in thread From: Waldemar Quevedo @ 2014-04-17 14:58 UTC (permalink / raw) To: Christoph Groth; +Cc: emacs-orgmode [-- Attachment #1: Type: text/plain, Size: 1764 bytes --] Hey Christoph, Not sure if this would help you, but I've playing with the idea of extracting some of the functionality that Org babel provides and created a small tool for doing reproducible runs using Org mode syntax: https://gist.github.com/wallyqs/10989253 It is based on the Org ruby parser implementation so not all the latest syntax and features are covered... I named it `org-converge` in the sense that a reproducible run should converge into the same result, and also because in the long run I hope to achieve some similar functionality (or at least wrap around) what is possible to do currently with tools like chef, ansible etc... I' ve found it useful for doing reproducible runs along with other tools like Chef and Capistrano, and also for some development work. If you have some ideas or feature requests the feedback is welcome :) The repository is here: https://github.com/wallyqs/org-converge Hope this helps, - Wally On Thu, Apr 17, 2014 at 9:45 PM, Christoph Groth <christoph@grothesque.org>wrote: > Thorsten Jolitz <tjolitz@gmail.com> writes: > > >> I’d love to hear about any frameworks or workflows that fulfill these > >> requirements. > > > > here is a link about that topic: > > > > ,----------------------------------------------- > > | http://draketo.de/english/emacs/parallel-babel > > `----------------------------------------------- > > Thanks. I had found this one myself - this way of working does not seem > to allow to capture the results of the calculation in orgmode. > > So probably a better way is to have some caching machinery in the > background. I see that babel has built-in caching but there seems to be > no way to support dependencies. > > Christoph > > > [-- Attachment #2: Type: text/html, Size: 2593 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: babel and long-running computations 2014-04-17 12:45 ` Christoph Groth 2014-04-17 14:58 ` Waldemar Quevedo @ 2014-04-17 15:29 ` Eric Schulte 2014-04-17 16:29 ` Christoph Groth 1 sibling, 1 reply; 14+ messages in thread From: Eric Schulte @ 2014-04-17 15:29 UTC (permalink / raw) To: Christoph Groth; +Cc: emacs-orgmode Christoph Groth <christoph@grothesque.org> writes: > Thorsten Jolitz <tjolitz@gmail.com> writes: > >>> I’d love to hear about any frameworks or workflows that fulfill these >>> requirements. >> >> here is a link about that topic: >> >> ,----------------------------------------------- >> | http://draketo.de/english/emacs/parallel-babel >> `----------------------------------------------- > > Thanks. I had found this one myself - this way of working does not seem > to allow to capture the results of the calculation in orgmode. > > So probably a better way is to have some caching machinery in the > background. I see that babel has built-in caching but there seems to be > no way to support dependencies. > > Christoph > > Hi Christoph, Currently Babel does not have support for long-running computations, however this feature has been discussed and there are previous messages on the mailing list and even some comments in the code outlining a potential implementation strategy. So this may change if/when someone finds the time and drive to tackle the implementation. Best, -- Eric Schulte https://cs.unm.edu/~eschulte PGP: 0x614CA05D ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: babel and long-running computations 2014-04-17 15:29 ` Eric Schulte @ 2014-04-17 16:29 ` Christoph Groth 2014-04-17 16:44 ` Nick Dokos 0 siblings, 1 reply; 14+ messages in thread From: Christoph Groth @ 2014-04-17 16:29 UTC (permalink / raw) To: emacs-orgmode Eric Schulte <schulte.eric@gmail.com> writes: > Currently Babel does not have support for long-running computations, > however this feature has been discussed and there are previous > messages on the mailing list and even some comments in the code > outlining a potential implementation strategy. So this may change > if/when someone finds the time and drive to tackle the implementation. Hmm, I searched the list archives before asking of course, but I didn’t find any previous discussion. Could you please provide some references? Thanks, Christoph ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: babel and long-running computations 2014-04-17 16:29 ` Christoph Groth @ 2014-04-17 16:44 ` Nick Dokos 0 siblings, 0 replies; 14+ messages in thread From: Nick Dokos @ 2014-04-17 16:44 UTC (permalink / raw) To: emacs-orgmode Christoph Groth <christoph@grothesque.org> writes: > Eric Schulte <schulte.eric@gmail.com> writes: > >> Currently Babel does not have support for long-running computations, >> however this feature has been discussed and there are previous >> messages on the mailing list and even some comments in the code >> outlining a potential implementation strategy. So this may change >> if/when someone finds the time and drive to tackle the implementation. > > Hmm, I searched the list archives before asking of course, but I didn’t > find any previous discussion. Could you please provide some references? > One example: http://thread.gmane.org/gmane.emacs.orgmode/52852 -- Nick ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: babel and long-running computations 2014-04-17 10:52 ` Christoph Groth 2014-04-17 12:06 ` Thorsten Jolitz @ 2014-04-17 15:33 ` Eric S Fraga 2014-04-20 13:29 ` John Kitchin 2 siblings, 0 replies; 14+ messages in thread From: Eric S Fraga @ 2014-04-17 15:33 UTC (permalink / raw) To: Christoph Groth; +Cc: emacs-orgmode On Thursday, 17 Apr 2014 at 11:52, Christoph Groth wrote: [...] > - Not freezing the editor during the execution of a task This is a problem but I solve this by running multiple instances of emacs. The difficulty arises if you wish to edit the same file in more than one emacs... but if you can avoid this, multiple instances work. > - Being able to execute multiple independent tasks in parallel Ditto. > - Being able to interrupt a running task Trickier but easy enough via the shell, of course (kill xxx). > - Being able to inspect the incomplete output of a running task Tricky as well unless the task sends its output to a file or even "tee"s the output so that you can "tail -f" that file. -- : Eric S Fraga (0xFFFCF67D), Emacs 24.4.50.2, Org release_8.2.5h-898-g11f8ef ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: babel and long-running computations 2014-04-17 10:52 ` Christoph Groth 2014-04-17 12:06 ` Thorsten Jolitz 2014-04-17 15:33 ` Eric S Fraga @ 2014-04-20 13:29 ` John Kitchin 2 siblings, 0 replies; 14+ messages in thread From: John Kitchin @ 2014-04-20 13:29 UTC (permalink / raw) To: Christoph Groth; +Cc: emacs-orgmode@gnu.org [-- Attachment #1: Type: text/plain, Size: 2014 bytes --] You are describing what I see as a need to run processes asynchronously. We use the Torque queue system for this. It takes care of scheduling and running jobs on a cluster. On Thu, Apr 17, 2014 at 6:52 AM, Christoph Groth <christoph@grothesque.org>wrote: > Hi Julian, > > > Generally, org-babel is suited for long running computations. Its > > caching can help you avoid rerunning code chunks. And long runtime does > > not conflict with the idea of reproducible research, it just may be not > > very comfortable for the user. > > I agree of course that it’s generally a good idea to structure the > analysis in small logical steps and to save intermediate results. Let > me narrow down my question then: does org-babel support working with > tasks that take several minutes to execute? By this I mean (most > important first): > > - Not freezing the editor during the execution of a task > You code-block should exit if a job is submitted, and record the job-id so that you can check it later. Alternatively, you could do this in ipython notebooks if you have python functions that return jobids. > - Being able to execute multiple independent tasks in parallel > No problem. this is what queue systems were designed for. > - Being able to interrupt a running task > you do this with the queue commands, e.g. qdel jobid > - Being able to inspect the incomplete output of a running task > This is just checking the output files in the running jobs directories. > > I’d love to hear about any frameworks or workflows that fulfill these > requirements. > > Cheers > > Christoph > > > PS. > > I’m using ipython notebooks but I’m not happy with them because of the > freezing problem and the complete lack of isolation of tasks within a > single notebook (they live in a common mutable namespace). I think a > useful framework must be “functional” at the highest level for caching > and dependencies to be useful. > > > [-- Attachment #2: Type: text/html, Size: 3092 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: babel and long-running computations 2014-04-17 8:45 babel and long-running computations Christoph Groth 2014-04-17 9:47 ` Julian Gehring @ 2014-04-18 13:20 ` John Kitchin 2014-04-19 21:49 ` Christoph Groth 1 sibling, 1 reply; 14+ messages in thread From: John Kitchin @ 2014-04-18 13:20 UTC (permalink / raw) To: Christoph Groth; +Cc: emacs-orgmode@gnu.org [-- Attachment #1: Type: text/plain, Size: 2466 bytes --] we routinely do this, in the following way. We run jobs that may take up to a week to finish, and they are usually run on a cluster. Our setup relies on the following behavior for a script. 1. you can run the script anytime you want, and it can tell the state of the calculation by some means. If the script has never been run before, it submits the job to a queue and exits. If the job is still in the queue, it exits, and if the job is done, it gives you the result. We manage this in python using exceptions to control the workflow. You can even have the script output an org-task marked TODO so you get an item on your agenda to check it later to see if it is done. Essentially the way it works is: when you run a script there is some code that checks the state, which in our case is looking for certain output files, and flag files. If the output files do not exist, we submit the job to a queue, and create a jobid file containing the output from the queue submission command. Then the script raises a JobSubmitted exception which we can handle if we want to. Later, I run the code block again. A jobid file is found, so the script gets the jobid, and checks if the job is still in the queue. If so a JobQueued exception is raised, which we can handle if we want to. If not, we delete the jobid file, and proceed to analyze the output files. All of that is done behind the scenes. You can see how this is done at https://github.com/jkitchin/jasp/blob/master/jasp/jasp_extensions.py#L416 this is a highly customized approach to a very specific code, but you could adapt the ideas to other codes and computing environments I think. John ----------------------------------- John Kitchin Associate Professor Doherty Hall A207F Department of Chemical Engineering Carnegie Mellon University Pittsburgh, PA 15213 412-268-7803 http://kitchingroup.cheme.cmu.edu On Thu, Apr 17, 2014 at 4:45 AM, Christoph Groth <christoph@grothesque.org>wrote: > Dear all, > > I’m considering the use of org-babel for “reproducible science”. Is > anyone successfully using org-babel for research that involves > calculations that typically take a few minutes up to several hours on a > single CPU (i.e. up to a few minutes on a cluster)? It seems to me that > currently org-babel is not well suited for such an usage profile, but > I’d love to be taught otherwise. > > Many thanks, > Christoph > > > [-- Attachment #2: Type: text/html, Size: 3113 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: babel and long-running computations 2014-04-18 13:20 ` John Kitchin @ 2014-04-19 21:49 ` Christoph Groth 2014-04-20 13:21 ` John Kitchin 0 siblings, 1 reply; 14+ messages in thread From: Christoph Groth @ 2014-04-19 21:49 UTC (permalink / raw) To: emacs-orgmode; +Cc: John Kitchin Thank you, John, for your detailed reply. > we routinely do this, in the following way. We run jobs that may take > up to a week to finish, and they are usually run on a cluster. Our > setup relies on the following behavior for a script. > > 1. you can run the script anytime you want, and it can tell the state > of the calculation by some means. If the script has never been run > before, it submits the job to a queue and exits. If the job is still > in the queue, it exits, and if the job is done, it gives you the > result. Returning immediately with whatever state the long-running computation is in currently seems indeed to be a good solution. I think I will setup something similar. Would you share your experience on the following issues? - How do you interface such jobs from orgmode? With org-babel, do you execute Python code, or do you run shell commands? - Do you run your Emacs on the master node of the cluster? Or does your setup involve running emacs on the machine you are working on and talking to the cluster over the network? Cheers, Christoph ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: babel and long-running computations 2014-04-19 21:49 ` Christoph Groth @ 2014-04-20 13:21 ` John Kitchin 0 siblings, 0 replies; 14+ messages in thread From: John Kitchin @ 2014-04-20 13:21 UTC (permalink / raw) To: Christoph Groth, emacs-orgmode@gnu.org [-- Attachment #1: Type: text/plain, Size: 1861 bytes --] On Sat, Apr 19, 2014 at 5:49 PM, Christoph Groth <christoph@grothesque.org>wrote: > The following message is a courtesy copy of an article > that has been posted to gmane.emacs.orgmode as well. > > Thank you, John, for your detailed reply. > > > we routinely do this, in the following way. We run jobs that may take > > up to a week to finish, and they are usually run on a cluster. Our > > setup relies on the following behavior for a script. > > > > 1. you can run the script anytime you want, and it can tell the state > > of the calculation by some means. If the script has never been run > > before, it submits the job to a queue and exits. If the job is still > > in the queue, it exits, and if the job is done, it gives you the > > result. > > Returning immediately with whatever state the long-running computation > is in currently seems indeed to be a good solution. I think I will > setup something similar. Would you share your experience on the > following issues? > > - How do you interface such jobs from orgmode? With org-babel, do > you execute Python code, or do you run shell commands? > We just have code blocks in org-mode. They are usually python blocks, but we can also do shell, emacs-lisp, etc... Anything that can run a system command, and get the output will do. > > - Do you run your Emacs on the master node of the cluster? Or does your > setup involve running emacs on the machine you are working on and > talking to the cluster over the network? > Currently, we run emacs on the master node. Once upon a time I had a sophisticated ssh setup that would allow me to do this on a local machine, rsync the necessary files to the cluster, ssh some commands to run the jobs, and then when they were done to rsync the files back. It was pretty sweet, but I have stopped used and maintaining it. > > Cheers, > Christoph > [-- Attachment #2: Type: text/html, Size: 2614 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2014-04-20 13:29 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-04-17 8:45 babel and long-running computations Christoph Groth 2014-04-17 9:47 ` Julian Gehring 2014-04-17 10:52 ` Christoph Groth 2014-04-17 12:06 ` Thorsten Jolitz 2014-04-17 12:45 ` Christoph Groth 2014-04-17 14:58 ` Waldemar Quevedo 2014-04-17 15:29 ` Eric Schulte 2014-04-17 16:29 ` Christoph Groth 2014-04-17 16:44 ` Nick Dokos 2014-04-17 15:33 ` Eric S Fraga 2014-04-20 13:29 ` John Kitchin 2014-04-18 13:20 ` John Kitchin 2014-04-19 21:49 ` Christoph Groth 2014-04-20 13:21 ` John Kitchin
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).