From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andreas Leha Subject: Re: Emacs/ESS/org freezes/hangs on big data/ RAM(~256GB) processes when run in org/babel Date: Sat, 20 Jun 2015 22:20:14 +0100 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:49842) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z6QBS-0001YG-37 for emacs-orgmode@gnu.org; Sat, 20 Jun 2015 17:20:15 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z6QBN-0001c7-2L for emacs-orgmode@gnu.org; Sat, 20 Jun 2015 17:20:14 -0400 Received: from plane.gmane.org ([80.91.229.3]:46702) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z6QBM-0001Z5-Nu for emacs-orgmode@gnu.org; Sat, 20 Jun 2015 17:20:08 -0400 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1Z6QBK-0007q9-Iv for emacs-orgmode@gnu.org; Sat, 20 Jun 2015 23:20:06 +0200 Received: from cpc33-cmbg15-2-0-cust4.5-4.cable.virginm.net ([81.102.136.5]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 20 Jun 2015 23:20:06 +0200 Received: from andreas.leha by cpc33-cmbg15-2-0-cust4.5-4.cable.virginm.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 20 Jun 2015 23:20:06 +0200 List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: emacs-orgmode@gnu.org Rainer M Krug writes: > Andreas Leha writes: > >> Hi Rainer, > > Hi Andreas, > >> >> Rainer M Krug writes: >>> "Charles C. Berry" writes: >>> >>>> On Wed, 17 Jun 2015, William Denton wrote: >>>> >>>>> On 17 June 2015, Xebar Saram wrote: >>>>> >>>>>> I do alot of modeling work that involves using huge datasets and run >>>>>> process intensive R processes (such as complex mixed models, Gamms etc). in >>>>>> R studio all works well yet when i use the orgmode eval on R code blocks it >>>>>> works well for small simple process but 90% of the time when dealing with >>>>>> complex models and bug data (up to 256GB) it will just freeze emacs/ess. >>>>>> sometimes i can C-c or C-g it and other times i need to physically kill >>>>>> emacs. >>>>> >>>>> I've been having the same problem for a while, but wasn't able to >>>>> isolate it any more than large data sets, lack of memory, and heavy >>>>> CPU usage. Sometimes everything hangs and I need to power cycle the >>>>> computer. :( >>>>> >>>> >>>> And you (both) have `ess-eval-visibly' set to nil, right? >>>> >>>> I do statistical genomics, which can be compute intensive. Sometimes >>>> processes need to run for a while, and I get impatient having to wait. >>>> >>>> I wrote (and use) ox-ravel[1] to speed up my write-run-revise cycle in >>>> org-mode. >>>> >>>> Basically, ravel will export Org mode to a format that knitr (and the >>>> like) can run - turning src blocks into `code chunks'. That allows me >>>> to set the cache=TRUE chunk option, etc. I run knitr on the exported >>>> document to initialize objects for long running computations or to >>>> produce a finished report. >>>> >>>> When I start a session, I run knitr in the R session, then all the >>>> cached objects are loaded in and ready to use. >>>> >>>> If I write a src block I know will take a long time to export, I >>>> export from org mode to update the knitr document and re-knit it to >>>> refresh the cache. >>> >>> I have a similar workflow, only that I use a package like >>> approach, i.e. I tangle function definitions in a folder ./R, data into >>> ./data (which makes it possible to share org defined variables with R >>> running outside org) and scripts, i.e. the things which do a analysis, >>> import data, ... i.e. which might take long, into a folder ./scripts/. I >>> then add the usual R package infrastructure files (DESCRIPTION, >>> NAMESPACE, ...). >>> Then I have one file tangled into ./scripts/init.R: >>> >>> #+begin_src R :tangle ./scripts/init.R >>> library(devtools) >>> load_all() >>> #+end_src >>> >>> >>> and one for the analysis: >>> >>> #+begin_src R :tangle ./scripts/myAnalysis.R >>> ## Do some really time intensive and horribly complicated and important >>> ## stuff here >>> save( >>> fileNames, >>> bw, >>> cols, >>> labels, >>> fit, >>> dens, >>> gof, >>> gofPerProf, >>> file = "./cache/results.myAnalysis.rds" >>> ) >>> #+end_src >>> >>> >>> Now after tangling, I have my code easily available in a new R session: >>> >>> 1) start R in the directory in which the DESCRIPTION file is, >>> 2) run source("./scripts/init.R") >>> >>> and I have all my functions and data available. >>> >>> To run a analysis, I do >>> >>> 3) source("./scripts/myAnalysis.R") >>> >>> and the results are saved in a file fn >>> >>> To analyse the data further, I can then simply use >>> >>> #+begin_src R :tangle ./scripts/myAnalysis.R >>> fitSing <- attach("./cache/results.myAnalysis.rds") >>> #+end_src >>> >>> >>> so they won't interfere with my environment in R. >>> >>> I can finally remove the attached environment by doing >>> >>> #+begin_src R :tangle ./scripts/myAnalysis.R >>> detach( >>> name = attr(fitSing, "name"), >>> character.only = TRUE >>> ) >>> #+end_src >>> >>> Through these caching and compartmentalizing, I can easily do some >>> things outside org and some inside, and easily combine all the data. >>> >>> Further advantage: I can actually create the package and send it to >>> somebody for testing and review and it should run out of the box, as in >>> the DESCRIPTION file all dependencies are defined. >>> >>> I am using this approach at the moment for a paper and which will also >>> result in a paper. By executing all the scripts, one will be able to do >>> import the raw data, do the analysis and create all graphs used in the >>> paper. >>> >>> Hope this gives you another idea how one can handle long running >>> analysis in R in org, >>> >>> Cheers, >>> >>> Rainer >>> >> >> That is a cool workflow. I especially like the fact that you end up >> with an R package. > > Thanks. Yes - the idea of having a package at the end was one main > reason why I am using this approach. > > >> >> So, I'll try my again. Is there there any chance to see working >> example of this? I'd love to see that. > > Let's say I am working on it. I am working on a project which is using > this workflow and when it is finished, the package will be available as > an electronic appendix to the paper. > > But I will see if I can condense an example and blog it - I'll let you > kow when it is done. > Thanks! Either way, I am really looking forward to this. Regards, Andreas