From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andreas Leha Subject: Re: Emacs/ESS/org freezes/hangs on big data/ RAM(~256GB) processes when run in org/babel Date: Fri, 19 Jun 2015 23:31:06 +0100 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:46797) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z64pM-0002qI-21 for emacs-orgmode@gnu.org; Fri, 19 Jun 2015 18:32:01 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z64pI-00037B-Ry for emacs-orgmode@gnu.org; Fri, 19 Jun 2015 18:32:00 -0400 Received: from plane.gmane.org ([80.91.229.3]:34573) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z64pI-000375-H6 for emacs-orgmode@gnu.org; Fri, 19 Jun 2015 18:31:56 -0400 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1Z64pA-00040e-CP for emacs-orgmode@gnu.org; Sat, 20 Jun 2015 00:31:48 +0200 Received: from cpc33-cmbg15-2-0-cust4.5-4.cable.virginm.net ([81.102.136.5]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 20 Jun 2015 00:31:48 +0200 Received: from andreas.leha by cpc33-cmbg15-2-0-cust4.5-4.cable.virginm.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 20 Jun 2015 00:31:48 +0200 List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: emacs-orgmode@gnu.org Hi Rainer, Rainer M Krug writes: > "Charles C. Berry" writes: > >> On Wed, 17 Jun 2015, William Denton wrote: >> >>> On 17 June 2015, Xebar Saram wrote: >>> >>>> I do alot of modeling work that involves using huge datasets and run >>>> process intensive R processes (such as complex mixed models, Gamms etc). in >>>> R studio all works well yet when i use the orgmode eval on R code blocks it >>>> works well for small simple process but 90% of the time when dealing with >>>> complex models and bug data (up to 256GB) it will just freeze emacs/ess. >>>> sometimes i can C-c or C-g it and other times i need to physically kill >>>> emacs. >>> >>> I've been having the same problem for a while, but wasn't able to >>> isolate it any more than large data sets, lack of memory, and heavy >>> CPU usage. Sometimes everything hangs and I need to power cycle the >>> computer. :( >>> >> >> And you (both) have `ess-eval-visibly' set to nil, right? >> >> I do statistical genomics, which can be compute intensive. Sometimes >> processes need to run for a while, and I get impatient having to wait. >> >> I wrote (and use) ox-ravel[1] to speed up my write-run-revise cycle in >> org-mode. >> >> Basically, ravel will export Org mode to a format that knitr (and the >> like) can run - turning src blocks into `code chunks'. That allows me >> to set the cache=TRUE chunk option, etc. I run knitr on the exported >> document to initialize objects for long running computations or to >> produce a finished report. >> >> When I start a session, I run knitr in the R session, then all the >> cached objects are loaded in and ready to use. >> >> If I write a src block I know will take a long time to export, I >> export from org mode to update the knitr document and re-knit it to >> refresh the cache. > > I have a similar workflow, only that I use a package like > approach, i.e. I tangle function definitions in a folder ./R, data into > ./data (which makes it possible to share org defined variables with R > running outside org) and scripts, i.e. the things which do a analysis, > import data, ... i.e. which might take long, into a folder ./scripts/. I > then add the usual R package infrastructure files (DESCRIPTION, > NAMESPACE, ...). > Then I have one file tangled into ./scripts/init.R: > > #+begin_src R :tangle ./scripts/init.R > library(devtools) > load_all() > #+end_src > > > and one for the analysis: > > #+begin_src R :tangle ./scripts/myAnalysis.R > ## Do some really time intensive and horribly complicated and important > ## stuff here > save( > fileNames, > bw, > cols, > labels, > fit, > dens, > gof, > gofPerProf, > file = "./cache/results.myAnalysis.rds" > ) > #+end_src > > > Now after tangling, I have my code easily available in a new R session: > > 1) start R in the directory in which the DESCRIPTION file is, > 2) run source("./scripts/init.R") > > and I have all my functions and data available. > > To run a analysis, I do > > 3) source("./scripts/myAnalysis.R") > > and the results are saved in a file fn > > To analyse the data further, I can then simply use > > #+begin_src R :tangle ./scripts/myAnalysis.R > fitSing <- attach("./cache/results.myAnalysis.rds") > #+end_src > > > so they won't interfere with my environment in R. > > I can finally remove the attached environment by doing > > #+begin_src R :tangle ./scripts/myAnalysis.R > detach( > name = attr(fitSing, "name"), > character.only = TRUE > ) > #+end_src > > Through these caching and compartmentalizing, I can easily do some > things outside org and some inside, and easily combine all the data. > > Further advantage: I can actually create the package and send it to > somebody for testing and review and it should run out of the box, as in > the DESCRIPTION file all dependencies are defined. > > I am using this approach at the moment for a paper and which will also > result in a paper. By executing all the scripts, one will be able to do > import the raw data, do the analysis and create all graphs used in the > paper. > > Hope this gives you another idea how one can handle long running > analysis in R in org, > > Cheers, > > Rainer > That is a cool workflow. I especially like the fact that you end up with an R package. So, I'll try my again. Is there there any chance to see working example of this? I'd love to see that. Thanks, Andreas