From mboxrd@z Thu Jan  1 00:00:00 1970
From: Andreas Leha <andreas.leha@med.uni-goettingen.de>
Subject: Re: Emacs/ESS/org freezes/hangs on big data/ RAM(~256GB) processes
 when run in org/babel
Date: Sat, 20 Jun 2015 22:20:14 +0100
Message-ID: <olua8vu84c1.fsf@med.uni-goettingen.de>
References: <CAOQHXPronGWf70rp7uT_2OjhpBKXvn1m=O2R-rSzk30s_yN4BQ@mail.gmail.com>
	<alpine.DEB.2.02.1506171518001.17579@shell.miskatonic.org>
	<alpine.OSX.2.11.1506171920250.537@charles-berrys-macbook.local>
	<m2lhfhyz3c.fsf@krugs.de> <oluegl78h5h.fsf@med.uni-goettingen.de>
	<m24mm2xvvv.fsf@krugs.de>
Mime-Version: 1.0
Content-Type: text/plain
Return-path: <emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org>
Received: from eggs.gnu.org ([2001:4830:134:3::10]:49842)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <geo-emacs-orgmode@m.gmane.org>) id 1Z6QBS-0001YG-37
	for emacs-orgmode@gnu.org; Sat, 20 Jun 2015 17:20:15 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <geo-emacs-orgmode@m.gmane.org>) id 1Z6QBN-0001c7-2L
	for emacs-orgmode@gnu.org; Sat, 20 Jun 2015 17:20:14 -0400
Received: from plane.gmane.org ([80.91.229.3]:46702)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <geo-emacs-orgmode@m.gmane.org>) id 1Z6QBM-0001Z5-Nu
	for emacs-orgmode@gnu.org; Sat, 20 Jun 2015 17:20:08 -0400
Received: from list by plane.gmane.org with local (Exim 4.69)
	(envelope-from <geo-emacs-orgmode@m.gmane.org>) id 1Z6QBK-0007q9-Iv
	for emacs-orgmode@gnu.org; Sat, 20 Jun 2015 23:20:06 +0200
Received: from cpc33-cmbg15-2-0-cust4.5-4.cable.virginm.net ([81.102.136.5])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <emacs-orgmode@gnu.org>; Sat, 20 Jun 2015 23:20:06 +0200
Received: from andreas.leha by cpc33-cmbg15-2-0-cust4.5-4.cable.virginm.net
	with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00
	for <emacs-orgmode@gnu.org>; Sat, 20 Jun 2015 23:20:06 +0200
List-Id: "General discussions about Org-mode." <emacs-orgmode.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-orgmode>,
	<mailto:emacs-orgmode-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-orgmode>
List-Post: <mailto:emacs-orgmode@gnu.org>
List-Help: <mailto:emacs-orgmode-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-orgmode>,
	<mailto:emacs-orgmode-request@gnu.org?subject=subscribe>
Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org
Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org
To: emacs-orgmode@gnu.org

Rainer M Krug <Rainer@krugs.de> writes:
> Andreas Leha <andreas.leha@med.uni-goettingen.de> writes:
>
>> Hi Rainer,
>
> Hi Andreas,
>
>>
>> Rainer M Krug <Rainer@krugs.de> writes:
>>> "Charles C. Berry" <ccberry@ucsd.edu> writes:
>>>
>>>> On Wed, 17 Jun 2015, William Denton wrote:
>>>>
>>>>> On 17 June 2015, Xebar Saram wrote:
>>>>>
>>>>>> I do alot of modeling work that involves using huge datasets and run
>>>>>> process intensive R processes (such as complex mixed models, Gamms etc). in
>>>>>> R studio all works well yet when i use the orgmode eval on R code blocks it
>>>>>> works well for small simple process but 90% of the time when dealing with
>>>>>> complex models and bug data (up to 256GB) it will just freeze emacs/ess.
>>>>>> sometimes i can C-c or C-g it and other times i need to physically kill
>>>>>> emacs.
>>>>>
>>>>> I've been having the same problem for a while, but wasn't able to
>>>>> isolate it any more than large data sets, lack of memory, and heavy
>>>>> CPU usage. Sometimes everything hangs and I need to power cycle the
>>>>> computer. :(
>>>>>
>>>>
>>>> And you (both) have `ess-eval-visibly' set to nil, right?
>>>>
>>>> I do statistical genomics, which can be compute intensive. Sometimes
>>>> processes need to run for a while, and I get impatient having to wait.
>>>>
>>>> I wrote (and use) ox-ravel[1] to speed up my write-run-revise cycle in
>>>> org-mode.
>>>>
>>>> Basically, ravel will export Org mode to a format that knitr (and the
>>>> like) can run - turning src blocks into `code chunks'. That allows me
>>>> to set the cache=TRUE chunk option, etc. I run knitr on the exported
>>>> document to initialize objects for long running computations or to
>>>> produce a finished report.
>>>>
>>>> When I start a session, I run knitr in the R session, then all the
>>>> cached objects are loaded in and ready to use.
>>>>
>>>> If I write a src block I know will take a long time to export, I
>>>> export from org mode to update the knitr document and re-knit it to
>>>> refresh the cache.
>>>
>>> I have a similar workflow, only that I use a package like
>>> approach, i.e. I tangle function definitions in a folder ./R, data into
>>> ./data (which makes it possible to share org defined variables with R
>>> running outside org) and scripts, i.e. the things which do a analysis,
>>> import data, ... i.e. which might take long, into a folder ./scripts/. I
>>> then add the usual R package infrastructure files (DESCRIPTION,
>>> NAMESPACE, ...).
>>> Then I have one file tangled into ./scripts/init.R:
>>>
>>> #+begin_src R :tangle ./scripts/init.R  
>>> library(devtools)
>>> load_all()
>>> #+end_src
>>>
>>>
>>> and one for the analysis:
>>>
>>> #+begin_src R :tangle ./scripts/myAnalysis.R  
>>> ## Do some really time intensive and horribly complicated and important
>>> ## stuff here
>>> save(
>>>     fileNames,
>>>     bw,
>>>     cols,
>>>     labels,
>>>     fit,
>>>     dens,
>>>     gof,
>>>     gofPerProf,
>>>     file = "./cache/results.myAnalysis.rds"
>>> )
>>> #+end_src
>>>
>>>
>>> Now after tangling, I have my code easily available in a new R session:
>>>
>>> 1) start R in the directory in which the DESCRIPTION file is, 
>>> 2) run source("./scripts/init.R")
>>>
>>> and I have all my functions and data available.
>>>
>>> To run a analysis, I do
>>>
>>> 3) source("./scripts/myAnalysis.R")
>>>
>>> and the results are saved in a file fn
>>>
>>> To analyse the data further, I can then simply use
>>>
>>> #+begin_src R :tangle ./scripts/myAnalysis.R
>>> fitSing <- attach("./cache/results.myAnalysis.rds")
>>> #+end_src
>>>
>>>
>>> so they won't interfere with my environment in R.
>>>
>>> I can finally remove the attached environment by doing
>>>
>>> #+begin_src R :tangle ./scripts/myAnalysis.R  
>>> detach(
>>>     name = attr(fitSing, "name"),
>>>     character.only = TRUE
>>> )
>>> #+end_src
>>>
>>> Through these caching and compartmentalizing, I can easily do some
>>> things outside org and some inside, and easily combine all the data.
>>>
>>> Further advantage: I can actually create the package and send it to
>>> somebody for testing and review and it should run out of the box, as in
>>> the DESCRIPTION file all dependencies are defined.
>>>
>>> I am using this approach at the moment for a paper and which will also
>>> result in a paper. By executing all the scripts, one will be able to do
>>> import the raw data, do the analysis and create all graphs used in the
>>> paper.
>>>
>>> Hope this gives you another idea how one can handle long running
>>> analysis in R in org,
>>>
>>> Cheers,
>>>
>>> Rainer
>>>
>>
>> That is a cool workflow.  I especially like the fact that you end up
>> with an R package.
>
> Thanks. Yes - the idea of having a package at the end was one main
> reason why I am using this approach.
>
>
>>
>> So, I'll try my again.   Is there there any chance to see working
>> example of this?  I'd love to see that.
>
> Let's say I am working on it. I am working on a project which is using
> this workflow and when it is finished, the package will be available as
> an electronic appendix to the paper.
>
> But I will see if I can condense an example and blog it - I'll let you
> kow when it is done.
>

Thanks!  Either way, I am really looking forward to this.

Regards,
Andreas