From mboxrd@z Thu Jan  1 00:00:00 1970
From: John Hendy <jw.hendy@gmail.com>
Subject: Re: Bug? R: Org babel block execution *drastically* slower than in
 ESS session directly
Date: Thu, 1 Nov 2012 09:53:51 -0500
Message-ID: <CA+M2ft_TZphAy=VuyhCuhGbVFvYuaUFpyLQFZZ931LmEAaVe4w@mail.gmail.com>
References: <CA+M2ft-gh3jPc9o9m0STC4Z5fNzJ93AQk2D_UPQiBjK6G37ouA@mail.gmail.com>
	<874nlappb1.fsf@tajo.ucsd.edu>
	<CA+M2ft9tixOALZf_TykmNBQXFXHZWewqq_fAsGTbPUYifEc2VA@mail.gmail.com>
	<878vam1jvh.fsf@tajo.ucsd.edu> <jw.hendy@gmail.com>
	<CA+M2ft9stDuOc5EsiMVMLCbuYsEqvS8nOcwXXUfVuzsg=LOHEQ@mail.gmail.com>
	<3477.1351723988@alphaville>
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary=bcaec51f9555f849db04cd702fb4
Return-path: <emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org>
Received: from eggs.gnu.org ([208.118.235.92]:36202)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jw.hendy@gmail.com>) id 1TTw9c-0002BK-Lu
	for emacs-orgmode@gnu.org; Thu, 01 Nov 2012 10:54:00 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <jw.hendy@gmail.com>) id 1TTw9Z-0006HT-36
	for emacs-orgmode@gnu.org; Thu, 01 Nov 2012 10:53:56 -0400
Received: from mail-ob0-f169.google.com ([209.85.214.169]:33534)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jw.hendy@gmail.com>) id 1TTw9Y-0006GV-S3
	for emacs-orgmode@gnu.org; Thu, 01 Nov 2012 10:53:53 -0400
Received: by mail-ob0-f169.google.com with SMTP id va7so2904640obc.0
	for <emacs-orgmode@gnu.org>; Thu, 01 Nov 2012 07:53:51 -0700 (PDT)
In-Reply-To: <3477.1351723988@alphaville>
List-Id: "General discussions about Org-mode." <emacs-orgmode.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-orgmode>,
	<mailto:emacs-orgmode-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-orgmode>
List-Post: <mailto:emacs-orgmode@gnu.org>
List-Help: <mailto:emacs-orgmode-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-orgmode>,
	<mailto:emacs-orgmode-request@gnu.org?subject=subscribe>
Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org
Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org
To: nicholas.dokos@hp.com
Cc: emacs-orgmode@gnu.org, cberry@tajo.ucsd.edu

--bcaec51f9555f849db04cd702fb4
Content-Type: text/plain; charset=UTF-8

On Wed, Oct 31, 2012 at 5:53 PM, Nick Dokos <nicholas.dokos@hp.com> wrote:

> John Hendy <jw.hendy@gmail.com> wrote:
>
> > On Wed, Oct 31, 2012 at 3:12 PM, <cberry@tajo.ucsd.edu> wrote:
> >
> >     John Hendy <jw.hendy@gmail.com> writes:
> >
> >     > On Wed, Oct 31, 2012 at 11:41 AM,  <span dir="ltr"><mailto:
> cberry@tajo.ucsd.edu></span> wrote:
> >     > John Hendy <mailto:jw.hendy@gmail.com> writes:
> >     >
> >     >> I edited the subject to be more concise/clear.I let orgmode chug
> away
> >     >> on reading in some ~10-30mb csv files for nearly 30min.
> >     >
> >     > [rest deleted]
> >     >
> >     > You need an ECM.I did my best to provide one, other than the file,
> which I offered to provide
> >     if others requested that I upload it somewhere. Since you have done
> so, so have I:
> >     > - https://docs.google.com/open?id=0BzQupOSnvw08WHdabHh5VVczRGM
> >
> >     > Let me know if that doesn&#39;t work. I put it on Google docs and
> sometimes have issues with
> >     the sharing settings...
> >
> >     Not an ECM in my book, but ...
> >
> > What else would you like? I provided:
> > - the config
> > - the data
> > - how to [attempt to] reproduce
> > - the org-mode text
> >
>
> Smaller set of data I'd guess :-) But it does not seem to be the
> size of the data that matters.
>
> >
> >
> >     On my 4 year old MacBook:
> >
> >     ,----
> >     |
> >     | #+PROPERTY: session *R*
> >     |
> >     | #+name: bigcsv
> >     | #+begin_src R
> >     | bigcsv <- Sys.glob("~/Downloads/*.csv")
> >     | #+end_src
> >     |
> >     | #+RESULTS: bigcsv
> >     | : /Users/cberry/Downloads/test-file.csv
> >     |
> >     | #+name: readbig
> >     | #+begin_src R :results output
> >     |   system.time(
> >     |     tmp <- read.csv(bigcsv)
> >     |     )
> >     |
> >     | #+end_src
> >     |
> >     | #+RESULTS: readbig
> >     | :    user  system elapsed
> >     | :   5.679   0.306   6.002
> >     |
> >     `----
> >
> >     About the same as running from ESS.
> >
> > Not sure what to say. Looking for ways to troubleshoot or confirm. Since
> you can't confirm, any
> > suggestions on where I should look for my issue? I can't explain it! All
> I know is that org chugs
> > and chugs and the direct execution in ESS session is lightning fast.
> >
>
> A few things to try in no particular order:
>
>
This was extremely helpful. Thanks for the suggestions.

Here's my attempt at an ECM, though I'm going to keep using the big file
since that's what's actually doing it an I've already uploaded it :)
- Using emacs config here: http://pastebin.com/raw.php?i=iTbRtCE9
- Using this org-mode file:

#+begin_src org

* headline

#+begin_src R :session r :results silent
# file here:
https://docs.google.com/uc?export=download&confirm=no_antivirus&id=0BzQupOSnvw08WHdabHh5VVczRGM
data <- read.csv("path/to/file.csv")
#+end_src

#+end_src org

- Execute block with C-c C-c after downloading and changing path

 o run top (or whatever equivalent is available on your OS) and see
>   whether the CPU (or one of the CPUs) gets pegged at 100% utilization
>   and stays there. If yes, that's an indication of an infinite loop
>   somewhere.
>
>
- quit any other instances of emacs/R
- start `top` in terminal
- execute block
- Use '<' '>' to sort back and forth between cpu and ram

Observations
- R is at 80-100% cpu for about 5sec
- Then emacs shifts to fairly constant ~100% cpu usage
- After about a minute, the minibuffer expands to ~1/3 of the window height
and fills with the csv data
- Finished after ~5min total time
- So, R took about 5sec, emacs took another 5min to finish


> o run vmstat (or equivalent) and see if any of the counters are out of
> whack.
>   That requires some experience though.
>
>
I'll skip for now; no experience with that.


> o use elp-instrument-package to instrument org and run the test, getting
>   a profile. I'm not sure whether the results will be useful, since you
>   are going to interrupt the test when you run out of patience, but it
>   cannot hurt and it might tell you something useful.
>
> o run your ECM on a different computer/OS/emacs installation. Being able
>   to compare things side by side is often very useful.
>
> o Halve your file and run the test on each half (but that's probably not
>   the problem given Chuck's results).
>
> o Reinstall org from scratch - you might have some corruption in one of
>   the compiled files that's causing it to go into an infinite loop.
>
>
- `cd ~/.elisp`
- `sudo rm -r org.git`
- `git clone http://git://orgmode.org/org-mode.git org.git`
- cd org.git && make clean && make && make doc
- Quit previous emacs instance; reopen
- Remove (require 'org-install) per prompt; restart again
- Repeat `top` experiment

Results:
- Didn't even see R flash on the screen this time; emacs just jumped to 100%
- After 1min 10sec, the minibuffer filled with data
- At that point I quit, as I think it will be a repeat of the above


> o Turn on debug-on-quit, start your test, wait a bit and then interrupt
>   it. Check the backtrace.  Do it again and check whether the backtrace
>   looks the same. That's often an indication of an infinite loop
>   (inferring an infinite loop from a two element sample is statistically
>   suspect of course, but surprisingly effective nevertheless). The point
>   here is that the infinite loop is in emacs and the backtrace tells you
>   something about the parties involved.
>
>
- =M-x customize-variable RET debug-on-quit RET=
- Toggled to on; saved for current session
- Waited about a min (till the minibuffer filled), then did C-g

Don't have experience with debugging. It brings me to a *Backtrace* buffer,
which is empty except for the line "Debugger entered--Lisp error: (quit)"

Thanks for the suggestions and help. That was quite above and beyond. Much
appreciated.


Best regards,
John

These are obviously not independent and the results of one experiment will
> have to guide you in what you try next.
>
> Good luck,
> Nick
>

--bcaec51f9555f849db04cd702fb4
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On Wed, Oct 31, 2012 at 5:53 PM, Nick Dokos <span dir=3D"ltr">&lt;<a href=
=3D"mailto:nicholas.dokos@hp.com" target=3D"_blank">nicholas.dokos@hp.com</=
a>&gt;</span> wrote:<br><div class=3D"gmail_quote"><blockquote class=3D"gma=
il_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-lef=
t:1ex">

<div>John Hendy &lt;<a href=3D"mailto:jw.hendy@gmail.com" target=3D"_blank"=
>jw.hendy@gmail.com</a>&gt; wrote:<br>
<br>
&gt; On Wed, Oct 31, 2012 at 3:12 PM, &lt;<a href=3D"mailto:cberry@tajo.ucs=
d.edu" target=3D"_blank">cberry@tajo.ucsd.edu</a>&gt; wrote:<br>
&gt;<br>
&gt; =C2=A0 =C2=A0 John Hendy &lt;<a href=3D"mailto:jw.hendy@gmail.com" tar=
get=3D"_blank">jw.hendy@gmail.com</a>&gt; writes:<br>
&gt;<br>
&gt; =C2=A0 =C2=A0 &gt; On Wed, Oct 31, 2012 at 11:41 AM, =C2=A0&lt;span di=
r=3D&quot;ltr&quot;&gt;&lt;mailto:<a href=3D"mailto:cberry@tajo.ucsd.edu" t=
arget=3D"_blank">cberry@tajo.ucsd.edu</a>&gt;&lt;/span&gt; wrote:<br>
&gt; =C2=A0 =C2=A0 &gt; John Hendy &lt;mailto:<a href=3D"mailto:jw.hendy@gm=
ail.com" target=3D"_blank">jw.hendy@gmail.com</a>&gt; writes:<br>
&gt; =C2=A0 =C2=A0 &gt;<br>
&gt; =C2=A0 =C2=A0 &gt;&gt; I edited the subject to be more concise/clear.I=
 let orgmode chug away<br>
&gt; =C2=A0 =C2=A0 &gt;&gt; on reading in some ~10-30mb csv files for nearl=
y 30min.<br>
&gt; =C2=A0 =C2=A0 &gt;<br>
&gt; =C2=A0 =C2=A0 &gt; [rest deleted]<br>
&gt; =C2=A0 =C2=A0 &gt;<br>
&gt; =C2=A0 =C2=A0 &gt; You need an ECM.I did my best to provide one, other=
 than the file, which I offered to provide<br>
&gt; =C2=A0 =C2=A0 if others requested that I upload it somewhere. Since yo=
u have done so, so have I:<br>
&gt; =C2=A0 =C2=A0 &gt; -=C2=A0<a href=3D"https://docs.google.com/open?id=
=3D0BzQupOSnvw08WHdabHh5VVczRGM" target=3D"_blank">https://docs.google.com/=
open?id=3D0BzQupOSnvw08WHdabHh5VVczRGM</a><br>
&gt;<br>
&gt; =C2=A0 =C2=A0 &gt; Let me know if that doesn&amp;#39;t work. I put it =
on Google docs and sometimes have issues with<br>
&gt; =C2=A0 =C2=A0 the sharing settings...<br>
&gt;<br>
&gt; =C2=A0 =C2=A0 Not an ECM in my book, but ...<br>
&gt;<br>
&gt; What else would you like? I provided:<br>
&gt; - the config<br>
&gt; - the data<br>
&gt; - how to [attempt to] reproduce<br>
&gt; - the org-mode text<br>
&gt;<br>
<br>
</div>Smaller set of data I&#39;d guess :-) But it does not seem to be the<=
br>
size of the data that matters.<br>
<div><br>
&gt; =C2=A0<br>
&gt;<br>
&gt; =C2=A0 =C2=A0 On my 4 year old MacBook:<br>
&gt;<br>
&gt; =C2=A0 =C2=A0 ,----<br>
&gt; =C2=A0 =C2=A0 |<br>
&gt; =C2=A0 =C2=A0 | #+PROPERTY: session *R*<br>
&gt; =C2=A0 =C2=A0 |<br>
&gt; =C2=A0 =C2=A0 | #+name: bigcsv<br>
&gt; =C2=A0 =C2=A0 | #+begin_src R<br>
&gt; =C2=A0 =C2=A0 | bigcsv &lt;- Sys.glob(&quot;~/Downloads/*.csv&quot;)<b=
r>
&gt; =C2=A0 =C2=A0 | #+end_src<br>
&gt; =C2=A0 =C2=A0 |<br>
&gt; =C2=A0 =C2=A0 | #+RESULTS: bigcsv<br>
&gt; =C2=A0 =C2=A0 | : /Users/cberry/Downloads/test-file.csv<br>
&gt; =C2=A0 =C2=A0 |<br>
&gt; =C2=A0 =C2=A0 | #+name: readbig<br>
&gt; =C2=A0 =C2=A0 | #+begin_src R :results output<br>
&gt; =C2=A0 =C2=A0 | =C2=A0 system.time(<br>
&gt; =C2=A0 =C2=A0 | =C2=A0 =C2=A0 tmp &lt;- read.csv(bigcsv)<br>
&gt; =C2=A0 =C2=A0 | =C2=A0 =C2=A0 )<br>
&gt; =C2=A0 =C2=A0 |<br>
&gt; =C2=A0 =C2=A0 | #+end_src<br>
&gt; =C2=A0 =C2=A0 |<br>
&gt; =C2=A0 =C2=A0 | #+RESULTS: readbig<br>
&gt; =C2=A0 =C2=A0 | : =C2=A0 =C2=A0user =C2=A0system elapsed<br>
&gt; =C2=A0 =C2=A0 | : =C2=A0 5.679 =C2=A0 0.306 =C2=A0 6.002<br>
&gt; =C2=A0 =C2=A0 |<br>
&gt; =C2=A0 =C2=A0 `----<br>
&gt;<br>
&gt; =C2=A0 =C2=A0 About the same as running from ESS.<br>
&gt;<br>
&gt; Not sure what to say. Looking for ways to troubleshoot or confirm. Sin=
ce you can&#39;t confirm, any<br>
&gt; suggestions on where I should look for my issue? I can&#39;t explain i=
t! All I know is that org chugs<br>
&gt; and chugs and the direct execution in ESS session is lightning fast.<b=
r>
&gt;<br>
<br>
</div>A few things to try in no particular order:<br>
<br></blockquote><div><br></div><div>This was extremely helpful. Thanks for=
 the suggestions.</div><div><br></div><div>Here&#39;s my attempt at an ECM,=
 though I&#39;m going to keep using the big file since that&#39;s what&#39;=
s actually doing it an I&#39;ve already uploaded it :)</div>

<div>- Using emacs config here:=C2=A0<a href=3D"http://pastebin.com/raw.php=
?i=3DiTbRtCE9" target=3D"_blank">http://pastebin.com/raw.php?i=3DiTbRtCE9</=
a></div><div>- Using this org-mode file:=C2=A0</div><div><br></div><div>#+b=
egin_src org</div>
<div><br></div>
<div><div>* headline</div><div><br></div><div>#+begin_src R :session r :res=
ults silent</div><div># file here:=C2=A0<a href=3D"https://docs.google.com/=
uc?export=3Ddownload&amp;confirm=3Dno_antivirus&amp;id=3D0BzQupOSnvw08WHdab=
Hh5VVczRGM" target=3D"_blank">https://docs.google.com/uc?export=3Ddownload&=
amp;confirm=3Dno_antivirus&amp;id=3D0BzQupOSnvw08WHdabHh5VVczRGM</a></div>

<div>data &lt;- read.csv(&quot;path/to/file.csv&quot;)</div><div>#+end_src<=
/div></div><div><br></div><div>#+end_src org</div><div>=C2=A0</div><div>- E=
xecute block with C-c C-c after downloading and changing path</div><div><br=
>

</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-l=
eft:1px #ccc solid;padding-left:1ex">
o run top (or whatever equivalent is available on your OS) and see<br>
=C2=A0 whether the CPU (or one of the CPUs) gets pegged at 100% utilization=
<br>
=C2=A0 and stays there. If yes, that&#39;s an indication of an infinite loo=
p<br>
=C2=A0 somewhere.<br>
<br></blockquote><div><br></div><div>- quit any other instances of emacs/R<=
/div><div>- start `top` in terminal</div><div>- execute block</div><div>- U=
se &#39;&lt;&#39; &#39;&gt;&#39; to sort back and forth between cpu and ram=
</div>

<div><br></div><div>Observations</div><div>- R is at 80-100% cpu for about =
5sec</div><div>- Then emacs shifts to fairly constant ~100% cpu usage=C2=A0=
</div><div>- After about a minute, the minibuffer expands to ~1/3 of the wi=
ndow height and fills with the csv data</div>

<div>- Finished after ~5min total time</div><div>- So, R took about 5sec, e=
macs took another 5min to finish</div><div>=C2=A0</div><blockquote class=3D=
"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding=
-left:1ex">


o run vmstat (or equivalent) and see if any of the counters are out of whac=
k.<br>
=C2=A0 That requires some experience though.<br>
<br></blockquote><div><br></div><div>I&#39;ll skip for now; no experience w=
ith that.</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"=
margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
o use elp-instrument-package to instrument org and run the test, getting<br=
>
=C2=A0 a profile. I&#39;m not sure whether the results will be useful, sinc=
e you<br>
=C2=A0 are going to interrupt the test when you run out of patience, but it=
<br>
=C2=A0 cannot hurt and it might tell you something useful.<br>
<br>
o run your ECM on a different computer/OS/emacs installation. Being able<br=
>
=C2=A0 to compare things side by side is often very useful.<br>
<br>
o Halve your file and run the test on each half (but that&#39;s probably no=
t<br>
=C2=A0 the problem given Chuck&#39;s results).<br>
<br>
o Reinstall org from scratch - you might have some corruption in one of<br>
=C2=A0 the compiled files that&#39;s causing it to go into an infinite loop=
.<br>
<br></blockquote><div><br></div><div>- `cd ~/.elisp`</div><div>- `sudo rm -=
r org.git`</div><div>- `git clone http://git://<a href=3D"http://orgmode.or=
g/org-mode.git" target=3D"_blank">orgmode.org/org-mode.git</a> org.git`</di=
v>
<div>- cd org.git &amp;&amp; make clean &amp;&amp; make &amp;&amp; make doc=
</div>
<div>- Quit previous emacs instance; reopen</div><div>- Remove (require &#3=
9;org-install) per prompt; restart again</div><div>- Repeat `top` experimen=
t</div><div><br></div><div>Results:</div><div>- Didn&#39;t even see R flash=
 on the screen this time; emacs just jumped to 100%</div>
<div>- After 1min 10sec, the minibuffer filled with data</div><div>- At tha=
t point I quit, as I think it will be a repeat of the above</div><div>=C2=
=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;borde=
r-left:1px #ccc solid;padding-left:1ex">

o Turn on debug-on-quit, start your test, wait a bit and then interrupt<br>
=C2=A0 it. Check the backtrace. =C2=A0Do it again and check whether the bac=
ktrace<br>
=C2=A0 looks the same. That&#39;s often an indication of an infinite loop<b=
r>
=C2=A0 (inferring an infinite loop from a two element sample is statistical=
ly<br>
=C2=A0 suspect of course, but surprisingly effective nevertheless). The poi=
nt<br>
=C2=A0 here is that the infinite loop is in emacs and the backtrace tells y=
ou<br>
=C2=A0 something about the parties involved.<br>
<br></blockquote><div><br></div><div>- =3DM-x customize-variable RET debug-=
on-quit RET=3D</div><div>- Toggled to on; saved for current session</div><d=
iv>- Waited about a min (till the minibuffer filled), then did C-g</div><di=
v>
<br></div><div>Don&#39;t have experience with debugging. It brings me to a =
*Backtrace* buffer, which is empty except for the line &quot;Debugger enter=
ed--Lisp error: (quit)&quot;</div><div>=C2=A0</div><div>Thanks for the sugg=
estions and help. That was quite above and beyond. Much appreciated.</div>
<div><br></div><div><br></div><div>Best regards,</div><div>John</div><div><=
br></div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;borde=
r-left:1px #ccc solid;padding-left:1ex">
These are obviously not independent and the results of one experiment will<=
br>
have to guide you in what you try next.<br>
<br>
Good luck,<br>
Nick<br>
</blockquote></div><br>

--bcaec51f9555f849db04cd702fb4--