From mboxrd@z Thu Jan  1 00:00:00 1970
From: John Kitchin <jkitchin@andrew.cmu.edu>
Subject: Re: [Orgmode] Slow speed of week and month views
Date: Sat, 5 Aug 2017 18:17:09 -0400
Message-ID: <CAJ51EToLCm5zDLKu8XeuqEWrLhHZF+OoNkviPSivZbFttzF8=A@mail.gmail.com>
References: <87efsre565.fsf@grothesque.org> <878tiyr9tg.fsf@alphapapa.net>
	<2017-08-05T23-06-00@devnull.Karl-Voit.at>
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary="001a1148f57c8e2cd6055608f92f"
Return-path: <emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org>
Received: from eggs.gnu.org ([2001:4830:134:3::10]:52815)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <johnrkitchin@gmail.com>) id 1de7Nh-0005nD-QD
	for emacs-orgmode@gnu.org; Sat, 05 Aug 2017 18:17:16 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <johnrkitchin@gmail.com>) id 1de7Nf-0000Ro-JS
	for emacs-orgmode@gnu.org; Sat, 05 Aug 2017 18:17:13 -0400
Received: from mail-wm0-x22e.google.com ([2a00:1450:400c:c09::22e]:37512)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.71) (envelope-from <johnrkitchin@gmail.com>)
	id 1de7Nf-0000Rg-8z
	for emacs-orgmode@gnu.org; Sat, 05 Aug 2017 18:17:11 -0400
Received: by mail-wm0-x22e.google.com with SMTP id t201so43236439wmt.0
	for <emacs-orgmode@gnu.org>; Sat, 05 Aug 2017 15:17:11 -0700 (PDT)
In-Reply-To: <2017-08-05T23-06-00@devnull.Karl-Voit.at>
List-Id: "General discussions about Org-mode." <emacs-orgmode.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-orgmode>,
	<mailto:emacs-orgmode-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-orgmode/>
List-Post: <mailto:emacs-orgmode@gnu.org>
List-Help: <mailto:emacs-orgmode-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-orgmode>,
	<mailto:emacs-orgmode-request@gnu.org?subject=subscribe>
Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org
Sender: "Emacs-orgmode"
	<emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org>
To: Karl Voit <news1142@karl-voit.at>
Cc: "emacs-orgmode@gnu.org" <emacs-orgmode@gnu.org>

--001a1148f57c8e2cd6055608f92f
Content-Type: text/plain; charset="UTF-8"

I can think of two possibilities for a future approach (besides a deep dive
on profiling the current elisp to improve the speed there). They both
involve some substantial coding though, and would probably add
dependencies. I am curious what anyone things about these, or if there are
other ideas.

One is to use the new dynamic module capability to write an org parser in
C, or a dedicated agenda function, which would presumably be faster than in
elisp.  This seems hard, and for me would certainly be a multiyear project
I am sure! The downside of this is the need to compile the module. I don't
know how easy it would be to make this work across platforms with the
relatively easy install org-mode currently has. This could have a side
benefit though of a c-lib that could be used by others to expand where
org-mode is used.

The other way that might work is to rely more heavily on a cached version
of the files, perhaps in a different format than elisp, that is faster to
work with. The approach I have explored in this is to index org files into
a sqlite database. The idea then would be to generate the agenda from a sql
query. I use something like this already to "find stuff in orgmode
anywhere". One of the reasons I wrote this is the org-agenda list of files
isn't practical for me because my files are so scattered on my file system.
I had a need to be able to find TODOs in research projects in a pretty wide
range of locations.

The code I use is at
https://github.com/jkitchin/scimax/blob/master/org-db.el, and from one
database I can find headlines, contacts, locations, TODO headlines across
my file system, all the files that contain a particular link, and my own
recent org files. This approach relies on emacsql, and a set of hook
functions to update the database whenever a file is changed. It is not
robust, e.g. the file could be out of sync with the db if it is modified
outside emacs, but this works well enough for me so far. Updated files get
reindexed whenever emacs is idle. It was a compromise on walking the file
system all the time or daily, or trying to use inotify and you can always
run a command to prune/sync all the files any time you want.

sqlite is ok, but with emacsql you cannot put strings in it directly (at
least when I wrote the org-db code), which has limited it for full-text
search so far. Also with text, the db got up to about 0.5 GB in size, and
started slowing down. So it doesn't have text in it for now. It has all the
other limitations of sqlite too, limited support for locking, single
process....

I am moderately motivated to switch from sqlite to MongoDB, but the support
for Mongo in emacs is pretty crummy (I tried writing a few traditional
interfaces, but the performance was not that good, and limited since Mongo
uses bson, and it is just not the same as json!). Why Mongo? Mostly because
the Mongo query language is basically json and easy to generate in Emacs,
unlike sql. Also, it is flexible and easy to adapt to new things, e.g.
indexing src-blocks or tables or whatever org-element you want. (And I want
to use Mongo for something else too ;). Obviously these all add
dependencies, and might not be suitable for the core org-mode distribution.
But I do think it is important to think about ways to scale org-mode while
maintaining compatibility with the core.

The main point of the database was to get a query language, persistence and
good performance. I have also used caches to speed up using bibtex files,
and my org-contacts with reasonable performance. These have been all elisp,
with no additional dependencies. Maybe one could do something similar to
keep an agenda cache that is persistent and updated via hook functions.

Thoughts?

John

-----------------------------------
Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu


On Sat, Aug 5, 2017 at 5:09 PM, Karl Voit <devnull@karl-voit.at> wrote:

> Thanks for the insight!
>
> * Adam Porter <adam@alphapapa.net> wrote:
> >
> > But doing that would, as Carsten said, require rewriting a lot of code.
> > Essentially you'd be creating a new agenda system, so you'd have to
> > reimplement a lot of existing code.  You could do it in parallel, rather
> > than replacing existing code, so you wouldn't have to break the existing
> > agenda system.  But I don't think there's any way to shortcut writing
> > the new system.  I don't think there's any "low hanging fruit."
>
> My daily agenda takes twenty seconds, my weekly approximately a
> minute and generating a monthly agenda is something I can only do
> when I plan to leave the computer for a longer break.
>
> Org-mode does not scale well, I'm afraid.
>
> Repeatedly going through all agenda files does not seem to be a very
> reasonable design choice for the general use-case. I would like to
> see an alternative approach for the future of Org-mode.
>
> --
> get mail|git|SVN|photos|postings|SMS|phonecalls|RSS|CSV|XML into Org-mode:
>        > get Memacs from https://github.com/novoid/Memacs <
> Personal Information Management > http://Karl-Voit.at/tags/pim/
> Emacs-related > http://Karl-Voit.at/tags/emacs/
>
>
>

--001a1148f57c8e2cd6055608f92f
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">I can think of two possibilities for a future approach (be=
sides a deep dive on profiling the current elisp to improve the speed there=
). They both involve some substantial coding though, and would probably add=
 dependencies. I am curious what anyone things about these, or if there are=
 other ideas.<div><br></div><div>One is to use the new dynamic module capab=
ility to write an org parser in C, or a dedicated agenda function, which wo=
uld presumably be faster than in elisp.=C2=A0 This seems hard, and for me w=
ould certainly be a multiyear project I am sure! The downside of this is th=
e need to compile the module. I don&#39;t know how easy it would be to make=
 this work across platforms with the relatively easy install org-mode curre=
ntly has. This could have a side benefit though of a c-lib that could be us=
ed by others to expand where org-mode is used.</div><div><br></div><div>The=
 other way that might work is to rely more heavily on a cached version of t=
he files, perhaps in a different format than elisp, that is faster to work =
with. The approach I have explored in this is to index org files into a sql=
ite database. The idea then would be to generate the agenda from a sql quer=
y. I use something like this already to &quot;find stuff in orgmode anywher=
e&quot;. One of the reasons I wrote this is the org-agenda list of files is=
n&#39;t practical for me because my files are so scattered on my file syste=
m. I had a need to be able to find TODOs in research projects in a pretty w=
ide range of locations.</div><div><br></div><div>The code I use is at =C2=
=A0<a href=3D"https://github.com/jkitchin/scimax/blob/master/org-db.el">htt=
ps://github.com/jkitchin/scimax/blob/master/org-db.el</a>, and from one dat=
abase I can find headlines, contacts, locations, TODO headlines across my f=
ile system, all the files that contain a particular link, and my own recent=
 org files. This approach relies on emacsql, and a set of hook functions to=
 update the database whenever a file is changed. It is not robust, e.g. the=
 file could be out of sync with the db if it is modified outside emacs, but=
 this works well enough for me so far. Updated files get reindexed whenever=
 emacs is idle. It was a compromise on walking the file system all the time=
 or daily, or trying to use inotify and you can always run a command to pru=
ne/sync all the files any time you want.=C2=A0</div><div><br></div><div>sql=
ite is ok, but with emacsql you cannot put strings in it directly (at least=
 when I wrote the org-db code), which has limited it for full-text search s=
o far. Also with text, the db got up to about 0.5 GB in size, and started s=
lowing down. So it doesn&#39;t have text in it for now. It has all the othe=
r limitations of sqlite too, limited support for locking, single process...=
.</div><div><br></div><div>I am moderately motivated to switch from sqlite =
to MongoDB, but the support for Mongo in emacs is pretty crummy (I tried wr=
iting a few traditional interfaces, but the performance was not that good, =
and limited since Mongo uses bson, and it is just not the same as json!). W=
hy Mongo? Mostly because the Mongo query language is basically json and eas=
y to generate in Emacs, unlike sql. Also, it is flexible and easy to adapt =
to new things, e.g. indexing src-blocks or tables or whatever org-element y=
ou want. (And I want to use Mongo for something else too ;). Obviously thes=
e all add dependencies, and might not be suitable for the core org-mode dis=
tribution. But I do think it is important to think about ways to scale org-=
mode while maintaining compatibility with the core.</div><div><br></div><di=
v>The main point of the database was to get a query language, persistence a=
nd good performance. I have also used caches to speed up using bibtex files=
, and my org-contacts with reasonable performance. These have been all elis=
p, with no additional dependencies. Maybe one could do something similar to=
 keep an agenda cache that is persistent and updated via hook functions.=C2=
=A0</div><div><br></div><div>Thoughts?</div></div><div class=3D"gmail_extra=
"><br clear=3D"all"><div><div class=3D"gmail_signature" data-smartmail=3D"g=
mail_signature"><div dir=3D"ltr"><div><div dir=3D"ltr">John<br><br>--------=
---------------------------<br>Professor John Kitchin=C2=A0<br>Doherty Hall=
 A207F<br>Department of Chemical Engineering<br>Carnegie Mellon University<=
br>Pittsburgh, PA 15213<br>412-268-7803</div><div dir=3D"ltr">@johnkitchin<=
br><a href=3D"http://kitchingroup.cheme.cmu.edu" target=3D"_blank">http://k=
itchingroup.cheme.cmu.edu</a><br><br></div></div></div></div></div>
<br><div class=3D"gmail_quote">On Sat, Aug 5, 2017 at 5:09 PM, Karl Voit <s=
pan dir=3D"ltr">&lt;<a href=3D"mailto:devnull@karl-voit.at" target=3D"_blan=
k">devnull@karl-voit.at</a>&gt;</span> wrote:<br><blockquote class=3D"gmail=
_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:=
1ex">Thanks for the insight!<br>
<span class=3D""><br>
* Adam Porter &lt;<a href=3D"mailto:adam@alphapapa.net">adam@alphapapa.net<=
/a>&gt; wrote:<br>
&gt;<br>
&gt; But doing that would, as Carsten said, require rewriting a lot of code=
.<br>
&gt; Essentially you&#39;d be creating a new agenda system, so you&#39;d ha=
ve to<br>
&gt; reimplement a lot of existing code.=C2=A0 You could do it in parallel,=
 rather<br>
&gt; than replacing existing code, so you wouldn&#39;t have to break the ex=
isting<br>
&gt; agenda system.=C2=A0 But I don&#39;t think there&#39;s any way to shor=
tcut writing<br>
&gt; the new system.=C2=A0 I don&#39;t think there&#39;s any &quot;low hang=
ing fruit.&quot;<br>
<br>
</span>My daily agenda takes twenty seconds, my weekly approximately a<br>
minute and generating a monthly agenda is something I can only do<br>
when I plan to leave the computer for a longer break.<br>
<br>
Org-mode does not scale well, I&#39;m afraid.<br>
<br>
Repeatedly going through all agenda files does not seem to be a very<br>
reasonable design choice for the general use-case. I would like to<br>
see an alternative approach for the future of Org-mode.<br>
<span class=3D"HOEnZb"><font color=3D"#888888"><br>
--<br>
get mail|git|SVN|photos|postings|<wbr>SMS|phonecalls|RSS|CSV|XML into Org-m=
ode:<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0&gt; get Memacs from <a href=3D"https://github.c=
om/novoid/Memacs" rel=3D"noreferrer" target=3D"_blank">https://github.com/n=
ovoid/<wbr>Memacs</a> &lt;<br>
Personal Information Management &gt; <a href=3D"http://Karl-Voit.at/tags/pi=
m/" rel=3D"noreferrer" target=3D"_blank">http://Karl-Voit.at/tags/pim/</a><=
br>
Emacs-related &gt; <a href=3D"http://Karl-Voit.at/tags/emacs/" rel=3D"noref=
errer" target=3D"_blank">http://Karl-Voit.at/tags/<wbr>emacs/</a><br>
<br>
<br>
</font></span></blockquote></div><br></div>

--001a1148f57c8e2cd6055608f92f--