From: Derek Feichtinger <derek.feichtinger@psi.ch>
To: John Kitchin <jkitchin@andrew.cmu.edu>
Cc: Tim Cross <theophilusx@gmail.com>, emacs-orgmode@gnu.org
Subject: Re: state of the art in org-mode tables e.g. join, etc
Date: Mon, 22 Feb 2021 09:27:50 +0100 [thread overview]
Message-ID: <87r1l8tsl5.fsf@psi.ch> (raw)
In-Reply-To: <CAJ51ETrQzgP3DGespaGx0Yj5gLSDciVtV-9G4gM33-xZdHxNug@mail.gmail.com>
Hi John,
I invested time some years ago in preparing babel examples, and a lot of
the description went into using tables. The most detailed documents I
had for elisp and python.
In order to be productive, e.g. for producing all kinds of scientific
graphs, but also for doing the finances and planning for our scientific
computing section I ended up the same as you with mostly going to python
and leveraging Pandas. I think all of us end up using ":colnames no" as
the most convenient solution.
https://github.com/dfeich/org-babel-examples/blob/master/python3/python3-babel.org
(especially look at the Pandas section 10)
In that file I also tangle a python library "orgbabelhelper" that is
available in Conda and PyPi. I mainly use that to work with my tables.
Best regards
Derek
--
Paul Scherrer Institut
Dr. Derek Feichtinger Phone: +41 56 310 47 33
Group Head HPC and Emerging Technologies Email: derek.feichtinger@psi.ch
Building/Room No. OHSA/D17
Forschungsstrasse 111
CH-5232 Villigen PSI
On Sun, Feb 21 2021, John Kitchin <jkitchin@andrew.cmu.edu> wrote:
> For fun, here is the sqlite equivalent of the Pandas example using the same
> tables as before
>
>
> ** aggregation example
>
> Examples from https://github.com/tbanel/orgaggregate
>
>
> #+NAME: original
> | Day | Color | Level | Quantity |
> |-----------+-------+-------+----------|
> | Monday | Red | 30 | 11 |
> | Monday | Blue | 25 | 3 |
> | Tuesday | Red | 51 | 12 |
> | Tuesday | Red | 45 | 15 |
> | Tuesday | Blue | 33 | 18 |
> | Wednesday | Red | 27 | 23 |
> | Wednesday | Blue | 12 | 16 |
> | Wednesday | Blue | 15 | 15 |
> | Thursday | Red | 39 | 24 |
> | Thursday | Red | 41 | 29 |
> | Thursday | Red | 49 | 30 |
> | Friday | Blue | 7 | 5 |
> | Friday | Blue | 6 | 8 |
> | Friday | Blue | 11 | 9 |
>
>
> #+begin_src sqlite :db ":memory:" :var orgtable=original :colnames yes
> drop table if exists testtable;
> create table testtable(Day str, Color str, Level int, Quantity int);
> .mode csv testtable
> .import $orgtable testtable
> select Color, count(*) from testtable group by Color;
> #+end_src
>
> #+RESULTS:
> | Color | count(*) |
> |-------+----------|
> | Blue | 7 |
> | Red | 7 |
>
> ** join example
>
> Example from https://github.com/tbanel/orgtbljoin
>
> #+name: nutrition
> | type | Fiber | Sugar | Protein | Carb |
> |----------+-------+-------+---------+------|
> | eggplant | 2.5 | 3.2 | 0.8 | 8.6 |
> | tomatoe | 0.6 | 2.1 | 0.8 | 3.4 |
> | onion | 1.3 | 4.4 | 1.3 | 9.0 |
> | egg | 0 | 18.3 | 31.9 | 18.3 |
> | rice | 0.2 | 0 | 1.5 | 16.0 |
> | bread | 0.7 | 0.7 | 3.3 | 16.0 |
> | orange | 3.1 | 11.9 | 1.3 | 17.6 |
> | banana | 2.1 | 9.9 | 0.9 | 18.5 |
> | tofu | 0.7 | 0.5 | 6.6 | 1.4 |
> | nut | 2.6 | 1.3 | 4.9 | 7.2 |
> | corn | 4.7 | 1.8 | 2.8 | 21.3 |
>
>
> #+name: recipe
> | type | quty |
> |----------+------|
> | onion | 70 |
> | tomatoe | 120 |
> | eggplant | 300 |
> | tofu | 100 |
>
>
> #+begin_src sqlite :db ":memory:" :var nut=nutrition rec=recipe :colnames
> yes
> drop table if exists nutrition;
> drop table if exists recipe;
> create table nutrition(type str, Fiber float, Sugar float, Protein float,
> Carb float);
> create table recipe(type str, quty int);
>
> .mode csv nutrition
> .import $nut nutrition
>
> .mode csv recipe
> .import $rec recipe
>
> select * from recipe, nutrition where recipe.type=nutrition.type;
> #+end_src
>
> #+RESULTS:
> | type | quty | type | Fiber | Sugar | Protein | Carb |
> |----------+------+----------+-------+-------+---------+------|
> | onion | 70 | onion | 1.3 | 4.4 | 1.3 | 9.0 |
> | tomatoe | 120 | tomatoe | 0.6 | 2.1 | 0.8 | 3.4 |
> | eggplant | 300 | eggplant | 2.5 | 3.2 | 0.8 | 8.6 |
> | tofu | 100 | tofu | 0.7 | 0.5 | 6.6 | 1.4 |
>
>
> John
>
> -----------------------------------
> Professor John Kitchin
> Doherty Hall A207F
> Department of Chemical Engineering
> Carnegie Mellon University
> Pittsburgh, PA 15213
> 412-268-7803
> @johnkitchin
> http://kitchingroup.cheme.cmu.edu
>
>
>
> On Sun, Feb 21, 2021 at 10:03 AM John Kitchin <jkitchin@andrew.cmu.edu>
> wrote:
>
>> Thanks Tim and Greg. I had mostly come to the same conclusions that it is
>> probably best to outsource this. I worked out some examples from
>> the orgtbljoin and orgaggregate packages with Pandas below, in case anyone
>> is interested in seeing how it works. A key point is using the ":colnames
>> no" header args to get the column names for Pandas. It seems like a pretty
>> good approach.
>>
>> * org-mode tables with Pandas
>> ** Aggregating from a table
>>
>> Examples from https://github.com/tbanel/orgaggregate
>>
>>
>> #+NAME: original
>> | Day | Color | Level | Quantity |
>> |-----------+-------+-------+----------|
>> | Monday | Red | 30 | 11 |
>> | Monday | Blue | 25 | 3 |
>> | Tuesday | Red | 51 | 12 |
>> | Tuesday | Red | 45 | 15 |
>> | Tuesday | Blue | 33 | 18 |
>> | Wednesday | Red | 27 | 23 |
>> | Wednesday | Blue | 12 | 16 |
>> | Wednesday | Blue | 15 | 15 |
>> | Thursday | Red | 39 | 24 |
>> | Thursday | Red | 41 | 29 |
>> | Thursday | Red | 49 | 30 |
>> | Friday | Blue | 7 | 5 |
>> | Friday | Blue | 6 | 8 |
>> | Friday | Blue | 11 | 9 |
>>
>>
>> #+BEGIN_SRC ipython :var data=original :colnames no
>> import pandas as pd
>>
>> pd.DataFrame(data[1:], columns=data[0]).groupby('Color').size()
>> #+END_SRC
>>
>> #+RESULTS:
>> :results:
>> # Out [1]:
>> # text/plain
>> : Color
>> : Blue 7
>> : Red 7
>> : dtype: int64
>> :end:
>>
>> The categorical stuff here is just to get the days sorted the same way as
>> the example. It is otherwise not needed. I feel there should be a more
>> clever way to do this, but didn't think of it.
>>
>> #+BEGIN_SRC ipython :var data=original :colnames no
>> df = pd.DataFrame(data[1:], columns=data[0])
>> days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday',
>> 'Saturday', 'Sunday']
>> df['Day'] = pd.Categorical(df['Day'], categories=days, ordered=True)
>>
>> (df
>> .groupby('Day')
>> .agg({'Level': 'mean',
>> 'Quantity': 'sum'})
>> .sort_values('Day'))
>> #+END_SRC
>>
>> #+RESULTS:
>> :results:
>> # Out [2]:
>> # text/plain
>> : Level Quantity
>> : Day
>> : Monday 27.5 14
>> : Tuesday 43.0 45
>> : Wednesday 18.0 54
>> : Thursday 43.0 83
>> : Friday 8.0 22
>> : Saturday NaN 0
>> : Sunday NaN 0
>>
>>
>> [[file:/var/folders/3q/ht_2mtk52hl7ydxrcr87z2gr0000gn/T/ob-ipython-htmlMnDA9a.html]]
>> :end:
>>
>> ** Joining tables
>>
>> Example from https://github.com/tbanel/orgtbljoin
>>
>> #+name: nutrition
>> | type | Fiber | Sugar | Protein | Carb |
>> |----------+-------+-------+---------+------|
>> | eggplant | 2.5 | 3.2 | 0.8 | 8.6 |
>> | tomatoe | 0.6 | 2.1 | 0.8 | 3.4 |
>> | onion | 1.3 | 4.4 | 1.3 | 9.0 |
>> | egg | 0 | 18.3 | 31.9 | 18.3 |
>> | rice | 0.2 | 0 | 1.5 | 16.0 |
>> | bread | 0.7 | 0.7 | 3.3 | 16.0 |
>> | orange | 3.1 | 11.9 | 1.3 | 17.6 |
>> | banana | 2.1 | 9.9 | 0.9 | 18.5 |
>> | tofu | 0.7 | 0.5 | 6.6 | 1.4 |
>> | nut | 2.6 | 1.3 | 4.9 | 7.2 |
>> | corn | 4.7 | 1.8 | 2.8 | 21.3 |
>>
>>
>> #+name: recipe
>> | type | quty |
>> |----------+------|
>> | onion | 70 |
>> | tomatoe | 120 |
>> | eggplant | 300 |
>> | tofu | 100 |
>>
>>
>> #+BEGIN_SRC ipython :var nut=nutrition recipe=recipe :colnames no
>> nutrition = pd.DataFrame(nut[1:], columns=nut[0])
>> rec = pd.DataFrame(recipe[1:], columns=recipe[0])
>>
>> pd.merge(rec, nutrition, on='type')
>> #+END_SRC
>>
>> #+RESULTS:
>> :results:
>> # Out [4]:
>> # text/plain
>> : type quty Fiber Sugar Protein Carb
>> : 0 onion 70 1.3 4.4 1.3 9.0
>> : 1 tomatoe 120 0.6 2.1 0.8 3.4
>> : 2 eggplant 300 2.5 3.2 0.8 8.6
>> : 3 tofu 100 0.7 0.5 6.6 1.4
>> :end:
>>
>>
>> John
>>
>> -----------------------------------
>> Professor John Kitchin
>> Doherty Hall A207F
>> Department of Chemical Engineering
>> Carnegie Mellon University
>> Pittsburgh, PA 15213
>> 412-268-7803
>> @johnkitchin
>> http://kitchingroup.cheme.cmu.edu
>>
>>
>>
>> On Sun, Feb 21, 2021 at 1:54 AM Tim Cross <theophilusx@gmail.com> wrote:
>>
>>>
>>> Greg Minshall <minshall@umich.edu> writes:
>>>
>>> > John,
>>> >
>>> >> Is there a state of the art in using org-tables as little databases
>>> >> with joins and stuff?
>>> >
>>> > i have to admit i do all that with an R code source block. (the dplyr
>>> > package has the relevant joins, e.g. dplyr::inner_join().) and, in R,
>>> > ":colnames yes" as a header argument gives you header lines on results.
>>> > (maybe that's ?now? for "all" languages?)
>>> >
>>>
>>> For really complex joins and ad hoc queries, I would do similar or put
>>> the data into sqlite. For more simple ones, I just define a table which
>>> uses table formulas to extract the values from the other tables - the
>>> downside being the tables need to have the same data ordering or the
>>> formulas need to be somewhat complex. Provided the tables have the same
>>> number of records in the same order, table formulas are usually fairly
>>> easy.
>>>
>>> I did think about writing some elisp functions to use in my table
>>> formulas to make things easier, but then decided I was just re-inventing
>>> and well defined database solution and figured when I need it, just use
>>> sqlite. However, it has been a while since I needed this level of
>>> complexity, so perhaps things have moved on and there are better ways
>>> now.
>>>
>>> --
>>> Tim Cross
>>>
>>>
next prev parent reply other threads:[~2021-02-22 8:39 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-20 21:15 state of the art in org-mode tables e.g. join, etc John Kitchin
2021-02-21 4:40 ` Greg Minshall
2021-02-21 6:45 ` Tim Cross
2021-02-21 15:03 ` John Kitchin
2021-02-21 16:23 ` John Kitchin
2021-02-22 6:52 ` Cook, Malcolm
2021-02-22 8:12 ` Greg Minshall
2021-02-22 15:21 ` Cook, Malcolm
2021-02-22 18:41 ` Greg Minshall
2021-02-25 14:50 ` John Kitchin
2021-02-22 8:27 ` Derek Feichtinger [this message]
2021-02-24 22:21 ` John Kitchin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.orgmode.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87r1l8tsl5.fsf@psi.ch \
--to=derek.feichtinger@psi.ch \
--cc=emacs-orgmode@gnu.org \
--cc=jkitchin@andrew.cmu.edu \
--cc=theophilusx@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).