From: John Kitchin <jkitchin@andrew.cmu.edu>
To: Tim Cross <theophilusx@gmail.com>
Cc: org-mode-email <emacs-orgmode@gnu.org>
Subject: Re: state of the art in org-mode tables e.g. join, etc
Date: Sun, 21 Feb 2021 10:03:01 -0500 [thread overview]
Message-ID: <CAJ51ETpKCuGiSX9GJPeyUY6y6M9Nn8PHWyVidtP6ts=MqrnbiQ@mail.gmail.com> (raw)
In-Reply-To: <875z2lgbco.fsf@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 5262 bytes --]
Thanks Tim and Greg. I had mostly come to the same conclusions that it is
probably best to outsource this. I worked out some examples from
the orgtbljoin and orgaggregate packages with Pandas below, in case anyone
is interested in seeing how it works. A key point is using the ":colnames
no" header args to get the column names for Pandas. It seems like a pretty
good approach.
* org-mode tables with Pandas
** Aggregating from a table
Examples from https://github.com/tbanel/orgaggregate
#+NAME: original
| Day | Color | Level | Quantity |
|-----------+-------+-------+----------|
| Monday | Red | 30 | 11 |
| Monday | Blue | 25 | 3 |
| Tuesday | Red | 51 | 12 |
| Tuesday | Red | 45 | 15 |
| Tuesday | Blue | 33 | 18 |
| Wednesday | Red | 27 | 23 |
| Wednesday | Blue | 12 | 16 |
| Wednesday | Blue | 15 | 15 |
| Thursday | Red | 39 | 24 |
| Thursday | Red | 41 | 29 |
| Thursday | Red | 49 | 30 |
| Friday | Blue | 7 | 5 |
| Friday | Blue | 6 | 8 |
| Friday | Blue | 11 | 9 |
#+BEGIN_SRC ipython :var data=original :colnames no
import pandas as pd
pd.DataFrame(data[1:], columns=data[0]).groupby('Color').size()
#+END_SRC
#+RESULTS:
:results:
# Out [1]:
# text/plain
: Color
: Blue 7
: Red 7
: dtype: int64
:end:
The categorical stuff here is just to get the days sorted the same way as
the example. It is otherwise not needed. I feel there should be a more
clever way to do this, but didn't think of it.
#+BEGIN_SRC ipython :var data=original :colnames no
df = pd.DataFrame(data[1:], columns=data[0])
days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday',
'Sunday']
df['Day'] = pd.Categorical(df['Day'], categories=days, ordered=True)
(df
.groupby('Day')
.agg({'Level': 'mean',
'Quantity': 'sum'})
.sort_values('Day'))
#+END_SRC
#+RESULTS:
:results:
# Out [2]:
# text/plain
: Level Quantity
: Day
: Monday 27.5 14
: Tuesday 43.0 45
: Wednesday 18.0 54
: Thursday 43.0 83
: Friday 8.0 22
: Saturday NaN 0
: Sunday NaN 0
[[file:/var/folders/3q/ht_2mtk52hl7ydxrcr87z2gr0000gn/T/ob-ipython-htmlMnDA9a.html]]
:end:
** Joining tables
Example from https://github.com/tbanel/orgtbljoin
#+name: nutrition
| type | Fiber | Sugar | Protein | Carb |
|----------+-------+-------+---------+------|
| eggplant | 2.5 | 3.2 | 0.8 | 8.6 |
| tomatoe | 0.6 | 2.1 | 0.8 | 3.4 |
| onion | 1.3 | 4.4 | 1.3 | 9.0 |
| egg | 0 | 18.3 | 31.9 | 18.3 |
| rice | 0.2 | 0 | 1.5 | 16.0 |
| bread | 0.7 | 0.7 | 3.3 | 16.0 |
| orange | 3.1 | 11.9 | 1.3 | 17.6 |
| banana | 2.1 | 9.9 | 0.9 | 18.5 |
| tofu | 0.7 | 0.5 | 6.6 | 1.4 |
| nut | 2.6 | 1.3 | 4.9 | 7.2 |
| corn | 4.7 | 1.8 | 2.8 | 21.3 |
#+name: recipe
| type | quty |
|----------+------|
| onion | 70 |
| tomatoe | 120 |
| eggplant | 300 |
| tofu | 100 |
#+BEGIN_SRC ipython :var nut=nutrition recipe=recipe :colnames no
nutrition = pd.DataFrame(nut[1:], columns=nut[0])
rec = pd.DataFrame(recipe[1:], columns=recipe[0])
pd.merge(rec, nutrition, on='type')
#+END_SRC
#+RESULTS:
:results:
# Out [4]:
# text/plain
: type quty Fiber Sugar Protein Carb
: 0 onion 70 1.3 4.4 1.3 9.0
: 1 tomatoe 120 0.6 2.1 0.8 3.4
: 2 eggplant 300 2.5 3.2 0.8 8.6
: 3 tofu 100 0.7 0.5 6.6 1.4
:end:
John
-----------------------------------
Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu
On Sun, Feb 21, 2021 at 1:54 AM Tim Cross <theophilusx@gmail.com> wrote:
>
> Greg Minshall <minshall@umich.edu> writes:
>
> > John,
> >
> >> Is there a state of the art in using org-tables as little databases
> >> with joins and stuff?
> >
> > i have to admit i do all that with an R code source block. (the dplyr
> > package has the relevant joins, e.g. dplyr::inner_join().) and, in R,
> > ":colnames yes" as a header argument gives you header lines on results.
> > (maybe that's ?now? for "all" languages?)
> >
>
> For really complex joins and ad hoc queries, I would do similar or put
> the data into sqlite. For more simple ones, I just define a table which
> uses table formulas to extract the values from the other tables - the
> downside being the tables need to have the same data ordering or the
> formulas need to be somewhat complex. Provided the tables have the same
> number of records in the same order, table formulas are usually fairly
> easy.
>
> I did think about writing some elisp functions to use in my table
> formulas to make things easier, but then decided I was just re-inventing
> and well defined database solution and figured when I need it, just use
> sqlite. However, it has been a while since I needed this level of
> complexity, so perhaps things have moved on and there are better ways
> now.
>
> --
> Tim Cross
>
>
[-- Attachment #2: Type: text/html, Size: 7026 bytes --]
next prev parent reply other threads:[~2021-02-21 15:06 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-20 21:15 state of the art in org-mode tables e.g. join, etc John Kitchin
2021-02-21 4:40 ` Greg Minshall
2021-02-21 6:45 ` Tim Cross
2021-02-21 15:03 ` John Kitchin [this message]
2021-02-21 16:23 ` John Kitchin
2021-02-22 6:52 ` Cook, Malcolm
2021-02-22 8:12 ` Greg Minshall
2021-02-22 15:21 ` Cook, Malcolm
2021-02-22 18:41 ` Greg Minshall
2021-02-25 14:50 ` John Kitchin
2021-02-22 8:27 ` Derek Feichtinger
2021-02-24 22:21 ` John Kitchin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.orgmode.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJ51ETpKCuGiSX9GJPeyUY6y6M9Nn8PHWyVidtP6ts=MqrnbiQ@mail.gmail.com' \
--to=jkitchin@andrew.cmu.edu \
--cc=emacs-orgmode@gnu.org \
--cc=theophilusx@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).