Thanks Tim and Greg. I had mostly come to the same conclusions that it is probably best to outsource this. I worked out some examples from the orgtbljoin and orgaggregate packages with Pandas below, in case anyone is interested in seeing how it works. A key point is using the ":colnames no" header args to get the column names for Pandas. It seems like a pretty good approach.
* org-mode tables with Pandas
** Aggregating from a table
Examples from
https://github.com/tbanel/orgaggregate#+NAME: original
| Day | Color | Level | Quantity |
|-----------+-------+-------+----------|
| Monday | Red | 30 | 11 |
| Monday | Blue | 25 | 3 |
| Tuesday | Red | 51 | 12 |
| Tuesday | Red | 45 | 15 |
| Tuesday | Blue | 33 | 18 |
| Wednesday | Red | 27 | 23 |
| Wednesday | Blue | 12 | 16 |
| Wednesday | Blue | 15 | 15 |
| Thursday | Red | 39 | 24 |
| Thursday | Red | 41 | 29 |
| Thursday | Red | 49 | 30 |
| Friday | Blue | 7 | 5 |
| Friday | Blue | 6 | 8 |
| Friday | Blue | 11 | 9 |
#+BEGIN_SRC ipython :var data=original :colnames no
import pandas as pd
pd.DataFrame(data[1:], columns=data[0]).groupby('Color').size()
#+END_SRC
#+RESULTS:
:results:
# Out [1]:
# text/plain
: Color
: Blue 7
: Red 7
: dtype: int64
:end:
The categorical stuff here is just to get the days sorted the same way as the example. It is otherwise not needed. I feel there should be a more clever way to do this, but didn't think of it.
#+BEGIN_SRC ipython :var data=original :colnames no
df = pd.DataFrame(data[1:], columns=data[0])
days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
df['Day'] = pd.Categorical(df['Day'], categories=days, ordered=True)
(df
.groupby('Day')
.agg({'Level': 'mean',
'Quantity': 'sum'})
.sort_values('Day'))
#+END_SRC
#+RESULTS:
:results:
# Out [2]:
# text/plain
: Level Quantity
: Day
: Monday 27.5 14
: Tuesday 43.0 45
: Wednesday 18.0 54
: Thursday 43.0 83
: Friday 8.0 22
: Saturday NaN 0
: Sunday NaN 0
[[file:/var/folders/3q/ht_2mtk52hl7ydxrcr87z2gr0000gn/T/ob-ipython-htmlMnDA9a.html]]
:end:
** Joining tables
Example from
https://github.com/tbanel/orgtbljoin#+name: nutrition
| type | Fiber | Sugar | Protein | Carb |
|----------+-------+-------+---------+------|
| eggplant | 2.5 | 3.2 | 0.8 | 8.6 |
| tomatoe | 0.6 | 2.1 | 0.8 | 3.4 |
| onion | 1.3 | 4.4 | 1.3 | 9.0 |
| egg | 0 | 18.3 | 31.9 | 18.3 |
| rice | 0.2 | 0 | 1.5 | 16.0 |
| bread | 0.7 | 0.7 | 3.3 | 16.0 |
| orange | 3.1 | 11.9 | 1.3 | 17.6 |
| banana | 2.1 | 9.9 | 0.9 | 18.5 |
| tofu | 0.7 | 0.5 | 6.6 | 1.4 |
| nut | 2.6 | 1.3 | 4.9 | 7.2 |
| corn | 4.7 | 1.8 | 2.8 | 21.3 |
#+name: recipe
| type | quty |
|----------+------|
| onion | 70 |
| tomatoe | 120 |
| eggplant | 300 |
| tofu | 100 |
#+BEGIN_SRC ipython :var nut=nutrition recipe=recipe :colnames no
nutrition = pd.DataFrame(nut[1:], columns=nut[0])
rec = pd.DataFrame(recipe[1:], columns=recipe[0])
pd.merge(rec, nutrition, on='type')
#+END_SRC
#+RESULTS:
:results:
# Out [4]:
# text/plain
: type quty Fiber Sugar Protein Carb
: 0 onion 70 1.3 4.4 1.3 9.0
: 1 tomatoe 120 0.6 2.1 0.8 3.4
: 2 eggplant 300 2.5 3.2 0.8 8.6
: 3 tofu 100 0.7 0.5 6.6 1.4
:end:
John
-----------------------------------
Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803