Thanks Tim and Greg. I had mostly come to the same conclusions that it is probably best to outsource this. I worked out some examples from the orgtbljoin and orgaggregate packages with Pandas below, in case anyone is interested in seeing how it works. A key point is using the ":colnames no" header args to get the column names for Pandas. It seems like a pretty good approach.

* org-mode tables with Pandas
** Aggregating from a table

Examples from https://github.com/tbanel/orgaggregate


#+NAME: original
| Day       | Color | Level | Quantity |
|-----------+-------+-------+----------|
| Monday    | Red   |    30 |       11 |
| Monday    | Blue  |    25 |        3 |
| Tuesday   | Red   |    51 |       12 |
| Tuesday   | Red   |    45 |       15 |
| Tuesday   | Blue  |    33 |       18 |
| Wednesday | Red   |    27 |       23 |
| Wednesday | Blue  |    12 |       16 |
| Wednesday | Blue  |    15 |       15 |
| Thursday  | Red   |    39 |       24 |
| Thursday  | Red   |    41 |       29 |
| Thursday  | Red   |    49 |       30 |
| Friday    | Blue  |     7 |        5 |
| Friday    | Blue  |     6 |        8 |
| Friday    | Blue  |    11 |        9 |


#+BEGIN_SRC ipython :var data=original :colnames no
import pandas as pd

pd.DataFrame(data[1:], columns=data[0]).groupby('Color').size()
#+END_SRC

#+RESULTS:
:results:
# Out [1]:
# text/plain
: Color
: Blue    7
: Red     7
: dtype: int64
:end:

The categorical stuff here is just to get the days sorted the same way as the example. It is otherwise not needed. I feel there should be a more clever way to do this, but didn't think of it.

#+BEGIN_SRC ipython :var data=original :colnames no
df = pd.DataFrame(data[1:], columns=data[0])
days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
df['Day'] = pd.Categorical(df['Day'], categories=days, ordered=True)

(df
 .groupby('Day')
 .agg({'Level': 'mean',
       'Quantity': 'sum'})
 .sort_values('Day'))
#+END_SRC

#+RESULTS:
:results:
# Out [2]:
# text/plain
:            Level  Quantity
: Day
: Monday      27.5        14
: Tuesday     43.0        45
: Wednesday   18.0        54
: Thursday    43.0        83
: Friday       8.0        22
: Saturday     NaN         0
: Sunday       NaN         0

[[file:/var/folders/3q/ht_2mtk52hl7ydxrcr87z2gr0000gn/T/ob-ipython-htmlMnDA9a.html]]
:end:

** Joining tables

Example from https://github.com/tbanel/orgtbljoin

#+name: nutrition
| type     | Fiber | Sugar | Protein | Carb |
|----------+-------+-------+---------+------|
| eggplant |   2.5 |   3.2 |     0.8 |  8.6 |
| tomatoe  |   0.6 |   2.1 |     0.8 |  3.4 |
| onion    |   1.3 |   4.4 |     1.3 |  9.0 |
| egg      |     0 |  18.3 |    31.9 | 18.3 |
| rice     |   0.2 |     0 |     1.5 | 16.0 |
| bread    |   0.7 |   0.7 |     3.3 | 16.0 |
| orange   |   3.1 |  11.9 |     1.3 | 17.6 |
| banana   |   2.1 |   9.9 |     0.9 | 18.5 |
| tofu     |   0.7 |   0.5 |     6.6 |  1.4 |
| nut      |   2.6 |   1.3 |     4.9 |  7.2 |
| corn     |   4.7 |   1.8 |     2.8 | 21.3 |


#+name: recipe
| type     | quty |
|----------+------|
| onion    |   70 |
| tomatoe  |  120 |
| eggplant |  300 |
| tofu     |  100 |


#+BEGIN_SRC ipython :var nut=nutrition recipe=recipe :colnames no
nutrition = pd.DataFrame(nut[1:], columns=nut[0])
rec = pd.DataFrame(recipe[1:], columns=recipe[0])

pd.merge(rec, nutrition, on='type')
#+END_SRC

#+RESULTS:
:results:
# Out [4]:
# text/plain
:        type  quty  Fiber  Sugar  Protein  Carb
: 0     onion    70    1.3    4.4      1.3   9.0
: 1   tomatoe   120    0.6    2.1      0.8   3.4
: 2  eggplant   300    2.5    3.2      0.8   8.6
: 3      tofu   100    0.7    0.5      6.6   1.4
:end:


John

-----------------------------------
Professor John Kitchin 
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803


On Sun, Feb 21, 2021 at 1:54 AM Tim Cross <theophilusx@gmail.com> wrote:

Greg Minshall <minshall@umich.edu> writes:

> John,
>
>> Is there a state of the art in using org-tables as little databases
>> with joins and stuff?
>
> i have to admit i do all that with an R code source block.  (the dplyr
> package has the relevant joins, e.g. dplyr::inner_join().)  and, in R,
> ":colnames yes" as a header argument gives you header lines on results.
> (maybe that's ?now? for "all" languages?)
>

For really complex joins and ad hoc queries, I would do similar or put
the data into sqlite. For more simple ones, I just define a table which
uses table formulas to extract the values from the other tables - the
downside being the tables need to have the same data ordering or the
formulas need to be somewhat complex. Provided the tables have the same
number of records in the same order, table formulas are usually fairly
easy.

I did think about writing some elisp functions to use in my table
formulas to make things easier, but then decided I was just re-inventing
and well defined database solution and figured when I need it, just use
sqlite. However, it has been a while since I needed this level of
complexity, so perhaps things have moved on and there are better ways
now.

--
Tim Cross