For fun, here is the sqlite equivalent of the Pandas example using the same tables as before ** aggregation example Examples from https://github.com/tbanel/orgaggregate #+NAME: original | Day | Color | Level | Quantity | |-----------+-------+-------+----------| | Monday | Red | 30 | 11 | | Monday | Blue | 25 | 3 | | Tuesday | Red | 51 | 12 | | Tuesday | Red | 45 | 15 | | Tuesday | Blue | 33 | 18 | | Wednesday | Red | 27 | 23 | | Wednesday | Blue | 12 | 16 | | Wednesday | Blue | 15 | 15 | | Thursday | Red | 39 | 24 | | Thursday | Red | 41 | 29 | | Thursday | Red | 49 | 30 | | Friday | Blue | 7 | 5 | | Friday | Blue | 6 | 8 | | Friday | Blue | 11 | 9 | #+begin_src sqlite :db ":memory:" :var orgtable=original :colnames yes drop table if exists testtable; create table testtable(Day str, Color str, Level int, Quantity int); .mode csv testtable .import $orgtable testtable select Color, count(*) from testtable group by Color; #+end_src #+RESULTS: | Color | count(*) | |-------+----------| | Blue | 7 | | Red | 7 | ** join example Example from https://github.com/tbanel/orgtbljoin #+name: nutrition | type | Fiber | Sugar | Protein | Carb | |----------+-------+-------+---------+------| | eggplant | 2.5 | 3.2 | 0.8 | 8.6 | | tomatoe | 0.6 | 2.1 | 0.8 | 3.4 | | onion | 1.3 | 4.4 | 1.3 | 9.0 | | egg | 0 | 18.3 | 31.9 | 18.3 | | rice | 0.2 | 0 | 1.5 | 16.0 | | bread | 0.7 | 0.7 | 3.3 | 16.0 | | orange | 3.1 | 11.9 | 1.3 | 17.6 | | banana | 2.1 | 9.9 | 0.9 | 18.5 | | tofu | 0.7 | 0.5 | 6.6 | 1.4 | | nut | 2.6 | 1.3 | 4.9 | 7.2 | | corn | 4.7 | 1.8 | 2.8 | 21.3 | #+name: recipe | type | quty | |----------+------| | onion | 70 | | tomatoe | 120 | | eggplant | 300 | | tofu | 100 | #+begin_src sqlite :db ":memory:" :var nut=nutrition rec=recipe :colnames yes drop table if exists nutrition; drop table if exists recipe; create table nutrition(type str, Fiber float, Sugar float, Protein float, Carb float); create table recipe(type str, quty int); .mode csv nutrition .import $nut nutrition .mode csv recipe .import $rec recipe select * from recipe, nutrition where recipe.type=nutrition.type; #+end_src #+RESULTS: | type | quty | type | Fiber | Sugar | Protein | Carb | |----------+------+----------+-------+-------+---------+------| | onion | 70 | onion | 1.3 | 4.4 | 1.3 | 9.0 | | tomatoe | 120 | tomatoe | 0.6 | 2.1 | 0.8 | 3.4 | | eggplant | 300 | eggplant | 2.5 | 3.2 | 0.8 | 8.6 | | tofu | 100 | tofu | 0.7 | 0.5 | 6.6 | 1.4 | John ----------------------------------- Professor John Kitchin Doherty Hall A207F Department of Chemical Engineering Carnegie Mellon University Pittsburgh, PA 15213 412-268-7803 @johnkitchin http://kitchingroup.cheme.cmu.edu On Sun, Feb 21, 2021 at 10:03 AM John Kitchin wrote: > Thanks Tim and Greg. I had mostly come to the same conclusions that it is > probably best to outsource this. I worked out some examples from > the orgtbljoin and orgaggregate packages with Pandas below, in case anyone > is interested in seeing how it works. A key point is using the ":colnames > no" header args to get the column names for Pandas. It seems like a pretty > good approach. > > * org-mode tables with Pandas > ** Aggregating from a table > > Examples from https://github.com/tbanel/orgaggregate > > > #+NAME: original > | Day | Color | Level | Quantity | > |-----------+-------+-------+----------| > | Monday | Red | 30 | 11 | > | Monday | Blue | 25 | 3 | > | Tuesday | Red | 51 | 12 | > | Tuesday | Red | 45 | 15 | > | Tuesday | Blue | 33 | 18 | > | Wednesday | Red | 27 | 23 | > | Wednesday | Blue | 12 | 16 | > | Wednesday | Blue | 15 | 15 | > | Thursday | Red | 39 | 24 | > | Thursday | Red | 41 | 29 | > | Thursday | Red | 49 | 30 | > | Friday | Blue | 7 | 5 | > | Friday | Blue | 6 | 8 | > | Friday | Blue | 11 | 9 | > > > #+BEGIN_SRC ipython :var data=original :colnames no > import pandas as pd > > pd.DataFrame(data[1:], columns=data[0]).groupby('Color').size() > #+END_SRC > > #+RESULTS: > :results: > # Out [1]: > # text/plain > : Color > : Blue 7 > : Red 7 > : dtype: int64 > :end: > > The categorical stuff here is just to get the days sorted the same way as > the example. It is otherwise not needed. I feel there should be a more > clever way to do this, but didn't think of it. > > #+BEGIN_SRC ipython :var data=original :colnames no > df = pd.DataFrame(data[1:], columns=data[0]) > days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', > 'Saturday', 'Sunday'] > df['Day'] = pd.Categorical(df['Day'], categories=days, ordered=True) > > (df > .groupby('Day') > .agg({'Level': 'mean', > 'Quantity': 'sum'}) > .sort_values('Day')) > #+END_SRC > > #+RESULTS: > :results: > # Out [2]: > # text/plain > : Level Quantity > : Day > : Monday 27.5 14 > : Tuesday 43.0 45 > : Wednesday 18.0 54 > : Thursday 43.0 83 > : Friday 8.0 22 > : Saturday NaN 0 > : Sunday NaN 0 > > > [[file:/var/folders/3q/ht_2mtk52hl7ydxrcr87z2gr0000gn/T/ob-ipython-htmlMnDA9a.html]] > :end: > > ** Joining tables > > Example from https://github.com/tbanel/orgtbljoin > > #+name: nutrition > | type | Fiber | Sugar | Protein | Carb | > |----------+-------+-------+---------+------| > | eggplant | 2.5 | 3.2 | 0.8 | 8.6 | > | tomatoe | 0.6 | 2.1 | 0.8 | 3.4 | > | onion | 1.3 | 4.4 | 1.3 | 9.0 | > | egg | 0 | 18.3 | 31.9 | 18.3 | > | rice | 0.2 | 0 | 1.5 | 16.0 | > | bread | 0.7 | 0.7 | 3.3 | 16.0 | > | orange | 3.1 | 11.9 | 1.3 | 17.6 | > | banana | 2.1 | 9.9 | 0.9 | 18.5 | > | tofu | 0.7 | 0.5 | 6.6 | 1.4 | > | nut | 2.6 | 1.3 | 4.9 | 7.2 | > | corn | 4.7 | 1.8 | 2.8 | 21.3 | > > > #+name: recipe > | type | quty | > |----------+------| > | onion | 70 | > | tomatoe | 120 | > | eggplant | 300 | > | tofu | 100 | > > > #+BEGIN_SRC ipython :var nut=nutrition recipe=recipe :colnames no > nutrition = pd.DataFrame(nut[1:], columns=nut[0]) > rec = pd.DataFrame(recipe[1:], columns=recipe[0]) > > pd.merge(rec, nutrition, on='type') > #+END_SRC > > #+RESULTS: > :results: > # Out [4]: > # text/plain > : type quty Fiber Sugar Protein Carb > : 0 onion 70 1.3 4.4 1.3 9.0 > : 1 tomatoe 120 0.6 2.1 0.8 3.4 > : 2 eggplant 300 2.5 3.2 0.8 8.6 > : 3 tofu 100 0.7 0.5 6.6 1.4 > :end: > > > John > > ----------------------------------- > Professor John Kitchin > Doherty Hall A207F > Department of Chemical Engineering > Carnegie Mellon University > Pittsburgh, PA 15213 > 412-268-7803 > @johnkitchin > http://kitchingroup.cheme.cmu.edu > > > > On Sun, Feb 21, 2021 at 1:54 AM Tim Cross wrote: > >> >> Greg Minshall writes: >> >> > John, >> > >> >> Is there a state of the art in using org-tables as little databases >> >> with joins and stuff? >> > >> > i have to admit i do all that with an R code source block. (the dplyr >> > package has the relevant joins, e.g. dplyr::inner_join().) and, in R, >> > ":colnames yes" as a header argument gives you header lines on results. >> > (maybe that's ?now? for "all" languages?) >> > >> >> For really complex joins and ad hoc queries, I would do similar or put >> the data into sqlite. For more simple ones, I just define a table which >> uses table formulas to extract the values from the other tables - the >> downside being the tables need to have the same data ordering or the >> formulas need to be somewhat complex. Provided the tables have the same >> number of records in the same order, table formulas are usually fairly >> easy. >> >> I did think about writing some elisp functions to use in my table >> formulas to make things easier, but then decided I was just re-inventing >> and well defined database solution and figured when I need it, just use >> sqlite. However, it has been a while since I needed this level of >> complexity, so perhaps things have moved on and there are better ways >> now. >> >> -- >> Tim Cross >> >>