From: "Cook, Malcolm" <MEC@stowers.org>
To: John Kitchin <jkitchin@andrew.cmu.edu>,
Tim Cross <theophilusx@gmail.com>
Cc: org-mode-email <emacs-orgmode@gnu.org>
Subject: RE: state of the art in org-mode tables e.g. join, etc
Date: Mon, 22 Feb 2021 06:52:30 +0000 [thread overview]
Message-ID: <DM6PR20MB341042993D9907F94AF618ACBE819@DM6PR20MB3410.namprd20.prod.outlook.com> (raw)
In-Reply-To: <CAJ51ETrQzgP3DGespaGx0Yj5gLSDciVtV-9G4gM33-xZdHxNug@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 12018 bytes --]
John,
Checkout what R sqldf package makes easy:
** aggregation example
Examples from https://github.com/tbanel/orgaggregate
#+NAME: original
| Day | Color | Level | Quantity |
|-----------+-------+-------+----------|
| Monday | Red | 30 | 11 |
| Monday | Blue | 25 | 3 |
| Tuesday | Red | 51 | 12 |
| Tuesday | Red | 45 | 15 |
| Tuesday | Blue | 33 | 18 |
| Wednesday | Red | 27 | 23 |
| Wednesday | Blue | 12 | 16 |
| Wednesday | Blue | 15 | 15 |
| Thursday | Red | 39 | 24 |
| Thursday | Red | 41 | 29 |
| Thursday | Red | 49 | 30 |
| Friday | Blue | 7 | 5 |
| Friday | Blue | 6 | 8 |
| Friday | Blue | 11 | 9 |
#+PROPERTY: header-args:R :session *R*
#+begin_src R :results none
library(sqldf)
#+end_src
#+begin_src R :var original=original :colnames yes
sqldf('select Color, count(*) from original group by Color;')
#+end_src
#+RESULTS:
| Color | count(*) |
|-------+----------|
| Blue | 7 |
| Red | 7 |
** join example
Example from https://github.com/tbanel/orgtbljoin
#+name: nutrition
| type | Fiber | Sugar | Protein | Carb |
|----------+-------+-------+---------+------|
| eggplant | 2.5 | 3.2 | 0.8 | 8.6 |
| tomatoe | 0.6 | 2.1 | 0.8 | 3.4 |
| onion | 1.3 | 4.4 | 1.3 | 9.0 |
| egg | 0 | 18.3 | 31.9 | 18.3 |
| rice | 0.2 | 0 | 1.5 | 16.0 |
| bread | 0.7 | 0.7 | 3.3 | 16.0 |
| orange | 3.1 | 11.9 | 1.3 | 17.6 |
| banana | 2.1 | 9.9 | 0.9 | 18.5 |
| tofu | 0.7 | 0.5 | 6.6 | 1.4 |
| nut | 2.6 | 1.3 | 4.9 | 7.2 |
| corn | 4.7 | 1.8 | 2.8 | 21.3 |
#+name: recipe
| type | quty |
|----------+------|
| onion | 70 |
| tomatoe | 120 |
| eggplant | 300 |
| tofu | 100 |
#+begin_src R :var recipe=recipe :var nutrition=nutrition :colnames yes
sqldf('select * from recipe, nutrition where recipe.type=nutrition.type')
#+end_src
#+RESULTS:
| type | quty | type | Fiber | Sugar | Protein | Carb |
|----------+------+----------+-------+-------+---------+------|
| onion | 70 | onion | 1.3 | 4.4 | 1.3 | 9 |
| tomatoe | 120 | tomatoe | 0.6 | 2.1 | 0.8 | 3.4 |
| eggplant | 300 | eggplant | 2.5 | 3.2 | 0.8 | 8.6 |
| tofu | 100 | tofu | 0.7 | 0.5 | 6.6 | 1.4 |
This should also be possible but I cannot get it to work now:
#+begin_src R :var recipe=recipe :var nutrition=nutrition :colnames yes :prologue sqldf(' :epilogue ')
select * from recipe, nutrition where recipe.type=nutrition.type
#+end_src
From: Emacs-orgmode <emacs-orgmode-bounces+mec=stowers.org@gnu.org> On Behalf Of John Kitchin
Sent: Sunday, February 21, 2021 10:24
To: Tim Cross <theophilusx@gmail.com>
Cc: org-mode-email <emacs-orgmode@gnu.org>
Subject: Re: state of the art in org-mode tables e.g. join, etc
ATTENTION: This email came from an external source. Do not open attachments or click on links from unknown senders or unexpected emails.
For fun, here is the sqlite equivalent of the Pandas example using the same tables as before
** aggregation example
Examples from https://github.com/tbanel/orgaggregate
#+NAME: original
| Day | Color | Level | Quantity |
|-----------+-------+-------+----------|
| Monday | Red | 30 | 11 |
| Monday | Blue | 25 | 3 |
| Tuesday | Red | 51 | 12 |
| Tuesday | Red | 45 | 15 |
| Tuesday | Blue | 33 | 18 |
| Wednesday | Red | 27 | 23 |
| Wednesday | Blue | 12 | 16 |
| Wednesday | Blue | 15 | 15 |
| Thursday | Red | 39 | 24 |
| Thursday | Red | 41 | 29 |
| Thursday | Red | 49 | 30 |
| Friday | Blue | 7 | 5 |
| Friday | Blue | 6 | 8 |
| Friday | Blue | 11 | 9 |
#+begin_src sqlite :db ":memory:" :var orgtable=original :colnames yes
drop table if exists testtable;
create table testtable(Day str, Color str, Level int, Quantity int);
.mode csv testtable
.import $orgtable testtable
select Color, count(*) from testtable group by Color;
#+end_src
#+RESULTS:
| Color | count(*) |
|-------+----------|
| Blue | 7 |
| Red | 7 |
** join example
Example from https://github.com/tbanel/orgtbljoin
#+name: nutrition
| type | Fiber | Sugar | Protein | Carb |
|----------+-------+-------+---------+------|
| eggplant | 2.5 | 3.2 | 0.8 | 8.6 |
| tomatoe | 0.6 | 2.1 | 0.8 | 3.4 |
| onion | 1.3 | 4.4 | 1.3 | 9.0 |
| egg | 0 | 18.3 | 31.9 | 18.3 |
| rice | 0.2 | 0 | 1.5 | 16.0 |
| bread | 0.7 | 0.7 | 3.3 | 16.0 |
| orange | 3.1 | 11.9 | 1.3 | 17.6 |
| banana | 2.1 | 9.9 | 0.9 | 18.5 |
| tofu | 0.7 | 0.5 | 6.6 | 1.4 |
| nut | 2.6 | 1.3 | 4.9 | 7.2 |
| corn | 4.7 | 1.8 | 2.8 | 21.3 |
#+name: recipe
| type | quty |
|----------+------|
| onion | 70 |
| tomatoe | 120 |
| eggplant | 300 |
| tofu | 100 |
#+begin_src sqlite :db ":memory:" :var nut=nutrition rec=recipe :colnames yes
drop table if exists nutrition;
drop table if exists recipe;
create table nutrition(type str, Fiber float, Sugar float, Protein float, Carb float);
create table recipe(type str, quty int);
.mode csv nutrition
.import $nut nutrition
.mode csv recipe
.import $rec recipe
select * from recipe, nutrition where recipe.type=nutrition.type;
#+end_src
#+RESULTS:
| type | quty | type | Fiber | Sugar | Protein | Carb |
|----------+------+----------+-------+-------+---------+------|
| onion | 70 | onion | 1.3 | 4.4 | 1.3 | 9.0 |
| tomatoe | 120 | tomatoe | 0.6 | 2.1 | 0.8 | 3.4 |
| eggplant | 300 | eggplant | 2.5 | 3.2 | 0.8 | 8.6 |
| tofu | 100 | tofu | 0.7 | 0.5 | 6.6 | 1.4 |
John
-----------------------------------
Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu
On Sun, Feb 21, 2021 at 10:03 AM John Kitchin <jkitchin@andrew.cmu.edu<mailto:jkitchin@andrew.cmu.edu>> wrote:
Thanks Tim and Greg. I had mostly come to the same conclusions that it is probably best to outsource this. I worked out some examples from the orgtbljoin and orgaggregate packages with Pandas below, in case anyone is interested in seeing how it works. A key point is using the ":colnames no" header args to get the column names for Pandas. It seems like a pretty good approach.
* org-mode tables with Pandas
** Aggregating from a table
Examples from https://github.com/tbanel/orgaggregate
#+NAME: original
| Day | Color | Level | Quantity |
|-----------+-------+-------+----------|
| Monday | Red | 30 | 11 |
| Monday | Blue | 25 | 3 |
| Tuesday | Red | 51 | 12 |
| Tuesday | Red | 45 | 15 |
| Tuesday | Blue | 33 | 18 |
| Wednesday | Red | 27 | 23 |
| Wednesday | Blue | 12 | 16 |
| Wednesday | Blue | 15 | 15 |
| Thursday | Red | 39 | 24 |
| Thursday | Red | 41 | 29 |
| Thursday | Red | 49 | 30 |
| Friday | Blue | 7 | 5 |
| Friday | Blue | 6 | 8 |
| Friday | Blue | 11 | 9 |
#+BEGIN_SRC ipython :var data=original :colnames no
import pandas as pd
pd.DataFrame(data[1:], columns=data[0]).groupby('Color').size()
#+END_SRC
#+RESULTS:
:results:
# Out [1]:
# text/plain
: Color
: Blue 7
: Red 7
: dtype: int64
:end:
The categorical stuff here is just to get the days sorted the same way as the example. It is otherwise not needed. I feel there should be a more clever way to do this, but didn't think of it.
#+BEGIN_SRC ipython :var data=original :colnames no
df = pd.DataFrame(data[1:], columns=data[0])
days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
df['Day'] = pd.Categorical(df['Day'], categories=days, ordered=True)
(df
.groupby('Day')
.agg({'Level': 'mean',
'Quantity': 'sum'})
.sort_values('Day'))
#+END_SRC
#+RESULTS:
:results:
# Out [2]:
# text/plain
: Level Quantity
: Day
: Monday 27.5 14
: Tuesday 43.0 45
: Wednesday 18.0 54
: Thursday 43.0 83
: Friday 8.0 22
: Saturday NaN 0
: Sunday NaN 0
[[file:/var/folders/3q/ht_2mtk52hl7ydxrcr87z2gr0000gn/T/ob-ipython-htmlMnDA9a.html]]
:end:
** Joining tables
Example from https://github.com/tbanel/orgtbljoin
#+name: nutrition
| type | Fiber | Sugar | Protein | Carb |
|----------+-------+-------+---------+------|
| eggplant | 2.5 | 3.2 | 0.8 | 8.6 |
| tomatoe | 0.6 | 2.1 | 0.8 | 3.4 |
| onion | 1.3 | 4.4 | 1.3 | 9.0 |
| egg | 0 | 18.3 | 31.9 | 18.3 |
| rice | 0.2 | 0 | 1.5 | 16.0 |
| bread | 0.7 | 0.7 | 3.3 | 16.0 |
| orange | 3.1 | 11.9 | 1.3 | 17.6 |
| banana | 2.1 | 9.9 | 0.9 | 18.5 |
| tofu | 0.7 | 0.5 | 6.6 | 1.4 |
| nut | 2.6 | 1.3 | 4.9 | 7.2 |
| corn | 4.7 | 1.8 | 2.8 | 21.3 |
#+name: recipe
| type | quty |
|----------+------|
| onion | 70 |
| tomatoe | 120 |
| eggplant | 300 |
| tofu | 100 |
#+BEGIN_SRC ipython :var nut=nutrition recipe=recipe :colnames no
nutrition = pd.DataFrame(nut[1:], columns=nut[0])
rec = pd.DataFrame(recipe[1:], columns=recipe[0])
pd.merge(rec, nutrition, on='type')
#+END_SRC
#+RESULTS:
:results:
# Out [4]:
# text/plain
: type quty Fiber Sugar Protein Carb
: 0 onion 70 1.3 4.4 1.3 9.0
: 1 tomatoe 120 0.6 2.1 0.8 3.4
: 2 eggplant 300 2.5 3.2 0.8 8.6
: 3 tofu 100 0.7 0.5 6.6 1.4
:end:
John
-----------------------------------
Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu
On Sun, Feb 21, 2021 at 1:54 AM Tim Cross <theophilusx@gmail.com<mailto:theophilusx@gmail.com>> wrote:
Greg Minshall <minshall@umich.edu<mailto:minshall@umich.edu>> writes:
> John,
>
>> Is there a state of the art in using org-tables as little databases
>> with joins and stuff?
>
> i have to admit i do all that with an R code source block. (the dplyr
> package has the relevant joins, e.g. dplyr::inner_join().) and, in R,
> ":colnames yes" as a header argument gives you header lines on results.
> (maybe that's ?now? for "all" languages?)
>
For really complex joins and ad hoc queries, I would do similar or put
the data into sqlite. For more simple ones, I just define a table which
uses table formulas to extract the values from the other tables - the
downside being the tables need to have the same data ordering or the
formulas need to be somewhat complex. Provided the tables have the same
number of records in the same order, table formulas are usually fairly
easy.
I did think about writing some elisp functions to use in my table
formulas to make things easier, but then decided I was just re-inventing
and well defined database solution and figured when I need it, just use
sqlite. However, it has been a while since I needed this level of
complexity, so perhaps things have moved on and there are better ways
now.
--
Tim Cross
[-- Attachment #2: Type: text/html, Size: 26562 bytes --]
next prev parent reply other threads:[~2021-02-22 6:53 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-20 21:15 state of the art in org-mode tables e.g. join, etc John Kitchin
2021-02-21 4:40 ` Greg Minshall
2021-02-21 6:45 ` Tim Cross
2021-02-21 15:03 ` John Kitchin
2021-02-21 16:23 ` John Kitchin
2021-02-22 6:52 ` Cook, Malcolm [this message]
2021-02-22 8:12 ` Greg Minshall
2021-02-22 15:21 ` Cook, Malcolm
2021-02-22 18:41 ` Greg Minshall
2021-02-25 14:50 ` John Kitchin
2021-02-22 8:27 ` Derek Feichtinger
2021-02-24 22:21 ` John Kitchin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.orgmode.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DM6PR20MB341042993D9907F94AF618ACBE819@DM6PR20MB3410.namprd20.prod.outlook.com \
--to=mec@stowers.org \
--cc=emacs-orgmode@gnu.org \
--cc=jkitchin@andrew.cmu.edu \
--cc=theophilusx@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).