emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: John Kitchin <jkitchin@andrew.cmu.edu>
To: Tim Cross <theophilusx@gmail.com>
Cc: org-mode-email <emacs-orgmode@gnu.org>
Subject: Re: state of the art in org-mode tables e.g. join, etc
Date: Sun, 21 Feb 2021 11:23:38 -0500	[thread overview]
Message-ID: <CAJ51ETrQzgP3DGespaGx0Yj5gLSDciVtV-9G4gM33-xZdHxNug@mail.gmail.com> (raw)
In-Reply-To: <CAJ51ETpKCuGiSX9GJPeyUY6y6M9Nn8PHWyVidtP6ts=MqrnbiQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 8747 bytes --]

For fun, here is the sqlite equivalent of the Pandas example using the same
tables as before


** aggregation example

Examples from https://github.com/tbanel/orgaggregate


#+NAME: original
| Day       | Color | Level | Quantity |
|-----------+-------+-------+----------|
| Monday    | Red   |    30 |       11 |
| Monday    | Blue  |    25 |        3 |
| Tuesday   | Red   |    51 |       12 |
| Tuesday   | Red   |    45 |       15 |
| Tuesday   | Blue  |    33 |       18 |
| Wednesday | Red   |    27 |       23 |
| Wednesday | Blue  |    12 |       16 |
| Wednesday | Blue  |    15 |       15 |
| Thursday  | Red   |    39 |       24 |
| Thursday  | Red   |    41 |       29 |
| Thursday  | Red   |    49 |       30 |
| Friday    | Blue  |     7 |        5 |
| Friday    | Blue  |     6 |        8 |
| Friday    | Blue  |    11 |        9 |


#+begin_src sqlite :db ":memory:" :var orgtable=original :colnames yes
drop table if exists testtable;
create table testtable(Day str, Color str, Level int, Quantity int);
.mode csv testtable
.import $orgtable testtable
select Color, count(*) from testtable group by Color;
#+end_src

#+RESULTS:
| Color | count(*) |
|-------+----------|
| Blue  |        7 |
| Red   |        7 |

** join example

Example from https://github.com/tbanel/orgtbljoin

#+name: nutrition
| type     | Fiber | Sugar | Protein | Carb |
|----------+-------+-------+---------+------|
| eggplant |   2.5 |   3.2 |     0.8 |  8.6 |
| tomatoe  |   0.6 |   2.1 |     0.8 |  3.4 |
| onion    |   1.3 |   4.4 |     1.3 |  9.0 |
| egg      |     0 |  18.3 |    31.9 | 18.3 |
| rice     |   0.2 |     0 |     1.5 | 16.0 |
| bread    |   0.7 |   0.7 |     3.3 | 16.0 |
| orange   |   3.1 |  11.9 |     1.3 | 17.6 |
| banana   |   2.1 |   9.9 |     0.9 | 18.5 |
| tofu     |   0.7 |   0.5 |     6.6 |  1.4 |
| nut      |   2.6 |   1.3 |     4.9 |  7.2 |
| corn     |   4.7 |   1.8 |     2.8 | 21.3 |


#+name: recipe
| type     | quty |
|----------+------|
| onion    |   70 |
| tomatoe  |  120 |
| eggplant |  300 |
| tofu     |  100 |


#+begin_src sqlite :db ":memory:" :var nut=nutrition rec=recipe :colnames
yes
drop table if exists nutrition;
drop table if exists recipe;
create table nutrition(type str, Fiber float, Sugar float, Protein float,
Carb float);
create table recipe(type str, quty int);

.mode csv nutrition
.import $nut nutrition

.mode csv recipe
.import $rec recipe

select * from recipe, nutrition where recipe.type=nutrition.type;
#+end_src

#+RESULTS:
| type     | quty | type     | Fiber | Sugar | Protein | Carb |
|----------+------+----------+-------+-------+---------+------|
| onion    |   70 | onion    |   1.3 |   4.4 |     1.3 |  9.0 |
| tomatoe  |  120 | tomatoe  |   0.6 |   2.1 |     0.8 |  3.4 |
| eggplant |  300 | eggplant |   2.5 |   3.2 |     0.8 |  8.6 |
| tofu     |  100 | tofu     |   0.7 |   0.5 |     6.6 |  1.4 |


John

-----------------------------------
Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu



On Sun, Feb 21, 2021 at 10:03 AM John Kitchin <jkitchin@andrew.cmu.edu>
wrote:

> Thanks Tim and Greg. I had mostly come to the same conclusions that it is
> probably best to outsource this. I worked out some examples from
> the orgtbljoin and orgaggregate packages with Pandas below, in case anyone
> is interested in seeing how it works. A key point is using the ":colnames
> no" header args to get the column names for Pandas. It seems like a pretty
> good approach.
>
> * org-mode tables with Pandas
> ** Aggregating from a table
>
> Examples from https://github.com/tbanel/orgaggregate
>
>
> #+NAME: original
> | Day       | Color | Level | Quantity |
> |-----------+-------+-------+----------|
> | Monday    | Red   |    30 |       11 |
> | Monday    | Blue  |    25 |        3 |
> | Tuesday   | Red   |    51 |       12 |
> | Tuesday   | Red   |    45 |       15 |
> | Tuesday   | Blue  |    33 |       18 |
> | Wednesday | Red   |    27 |       23 |
> | Wednesday | Blue  |    12 |       16 |
> | Wednesday | Blue  |    15 |       15 |
> | Thursday  | Red   |    39 |       24 |
> | Thursday  | Red   |    41 |       29 |
> | Thursday  | Red   |    49 |       30 |
> | Friday    | Blue  |     7 |        5 |
> | Friday    | Blue  |     6 |        8 |
> | Friday    | Blue  |    11 |        9 |
>
>
> #+BEGIN_SRC ipython :var data=original :colnames no
> import pandas as pd
>
> pd.DataFrame(data[1:], columns=data[0]).groupby('Color').size()
> #+END_SRC
>
> #+RESULTS:
> :results:
> # Out [1]:
> # text/plain
> : Color
> : Blue    7
> : Red     7
> : dtype: int64
> :end:
>
> The categorical stuff here is just to get the days sorted the same way as
> the example. It is otherwise not needed. I feel there should be a more
> clever way to do this, but didn't think of it.
>
> #+BEGIN_SRC ipython :var data=original :colnames no
> df = pd.DataFrame(data[1:], columns=data[0])
> days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday',
> 'Saturday', 'Sunday']
> df['Day'] = pd.Categorical(df['Day'], categories=days, ordered=True)
>
> (df
>  .groupby('Day')
>  .agg({'Level': 'mean',
>        'Quantity': 'sum'})
>  .sort_values('Day'))
> #+END_SRC
>
> #+RESULTS:
> :results:
> # Out [2]:
> # text/plain
> :            Level  Quantity
> : Day
> : Monday      27.5        14
> : Tuesday     43.0        45
> : Wednesday   18.0        54
> : Thursday    43.0        83
> : Friday       8.0        22
> : Saturday     NaN         0
> : Sunday       NaN         0
>
>
> [[file:/var/folders/3q/ht_2mtk52hl7ydxrcr87z2gr0000gn/T/ob-ipython-htmlMnDA9a.html]]
> :end:
>
> ** Joining tables
>
> Example from https://github.com/tbanel/orgtbljoin
>
> #+name: nutrition
> | type     | Fiber | Sugar | Protein | Carb |
> |----------+-------+-------+---------+------|
> | eggplant |   2.5 |   3.2 |     0.8 |  8.6 |
> | tomatoe  |   0.6 |   2.1 |     0.8 |  3.4 |
> | onion    |   1.3 |   4.4 |     1.3 |  9.0 |
> | egg      |     0 |  18.3 |    31.9 | 18.3 |
> | rice     |   0.2 |     0 |     1.5 | 16.0 |
> | bread    |   0.7 |   0.7 |     3.3 | 16.0 |
> | orange   |   3.1 |  11.9 |     1.3 | 17.6 |
> | banana   |   2.1 |   9.9 |     0.9 | 18.5 |
> | tofu     |   0.7 |   0.5 |     6.6 |  1.4 |
> | nut      |   2.6 |   1.3 |     4.9 |  7.2 |
> | corn     |   4.7 |   1.8 |     2.8 | 21.3 |
>
>
> #+name: recipe
> | type     | quty |
> |----------+------|
> | onion    |   70 |
> | tomatoe  |  120 |
> | eggplant |  300 |
> | tofu     |  100 |
>
>
> #+BEGIN_SRC ipython :var nut=nutrition recipe=recipe :colnames no
> nutrition = pd.DataFrame(nut[1:], columns=nut[0])
> rec = pd.DataFrame(recipe[1:], columns=recipe[0])
>
> pd.merge(rec, nutrition, on='type')
> #+END_SRC
>
> #+RESULTS:
> :results:
> # Out [4]:
> # text/plain
> :        type  quty  Fiber  Sugar  Protein  Carb
> : 0     onion    70    1.3    4.4      1.3   9.0
> : 1   tomatoe   120    0.6    2.1      0.8   3.4
> : 2  eggplant   300    2.5    3.2      0.8   8.6
> : 3      tofu   100    0.7    0.5      6.6   1.4
> :end:
>
>
> John
>
> -----------------------------------
> Professor John Kitchin
> Doherty Hall A207F
> Department of Chemical Engineering
> Carnegie Mellon University
> Pittsburgh, PA 15213
> 412-268-7803
> @johnkitchin
> http://kitchingroup.cheme.cmu.edu
>
>
>
> On Sun, Feb 21, 2021 at 1:54 AM Tim Cross <theophilusx@gmail.com> wrote:
>
>>
>> Greg Minshall <minshall@umich.edu> writes:
>>
>> > John,
>> >
>> >> Is there a state of the art in using org-tables as little databases
>> >> with joins and stuff?
>> >
>> > i have to admit i do all that with an R code source block.  (the dplyr
>> > package has the relevant joins, e.g. dplyr::inner_join().)  and, in R,
>> > ":colnames yes" as a header argument gives you header lines on results.
>> > (maybe that's ?now? for "all" languages?)
>> >
>>
>> For really complex joins and ad hoc queries, I would do similar or put
>> the data into sqlite. For more simple ones, I just define a table which
>> uses table formulas to extract the values from the other tables - the
>> downside being the tables need to have the same data ordering or the
>> formulas need to be somewhat complex. Provided the tables have the same
>> number of records in the same order, table formulas are usually fairly
>> easy.
>>
>> I did think about writing some elisp functions to use in my table
>> formulas to make things easier, but then decided I was just re-inventing
>> and well defined database solution and figured when I need it, just use
>> sqlite. However, it has been a while since I needed this level of
>> complexity, so perhaps things have moved on and there are better ways
>> now.
>>
>> --
>> Tim Cross
>>
>>

[-- Attachment #2: Type: text/html, Size: 11492 bytes --]

  reply	other threads:[~2021-02-21 16:24 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-20 21:15 state of the art in org-mode tables e.g. join, etc John Kitchin
2021-02-21  4:40 ` Greg Minshall
2021-02-21  6:45   ` Tim Cross
2021-02-21 15:03     ` John Kitchin
2021-02-21 16:23       ` John Kitchin [this message]
2021-02-22  6:52         ` Cook, Malcolm
2021-02-22  8:12           ` Greg Minshall
2021-02-22 15:21             ` Cook, Malcolm
2021-02-22 18:41               ` Greg Minshall
2021-02-25 14:50           ` John Kitchin
2021-02-22  8:27         ` Derek Feichtinger
2021-02-24 22:21           ` John Kitchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJ51ETrQzgP3DGespaGx0Yj5gLSDciVtV-9G4gM33-xZdHxNug@mail.gmail.com \
    --to=jkitchin@andrew.cmu.edu \
    --cc=emacs-orgmode@gnu.org \
    --cc=theophilusx@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).