emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* state of the art in org-mode tables e.g. join, etc
@ 2021-02-20 21:15 John Kitchin
  2021-02-21  4:40 ` Greg Minshall
  0 siblings, 1 reply; 12+ messages in thread
From: John Kitchin @ 2021-02-20 21:15 UTC (permalink / raw)
  To: org-mode-email

[-- Attachment #1: Type: text/plain, Size: 2221 bytes --]

Is there a state of the art in using org-tables as little databases with
joins and stuff?

This package https://github.com/tbanel/orgtbljoin

seems close, but not quite what I had in mind. I don't want to modify
tables in place, or create dynamic tables. I do want to combine tables in
memory though for subsequent calculations.

I was thinking more like given the tables from that page:

#+name: quantities
| type-1   | quty |
|----------+------|
| onion    |   70 |
| tomatoe  |  120 |
| eggplant |  300 |
| tofu     |  100 |

#+name: nutrition
| type-2   | Fiber | Sugar | Protein | Carb |
|----------+-------+-------+---------+------|
| eggplant |   2.5 |   3.2 |     0.8 |  8.6 |
| tomatoe  |   0.6 |   2.1 |     0.8 |  3.4 |
| onion    |   1.3 |   4.4 |     1.3 |  9.0 |
| egg      |     0 |  18.3 |    31.9 | 18.3 |
| rice     |   0.2 |     0 |     1.5 | 16.0 |
| bread    |   0.7 |   0.7 |     3.3 | 16.0 |
| orange   |   3.1 |  11.9 |     1.3 | 17.6 |
| banana   |   2.1 |   9.9 |     0.9 | 18.5 |
| tofu     |   0.7 |   0.5 |     6.6 |  1.4 |
| nut      |   2.6 |   1.3 |     4.9 |  7.2 |
| corn     |   4.7 |   1.8 |     2.8 | 21.3 |

you would be able to do something like this:

#+BEGIN_SRC emacs-lisp
(join "quantities" "type-1" "nutrition" "type-2")
#+END_SRC

#+RESULTS:
| type-1   | quty | Fiber | Sugar | Protein | Carb |
|----------+------+-------+-------+---------+------|
| onion    |   70 |   1.3 |   4.4 |     1.3 |  9.0 |
| tomatoe  |  120 |   0.6 |   2.1 |     0.8 |  3.4 |
| eggplant |  300 |   2.5 |   3.2 |     0.8 |  8.6 |
| tofu     |  100 |   0.7 |   0.5 |     6.6 |  1.4 |

or, to sum the Fiber column:

#+BEGIN_SRC emacs-lisp :var data=(join "quantities" "type-1" "nutrition"
"type-2")
(cl-loop for row in data sum (nth 2 row)))
#+END_SRC

#+RESULTS:
: 5.1000000000000005

I have spent a lot of time with Pandas in Python lately,  and trying to
think through what some analogues with org-tables would be. I would be very
happy not to reinvent the wheel here!


John

-----------------------------------
Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu

[-- Attachment #2: Type: text/html, Size: 3117 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: state of the art in org-mode tables e.g. join, etc
  2021-02-20 21:15 state of the art in org-mode tables e.g. join, etc John Kitchin
@ 2021-02-21  4:40 ` Greg Minshall
  2021-02-21  6:45   ` Tim Cross
  0 siblings, 1 reply; 12+ messages in thread
From: Greg Minshall @ 2021-02-21  4:40 UTC (permalink / raw)
  To: John Kitchin; +Cc: org-mode-email

John,

> Is there a state of the art in using org-tables as little databases
> with joins and stuff?

i have to admit i do all that with an R code source block.  (the dplyr
package has the relevant joins, e.g. dplyr::inner_join().)  and, in R,
":colnames yes" as a header argument gives you header lines on results.
(maybe that's ?now? for "all" languages?)

Greg

https://dplyr.tidyverse.org/articles/two-table.html?q=join#mutating-joins


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: state of the art in org-mode tables e.g. join, etc
  2021-02-21  4:40 ` Greg Minshall
@ 2021-02-21  6:45   ` Tim Cross
  2021-02-21 15:03     ` John Kitchin
  0 siblings, 1 reply; 12+ messages in thread
From: Tim Cross @ 2021-02-21  6:45 UTC (permalink / raw)
  To: emacs-orgmode


Greg Minshall <minshall@umich.edu> writes:

> John,
>
>> Is there a state of the art in using org-tables as little databases
>> with joins and stuff?
>
> i have to admit i do all that with an R code source block.  (the dplyr
> package has the relevant joins, e.g. dplyr::inner_join().)  and, in R,
> ":colnames yes" as a header argument gives you header lines on results.
> (maybe that's ?now? for "all" languages?)
>

For really complex joins and ad hoc queries, I would do similar or put
the data into sqlite. For more simple ones, I just define a table which
uses table formulas to extract the values from the other tables - the
downside being the tables need to have the same data ordering or the
formulas need to be somewhat complex. Provided the tables have the same
number of records in the same order, table formulas are usually fairly
easy.

I did think about writing some elisp functions to use in my table
formulas to make things easier, but then decided I was just re-inventing
and well defined database solution and figured when I need it, just use
sqlite. However, it has been a while since I needed this level of
complexity, so perhaps things have moved on and there are better ways
now.

--
Tim Cross


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: state of the art in org-mode tables e.g. join, etc
  2021-02-21  6:45   ` Tim Cross
@ 2021-02-21 15:03     ` John Kitchin
  2021-02-21 16:23       ` John Kitchin
  0 siblings, 1 reply; 12+ messages in thread
From: John Kitchin @ 2021-02-21 15:03 UTC (permalink / raw)
  To: Tim Cross; +Cc: org-mode-email

[-- Attachment #1: Type: text/plain, Size: 5262 bytes --]

Thanks Tim and Greg. I had mostly come to the same conclusions that it is
probably best to outsource this. I worked out some examples from
the orgtbljoin and orgaggregate packages with Pandas below, in case anyone
is interested in seeing how it works. A key point is using the ":colnames
no" header args to get the column names for Pandas. It seems like a pretty
good approach.

* org-mode tables with Pandas
** Aggregating from a table

Examples from https://github.com/tbanel/orgaggregate


#+NAME: original
| Day       | Color | Level | Quantity |
|-----------+-------+-------+----------|
| Monday    | Red   |    30 |       11 |
| Monday    | Blue  |    25 |        3 |
| Tuesday   | Red   |    51 |       12 |
| Tuesday   | Red   |    45 |       15 |
| Tuesday   | Blue  |    33 |       18 |
| Wednesday | Red   |    27 |       23 |
| Wednesday | Blue  |    12 |       16 |
| Wednesday | Blue  |    15 |       15 |
| Thursday  | Red   |    39 |       24 |
| Thursday  | Red   |    41 |       29 |
| Thursday  | Red   |    49 |       30 |
| Friday    | Blue  |     7 |        5 |
| Friday    | Blue  |     6 |        8 |
| Friday    | Blue  |    11 |        9 |


#+BEGIN_SRC ipython :var data=original :colnames no
import pandas as pd

pd.DataFrame(data[1:], columns=data[0]).groupby('Color').size()
#+END_SRC

#+RESULTS:
:results:
# Out [1]:
# text/plain
: Color
: Blue    7
: Red     7
: dtype: int64
:end:

The categorical stuff here is just to get the days sorted the same way as
the example. It is otherwise not needed. I feel there should be a more
clever way to do this, but didn't think of it.

#+BEGIN_SRC ipython :var data=original :colnames no
df = pd.DataFrame(data[1:], columns=data[0])
days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday',
'Sunday']
df['Day'] = pd.Categorical(df['Day'], categories=days, ordered=True)

(df
 .groupby('Day')
 .agg({'Level': 'mean',
       'Quantity': 'sum'})
 .sort_values('Day'))
#+END_SRC

#+RESULTS:
:results:
# Out [2]:
# text/plain
:            Level  Quantity
: Day
: Monday      27.5        14
: Tuesday     43.0        45
: Wednesday   18.0        54
: Thursday    43.0        83
: Friday       8.0        22
: Saturday     NaN         0
: Sunday       NaN         0

[[file:/var/folders/3q/ht_2mtk52hl7ydxrcr87z2gr0000gn/T/ob-ipython-htmlMnDA9a.html]]
:end:

** Joining tables

Example from https://github.com/tbanel/orgtbljoin

#+name: nutrition
| type     | Fiber | Sugar | Protein | Carb |
|----------+-------+-------+---------+------|
| eggplant |   2.5 |   3.2 |     0.8 |  8.6 |
| tomatoe  |   0.6 |   2.1 |     0.8 |  3.4 |
| onion    |   1.3 |   4.4 |     1.3 |  9.0 |
| egg      |     0 |  18.3 |    31.9 | 18.3 |
| rice     |   0.2 |     0 |     1.5 | 16.0 |
| bread    |   0.7 |   0.7 |     3.3 | 16.0 |
| orange   |   3.1 |  11.9 |     1.3 | 17.6 |
| banana   |   2.1 |   9.9 |     0.9 | 18.5 |
| tofu     |   0.7 |   0.5 |     6.6 |  1.4 |
| nut      |   2.6 |   1.3 |     4.9 |  7.2 |
| corn     |   4.7 |   1.8 |     2.8 | 21.3 |


#+name: recipe
| type     | quty |
|----------+------|
| onion    |   70 |
| tomatoe  |  120 |
| eggplant |  300 |
| tofu     |  100 |


#+BEGIN_SRC ipython :var nut=nutrition recipe=recipe :colnames no
nutrition = pd.DataFrame(nut[1:], columns=nut[0])
rec = pd.DataFrame(recipe[1:], columns=recipe[0])

pd.merge(rec, nutrition, on='type')
#+END_SRC

#+RESULTS:
:results:
# Out [4]:
# text/plain
:        type  quty  Fiber  Sugar  Protein  Carb
: 0     onion    70    1.3    4.4      1.3   9.0
: 1   tomatoe   120    0.6    2.1      0.8   3.4
: 2  eggplant   300    2.5    3.2      0.8   8.6
: 3      tofu   100    0.7    0.5      6.6   1.4
:end:


John

-----------------------------------
Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu



On Sun, Feb 21, 2021 at 1:54 AM Tim Cross <theophilusx@gmail.com> wrote:

>
> Greg Minshall <minshall@umich.edu> writes:
>
> > John,
> >
> >> Is there a state of the art in using org-tables as little databases
> >> with joins and stuff?
> >
> > i have to admit i do all that with an R code source block.  (the dplyr
> > package has the relevant joins, e.g. dplyr::inner_join().)  and, in R,
> > ":colnames yes" as a header argument gives you header lines on results.
> > (maybe that's ?now? for "all" languages?)
> >
>
> For really complex joins and ad hoc queries, I would do similar or put
> the data into sqlite. For more simple ones, I just define a table which
> uses table formulas to extract the values from the other tables - the
> downside being the tables need to have the same data ordering or the
> formulas need to be somewhat complex. Provided the tables have the same
> number of records in the same order, table formulas are usually fairly
> easy.
>
> I did think about writing some elisp functions to use in my table
> formulas to make things easier, but then decided I was just re-inventing
> and well defined database solution and figured when I need it, just use
> sqlite. However, it has been a while since I needed this level of
> complexity, so perhaps things have moved on and there are better ways
> now.
>
> --
> Tim Cross
>
>

[-- Attachment #2: Type: text/html, Size: 7026 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: state of the art in org-mode tables e.g. join, etc
  2021-02-21 15:03     ` John Kitchin
@ 2021-02-21 16:23       ` John Kitchin
  2021-02-22  6:52         ` Cook, Malcolm
  2021-02-22  8:27         ` Derek Feichtinger
  0 siblings, 2 replies; 12+ messages in thread
From: John Kitchin @ 2021-02-21 16:23 UTC (permalink / raw)
  To: Tim Cross; +Cc: org-mode-email

[-- Attachment #1: Type: text/plain, Size: 8747 bytes --]

For fun, here is the sqlite equivalent of the Pandas example using the same
tables as before


** aggregation example

Examples from https://github.com/tbanel/orgaggregate


#+NAME: original
| Day       | Color | Level | Quantity |
|-----------+-------+-------+----------|
| Monday    | Red   |    30 |       11 |
| Monday    | Blue  |    25 |        3 |
| Tuesday   | Red   |    51 |       12 |
| Tuesday   | Red   |    45 |       15 |
| Tuesday   | Blue  |    33 |       18 |
| Wednesday | Red   |    27 |       23 |
| Wednesday | Blue  |    12 |       16 |
| Wednesday | Blue  |    15 |       15 |
| Thursday  | Red   |    39 |       24 |
| Thursday  | Red   |    41 |       29 |
| Thursday  | Red   |    49 |       30 |
| Friday    | Blue  |     7 |        5 |
| Friday    | Blue  |     6 |        8 |
| Friday    | Blue  |    11 |        9 |


#+begin_src sqlite :db ":memory:" :var orgtable=original :colnames yes
drop table if exists testtable;
create table testtable(Day str, Color str, Level int, Quantity int);
.mode csv testtable
.import $orgtable testtable
select Color, count(*) from testtable group by Color;
#+end_src

#+RESULTS:
| Color | count(*) |
|-------+----------|
| Blue  |        7 |
| Red   |        7 |

** join example

Example from https://github.com/tbanel/orgtbljoin

#+name: nutrition
| type     | Fiber | Sugar | Protein | Carb |
|----------+-------+-------+---------+------|
| eggplant |   2.5 |   3.2 |     0.8 |  8.6 |
| tomatoe  |   0.6 |   2.1 |     0.8 |  3.4 |
| onion    |   1.3 |   4.4 |     1.3 |  9.0 |
| egg      |     0 |  18.3 |    31.9 | 18.3 |
| rice     |   0.2 |     0 |     1.5 | 16.0 |
| bread    |   0.7 |   0.7 |     3.3 | 16.0 |
| orange   |   3.1 |  11.9 |     1.3 | 17.6 |
| banana   |   2.1 |   9.9 |     0.9 | 18.5 |
| tofu     |   0.7 |   0.5 |     6.6 |  1.4 |
| nut      |   2.6 |   1.3 |     4.9 |  7.2 |
| corn     |   4.7 |   1.8 |     2.8 | 21.3 |


#+name: recipe
| type     | quty |
|----------+------|
| onion    |   70 |
| tomatoe  |  120 |
| eggplant |  300 |
| tofu     |  100 |


#+begin_src sqlite :db ":memory:" :var nut=nutrition rec=recipe :colnames
yes
drop table if exists nutrition;
drop table if exists recipe;
create table nutrition(type str, Fiber float, Sugar float, Protein float,
Carb float);
create table recipe(type str, quty int);

.mode csv nutrition
.import $nut nutrition

.mode csv recipe
.import $rec recipe

select * from recipe, nutrition where recipe.type=nutrition.type;
#+end_src

#+RESULTS:
| type     | quty | type     | Fiber | Sugar | Protein | Carb |
|----------+------+----------+-------+-------+---------+------|
| onion    |   70 | onion    |   1.3 |   4.4 |     1.3 |  9.0 |
| tomatoe  |  120 | tomatoe  |   0.6 |   2.1 |     0.8 |  3.4 |
| eggplant |  300 | eggplant |   2.5 |   3.2 |     0.8 |  8.6 |
| tofu     |  100 | tofu     |   0.7 |   0.5 |     6.6 |  1.4 |


John

-----------------------------------
Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu



On Sun, Feb 21, 2021 at 10:03 AM John Kitchin <jkitchin@andrew.cmu.edu>
wrote:

> Thanks Tim and Greg. I had mostly come to the same conclusions that it is
> probably best to outsource this. I worked out some examples from
> the orgtbljoin and orgaggregate packages with Pandas below, in case anyone
> is interested in seeing how it works. A key point is using the ":colnames
> no" header args to get the column names for Pandas. It seems like a pretty
> good approach.
>
> * org-mode tables with Pandas
> ** Aggregating from a table
>
> Examples from https://github.com/tbanel/orgaggregate
>
>
> #+NAME: original
> | Day       | Color | Level | Quantity |
> |-----------+-------+-------+----------|
> | Monday    | Red   |    30 |       11 |
> | Monday    | Blue  |    25 |        3 |
> | Tuesday   | Red   |    51 |       12 |
> | Tuesday   | Red   |    45 |       15 |
> | Tuesday   | Blue  |    33 |       18 |
> | Wednesday | Red   |    27 |       23 |
> | Wednesday | Blue  |    12 |       16 |
> | Wednesday | Blue  |    15 |       15 |
> | Thursday  | Red   |    39 |       24 |
> | Thursday  | Red   |    41 |       29 |
> | Thursday  | Red   |    49 |       30 |
> | Friday    | Blue  |     7 |        5 |
> | Friday    | Blue  |     6 |        8 |
> | Friday    | Blue  |    11 |        9 |
>
>
> #+BEGIN_SRC ipython :var data=original :colnames no
> import pandas as pd
>
> pd.DataFrame(data[1:], columns=data[0]).groupby('Color').size()
> #+END_SRC
>
> #+RESULTS:
> :results:
> # Out [1]:
> # text/plain
> : Color
> : Blue    7
> : Red     7
> : dtype: int64
> :end:
>
> The categorical stuff here is just to get the days sorted the same way as
> the example. It is otherwise not needed. I feel there should be a more
> clever way to do this, but didn't think of it.
>
> #+BEGIN_SRC ipython :var data=original :colnames no
> df = pd.DataFrame(data[1:], columns=data[0])
> days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday',
> 'Saturday', 'Sunday']
> df['Day'] = pd.Categorical(df['Day'], categories=days, ordered=True)
>
> (df
>  .groupby('Day')
>  .agg({'Level': 'mean',
>        'Quantity': 'sum'})
>  .sort_values('Day'))
> #+END_SRC
>
> #+RESULTS:
> :results:
> # Out [2]:
> # text/plain
> :            Level  Quantity
> : Day
> : Monday      27.5        14
> : Tuesday     43.0        45
> : Wednesday   18.0        54
> : Thursday    43.0        83
> : Friday       8.0        22
> : Saturday     NaN         0
> : Sunday       NaN         0
>
>
> [[file:/var/folders/3q/ht_2mtk52hl7ydxrcr87z2gr0000gn/T/ob-ipython-htmlMnDA9a.html]]
> :end:
>
> ** Joining tables
>
> Example from https://github.com/tbanel/orgtbljoin
>
> #+name: nutrition
> | type     | Fiber | Sugar | Protein | Carb |
> |----------+-------+-------+---------+------|
> | eggplant |   2.5 |   3.2 |     0.8 |  8.6 |
> | tomatoe  |   0.6 |   2.1 |     0.8 |  3.4 |
> | onion    |   1.3 |   4.4 |     1.3 |  9.0 |
> | egg      |     0 |  18.3 |    31.9 | 18.3 |
> | rice     |   0.2 |     0 |     1.5 | 16.0 |
> | bread    |   0.7 |   0.7 |     3.3 | 16.0 |
> | orange   |   3.1 |  11.9 |     1.3 | 17.6 |
> | banana   |   2.1 |   9.9 |     0.9 | 18.5 |
> | tofu     |   0.7 |   0.5 |     6.6 |  1.4 |
> | nut      |   2.6 |   1.3 |     4.9 |  7.2 |
> | corn     |   4.7 |   1.8 |     2.8 | 21.3 |
>
>
> #+name: recipe
> | type     | quty |
> |----------+------|
> | onion    |   70 |
> | tomatoe  |  120 |
> | eggplant |  300 |
> | tofu     |  100 |
>
>
> #+BEGIN_SRC ipython :var nut=nutrition recipe=recipe :colnames no
> nutrition = pd.DataFrame(nut[1:], columns=nut[0])
> rec = pd.DataFrame(recipe[1:], columns=recipe[0])
>
> pd.merge(rec, nutrition, on='type')
> #+END_SRC
>
> #+RESULTS:
> :results:
> # Out [4]:
> # text/plain
> :        type  quty  Fiber  Sugar  Protein  Carb
> : 0     onion    70    1.3    4.4      1.3   9.0
> : 1   tomatoe   120    0.6    2.1      0.8   3.4
> : 2  eggplant   300    2.5    3.2      0.8   8.6
> : 3      tofu   100    0.7    0.5      6.6   1.4
> :end:
>
>
> John
>
> -----------------------------------
> Professor John Kitchin
> Doherty Hall A207F
> Department of Chemical Engineering
> Carnegie Mellon University
> Pittsburgh, PA 15213
> 412-268-7803
> @johnkitchin
> http://kitchingroup.cheme.cmu.edu
>
>
>
> On Sun, Feb 21, 2021 at 1:54 AM Tim Cross <theophilusx@gmail.com> wrote:
>
>>
>> Greg Minshall <minshall@umich.edu> writes:
>>
>> > John,
>> >
>> >> Is there a state of the art in using org-tables as little databases
>> >> with joins and stuff?
>> >
>> > i have to admit i do all that with an R code source block.  (the dplyr
>> > package has the relevant joins, e.g. dplyr::inner_join().)  and, in R,
>> > ":colnames yes" as a header argument gives you header lines on results.
>> > (maybe that's ?now? for "all" languages?)
>> >
>>
>> For really complex joins and ad hoc queries, I would do similar or put
>> the data into sqlite. For more simple ones, I just define a table which
>> uses table formulas to extract the values from the other tables - the
>> downside being the tables need to have the same data ordering or the
>> formulas need to be somewhat complex. Provided the tables have the same
>> number of records in the same order, table formulas are usually fairly
>> easy.
>>
>> I did think about writing some elisp functions to use in my table
>> formulas to make things easier, but then decided I was just re-inventing
>> and well defined database solution and figured when I need it, just use
>> sqlite. However, it has been a while since I needed this level of
>> complexity, so perhaps things have moved on and there are better ways
>> now.
>>
>> --
>> Tim Cross
>>
>>

[-- Attachment #2: Type: text/html, Size: 11492 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: state of the art in org-mode tables e.g. join, etc
  2021-02-21 16:23       ` John Kitchin
@ 2021-02-22  6:52         ` Cook, Malcolm
  2021-02-22  8:12           ` Greg Minshall
  2021-02-25 14:50           ` John Kitchin
  2021-02-22  8:27         ` Derek Feichtinger
  1 sibling, 2 replies; 12+ messages in thread
From: Cook, Malcolm @ 2021-02-22  6:52 UTC (permalink / raw)
  To: John Kitchin, Tim Cross; +Cc: org-mode-email

[-- Attachment #1: Type: text/plain, Size: 12018 bytes --]


John,

Checkout what R sqldf package makes easy:

** aggregation example

Examples from https://github.com/tbanel/orgaggregate


#+NAME: original
| Day       | Color | Level | Quantity |
|-----------+-------+-------+----------|
| Monday    | Red   |    30 |       11 |
| Monday    | Blue  |    25 |        3 |
| Tuesday   | Red   |    51 |       12 |
| Tuesday   | Red   |    45 |       15 |
| Tuesday   | Blue  |    33 |       18 |
| Wednesday | Red   |    27 |       23 |
| Wednesday | Blue  |    12 |       16 |
| Wednesday | Blue  |    15 |       15 |
| Thursday  | Red   |    39 |       24 |
| Thursday  | Red   |    41 |       29 |
| Thursday  | Red   |    49 |       30 |
| Friday    | Blue  |     7 |        5 |
| Friday    | Blue  |     6 |        8 |
| Friday    | Blue  |    11 |        9 |

#+PROPERTY: header-args:R  :session *R*

#+begin_src R :results none
library(sqldf)
#+end_src


#+begin_src R :var original=original :colnames yes
sqldf('select Color, count(*) from original group by Color;')
#+end_src

#+RESULTS:
| Color | count(*) |
|-------+----------|
| Blue  |        7 |
| Red   |        7 |



** join example

Example from https://github.com/tbanel/orgtbljoin

#+name: nutrition
| type     | Fiber | Sugar | Protein | Carb |
|----------+-------+-------+---------+------|
| eggplant |   2.5 |   3.2 |     0.8 |  8.6 |
| tomatoe  |   0.6 |   2.1 |     0.8 |  3.4 |
| onion    |   1.3 |   4.4 |     1.3 |  9.0 |
| egg      |     0 |  18.3 |    31.9 | 18.3 |
| rice     |   0.2 |     0 |     1.5 | 16.0 |
| bread    |   0.7 |   0.7 |     3.3 | 16.0 |
| orange   |   3.1 |  11.9 |     1.3 | 17.6 |
| banana   |   2.1 |   9.9 |     0.9 | 18.5 |
| tofu     |   0.7 |   0.5 |     6.6 |  1.4 |
| nut      |   2.6 |   1.3 |     4.9 |  7.2 |
| corn     |   4.7 |   1.8 |     2.8 | 21.3 |


#+name: recipe
| type     | quty |
|----------+------|
| onion    |   70 |
| tomatoe  |  120 |
| eggplant |  300 |
| tofu     |  100 |


#+begin_src R :var recipe=recipe :var nutrition=nutrition :colnames yes
sqldf('select * from recipe, nutrition where recipe.type=nutrition.type')
#+end_src

#+RESULTS:
| type     | quty | type     | Fiber | Sugar | Protein | Carb |
|----------+------+----------+-------+-------+---------+------|
| onion    |   70 | onion    |   1.3 |   4.4 |     1.3 |    9 |
| tomatoe  |  120 | tomatoe  |   0.6 |   2.1 |     0.8 |  3.4 |
| eggplant |  300 | eggplant |   2.5 |   3.2 |     0.8 |  8.6 |
| tofu     |  100 | tofu     |   0.7 |   0.5 |     6.6 |  1.4 |



This should also be possible but I cannot get it to work now:

#+begin_src R :var recipe=recipe :var nutrition=nutrition :colnames yes :prologue sqldf(' :epilogue ')
select * from recipe, nutrition where recipe.type=nutrition.type
#+end_src





From: Emacs-orgmode <emacs-orgmode-bounces+mec=stowers.org@gnu.org> On Behalf Of John Kitchin
Sent: Sunday, February 21, 2021 10:24
To: Tim Cross <theophilusx@gmail.com>
Cc: org-mode-email <emacs-orgmode@gnu.org>
Subject: Re: state of the art in org-mode tables e.g. join, etc

ATTENTION: This email came from an external source. Do not open attachments or click on links from unknown senders or unexpected emails.

For fun, here is the sqlite equivalent of the Pandas example using the same tables as before


** aggregation example

Examples from https://github.com/tbanel/orgaggregate


#+NAME: original
| Day       | Color | Level | Quantity |
|-----------+-------+-------+----------|
| Monday    | Red   |    30 |       11 |
| Monday    | Blue  |    25 |        3 |
| Tuesday   | Red   |    51 |       12 |
| Tuesday   | Red   |    45 |       15 |
| Tuesday   | Blue  |    33 |       18 |
| Wednesday | Red   |    27 |       23 |
| Wednesday | Blue  |    12 |       16 |
| Wednesday | Blue  |    15 |       15 |
| Thursday  | Red   |    39 |       24 |
| Thursday  | Red   |    41 |       29 |
| Thursday  | Red   |    49 |       30 |
| Friday    | Blue  |     7 |        5 |
| Friday    | Blue  |     6 |        8 |
| Friday    | Blue  |    11 |        9 |


#+begin_src sqlite :db ":memory:" :var orgtable=original :colnames yes
drop table if exists testtable;
create table testtable(Day str, Color str, Level int, Quantity int);
.mode csv testtable
.import $orgtable testtable
select Color, count(*) from testtable group by Color;
#+end_src

#+RESULTS:
| Color | count(*) |
|-------+----------|
| Blue  |        7 |
| Red   |        7 |

** join example

Example from https://github.com/tbanel/orgtbljoin

#+name: nutrition
| type     | Fiber | Sugar | Protein | Carb |
|----------+-------+-------+---------+------|
| eggplant |   2.5 |   3.2 |     0.8 |  8.6 |
| tomatoe  |   0.6 |   2.1 |     0.8 |  3.4 |
| onion    |   1.3 |   4.4 |     1.3 |  9.0 |
| egg      |     0 |  18.3 |    31.9 | 18.3 |
| rice     |   0.2 |     0 |     1.5 | 16.0 |
| bread    |   0.7 |   0.7 |     3.3 | 16.0 |
| orange   |   3.1 |  11.9 |     1.3 | 17.6 |
| banana   |   2.1 |   9.9 |     0.9 | 18.5 |
| tofu     |   0.7 |   0.5 |     6.6 |  1.4 |
| nut      |   2.6 |   1.3 |     4.9 |  7.2 |
| corn     |   4.7 |   1.8 |     2.8 | 21.3 |


#+name: recipe
| type     | quty |
|----------+------|
| onion    |   70 |
| tomatoe  |  120 |
| eggplant |  300 |
| tofu     |  100 |


#+begin_src sqlite :db ":memory:" :var nut=nutrition rec=recipe :colnames yes
drop table if exists nutrition;
drop table if exists recipe;
create table nutrition(type str, Fiber float, Sugar float, Protein float, Carb float);
create table recipe(type str, quty int);

.mode csv nutrition
.import $nut nutrition

.mode csv recipe
.import $rec recipe

select * from recipe, nutrition where recipe.type=nutrition.type;
#+end_src

#+RESULTS:
| type     | quty | type     | Fiber | Sugar | Protein | Carb |
|----------+------+----------+-------+-------+---------+------|
| onion    |   70 | onion    |   1.3 |   4.4 |     1.3 |  9.0 |
| tomatoe  |  120 | tomatoe  |   0.6 |   2.1 |     0.8 |  3.4 |
| eggplant |  300 | eggplant |   2.5 |   3.2 |     0.8 |  8.6 |
| tofu     |  100 | tofu     |   0.7 |   0.5 |     6.6 |  1.4 |


John

-----------------------------------
Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu


On Sun, Feb 21, 2021 at 10:03 AM John Kitchin <jkitchin@andrew.cmu.edu<mailto:jkitchin@andrew.cmu.edu>> wrote:
Thanks Tim and Greg. I had mostly come to the same conclusions that it is probably best to outsource this. I worked out some examples from the orgtbljoin and orgaggregate packages with Pandas below, in case anyone is interested in seeing how it works. A key point is using the ":colnames no" header args to get the column names for Pandas. It seems like a pretty good approach.

* org-mode tables with Pandas
** Aggregating from a table

Examples from https://github.com/tbanel/orgaggregate


#+NAME: original
| Day       | Color | Level | Quantity |
|-----------+-------+-------+----------|
| Monday    | Red   |    30 |       11 |
| Monday    | Blue  |    25 |        3 |
| Tuesday   | Red   |    51 |       12 |
| Tuesday   | Red   |    45 |       15 |
| Tuesday   | Blue  |    33 |       18 |
| Wednesday | Red   |    27 |       23 |
| Wednesday | Blue  |    12 |       16 |
| Wednesday | Blue  |    15 |       15 |
| Thursday  | Red   |    39 |       24 |
| Thursday  | Red   |    41 |       29 |
| Thursday  | Red   |    49 |       30 |
| Friday    | Blue  |     7 |        5 |
| Friday    | Blue  |     6 |        8 |
| Friday    | Blue  |    11 |        9 |


#+BEGIN_SRC ipython :var data=original :colnames no
import pandas as pd

pd.DataFrame(data[1:], columns=data[0]).groupby('Color').size()
#+END_SRC

#+RESULTS:
:results:
# Out [1]:
# text/plain
: Color
: Blue    7
: Red     7
: dtype: int64
:end:

The categorical stuff here is just to get the days sorted the same way as the example. It is otherwise not needed. I feel there should be a more clever way to do this, but didn't think of it.

#+BEGIN_SRC ipython :var data=original :colnames no
df = pd.DataFrame(data[1:], columns=data[0])
days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
df['Day'] = pd.Categorical(df['Day'], categories=days, ordered=True)

(df
 .groupby('Day')
 .agg({'Level': 'mean',
       'Quantity': 'sum'})
 .sort_values('Day'))
#+END_SRC

#+RESULTS:
:results:
# Out [2]:
# text/plain
:            Level  Quantity
: Day
: Monday      27.5        14
: Tuesday     43.0        45
: Wednesday   18.0        54
: Thursday    43.0        83
: Friday       8.0        22
: Saturday     NaN         0
: Sunday       NaN         0

[[file:/var/folders/3q/ht_2mtk52hl7ydxrcr87z2gr0000gn/T/ob-ipython-htmlMnDA9a.html]]
:end:

** Joining tables

Example from https://github.com/tbanel/orgtbljoin

#+name: nutrition
| type     | Fiber | Sugar | Protein | Carb |
|----------+-------+-------+---------+------|
| eggplant |   2.5 |   3.2 |     0.8 |  8.6 |
| tomatoe  |   0.6 |   2.1 |     0.8 |  3.4 |
| onion    |   1.3 |   4.4 |     1.3 |  9.0 |
| egg      |     0 |  18.3 |    31.9 | 18.3 |
| rice     |   0.2 |     0 |     1.5 | 16.0 |
| bread    |   0.7 |   0.7 |     3.3 | 16.0 |
| orange   |   3.1 |  11.9 |     1.3 | 17.6 |
| banana   |   2.1 |   9.9 |     0.9 | 18.5 |
| tofu     |   0.7 |   0.5 |     6.6 |  1.4 |
| nut      |   2.6 |   1.3 |     4.9 |  7.2 |
| corn     |   4.7 |   1.8 |     2.8 | 21.3 |


#+name: recipe
| type     | quty |
|----------+------|
| onion    |   70 |
| tomatoe  |  120 |
| eggplant |  300 |
| tofu     |  100 |


#+BEGIN_SRC ipython :var nut=nutrition recipe=recipe :colnames no
nutrition = pd.DataFrame(nut[1:], columns=nut[0])
rec = pd.DataFrame(recipe[1:], columns=recipe[0])

pd.merge(rec, nutrition, on='type')
#+END_SRC

#+RESULTS:
:results:
# Out [4]:
# text/plain
:        type  quty  Fiber  Sugar  Protein  Carb
: 0     onion    70    1.3    4.4      1.3   9.0
: 1   tomatoe   120    0.6    2.1      0.8   3.4
: 2  eggplant   300    2.5    3.2      0.8   8.6
: 3      tofu   100    0.7    0.5      6.6   1.4
:end:


John

-----------------------------------
Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu


On Sun, Feb 21, 2021 at 1:54 AM Tim Cross <theophilusx@gmail.com<mailto:theophilusx@gmail.com>> wrote:

Greg Minshall <minshall@umich.edu<mailto:minshall@umich.edu>> writes:

> John,
>
>> Is there a state of the art in using org-tables as little databases
>> with joins and stuff?
>
> i have to admit i do all that with an R code source block.  (the dplyr
> package has the relevant joins, e.g. dplyr::inner_join().)  and, in R,
> ":colnames yes" as a header argument gives you header lines on results.
> (maybe that's ?now? for "all" languages?)
>

For really complex joins and ad hoc queries, I would do similar or put
the data into sqlite. For more simple ones, I just define a table which
uses table formulas to extract the values from the other tables - the
downside being the tables need to have the same data ordering or the
formulas need to be somewhat complex. Provided the tables have the same
number of records in the same order, table formulas are usually fairly
easy.

I did think about writing some elisp functions to use in my table
formulas to make things easier, but then decided I was just re-inventing
and well defined database solution and figured when I need it, just use
sqlite. However, it has been a while since I needed this level of
complexity, so perhaps things have moved on and there are better ways
now.

--
Tim Cross

[-- Attachment #2: Type: text/html, Size: 26562 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: state of the art in org-mode tables e.g. join, etc
  2021-02-22  6:52         ` Cook, Malcolm
@ 2021-02-22  8:12           ` Greg Minshall
  2021-02-22 15:21             ` Cook, Malcolm
  2021-02-25 14:50           ` John Kitchin
  1 sibling, 1 reply; 12+ messages in thread
From: Greg Minshall @ 2021-02-22  8:12 UTC (permalink / raw)
  To: Cook, Malcolm; +Cc: Tim Cross, org-mode-email, John Kitchin

Malcolm,

> Checkout what R sqldf package makes easy:

very nice!

Greg

ps -- (feeling a challenge... :) for base R, dplyr::inner_join, the
following seem to work (i apologize that i don't know how people embed
org-frags in e-mail, or how important that format might be?)
----
 #+NAME: original
 | Day       | Color | Level | Quantity |
 |-----------+-------+-------+----------|
 | Monday    | Red   |    30 |       11 |
 | Monday    | Blue  |    25 |        3 |
 | Tuesday   | Red   |    51 |       12 |
 | Tuesday   | Red   |    45 |       15 |
 | Tuesday   | Blue  |    33 |       18 |
 | Wednesday | Red   |    27 |       23 |
 | Wednesday | Blue  |    12 |       16 |
 | Wednesday | Blue  |    15 |       15 |
 | Thursday  | Red   |    39 |       24 |
 | Thursday  | Red   |    41 |       29 |
 | Thursday  | Red   |    49 |       30 |
 | Friday    | Blue  |     7 |        5 |
 | Friday    | Blue  |     6 |        8 |
 | Friday    | Blue  |    11 |        9 |

 #+PROPERTY: header-args:R  :session *R*
 #+begin_src R :results none
   library(dplyr)
 #+end_src

 #+begin_src R :var original=original :colnames yes
   as.data.frame(table(Color=original$Color))
 #+end_src

 #+RESULTS:
 | Color | Freq |
 |-------+------|
 | Blue  |    7 |
 | Red   |    7 |



*** join example

 Example from https://github.com/tbanel/orgtbljoin

 #+name: nutrition
 | type     | Fiber | Sugar | Protein | Carb |
 |----------+-------+-------+---------+------|
 | eggplant |   2.5 |   3.2 |     0.8 |  8.6 |
 | tomatoe  |   0.6 |   2.1 |     0.8 |  3.4 |
 | onion    |   1.3 |   4.4 |     1.3 |  9.0 |
 | egg      |     0 |  18.3 |    31.9 | 18.3 |
 | rice     |   0.2 |     0 |     1.5 | 16.0 |
 | bread    |   0.7 |   0.7 |     3.3 | 16.0 |
 | orange   |   3.1 |  11.9 |     1.3 | 17.6 |
 | banana   |   2.1 |   9.9 |     0.9 | 18.5 |
 | tofu     |   0.7 |   0.5 |     6.6 |  1.4 |
 | nut      |   2.6 |   1.3 |     4.9 |  7.2 |
 | corn     |   4.7 |   1.8 |     2.8 | 21.3 |


 #+name: recipe
 | type     | quty |
 |----------+------|
 | onion    |   70 |
 | tomatoe  |  120 |
 | eggplant |  300 |
 | tofu     |  100 |


 #+begin_src R :var recipe=recipe :var nutrition=nutrition :colnames yes
   dplyr::inner_join(nutrition, recipe)
 #+end_src

 #+RESULTS:
 | type     | Fiber | Sugar | Protein | Carb | quty |
 |----------+-------+-------+---------+------+------|
 | eggplant |   2.5 |   3.2 |     0.8 |  8.6 |  300 |
 | tomatoe  |   0.6 |   2.1 |     0.8 |  3.4 |  120 |
 | onion    |   1.3 |   4.4 |     1.3 |    9 |   70 |
 | tofu     |   0.7 |   0.5 |     6.6 |  1.4 |  100 |


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: state of the art in org-mode tables e.g. join, etc
  2021-02-21 16:23       ` John Kitchin
  2021-02-22  6:52         ` Cook, Malcolm
@ 2021-02-22  8:27         ` Derek Feichtinger
  2021-02-24 22:21           ` John Kitchin
  1 sibling, 1 reply; 12+ messages in thread
From: Derek Feichtinger @ 2021-02-22  8:27 UTC (permalink / raw)
  To: John Kitchin; +Cc: Tim Cross, emacs-orgmode

Hi John,

I invested time some years ago in preparing babel examples, and a lot of
the description went into using tables. The most detailed documents I
had for elisp and python.

In order to be productive, e.g. for producing all kinds of scientific
graphs, but also for doing the finances and planning for our scientific
computing section I ended up the same as you with mostly going to python
and leveraging Pandas. I think all of us end up using ":colnames no" as
the most convenient solution.

https://github.com/dfeich/org-babel-examples/blob/master/python3/python3-babel.org

(especially look at the Pandas section 10)

In that file I also tangle a python library "orgbabelhelper" that is
available in Conda and PyPi. I mainly use that to work with my tables.

Best regards
Derek

-- 
Paul Scherrer Institut
Dr. Derek Feichtinger                   Phone:   +41 56 310 47 33
Group Head HPC and Emerging Technologies  Email: derek.feichtinger@psi.ch
Building/Room No. OHSA/D17
Forschungsstrasse 111
CH-5232 Villigen PSI 

On Sun, Feb 21 2021, John Kitchin <jkitchin@andrew.cmu.edu> wrote:

> For fun, here is the sqlite equivalent of the Pandas example using the same
> tables as before
>
>
> ** aggregation example
>
> Examples from https://github.com/tbanel/orgaggregate
>
>
> #+NAME: original
> | Day       | Color | Level | Quantity |
> |-----------+-------+-------+----------|
> | Monday    | Red   |    30 |       11 |
> | Monday    | Blue  |    25 |        3 |
> | Tuesday   | Red   |    51 |       12 |
> | Tuesday   | Red   |    45 |       15 |
> | Tuesday   | Blue  |    33 |       18 |
> | Wednesday | Red   |    27 |       23 |
> | Wednesday | Blue  |    12 |       16 |
> | Wednesday | Blue  |    15 |       15 |
> | Thursday  | Red   |    39 |       24 |
> | Thursday  | Red   |    41 |       29 |
> | Thursday  | Red   |    49 |       30 |
> | Friday    | Blue  |     7 |        5 |
> | Friday    | Blue  |     6 |        8 |
> | Friday    | Blue  |    11 |        9 |
>
>
> #+begin_src sqlite :db ":memory:" :var orgtable=original :colnames yes
> drop table if exists testtable;
> create table testtable(Day str, Color str, Level int, Quantity int);
> .mode csv testtable
> .import $orgtable testtable
> select Color, count(*) from testtable group by Color;
> #+end_src
>
> #+RESULTS:
> | Color | count(*) |
> |-------+----------|
> | Blue  |        7 |
> | Red   |        7 |
>
> ** join example
>
> Example from https://github.com/tbanel/orgtbljoin
>
> #+name: nutrition
> | type     | Fiber | Sugar | Protein | Carb |
> |----------+-------+-------+---------+------|
> | eggplant |   2.5 |   3.2 |     0.8 |  8.6 |
> | tomatoe  |   0.6 |   2.1 |     0.8 |  3.4 |
> | onion    |   1.3 |   4.4 |     1.3 |  9.0 |
> | egg      |     0 |  18.3 |    31.9 | 18.3 |
> | rice     |   0.2 |     0 |     1.5 | 16.0 |
> | bread    |   0.7 |   0.7 |     3.3 | 16.0 |
> | orange   |   3.1 |  11.9 |     1.3 | 17.6 |
> | banana   |   2.1 |   9.9 |     0.9 | 18.5 |
> | tofu     |   0.7 |   0.5 |     6.6 |  1.4 |
> | nut      |   2.6 |   1.3 |     4.9 |  7.2 |
> | corn     |   4.7 |   1.8 |     2.8 | 21.3 |
>
>
> #+name: recipe
> | type     | quty |
> |----------+------|
> | onion    |   70 |
> | tomatoe  |  120 |
> | eggplant |  300 |
> | tofu     |  100 |
>
>
> #+begin_src sqlite :db ":memory:" :var nut=nutrition rec=recipe :colnames
> yes
> drop table if exists nutrition;
> drop table if exists recipe;
> create table nutrition(type str, Fiber float, Sugar float, Protein float,
> Carb float);
> create table recipe(type str, quty int);
>
> .mode csv nutrition
> .import $nut nutrition
>
> .mode csv recipe
> .import $rec recipe
>
> select * from recipe, nutrition where recipe.type=nutrition.type;
> #+end_src
>
> #+RESULTS:
> | type     | quty | type     | Fiber | Sugar | Protein | Carb |
> |----------+------+----------+-------+-------+---------+------|
> | onion    |   70 | onion    |   1.3 |   4.4 |     1.3 |  9.0 |
> | tomatoe  |  120 | tomatoe  |   0.6 |   2.1 |     0.8 |  3.4 |
> | eggplant |  300 | eggplant |   2.5 |   3.2 |     0.8 |  8.6 |
> | tofu     |  100 | tofu     |   0.7 |   0.5 |     6.6 |  1.4 |
>
>
> John
>
> -----------------------------------
> Professor John Kitchin
> Doherty Hall A207F
> Department of Chemical Engineering
> Carnegie Mellon University
> Pittsburgh, PA 15213
> 412-268-7803
> @johnkitchin
> http://kitchingroup.cheme.cmu.edu
>
>
>
> On Sun, Feb 21, 2021 at 10:03 AM John Kitchin <jkitchin@andrew.cmu.edu>
> wrote:
>
>> Thanks Tim and Greg. I had mostly come to the same conclusions that it is
>> probably best to outsource this. I worked out some examples from
>> the orgtbljoin and orgaggregate packages with Pandas below, in case anyone
>> is interested in seeing how it works. A key point is using the ":colnames
>> no" header args to get the column names for Pandas. It seems like a pretty
>> good approach.
>>
>> * org-mode tables with Pandas
>> ** Aggregating from a table
>>
>> Examples from https://github.com/tbanel/orgaggregate
>>
>>
>> #+NAME: original
>> | Day       | Color | Level | Quantity |
>> |-----------+-------+-------+----------|
>> | Monday    | Red   |    30 |       11 |
>> | Monday    | Blue  |    25 |        3 |
>> | Tuesday   | Red   |    51 |       12 |
>> | Tuesday   | Red   |    45 |       15 |
>> | Tuesday   | Blue  |    33 |       18 |
>> | Wednesday | Red   |    27 |       23 |
>> | Wednesday | Blue  |    12 |       16 |
>> | Wednesday | Blue  |    15 |       15 |
>> | Thursday  | Red   |    39 |       24 |
>> | Thursday  | Red   |    41 |       29 |
>> | Thursday  | Red   |    49 |       30 |
>> | Friday    | Blue  |     7 |        5 |
>> | Friday    | Blue  |     6 |        8 |
>> | Friday    | Blue  |    11 |        9 |
>>
>>
>> #+BEGIN_SRC ipython :var data=original :colnames no
>> import pandas as pd
>>
>> pd.DataFrame(data[1:], columns=data[0]).groupby('Color').size()
>> #+END_SRC
>>
>> #+RESULTS:
>> :results:
>> # Out [1]:
>> # text/plain
>> : Color
>> : Blue    7
>> : Red     7
>> : dtype: int64
>> :end:
>>
>> The categorical stuff here is just to get the days sorted the same way as
>> the example. It is otherwise not needed. I feel there should be a more
>> clever way to do this, but didn't think of it.
>>
>> #+BEGIN_SRC ipython :var data=original :colnames no
>> df = pd.DataFrame(data[1:], columns=data[0])
>> days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday',
>> 'Saturday', 'Sunday']
>> df['Day'] = pd.Categorical(df['Day'], categories=days, ordered=True)
>>
>> (df
>>  .groupby('Day')
>>  .agg({'Level': 'mean',
>>        'Quantity': 'sum'})
>>  .sort_values('Day'))
>> #+END_SRC
>>
>> #+RESULTS:
>> :results:
>> # Out [2]:
>> # text/plain
>> :            Level  Quantity
>> : Day
>> : Monday      27.5        14
>> : Tuesday     43.0        45
>> : Wednesday   18.0        54
>> : Thursday    43.0        83
>> : Friday       8.0        22
>> : Saturday     NaN         0
>> : Sunday       NaN         0
>>
>>
>> [[file:/var/folders/3q/ht_2mtk52hl7ydxrcr87z2gr0000gn/T/ob-ipython-htmlMnDA9a.html]]
>> :end:
>>
>> ** Joining tables
>>
>> Example from https://github.com/tbanel/orgtbljoin
>>
>> #+name: nutrition
>> | type     | Fiber | Sugar | Protein | Carb |
>> |----------+-------+-------+---------+------|
>> | eggplant |   2.5 |   3.2 |     0.8 |  8.6 |
>> | tomatoe  |   0.6 |   2.1 |     0.8 |  3.4 |
>> | onion    |   1.3 |   4.4 |     1.3 |  9.0 |
>> | egg      |     0 |  18.3 |    31.9 | 18.3 |
>> | rice     |   0.2 |     0 |     1.5 | 16.0 |
>> | bread    |   0.7 |   0.7 |     3.3 | 16.0 |
>> | orange   |   3.1 |  11.9 |     1.3 | 17.6 |
>> | banana   |   2.1 |   9.9 |     0.9 | 18.5 |
>> | tofu     |   0.7 |   0.5 |     6.6 |  1.4 |
>> | nut      |   2.6 |   1.3 |     4.9 |  7.2 |
>> | corn     |   4.7 |   1.8 |     2.8 | 21.3 |
>>
>>
>> #+name: recipe
>> | type     | quty |
>> |----------+------|
>> | onion    |   70 |
>> | tomatoe  |  120 |
>> | eggplant |  300 |
>> | tofu     |  100 |
>>
>>
>> #+BEGIN_SRC ipython :var nut=nutrition recipe=recipe :colnames no
>> nutrition = pd.DataFrame(nut[1:], columns=nut[0])
>> rec = pd.DataFrame(recipe[1:], columns=recipe[0])
>>
>> pd.merge(rec, nutrition, on='type')
>> #+END_SRC
>>
>> #+RESULTS:
>> :results:
>> # Out [4]:
>> # text/plain
>> :        type  quty  Fiber  Sugar  Protein  Carb
>> : 0     onion    70    1.3    4.4      1.3   9.0
>> : 1   tomatoe   120    0.6    2.1      0.8   3.4
>> : 2  eggplant   300    2.5    3.2      0.8   8.6
>> : 3      tofu   100    0.7    0.5      6.6   1.4
>> :end:
>>
>>
>> John
>>
>> -----------------------------------
>> Professor John Kitchin
>> Doherty Hall A207F
>> Department of Chemical Engineering
>> Carnegie Mellon University
>> Pittsburgh, PA 15213
>> 412-268-7803
>> @johnkitchin
>> http://kitchingroup.cheme.cmu.edu
>>
>>
>>
>> On Sun, Feb 21, 2021 at 1:54 AM Tim Cross <theophilusx@gmail.com> wrote:
>>
>>>
>>> Greg Minshall <minshall@umich.edu> writes:
>>>
>>> > John,
>>> >
>>> >> Is there a state of the art in using org-tables as little databases
>>> >> with joins and stuff?
>>> >
>>> > i have to admit i do all that with an R code source block.  (the dplyr
>>> > package has the relevant joins, e.g. dplyr::inner_join().)  and, in R,
>>> > ":colnames yes" as a header argument gives you header lines on results.
>>> > (maybe that's ?now? for "all" languages?)
>>> >
>>>
>>> For really complex joins and ad hoc queries, I would do similar or put
>>> the data into sqlite. For more simple ones, I just define a table which
>>> uses table formulas to extract the values from the other tables - the
>>> downside being the tables need to have the same data ordering or the
>>> formulas need to be somewhat complex. Provided the tables have the same
>>> number of records in the same order, table formulas are usually fairly
>>> easy.
>>>
>>> I did think about writing some elisp functions to use in my table
>>> formulas to make things easier, but then decided I was just re-inventing
>>> and well defined database solution and figured when I need it, just use
>>> sqlite. However, it has been a while since I needed this level of
>>> complexity, so perhaps things have moved on and there are better ways
>>> now.
>>>
>>> --
>>> Tim Cross
>>>
>>>




^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: state of the art in org-mode tables e.g. join, etc
  2021-02-22  8:12           ` Greg Minshall
@ 2021-02-22 15:21             ` Cook, Malcolm
  2021-02-22 18:41               ` Greg Minshall
  0 siblings, 1 reply; 12+ messages in thread
From: Cook, Malcolm @ 2021-02-22 15:21 UTC (permalink / raw)
  To: Greg Minshall; +Cc: Tim Cross, org-mode-email, John Kitchin

[-- Attachment #1: Type: text/plain, Size: 2901 bytes --]

Greg,

Of course, I’m not surprised by the results of your efforts.  Nice!

I myself don’t prefer the tidyverse, mainly except for ggplot, and instead find myself reaching for sqldf or data.tables where such benefit is needed.

YMMV,

Malcolm

From: Greg Minshall <minshall@umich.edu>
Sent: Monday, February 22, 2021 02:13
To: Cook, Malcolm <MEC@stowers.org>
Cc: John Kitchin <jkitchin@andrew.cmu.edu>; Tim Cross <theophilusx@gmail.com>; org-mode-email <emacs-orgmode@gnu.org>
Subject: Re: state of the art in org-mode tables e.g. join, etc

ATTENTION: This email came from an external source. Do not open attachments or click on links from unknown senders or unexpected emails.


Malcolm,

> Checkout what R sqldf package makes easy:

very nice!

Greg

ps -- (feeling a challenge... :) for base R, dplyr::inner_join, the
following seem to work (i apologize that i don't know how people embed
org-frags in e-mail, or how important that format might be?)
----
#+NAME: original
| Day | Color | Level | Quantity |
|-----------+-------+-------+----------|
| Monday | Red | 30 | 11 |
| Monday | Blue | 25 | 3 |
| Tuesday | Red | 51 | 12 |
| Tuesday | Red | 45 | 15 |
| Tuesday | Blue | 33 | 18 |
| Wednesday | Red | 27 | 23 |
| Wednesday | Blue | 12 | 16 |
| Wednesday | Blue | 15 | 15 |
| Thursday | Red | 39 | 24 |
| Thursday | Red | 41 | 29 |
| Thursday | Red | 49 | 30 |
| Friday | Blue | 7 | 5 |
| Friday | Blue | 6 | 8 |
| Friday | Blue | 11 | 9 |

#+PROPERTY: header-args:R :session *R*
#+begin_src R :results none
library(dplyr)
#+end_src

#+begin_src R :var original=original :colnames yes
as.data.frame(table(Color=original$Color))
#+end_src

#+RESULTS:
| Color | Freq |
|-------+------|
| Blue | 7 |
| Red | 7 |



*** join example

Example from https://github.com/tbanel/orgtbljoin

#+name: nutrition
| type | Fiber | Sugar | Protein | Carb |
|----------+-------+-------+---------+------|
| eggplant | 2.5 | 3.2 | 0.8 | 8.6 |
| tomatoe | 0.6 | 2.1 | 0.8 | 3.4 |
| onion | 1.3 | 4.4 | 1.3 | 9.0 |
| egg | 0 | 18.3 | 31.9 | 18.3 |
| rice | 0.2 | 0 | 1.5 | 16.0 |
| bread | 0.7 | 0.7 | 3.3 | 16.0 |
| orange | 3.1 | 11.9 | 1.3 | 17.6 |
| banana | 2.1 | 9.9 | 0.9 | 18.5 |
| tofu | 0.7 | 0.5 | 6.6 | 1.4 |
| nut | 2.6 | 1.3 | 4.9 | 7.2 |
| corn | 4.7 | 1.8 | 2.8 | 21.3 |


#+name: recipe
| type | quty |
|----------+------|
| onion | 70 |
| tomatoe | 120 |
| eggplant | 300 |
| tofu | 100 |


#+begin_src R :var recipe=recipe :var nutrition=nutrition :colnames yes
dplyr::inner_join(nutrition, recipe)
#+end_src

#+RESULTS:
| type | Fiber | Sugar | Protein | Carb | quty |
|----------+-------+-------+---------+------+------|
| eggplant | 2.5 | 3.2 | 0.8 | 8.6 | 300 |
| tomatoe | 0.6 | 2.1 | 0.8 | 3.4 | 120 |
| onion | 1.3 | 4.4 | 1.3 | 9 | 70 |
| tofu | 0.7 | 0.5 | 6.6 | 1.4 | 100 |

[-- Attachment #2: Type: text/html, Size: 6130 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: state of the art in org-mode tables e.g. join, etc
  2021-02-22 15:21             ` Cook, Malcolm
@ 2021-02-22 18:41               ` Greg Minshall
  0 siblings, 0 replies; 12+ messages in thread
From: Greg Minshall @ 2021-02-22 18:41 UTC (permalink / raw)
  To: Cook, Malcolm; +Cc: Tim Cross, org-mode-email, John Kitchin

Malcolm, thanks, and, yes, i'm of mixed mind, myself.  cheers, Greg


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: state of the art in org-mode tables e.g. join, etc
  2021-02-22  8:27         ` Derek Feichtinger
@ 2021-02-24 22:21           ` John Kitchin
  0 siblings, 0 replies; 12+ messages in thread
From: John Kitchin @ 2021-02-24 22:21 UTC (permalink / raw)
  To: Derek Feichtinger; +Cc: Tim Cross, emacs-orgmode

Thanks for the link! It looks like some useful functions there. It would
be nice to integrate some of those with the rich output of a Jupyter
kernel so you could get native org tables automatically in org-mode.

Derek Feichtinger <derek.feichtinger@psi.ch> writes:

> Hi John,
>
> I invested time some years ago in preparing babel examples, and a lot of
> the description went into using tables. The most detailed documents I
> had for elisp and python.
>
> In order to be productive, e.g. for producing all kinds of scientific
> graphs, but also for doing the finances and planning for our scientific
> computing section I ended up the same as you with mostly going to python
> and leveraging Pandas. I think all of us end up using ":colnames no" as
> the most convenient solution.
>
> https://github.com/dfeich/org-babel-examples/blob/master/python3/python3-babel.org
>
> (especially look at the Pandas section 10)
>
> In that file I also tangle a python library "orgbabelhelper" that is
> available in Conda and PyPi. I mainly use that to work with my tables.
>
> Best regards
> Derek


--
Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: state of the art in org-mode tables e.g. join, etc
  2021-02-22  6:52         ` Cook, Malcolm
  2021-02-22  8:12           ` Greg Minshall
@ 2021-02-25 14:50           ` John Kitchin
  1 sibling, 0 replies; 12+ messages in thread
From: John Kitchin @ 2021-02-25 14:50 UTC (permalink / raw)
  To: Cook, Malcolm; +Cc: Tim Cross, org-mode-email

That is remarkably slim code to get those results!

Cook, Malcolm <MEC@stowers.org> writes:

> John,
>
> Checkout what R sqldf package makes easy:
>
> ** aggregation example
>
> Examples from https://github.com/tbanel/orgaggregate
>
>
> #+NAME: original
> | Day       | Color | Level | Quantity |
> |-----------+-------+-------+----------|
> | Monday    | Red   |    30 |       11 |
> | Monday    | Blue  |    25 |        3 |
> | Tuesday   | Red   |    51 |       12 |
> | Tuesday   | Red   |    45 |       15 |
> | Tuesday   | Blue  |    33 |       18 |
> | Wednesday | Red   |    27 |       23 |
> | Wednesday | Blue  |    12 |       16 |
> | Wednesday | Blue  |    15 |       15 |
> | Thursday  | Red   |    39 |       24 |
> | Thursday  | Red   |    41 |       29 |
> | Thursday  | Red   |    49 |       30 |
> | Friday    | Blue  |     7 |        5 |
> | Friday    | Blue  |     6 |        8 |
> | Friday    | Blue  |    11 |        9 |
>
> #+PROPERTY: header-args:R  :session *R*
>
> #+begin_src R :results none
> library(sqldf)
> #+end_src
>
>
> #+begin_src R :var original=original :colnames yes
> sqldf('select Color, count(*) from original group by Color;')
> #+end_src
>
> #+RESULTS:
> | Color | count(*) |
> |-------+----------|
> | Blue  |        7 |
> | Red   |        7 |
>
>
>
> ** join example
>
> Example from https://github.com/tbanel/orgtbljoin
>
> #+name: nutrition
> | type     | Fiber | Sugar | Protein | Carb |
> |----------+-------+-------+---------+------|
> | eggplant |   2.5 |   3.2 |     0.8 |  8.6 |
> | tomatoe  |   0.6 |   2.1 |     0.8 |  3.4 |
> | onion    |   1.3 |   4.4 |     1.3 |  9.0 |
> | egg      |     0 |  18.3 |    31.9 | 18.3 |
> | rice     |   0.2 |     0 |     1.5 | 16.0 |
> | bread    |   0.7 |   0.7 |     3.3 | 16.0 |
> | orange   |   3.1 |  11.9 |     1.3 | 17.6 |
> | banana   |   2.1 |   9.9 |     0.9 | 18.5 |
> | tofu     |   0.7 |   0.5 |     6.6 |  1.4 |
> | nut      |   2.6 |   1.3 |     4.9 |  7.2 |
> | corn     |   4.7 |   1.8 |     2.8 | 21.3 |
>
>
> #+name: recipe
> | type     | quty |
> |----------+------|
> | onion    |   70 |
> | tomatoe  |  120 |
> | eggplant |  300 |
> | tofu     |  100 |
>
>
> #+begin_src R :var recipe=recipe :var nutrition=nutrition :colnames yes
> sqldf('select * from recipe, nutrition where recipe.type=nutrition.type')
> #+end_src
>
> #+RESULTS:
> | type     | quty | type     | Fiber | Sugar | Protein | Carb |
> |----------+------+----------+-------+-------+---------+------|
> | onion    |   70 | onion    |   1.3 |   4.4 |     1.3 |    9 |
> | tomatoe  |  120 | tomatoe  |   0.6 |   2.1 |     0.8 |  3.4 |
> | eggplant |  300 | eggplant |   2.5 |   3.2 |     0.8 |  8.6 |
> | tofu     |  100 | tofu     |   0.7 |   0.5 |     6.6 |  1.4 |
>
>
>
> This should also be possible but I cannot get it to work now:
>
> #+begin_src R :var recipe=recipe :var nutrition=nutrition :colnames yes :prologue sqldf(' :epilogue ')
> select * from recipe, nutrition where recipe.type=nutrition.type
> #+end_src
>
>
>
>
>
> From: Emacs-orgmode <emacs-orgmode-bounces+mec=stowers.org@gnu.org> On Behalf Of John Kitchin
> Sent: Sunday, February 21, 2021 10:24
> To: Tim Cross <theophilusx@gmail.com>
> Cc: org-mode-email <emacs-orgmode@gnu.org>
> Subject: Re: state of the art in org-mode tables e.g. join, etc
>
> ATTENTION: This email came from an external source. Do not open attachments or click on links from unknown senders or unexpected emails.
>
> For fun, here is the sqlite equivalent of the Pandas example using the same tables as before
>
>
> ** aggregation example
>
> Examples from https://github.com/tbanel/orgaggregate
>
>
> #+NAME: original
> | Day       | Color | Level | Quantity |
> |-----------+-------+-------+----------|
> | Monday    | Red   |    30 |       11 |
> | Monday    | Blue  |    25 |        3 |
> | Tuesday   | Red   |    51 |       12 |
> | Tuesday   | Red   |    45 |       15 |
> | Tuesday   | Blue  |    33 |       18 |
> | Wednesday | Red   |    27 |       23 |
> | Wednesday | Blue  |    12 |       16 |
> | Wednesday | Blue  |    15 |       15 |
> | Thursday  | Red   |    39 |       24 |
> | Thursday  | Red   |    41 |       29 |
> | Thursday  | Red   |    49 |       30 |
> | Friday    | Blue  |     7 |        5 |
> | Friday    | Blue  |     6 |        8 |
> | Friday    | Blue  |    11 |        9 |
>
>
> #+begin_src sqlite :db ":memory:" :var orgtable=original :colnames yes
> drop table if exists testtable;
> create table testtable(Day str, Color str, Level int, Quantity int);
> .mode csv testtable
> .import $orgtable testtable
> select Color, count(*) from testtable group by Color;
> #+end_src
>
> #+RESULTS:
> | Color | count(*) |
> |-------+----------|
> | Blue  |        7 |
> | Red   |        7 |
>
> ** join example
>
> Example from https://github.com/tbanel/orgtbljoin
>
> #+name: nutrition
> | type     | Fiber | Sugar | Protein | Carb |
> |----------+-------+-------+---------+------|
> | eggplant |   2.5 |   3.2 |     0.8 |  8.6 |
> | tomatoe  |   0.6 |   2.1 |     0.8 |  3.4 |
> | onion    |   1.3 |   4.4 |     1.3 |  9.0 |
> | egg      |     0 |  18.3 |    31.9 | 18.3 |
> | rice     |   0.2 |     0 |     1.5 | 16.0 |
> | bread    |   0.7 |   0.7 |     3.3 | 16.0 |
> | orange   |   3.1 |  11.9 |     1.3 | 17.6 |
> | banana   |   2.1 |   9.9 |     0.9 | 18.5 |
> | tofu     |   0.7 |   0.5 |     6.6 |  1.4 |
> | nut      |   2.6 |   1.3 |     4.9 |  7.2 |
> | corn     |   4.7 |   1.8 |     2.8 | 21.3 |
>
>
> #+name: recipe
> | type     | quty |
> |----------+------|
> | onion    |   70 |
> | tomatoe  |  120 |
> | eggplant |  300 |
> | tofu     |  100 |
>
>
> #+begin_src sqlite :db ":memory:" :var nut=nutrition rec=recipe :colnames yes
> drop table if exists nutrition;
> drop table if exists recipe;
> create table nutrition(type str, Fiber float, Sugar float, Protein float, Carb float);
> create table recipe(type str, quty int);
>
> .mode csv nutrition
> .import $nut nutrition
>
> .mode csv recipe
> .import $rec recipe
>
> select * from recipe, nutrition where recipe.type=nutrition.type;
> #+end_src
>
> #+RESULTS:
> | type     | quty | type     | Fiber | Sugar | Protein | Carb |
> |----------+------+----------+-------+-------+---------+------|
> | onion    |   70 | onion    |   1.3 |   4.4 |     1.3 |  9.0 |
> | tomatoe  |  120 | tomatoe  |   0.6 |   2.1 |     0.8 |  3.4 |
> | eggplant |  300 | eggplant |   2.5 |   3.2 |     0.8 |  8.6 |
> | tofu     |  100 | tofu     |   0.7 |   0.5 |     6.6 |  1.4 |
>
>
> John
>
> -----------------------------------
> Professor John Kitchin
> Doherty Hall A207F
> Department of Chemical Engineering
> Carnegie Mellon University
> Pittsburgh, PA 15213
> 412-268-7803
> @johnkitchin
> http://kitchingroup.cheme.cmu.edu
>
>
> On Sun, Feb 21, 2021 at 10:03 AM John Kitchin <jkitchin@andrew.cmu.edu<mailto:jkitchin@andrew.cmu.edu>> wrote:
> Thanks Tim and Greg. I had mostly come to the same conclusions that it is probably best to outsource this. I worked out some examples from the orgtbljoin and orgaggregate packages with Pandas below, in case anyone is interested in seeing how it works. A key point is using the ":colnames no" header args to get the column names for Pandas. It seems like a pretty good approach.
>
> * org-mode tables with Pandas
> ** Aggregating from a table
>
> Examples from https://github.com/tbanel/orgaggregate
>
>
> #+NAME: original
> | Day       | Color | Level | Quantity |
> |-----------+-------+-------+----------|
> | Monday    | Red   |    30 |       11 |
> | Monday    | Blue  |    25 |        3 |
> | Tuesday   | Red   |    51 |       12 |
> | Tuesday   | Red   |    45 |       15 |
> | Tuesday   | Blue  |    33 |       18 |
> | Wednesday | Red   |    27 |       23 |
> | Wednesday | Blue  |    12 |       16 |
> | Wednesday | Blue  |    15 |       15 |
> | Thursday  | Red   |    39 |       24 |
> | Thursday  | Red   |    41 |       29 |
> | Thursday  | Red   |    49 |       30 |
> | Friday    | Blue  |     7 |        5 |
> | Friday    | Blue  |     6 |        8 |
> | Friday    | Blue  |    11 |        9 |
>
>
> #+BEGIN_SRC ipython :var data=original :colnames no
> import pandas as pd
>
> pd.DataFrame(data[1:], columns=data[0]).groupby('Color').size()
> #+END_SRC
>
> #+RESULTS:
> :results:
> # Out [1]:
> # text/plain
> : Color
> : Blue    7
> : Red     7
> : dtype: int64
> :end:
>
> The categorical stuff here is just to get the days sorted the same way as the example. It is otherwise not needed. I feel there should be a more clever way to do this, but didn't think of it.
>
> #+BEGIN_SRC ipython :var data=original :colnames no
> df = pd.DataFrame(data[1:], columns=data[0])
> days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
> df['Day'] = pd.Categorical(df['Day'], categories=days, ordered=True)
>
> (df
>  .groupby('Day')
>  .agg({'Level': 'mean',
>        'Quantity': 'sum'})
>  .sort_values('Day'))
> #+END_SRC
>
> #+RESULTS:
> :results:
> # Out [2]:
> # text/plain
> :            Level  Quantity
> : Day
> : Monday      27.5        14
> : Tuesday     43.0        45
> : Wednesday   18.0        54
> : Thursday    43.0        83
> : Friday       8.0        22
> : Saturday     NaN         0
> : Sunday       NaN         0
>
> [[file:/var/folders/3q/ht_2mtk52hl7ydxrcr87z2gr0000gn/T/ob-ipython-htmlMnDA9a.html]]
> :end:
>
> ** Joining tables
>
> Example from https://github.com/tbanel/orgtbljoin
>
> #+name: nutrition
> | type     | Fiber | Sugar | Protein | Carb |
> |----------+-------+-------+---------+------|
> | eggplant |   2.5 |   3.2 |     0.8 |  8.6 |
> | tomatoe  |   0.6 |   2.1 |     0.8 |  3.4 |
> | onion    |   1.3 |   4.4 |     1.3 |  9.0 |
> | egg      |     0 |  18.3 |    31.9 | 18.3 |
> | rice     |   0.2 |     0 |     1.5 | 16.0 |
> | bread    |   0.7 |   0.7 |     3.3 | 16.0 |
> | orange   |   3.1 |  11.9 |     1.3 | 17.6 |
> | banana   |   2.1 |   9.9 |     0.9 | 18.5 |
> | tofu     |   0.7 |   0.5 |     6.6 |  1.4 |
> | nut      |   2.6 |   1.3 |     4.9 |  7.2 |
> | corn     |   4.7 |   1.8 |     2.8 | 21.3 |
>
>
> #+name: recipe
> | type     | quty |
> |----------+------|
> | onion    |   70 |
> | tomatoe  |  120 |
> | eggplant |  300 |
> | tofu     |  100 |
>
>
> #+BEGIN_SRC ipython :var nut=nutrition recipe=recipe :colnames no
> nutrition = pd.DataFrame(nut[1:], columns=nut[0])
> rec = pd.DataFrame(recipe[1:], columns=recipe[0])
>
> pd.merge(rec, nutrition, on='type')
> #+END_SRC
>
> #+RESULTS:
> :results:
> # Out [4]:
> # text/plain
> :        type  quty  Fiber  Sugar  Protein  Carb
> : 0     onion    70    1.3    4.4      1.3   9.0
> : 1   tomatoe   120    0.6    2.1      0.8   3.4
> : 2  eggplant   300    2.5    3.2      0.8   8.6
> : 3      tofu   100    0.7    0.5      6.6   1.4
> :end:
>
>
> John
>
> -----------------------------------
> Professor John Kitchin
> Doherty Hall A207F
> Department of Chemical Engineering
> Carnegie Mellon University
> Pittsburgh, PA 15213
> 412-268-7803
> @johnkitchin
> http://kitchingroup.cheme.cmu.edu
>
>
> On Sun, Feb 21, 2021 at 1:54 AM Tim Cross <theophilusx@gmail.com<mailto:theophilusx@gmail.com>> wrote:
>
> Greg Minshall <minshall@umich.edu<mailto:minshall@umich.edu>> writes:
>
>> John,
>>
>>> Is there a state of the art in using org-tables as little databases
>>> with joins and stuff?
>>
>> i have to admit i do all that with an R code source block.  (the dplyr
>> package has the relevant joins, e.g. dplyr::inner_join().)  and, in R,
>> ":colnames yes" as a header argument gives you header lines on results.
>> (maybe that's ?now? for "all" languages?)
>>
>
> For really complex joins and ad hoc queries, I would do similar or put
> the data into sqlite. For more simple ones, I just define a table which
> uses table formulas to extract the values from the other tables - the
> downside being the tables need to have the same data ordering or the
> formulas need to be somewhat complex. Provided the tables have the same
> number of records in the same order, table formulas are usually fairly
> easy.
>
> I did think about writing some elisp functions to use in my table
> formulas to make things easier, but then decided I was just re-inventing
> and well defined database solution and figured when I need it, just use
> sqlite. However, it has been a while since I needed this level of
> complexity, so perhaps things have moved on and there are better ways
> now.


--
Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-02-25 14:51 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-02-20 21:15 state of the art in org-mode tables e.g. join, etc John Kitchin
2021-02-21  4:40 ` Greg Minshall
2021-02-21  6:45   ` Tim Cross
2021-02-21 15:03     ` John Kitchin
2021-02-21 16:23       ` John Kitchin
2021-02-22  6:52         ` Cook, Malcolm
2021-02-22  8:12           ` Greg Minshall
2021-02-22 15:21             ` Cook, Malcolm
2021-02-22 18:41               ` Greg Minshall
2021-02-25 14:50           ` John Kitchin
2021-02-22  8:27         ` Derek Feichtinger
2021-02-24 22:21           ` John Kitchin

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).