From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id eBv7AhF3MmCqOAAA0tVLHw (envelope-from ) for ; Sun, 21 Feb 2021 15:06:57 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1 with LMTPS id B2w9OhB3MmD0IgAAbx9fmQ (envelope-from ) for ; Sun, 21 Feb 2021 15:06:56 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 76629B21F for ; Sun, 21 Feb 2021 16:06:55 +0100 (CET) Received: from localhost ([::1]:57322 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lDqJy-0002qU-8t for larch@yhetil.org; Sun, 21 Feb 2021 10:06:54 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:38048) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lDqGU-00007d-84 for emacs-orgmode@gnu.org; Sun, 21 Feb 2021 10:03:20 -0500 Received: from mail-wm1-x331.google.com ([2a00:1450:4864:20::331]:39703) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1lDqGP-0005mR-SV for emacs-orgmode@gnu.org; Sun, 21 Feb 2021 10:03:16 -0500 Received: by mail-wm1-x331.google.com with SMTP id v62so12077575wmg.4 for ; Sun, 21 Feb 2021 07:03:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=andrew-cmu-edu.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=E1gIaLPBBZQiv+xjtU8ikXKA6SSBIx9huqbpcPKvpuY=; b=1OOMHxsTIMeOaHwsxhnoOlpA0ZioDyMMWFPKwzT5NGZWbyNZk355V/Y/QxXmfEjgnM MSYx5ZLz5AEAPsLq6LTaQoEMekhXLC6b/GDN3LhWiiOnGNW23q5iBrExwBjMkW+Zegb9 Os193cHczI2gWObiWwls56Zvj7sUvzRMLo6SWxEO28BXOSNhbgwFTuqskjBzDJFkN7TW 28ioIheiW0XNVU2Zw6r0mVh41OiREeHdip6NNa1tIp3bxEcpD8A2cS5cti2g4di3Eqw6 8zegWBy/z75Wa8a14E4BWmUbg3UdNKN/iR48FXOk8H3ziq7TcbMXoxlTx9L2PJxQqphE i8lA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=E1gIaLPBBZQiv+xjtU8ikXKA6SSBIx9huqbpcPKvpuY=; b=SM2uWeOsWsVXsFZIwsoroE8Cn/f5KCqvAdgcsIAgOoTm5xWK+8wFIq0ajF7/IbbtJi sduSdDYR28Oo0swuXcADNJfDVFIzxpneXYkv+PcSTJ8uX9O4bbNKR0JYTQNiwXyY8lTc aNXrH37iOwdIxIkbqjbkYbV0FozOrcUA3oBCA9tk6sHT9MjsEDm7l5artn8P77V6zEB3 DvhTZdlqjOuYnAWV8pyXksi36kEXWwJ9YXfAkQAEGOF7hiGZpOVPj8B/lKK5wXfMrE6d clGqUat1cQpYQLPELVA9jTZWPEkmPvzqkuEEcLj+nKEDOEeIaOF6w8O7TB38Hk9nJpJE TqTA== X-Gm-Message-State: AOAM530zfmCFcrGass4aXPNouvoYtyGem8YErqsGhAZB8TcnDc3/vkwR UZyifn8hAAsyq+L+3fAt5XfLvfqanpFAOftDclA= X-Google-Smtp-Source: ABdhPJwVLosuthcw/ziVL0JNtweRw4HphA3H/eeQIQmIMbwRrzHp1pf9A05K4rOBmTZPF1APdDECnsX1Suvs1QGxcn4= X-Received: by 2002:a7b:c755:: with SMTP id w21mr9463217wmk.81.1613919792182; Sun, 21 Feb 2021 07:03:12 -0800 (PST) MIME-Version: 1.0 References: <529055.1613882450@apollo2.minshall.org> <875z2lgbco.fsf@gmail.com> In-Reply-To: <875z2lgbco.fsf@gmail.com> From: John Kitchin Date: Sun, 21 Feb 2021 10:03:01 -0500 Message-ID: Subject: Re: state of the art in org-mode tables e.g. join, etc To: Tim Cross Content-Type: multipart/alternative; boundary="000000000000e8b76605bbd9fbf4" Received-SPF: pass client-ip=2a00:1450:4864:20::331; envelope-from=johnrkitchin@gmail.com; helo=mail-wm1-x331.google.com X-Spam_score_int: -13 X-Spam_score: -1.4 X-Spam_bar: - X-Spam_report: (-1.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.249, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: org-mode-email Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Migadu-Flow: FLOW_IN X-Migadu-Spam-Score: -2.47 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=andrew-cmu-edu.20150623.gappssmtp.com header.s=20150623 header.b=1OOMHxsT; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=andrew.cmu.edu (policy=none); spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Migadu-Queue-Id: 76629B21F X-Spam-Score: -2.47 X-Migadu-Scanner: scn1.migadu.com X-TUID: VypnhDK/tmR9 --000000000000e8b76605bbd9fbf4 Content-Type: text/plain; charset="UTF-8" Thanks Tim and Greg. I had mostly come to the same conclusions that it is probably best to outsource this. I worked out some examples from the orgtbljoin and orgaggregate packages with Pandas below, in case anyone is interested in seeing how it works. A key point is using the ":colnames no" header args to get the column names for Pandas. It seems like a pretty good approach. * org-mode tables with Pandas ** Aggregating from a table Examples from https://github.com/tbanel/orgaggregate #+NAME: original | Day | Color | Level | Quantity | |-----------+-------+-------+----------| | Monday | Red | 30 | 11 | | Monday | Blue | 25 | 3 | | Tuesday | Red | 51 | 12 | | Tuesday | Red | 45 | 15 | | Tuesday | Blue | 33 | 18 | | Wednesday | Red | 27 | 23 | | Wednesday | Blue | 12 | 16 | | Wednesday | Blue | 15 | 15 | | Thursday | Red | 39 | 24 | | Thursday | Red | 41 | 29 | | Thursday | Red | 49 | 30 | | Friday | Blue | 7 | 5 | | Friday | Blue | 6 | 8 | | Friday | Blue | 11 | 9 | #+BEGIN_SRC ipython :var data=original :colnames no import pandas as pd pd.DataFrame(data[1:], columns=data[0]).groupby('Color').size() #+END_SRC #+RESULTS: :results: # Out [1]: # text/plain : Color : Blue 7 : Red 7 : dtype: int64 :end: The categorical stuff here is just to get the days sorted the same way as the example. It is otherwise not needed. I feel there should be a more clever way to do this, but didn't think of it. #+BEGIN_SRC ipython :var data=original :colnames no df = pd.DataFrame(data[1:], columns=data[0]) days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'] df['Day'] = pd.Categorical(df['Day'], categories=days, ordered=True) (df .groupby('Day') .agg({'Level': 'mean', 'Quantity': 'sum'}) .sort_values('Day')) #+END_SRC #+RESULTS: :results: # Out [2]: # text/plain : Level Quantity : Day : Monday 27.5 14 : Tuesday 43.0 45 : Wednesday 18.0 54 : Thursday 43.0 83 : Friday 8.0 22 : Saturday NaN 0 : Sunday NaN 0 [[file:/var/folders/3q/ht_2mtk52hl7ydxrcr87z2gr0000gn/T/ob-ipython-htmlMnDA9a.html]] :end: ** Joining tables Example from https://github.com/tbanel/orgtbljoin #+name: nutrition | type | Fiber | Sugar | Protein | Carb | |----------+-------+-------+---------+------| | eggplant | 2.5 | 3.2 | 0.8 | 8.6 | | tomatoe | 0.6 | 2.1 | 0.8 | 3.4 | | onion | 1.3 | 4.4 | 1.3 | 9.0 | | egg | 0 | 18.3 | 31.9 | 18.3 | | rice | 0.2 | 0 | 1.5 | 16.0 | | bread | 0.7 | 0.7 | 3.3 | 16.0 | | orange | 3.1 | 11.9 | 1.3 | 17.6 | | banana | 2.1 | 9.9 | 0.9 | 18.5 | | tofu | 0.7 | 0.5 | 6.6 | 1.4 | | nut | 2.6 | 1.3 | 4.9 | 7.2 | | corn | 4.7 | 1.8 | 2.8 | 21.3 | #+name: recipe | type | quty | |----------+------| | onion | 70 | | tomatoe | 120 | | eggplant | 300 | | tofu | 100 | #+BEGIN_SRC ipython :var nut=nutrition recipe=recipe :colnames no nutrition = pd.DataFrame(nut[1:], columns=nut[0]) rec = pd.DataFrame(recipe[1:], columns=recipe[0]) pd.merge(rec, nutrition, on='type') #+END_SRC #+RESULTS: :results: # Out [4]: # text/plain : type quty Fiber Sugar Protein Carb : 0 onion 70 1.3 4.4 1.3 9.0 : 1 tomatoe 120 0.6 2.1 0.8 3.4 : 2 eggplant 300 2.5 3.2 0.8 8.6 : 3 tofu 100 0.7 0.5 6.6 1.4 :end: John ----------------------------------- Professor John Kitchin Doherty Hall A207F Department of Chemical Engineering Carnegie Mellon University Pittsburgh, PA 15213 412-268-7803 @johnkitchin http://kitchingroup.cheme.cmu.edu On Sun, Feb 21, 2021 at 1:54 AM Tim Cross wrote: > > Greg Minshall writes: > > > John, > > > >> Is there a state of the art in using org-tables as little databases > >> with joins and stuff? > > > > i have to admit i do all that with an R code source block. (the dplyr > > package has the relevant joins, e.g. dplyr::inner_join().) and, in R, > > ":colnames yes" as a header argument gives you header lines on results. > > (maybe that's ?now? for "all" languages?) > > > > For really complex joins and ad hoc queries, I would do similar or put > the data into sqlite. For more simple ones, I just define a table which > uses table formulas to extract the values from the other tables - the > downside being the tables need to have the same data ordering or the > formulas need to be somewhat complex. Provided the tables have the same > number of records in the same order, table formulas are usually fairly > easy. > > I did think about writing some elisp functions to use in my table > formulas to make things easier, but then decided I was just re-inventing > and well defined database solution and figured when I need it, just use > sqlite. However, it has been a while since I needed this level of > complexity, so perhaps things have moved on and there are better ways > now. > > -- > Tim Cross > > --000000000000e8b76605bbd9fbf4 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks Tim and Greg. I had mostly come to the same conclus= ions that it is probably best to outsource this. I worked out some examples= from the=C2=A0orgtbljoin and orgaggregate=C2=A0packages with Pandas below,= in case anyone is interested in seeing how it works. A key point is using = the ":colnames no" header args to get the column names for Pandas= . It seems like a pretty good approach.

* org-mode table= s with Pandas
** Aggregating from a table

Examples from https://github.com/tbanel/orgag= gregate


#+NAME: original
| Day =C2=A0 =C2=A0 =C2=A0 | Col= or | Level | Quantity |
|-----------+-------+-------+----------|
| Mo= nday =C2=A0 =C2=A0| Red =C2=A0 | =C2=A0 =C2=A030 | =C2=A0 =C2=A0 =C2=A0 11 = |
| Monday =C2=A0 =C2=A0| Blue =C2=A0| =C2=A0 =C2=A025 | =C2=A0 =C2=A0 = =C2=A0 =C2=A03 |
| Tuesday =C2=A0 | Red =C2=A0 | =C2=A0 =C2=A051 | =C2= =A0 =C2=A0 =C2=A0 12 |
| Tuesday =C2=A0 | Red =C2=A0 | =C2=A0 =C2=A045 |= =C2=A0 =C2=A0 =C2=A0 15 |
| Tuesday =C2=A0 | Blue =C2=A0| =C2=A0 =C2=A0= 33 | =C2=A0 =C2=A0 =C2=A0 18 |
| Wednesday | Red =C2=A0 | =C2=A0 =C2=A02= 7 | =C2=A0 =C2=A0 =C2=A0 23 |
| Wednesday | Blue =C2=A0| =C2=A0 =C2=A012= | =C2=A0 =C2=A0 =C2=A0 16 |
| Wednesday | Blue =C2=A0| =C2=A0 =C2=A015 = | =C2=A0 =C2=A0 =C2=A0 15 |
| Thursday =C2=A0| Red =C2=A0 | =C2=A0 =C2= =A039 | =C2=A0 =C2=A0 =C2=A0 24 |
| Thursday =C2=A0| Red =C2=A0 | =C2=A0= =C2=A041 | =C2=A0 =C2=A0 =C2=A0 29 |
| Thursday =C2=A0| Red =C2=A0 | = =C2=A0 =C2=A049 | =C2=A0 =C2=A0 =C2=A0 30 |
| Friday =C2=A0 =C2=A0| Blue= =C2=A0| =C2=A0 =C2=A0 7 | =C2=A0 =C2=A0 =C2=A0 =C2=A05 |
| Friday =C2= =A0 =C2=A0| Blue =C2=A0| =C2=A0 =C2=A0 6 | =C2=A0 =C2=A0 =C2=A0 =C2=A08 || Friday =C2=A0 =C2=A0| Blue =C2=A0| =C2=A0 =C2=A011 | =C2=A0 =C2=A0 =C2= =A0 =C2=A09 |


#+BEGIN_SRC ipython :var data=3Doriginal :colnames= no
import pandas as pd

pd.DataFrame(data[1:], columns=3Ddata[0])= .groupby('Color').size()
#+END_SRC

#+RESULTS:
:results= :
# Out [1]:
# text/plain
: Color
: Blue =C2=A0 =C2=A07
: Re= d =C2=A0 =C2=A0 7
: dtype: int64
:end:

The categorical stuff h= ere is just to get the days sorted the same way as the example. It is other= wise not needed. I feel there should be a more clever way to do this, but d= idn't think of it.

#+BEGIN_SRC ipython :var data=3Dorigin= al :colnames no
df =3D pd.DataFrame(data[1:], columns=3Ddata[0])
days= =3D ['Monday', 'Tuesday', 'Wednesday', 'Thursd= ay', 'Friday', 'Saturday', 'Sunday']
df['= ;Day'] =3D pd.Categorical(df['Day'], categories=3Ddays, ordered= =3DTrue)

(df
=C2=A0.groupby('Day')
=C2=A0.agg({'Le= vel': 'mean',
=C2=A0 =C2=A0 =C2=A0 =C2=A0'Quantity':= 'sum'})
=C2=A0.sort_values('Day'))
#+END_SRC

= #+RESULTS:
:results:
# Out [2]:
# text/plain
: =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0Level =C2=A0Quantity
: Day
: Monday =C2=A0= =C2=A0 =C2=A027.5 =C2=A0 =C2=A0 =C2=A0 =C2=A014
: Tuesday =C2=A0 =C2=A0= 43.0 =C2=A0 =C2=A0 =C2=A0 =C2=A045
: Wednesday =C2=A0 18.0 =C2=A0 =C2= =A0 =C2=A0 =C2=A054
: Thursday =C2=A0 =C2=A043.0 =C2=A0 =C2=A0 =C2=A0 = =C2=A083
: Friday =C2=A0 =C2=A0 =C2=A0 8.0 =C2=A0 =C2=A0 =C2=A0 =C2=A022=
: Saturday =C2=A0 =C2=A0 NaN =C2=A0 =C2=A0 =C2=A0 =C2=A0 0
: Sunday = =C2=A0 =C2=A0 =C2=A0 NaN =C2=A0 =C2=A0 =C2=A0 =C2=A0 0

[[file:/var/f= olders/3q/ht_2mtk52hl7ydxrcr87z2gr0000gn/T/ob-ipython-htmlMnDA9a.html]]
= :end:

** Joining tables

Example from https://github.com/tbanel/orgtbljoin

#+= name: nutrition
| type =C2=A0 =C2=A0 | Fiber | Sugar | Protein | Carb |<= br>|----------+-------+-------+---------+------|
| eggplant | =C2=A0 2.5= | =C2=A0 3.2 | =C2=A0 =C2=A0 0.8 | =C2=A08.6 |
| tomatoe =C2=A0| =C2=A0= 0.6 | =C2=A0 2.1 | =C2=A0 =C2=A0 0.8 | =C2=A03.4 |
| onion =C2=A0 =C2= =A0| =C2=A0 1.3 | =C2=A0 4.4 | =C2=A0 =C2=A0 1.3 | =C2=A09.0 |
| egg =C2= =A0 =C2=A0 =C2=A0| =C2=A0 =C2=A0 0 | =C2=A018.3 | =C2=A0 =C2=A031.9 | 18.3 = |
| rice =C2=A0 =C2=A0 | =C2=A0 0.2 | =C2=A0 =C2=A0 0 | =C2=A0 =C2=A0 1.= 5 | 16.0 |
| bread =C2=A0 =C2=A0| =C2=A0 0.7 | =C2=A0 0.7 | =C2=A0 =C2= =A0 3.3 | 16.0 |
| orange =C2=A0 | =C2=A0 3.1 | =C2=A011.9 | =C2=A0 =C2= =A0 1.3 | 17.6 |
| banana =C2=A0 | =C2=A0 2.1 | =C2=A0 9.9 | =C2=A0 =C2= =A0 0.9 | 18.5 |
| tofu =C2=A0 =C2=A0 | =C2=A0 0.7 | =C2=A0 0.5 | =C2=A0= =C2=A0 6.6 | =C2=A01.4 |
| nut =C2=A0 =C2=A0 =C2=A0| =C2=A0 2.6 | =C2= =A0 1.3 | =C2=A0 =C2=A0 4.9 | =C2=A07.2 |
| corn =C2=A0 =C2=A0 | =C2=A0 = 4.7 | =C2=A0 1.8 | =C2=A0 =C2=A0 2.8 | 21.3 |


#+name: recipe
= | type =C2=A0 =C2=A0 | quty |
|----------+------|
| onion =C2=A0 =C2= =A0| =C2=A0 70 |
| tomatoe =C2=A0| =C2=A0120 |
| eggplant | =C2=A0300= |
| tofu =C2=A0 =C2=A0 | =C2=A0100 |


#+BEGIN_SRC ipython :va= r nut=3Dnutrition recipe=3Drecipe :colnames no
nutrition =3D pd.DataFram= e(nut[1:], columns=3Dnut[0])
rec =3D pd.DataFrame(recipe[1:], columns=3D= recipe[0])

pd.merge(rec, nutrition, on=3D'type')
#+END_SR= C

#+RESULTS:
:results:
# Out [4]:
# text/plain
: =C2=A0 = =C2=A0 =C2=A0 =C2=A0type =C2=A0quty =C2=A0Fiber =C2=A0Sugar =C2=A0Protein = =C2=A0Carb
: 0 =C2=A0 =C2=A0 onion =C2=A0 =C2=A070 =C2=A0 =C2=A01.3 =C2= =A0 =C2=A04.4 =C2=A0 =C2=A0 =C2=A01.3 =C2=A0 9.0
: 1 =C2=A0 tomatoe =C2= =A0 120 =C2=A0 =C2=A00.6 =C2=A0 =C2=A02.1 =C2=A0 =C2=A0 =C2=A00.8 =C2=A0 3.= 4
: 2 =C2=A0eggplant =C2=A0 300 =C2=A0 =C2=A02.5 =C2=A0 =C2=A03.2 =C2=A0= =C2=A0 =C2=A00.8 =C2=A0 8.6
: 3 =C2=A0 =C2=A0 =C2=A0tofu =C2=A0 100 =C2= =A0 =C2=A00.7 =C2=A0 =C2=A00.5 =C2=A0 =C2=A0 =C2=A06.6 =C2=A0 1.4
:end:<= br>


John

----------------------------------= -
Professor John Kitchin=C2=A0
Doherty Hall A207F
Department of Ch= emical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213412-268-7803



Greg Minshall <m= inshall@umich.edu> writes:

> John,
>
>> Is there a state of the art in using org-tables as little database= s
>> with joins and stuff?
>
> i have to admit i do all that with an R code source block.=C2=A0 (the = dplyr
> package has the relevant joins, e.g. dplyr::inner_join().)=C2=A0 and, = in R,
> ":colnames yes" as a header argument gives you header lines = on results.
> (maybe that's ?now? for "all" languages?)
>

For really complex joins and ad hoc queries, I would do similar or put
the data into sqlite. For more simple ones, I just define a table which
uses table formulas to extract the values from the other tables - the
downside being the tables need to have the same data ordering or the
formulas need to be somewhat complex. Provided the tables have the same
number of records in the same order, table formulas are usually fairly
easy.

I did think about writing some elisp functions to use in my table
formulas to make things easier, but then decided I was just re-inventing and well defined database solution and figured when I need it, just use
sqlite. However, it has been a while since I needed this level of
complexity, so perhaps things have moved on and there are better ways
now.

--
Tim Cross

--000000000000e8b76605bbd9fbf4--