From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:2:c151::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id +JxdF2a5N2ArBwAA0tVLHw (envelope-from ) for ; Thu, 25 Feb 2021 14:51:18 +0000 Received: from aspmx2.migadu.com ([2001:41d0:2:c151::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id GMwSE2a5N2BNUQAA1q6Kng (envelope-from ) for ; Thu, 25 Feb 2021 14:51:18 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx2.migadu.com (Postfix) with ESMTPS id 4945C16163 for ; Thu, 25 Feb 2021 15:51:17 +0100 (CET) Received: from localhost ([::1]:42610 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lFHz2-0002FX-6B for larch@yhetil.org; Thu, 25 Feb 2021 09:51:16 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:38630) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lFHyX-0002FJ-Nf for emacs-orgmode@gnu.org; Thu, 25 Feb 2021 09:50:45 -0500 Received: from mail-qk1-x72a.google.com ([2607:f8b0:4864:20::72a]:34189) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1lFHyV-0004eH-5O for emacs-orgmode@gnu.org; Thu, 25 Feb 2021 09:50:45 -0500 Received: by mail-qk1-x72a.google.com with SMTP id x124so5850913qkc.1 for ; Thu, 25 Feb 2021 06:50:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:references:user-agent:from:to:cc:subject:in-reply-to :message-id:date:mime-version; bh=BBR3MdtXD2fWLvwUwzmfhksvZNCZvhck13lrVcFR1pc=; b=C54l4jizW/cpk68aar/fbwaEf4vALMPVy6EN8rcsZyzZcXggMt80nVfB7vbuPm9quh uOVbt4P+Q6kD/m2ViW878mjMtRRjkgWWgC5ueO7bncOE2PiNGdO+JcFy4xG+uTfeO2P+ 6iuwu7yQdLnllK9Vby57wLbH8T50LrsZBprBjHv6BLHl3rOj3Wps3KYtPgeL9llrBXBq P5K3IdajtXqQ2gMlZQ6Mdkm1upoDW8yMRoqvb/bC3umhYYLoWTBlKMoZCRQBvP9M8cBM ihXFRSoTjSSxbODxP8htjVHocbpMl6TzP7LA919F8B+m06Ze65VgZATDcA6KX+1SsMLb vJKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:references:user-agent:from:to:cc:subject :in-reply-to:message-id:date:mime-version; bh=BBR3MdtXD2fWLvwUwzmfhksvZNCZvhck13lrVcFR1pc=; b=C7qvQCXarb8xoH3NpEDjqbrYniUTR+6EbNdqjIPF1VSynot0uv1hAakSmXkTUEBoXA twD+gPLxvtw9dFHZg7OklwVYRKvwdmfhUvOq604Ng07pqTzu9YJgH4eOf9ARnk9HARhs V4+Z4k+l39DTv0Nx4uSIA/MKXzV+RMB5LYs5SyphSpwZu9A+hG7vXnum5yYFO1x7nIwO yugbc9oCAl7fmE0iqsmrm+fu6yUSd0ahYNN5dHgaTBq1X9sLgm1s5Ui/aIQ6bTprodn5 91vvxsOGhPT00s45CKjzNKR5qj4hfbU5AXjJr95wlZwFFkS3Hl0cIa0BFCkW0f7Bq0YV oW0g== X-Gm-Message-State: AOAM531ZRSJdpWGgi0Zy5FCBLq9AgaqJzH/dx7HaVlu/+/PxT3HyuamD w4Tiyfvr9d6ki6tKVH0DiFLkC3ouU7n4Dg== X-Google-Smtp-Source: ABdhPJzpxA/PM7djFN0sK9WN6MRHk+99uz8zKsVPhJNih28maEP0sReKRDPhpr3eN3YEzqZ0jzh8iw== X-Received: by 2002:a37:9bc3:: with SMTP id d186mr2993709qke.240.1614264641646; Thu, 25 Feb 2021 06:50:41 -0800 (PST) Received: from Johns-iMac.local ([2601:547:901:1830:7521:bd61:2aed:109a]) by smtp.gmail.com with ESMTPSA id z188sm2879389qkb.40.2021.02.25.06.50.40 (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Thu, 25 Feb 2021 06:50:41 -0800 (PST) References: <529055.1613882450@apollo2.minshall.org> <875z2lgbco.fsf@gmail.com> User-agent: mu4e 1.4.13; emacs 27.1.90 From: John Kitchin To: "Cook, Malcolm" Subject: Re: state of the art in org-mode tables e.g. join, etc In-reply-to: Message-ID: Date: Thu, 25 Feb 2021 09:50:40 -0500 MIME-Version: 1.0 Content-Type: text/plain Received-SPF: pass client-ip=2607:f8b0:4864:20::72a; envelope-from=johnrkitchin@gmail.com; helo=mail-qk1-x72a.google.com X-Spam_score_int: -14 X-Spam_score: -1.5 X-Spam_bar: - X-Spam_report: (-1.5 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.249, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Tim Cross , org-mode-email Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Migadu-Flow: FLOW_IN X-Migadu-Spam-Score: -1.27 Authentication-Results: aspmx2.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gmail.com header.s=20161025 header.b=C54l4jiz; dmarc=fail reason="SPF not aligned (relaxed)" header.from=andrew.cmu.edu (policy=none); spf=pass (aspmx2.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Migadu-Queue-Id: 4945C16163 X-Spam-Score: -1.27 X-Migadu-Scanner: scn1.migadu.com X-TUID: oHhCfsdYp0Cr That is remarkably slim code to get those results! Cook, Malcolm writes: > John, > > Checkout what R sqldf package makes easy: > > ** aggregation example > > Examples from https://github.com/tbanel/orgaggregate > > > #+NAME: original > | Day | Color | Level | Quantity | > |-----------+-------+-------+----------| > | Monday | Red | 30 | 11 | > | Monday | Blue | 25 | 3 | > | Tuesday | Red | 51 | 12 | > | Tuesday | Red | 45 | 15 | > | Tuesday | Blue | 33 | 18 | > | Wednesday | Red | 27 | 23 | > | Wednesday | Blue | 12 | 16 | > | Wednesday | Blue | 15 | 15 | > | Thursday | Red | 39 | 24 | > | Thursday | Red | 41 | 29 | > | Thursday | Red | 49 | 30 | > | Friday | Blue | 7 | 5 | > | Friday | Blue | 6 | 8 | > | Friday | Blue | 11 | 9 | > > #+PROPERTY: header-args:R :session *R* > > #+begin_src R :results none > library(sqldf) > #+end_src > > > #+begin_src R :var original=original :colnames yes > sqldf('select Color, count(*) from original group by Color;') > #+end_src > > #+RESULTS: > | Color | count(*) | > |-------+----------| > | Blue | 7 | > | Red | 7 | > > > > ** join example > > Example from https://github.com/tbanel/orgtbljoin > > #+name: nutrition > | type | Fiber | Sugar | Protein | Carb | > |----------+-------+-------+---------+------| > | eggplant | 2.5 | 3.2 | 0.8 | 8.6 | > | tomatoe | 0.6 | 2.1 | 0.8 | 3.4 | > | onion | 1.3 | 4.4 | 1.3 | 9.0 | > | egg | 0 | 18.3 | 31.9 | 18.3 | > | rice | 0.2 | 0 | 1.5 | 16.0 | > | bread | 0.7 | 0.7 | 3.3 | 16.0 | > | orange | 3.1 | 11.9 | 1.3 | 17.6 | > | banana | 2.1 | 9.9 | 0.9 | 18.5 | > | tofu | 0.7 | 0.5 | 6.6 | 1.4 | > | nut | 2.6 | 1.3 | 4.9 | 7.2 | > | corn | 4.7 | 1.8 | 2.8 | 21.3 | > > > #+name: recipe > | type | quty | > |----------+------| > | onion | 70 | > | tomatoe | 120 | > | eggplant | 300 | > | tofu | 100 | > > > #+begin_src R :var recipe=recipe :var nutrition=nutrition :colnames yes > sqldf('select * from recipe, nutrition where recipe.type=nutrition.type') > #+end_src > > #+RESULTS: > | type | quty | type | Fiber | Sugar | Protein | Carb | > |----------+------+----------+-------+-------+---------+------| > | onion | 70 | onion | 1.3 | 4.4 | 1.3 | 9 | > | tomatoe | 120 | tomatoe | 0.6 | 2.1 | 0.8 | 3.4 | > | eggplant | 300 | eggplant | 2.5 | 3.2 | 0.8 | 8.6 | > | tofu | 100 | tofu | 0.7 | 0.5 | 6.6 | 1.4 | > > > > This should also be possible but I cannot get it to work now: > > #+begin_src R :var recipe=recipe :var nutrition=nutrition :colnames yes :prologue sqldf(' :epilogue ') > select * from recipe, nutrition where recipe.type=nutrition.type > #+end_src > > > > > > From: Emacs-orgmode On Behalf Of John Kitchin > Sent: Sunday, February 21, 2021 10:24 > To: Tim Cross > Cc: org-mode-email > Subject: Re: state of the art in org-mode tables e.g. join, etc > > ATTENTION: This email came from an external source. Do not open attachments or click on links from unknown senders or unexpected emails. > > For fun, here is the sqlite equivalent of the Pandas example using the same tables as before > > > ** aggregation example > > Examples from https://github.com/tbanel/orgaggregate > > > #+NAME: original > | Day | Color | Level | Quantity | > |-----------+-------+-------+----------| > | Monday | Red | 30 | 11 | > | Monday | Blue | 25 | 3 | > | Tuesday | Red | 51 | 12 | > | Tuesday | Red | 45 | 15 | > | Tuesday | Blue | 33 | 18 | > | Wednesday | Red | 27 | 23 | > | Wednesday | Blue | 12 | 16 | > | Wednesday | Blue | 15 | 15 | > | Thursday | Red | 39 | 24 | > | Thursday | Red | 41 | 29 | > | Thursday | Red | 49 | 30 | > | Friday | Blue | 7 | 5 | > | Friday | Blue | 6 | 8 | > | Friday | Blue | 11 | 9 | > > > #+begin_src sqlite :db ":memory:" :var orgtable=original :colnames yes > drop table if exists testtable; > create table testtable(Day str, Color str, Level int, Quantity int); > .mode csv testtable > .import $orgtable testtable > select Color, count(*) from testtable group by Color; > #+end_src > > #+RESULTS: > | Color | count(*) | > |-------+----------| > | Blue | 7 | > | Red | 7 | > > ** join example > > Example from https://github.com/tbanel/orgtbljoin > > #+name: nutrition > | type | Fiber | Sugar | Protein | Carb | > |----------+-------+-------+---------+------| > | eggplant | 2.5 | 3.2 | 0.8 | 8.6 | > | tomatoe | 0.6 | 2.1 | 0.8 | 3.4 | > | onion | 1.3 | 4.4 | 1.3 | 9.0 | > | egg | 0 | 18.3 | 31.9 | 18.3 | > | rice | 0.2 | 0 | 1.5 | 16.0 | > | bread | 0.7 | 0.7 | 3.3 | 16.0 | > | orange | 3.1 | 11.9 | 1.3 | 17.6 | > | banana | 2.1 | 9.9 | 0.9 | 18.5 | > | tofu | 0.7 | 0.5 | 6.6 | 1.4 | > | nut | 2.6 | 1.3 | 4.9 | 7.2 | > | corn | 4.7 | 1.8 | 2.8 | 21.3 | > > > #+name: recipe > | type | quty | > |----------+------| > | onion | 70 | > | tomatoe | 120 | > | eggplant | 300 | > | tofu | 100 | > > > #+begin_src sqlite :db ":memory:" :var nut=nutrition rec=recipe :colnames yes > drop table if exists nutrition; > drop table if exists recipe; > create table nutrition(type str, Fiber float, Sugar float, Protein float, Carb float); > create table recipe(type str, quty int); > > .mode csv nutrition > .import $nut nutrition > > .mode csv recipe > .import $rec recipe > > select * from recipe, nutrition where recipe.type=nutrition.type; > #+end_src > > #+RESULTS: > | type | quty | type | Fiber | Sugar | Protein | Carb | > |----------+------+----------+-------+-------+---------+------| > | onion | 70 | onion | 1.3 | 4.4 | 1.3 | 9.0 | > | tomatoe | 120 | tomatoe | 0.6 | 2.1 | 0.8 | 3.4 | > | eggplant | 300 | eggplant | 2.5 | 3.2 | 0.8 | 8.6 | > | tofu | 100 | tofu | 0.7 | 0.5 | 6.6 | 1.4 | > > > John > > ----------------------------------- > Professor John Kitchin > Doherty Hall A207F > Department of Chemical Engineering > Carnegie Mellon University > Pittsburgh, PA 15213 > 412-268-7803 > @johnkitchin > http://kitchingroup.cheme.cmu.edu > > > On Sun, Feb 21, 2021 at 10:03 AM John Kitchin > wrote: > Thanks Tim and Greg. I had mostly come to the same conclusions that it is probably best to outsource this. I worked out some examples from the orgtbljoin and orgaggregate packages with Pandas below, in case anyone is interested in seeing how it works. A key point is using the ":colnames no" header args to get the column names for Pandas. It seems like a pretty good approach. > > * org-mode tables with Pandas > ** Aggregating from a table > > Examples from https://github.com/tbanel/orgaggregate > > > #+NAME: original > | Day | Color | Level | Quantity | > |-----------+-------+-------+----------| > | Monday | Red | 30 | 11 | > | Monday | Blue | 25 | 3 | > | Tuesday | Red | 51 | 12 | > | Tuesday | Red | 45 | 15 | > | Tuesday | Blue | 33 | 18 | > | Wednesday | Red | 27 | 23 | > | Wednesday | Blue | 12 | 16 | > | Wednesday | Blue | 15 | 15 | > | Thursday | Red | 39 | 24 | > | Thursday | Red | 41 | 29 | > | Thursday | Red | 49 | 30 | > | Friday | Blue | 7 | 5 | > | Friday | Blue | 6 | 8 | > | Friday | Blue | 11 | 9 | > > > #+BEGIN_SRC ipython :var data=original :colnames no > import pandas as pd > > pd.DataFrame(data[1:], columns=data[0]).groupby('Color').size() > #+END_SRC > > #+RESULTS: > :results: > # Out [1]: > # text/plain > : Color > : Blue 7 > : Red 7 > : dtype: int64 > :end: > > The categorical stuff here is just to get the days sorted the same way as the example. It is otherwise not needed. I feel there should be a more clever way to do this, but didn't think of it. > > #+BEGIN_SRC ipython :var data=original :colnames no > df = pd.DataFrame(data[1:], columns=data[0]) > days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'] > df['Day'] = pd.Categorical(df['Day'], categories=days, ordered=True) > > (df > .groupby('Day') > .agg({'Level': 'mean', > 'Quantity': 'sum'}) > .sort_values('Day')) > #+END_SRC > > #+RESULTS: > :results: > # Out [2]: > # text/plain > : Level Quantity > : Day > : Monday 27.5 14 > : Tuesday 43.0 45 > : Wednesday 18.0 54 > : Thursday 43.0 83 > : Friday 8.0 22 > : Saturday NaN 0 > : Sunday NaN 0 > > [[file:/var/folders/3q/ht_2mtk52hl7ydxrcr87z2gr0000gn/T/ob-ipython-htmlMnDA9a.html]] > :end: > > ** Joining tables > > Example from https://github.com/tbanel/orgtbljoin > > #+name: nutrition > | type | Fiber | Sugar | Protein | Carb | > |----------+-------+-------+---------+------| > | eggplant | 2.5 | 3.2 | 0.8 | 8.6 | > | tomatoe | 0.6 | 2.1 | 0.8 | 3.4 | > | onion | 1.3 | 4.4 | 1.3 | 9.0 | > | egg | 0 | 18.3 | 31.9 | 18.3 | > | rice | 0.2 | 0 | 1.5 | 16.0 | > | bread | 0.7 | 0.7 | 3.3 | 16.0 | > | orange | 3.1 | 11.9 | 1.3 | 17.6 | > | banana | 2.1 | 9.9 | 0.9 | 18.5 | > | tofu | 0.7 | 0.5 | 6.6 | 1.4 | > | nut | 2.6 | 1.3 | 4.9 | 7.2 | > | corn | 4.7 | 1.8 | 2.8 | 21.3 | > > > #+name: recipe > | type | quty | > |----------+------| > | onion | 70 | > | tomatoe | 120 | > | eggplant | 300 | > | tofu | 100 | > > > #+BEGIN_SRC ipython :var nut=nutrition recipe=recipe :colnames no > nutrition = pd.DataFrame(nut[1:], columns=nut[0]) > rec = pd.DataFrame(recipe[1:], columns=recipe[0]) > > pd.merge(rec, nutrition, on='type') > #+END_SRC > > #+RESULTS: > :results: > # Out [4]: > # text/plain > : type quty Fiber Sugar Protein Carb > : 0 onion 70 1.3 4.4 1.3 9.0 > : 1 tomatoe 120 0.6 2.1 0.8 3.4 > : 2 eggplant 300 2.5 3.2 0.8 8.6 > : 3 tofu 100 0.7 0.5 6.6 1.4 > :end: > > > John > > ----------------------------------- > Professor John Kitchin > Doherty Hall A207F > Department of Chemical Engineering > Carnegie Mellon University > Pittsburgh, PA 15213 > 412-268-7803 > @johnkitchin > http://kitchingroup.cheme.cmu.edu > > > On Sun, Feb 21, 2021 at 1:54 AM Tim Cross > wrote: > > Greg Minshall > writes: > >> John, >> >>> Is there a state of the art in using org-tables as little databases >>> with joins and stuff? >> >> i have to admit i do all that with an R code source block. (the dplyr >> package has the relevant joins, e.g. dplyr::inner_join().) and, in R, >> ":colnames yes" as a header argument gives you header lines on results. >> (maybe that's ?now? for "all" languages?) >> > > For really complex joins and ad hoc queries, I would do similar or put > the data into sqlite. For more simple ones, I just define a table which > uses table formulas to extract the values from the other tables - the > downside being the tables need to have the same data ordering or the > formulas need to be somewhat complex. Provided the tables have the same > number of records in the same order, table formulas are usually fairly > easy. > > I did think about writing some elisp functions to use in my table > formulas to make things easier, but then decided I was just re-inventing > and well defined database solution and figured when I need it, just use > sqlite. However, it has been a while since I needed this level of > complexity, so perhaps things have moved on and there are better ways > now. -- Professor John Kitchin Doherty Hall A207F Department of Chemical Engineering Carnegie Mellon University Pittsburgh, PA 15213 412-268-7803 @johnkitchin http://kitchingroup.cheme.cmu.edu