From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id ANguHGRyql4YaQAA0tVLHw (envelope-from ) for ; Thu, 30 Apr 2020 06:38:28 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id 6NyvA21yql6oSAAA1q6Kng (envelope-from ) for ; Thu, 30 Apr 2020 06:38:37 +0000 Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:470:142::17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 0CE6C941238 for ; Thu, 30 Apr 2020 06:38:36 +0000 (UTC) Received: from localhost ([::1]:52148 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jU2qB-0006ag-7v for larch@yhetil.org; Thu, 30 Apr 2020 02:38:35 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:57944) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jU2pL-0006YB-6x for emacs-orgmode@gnu.org; Thu, 30 Apr 2020 02:37:54 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.90_1) (envelope-from ) id 1jU2mK-0005LG-Am for emacs-orgmode@gnu.org; Thu, 30 Apr 2020 02:37:42 -0400 Received: from smtp2-g21.free.fr ([2a01:e0c:1:1599::11]:64239) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jU2mJ-0005L8-TI for emacs-orgmode@gnu.org; Thu, 30 Apr 2020 02:34:36 -0400 Received: from [IPv6:2a01:e35:1398:10f0:5908:4055:8a0e:62b7] (unknown [IPv6:2a01:e35:1398:10f0:5908:4055:8a0e:62b7]) by smtp2-g21.free.fr (Postfix) with ESMTPS id CE545200348 for ; Thu, 30 Apr 2020 08:34:32 +0200 (CEST) From: tbanelwebmin Subject: [ANN] faster org-table-to-lisp To: emacs-orgmode@gnu.org Message-ID: <820681a6-4973-f016-6425-4afb9c9486a7@free.fr> Date: Thu, 30 Apr 2020 08:34:32 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Content-Language: en-US Received-SPF: none client-ip=2a01:e0c:1:1599::11; envelope-from=tbanelwebmin@free.fr; helo=smtp2-g21.free.fr X-detected-operating-system: by eggs.gnu.org: Error: [-] PROGRAM ABORT : Malformed IPv6 address (bad octet value). Location : parse_addr6(), p0f-client.c:67 X-Received-From: 2a01:e0c:1:1599::11 X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Scanner: scn0 X-Spam-Score: -1.01 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 2001:470:142::17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Scan-Result: default: False [-1.01 / 13.00]; GENERIC_REPUTATION(0.00)[-0.49401316825032]; DWL_DNSWL_FAIL(0.00)[2001:470:142::17:server fail]; R_SPF_ALLOW(-0.20)[+ip6:2001:470:142::/48:c]; FREEMAIL_FROM(0.00)[free.fr]; TO_DN_NONE(0.00)[]; IP_REPUTATION_HAM(0.00)[asn: 22989(0.17), country: US(-0.00), ip: 2001:470:142::17(-0.49)]; MX_GOOD(-0.50)[cached: eggs.gnu.org]; MAILLIST(-0.20)[mailman]; FORGED_RECIPIENTS_MAILLIST(0.00)[]; RCVD_IN_DNSWL_FAIL(0.00)[2001:470:142::17:server fail]; RCVD_TLS_LAST(0.00)[]; R_DKIM_NA(0.00)[]; ASN(0.00)[asn:22989, ipnet:2001:470:142::/48, country:US]; MID_RHS_MATCH_FROM(0.00)[]; TAGGED_FROM(0.00)[larch=yhetil.org]; ARC_NA(0.00)[]; RCVD_COUNT_FIVE(0.00)[6]; FROM_NEQ_ENVFROM(0.00)[tbanelwebmin@free.fr,emacs-orgmode-bounces@gnu.org]; FROM_HAS_DN(0.00)[]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[emacs-orgmode@gnu.org]; DMARC_NA(0.00)[free.fr]; HAS_LIST_UNSUB(-0.01)[]; RCPT_COUNT_ONE(0.00)[1]; MIME_TRACE(0.00)[0:+]; FORGED_SENDER_MAILLIST(0.00)[] X-TUID: +AwNrxvRwUOD Hi The List. Here is an alternative, faster version of org-table-to-lisp. It can be more than 100 times faster. #+BEGIN_SRC elisp (defun org-table-to-lisp-faster (&optional org-table-at-p-done)   "Convert the table at point to a Lisp structure. The structure will be a list.  Each item is either the symbol `hline' for a horizontal separator line, or a list of field values as strings. The table is taken from the buffer at point. When the optional ORG-TABLE-AT-P-DONE parameter is not nil, it is assumed that (org-at-table-p) was already called."   (or org-table-at-p-done (org-at-table-p) (user-error "No table at point"))   (save-excursion     (goto-char (org-table-begin))     (let ((end (org-table-end))           (row)           (table))       (while (< (point) end)         (setq row nil)         (search-forward "|" end)         (if (looking-at "-")             (progn               (search-forward "\n" end)               (push 'hline table))           (while (not (search-forward-regexp "\\=\n" end t))             (unless (search-forward-regexp "\\=\\s-*\\([^|]*\\)" end t)               (user-error "Malformed table at char %s" (point)))             (let ((b (match-beginning 1))           (e (match-end       1)))               (and (search-backward-regexp "[^ \t]" b t)                (forward-char 1))               (push            (buffer-substring-no-properties b (point))            row)           (goto-char (1+ e))))           (push (nreverse row) table)))       (nreverse table)))) #+END_SRC Bellow is an example of a large table borrowed from the Datamash software. On my PC, the reproducible benches show: - Traditional org-table-to-lisp: 130 seconds - Alternative org-table-to-lisp: 0.8 seconds (not compiled) It is faster because it operates directly on the buffer with (search-forward-regexp). Whereas the standard function splits a string extracted from the buffer. This function is a drop-in replacement for the standard one. It can benefit to Babel and Gnuplot. Would it make sense to upgrade Org Mode code base? Beware! The optional parameter has a slightly different meaning for both functions: - for the traditional function, it is a string representing an Org table - for the alternative function, it is a Boolean telling whether (org-table-at-p) has been called or not This difference makes no difference for the use cases in the code base. The function is always called without a parameter, or as: #+BEGIN_SRC elisp (org-table-to-lisp   (buffer-substring-no-properties     (org-table-begin)     (org-table-end))) #+END_SRC Here is the reproducible bench. It is a self-contained, Org Mode file to be opened in Emacs. wget http://tbanelwebmin.free.fr/OrgMode/bench-org-table-to-lisp.org.gz