From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1 ([2001:41d0:2:bcc0::]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id wLkGNBwfiWCCKAEAgWs5BA (envelope-from ) for ; Wed, 28 Apr 2021 10:38:52 +0200 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1 with LMTPS id WOmOLxwfiWDuDwAAbx9fmQ (envelope-from ) for ; Wed, 28 Apr 2021 08:38:52 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 14E4F1E646 for ; Wed, 28 Apr 2021 10:38:52 +0200 (CEST) Received: from localhost ([::1]:46556 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lbfid-0002ME-71 for larch@yhetil.org; Wed, 28 Apr 2021 04:38:51 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:58868) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lbfhq-0002LF-AC for emacs-orgmode@gnu.org; Wed, 28 Apr 2021 04:38:02 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:40132) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1lbfhq-0004c3-2D for emacs-orgmode@gnu.org; Wed, 28 Apr 2021 04:38:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1lbfhp-0002Vj-Vq for emacs-orgmode@gnu.org; Wed, 28 Apr 2021 04:38:01 -0400 X-Loop: help-debbugs@gnu.org Subject: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use Resent-From: Utkarsh Singh Original-Sender: "Debbugs-submit" Resent-CC: emacs-orgmode@gnu.org Resent-Date: Wed, 28 Apr 2021 08:38:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 47885 X-GNU-PR-Package: org-mode X-GNU-PR-Keywords: patch To: Nicolas Goaziou Received: via spool by 47885-submit@debbugs.gnu.org id=B47885.16195990659616 (code B ref 47885); Wed, 28 Apr 2021 08:38:01 +0000 Received: (at 47885) by debbugs.gnu.org; 28 Apr 2021 08:37:45 +0000 Received: from localhost ([127.0.0.1]:51676 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lbfhY-0002V1-LQ for submit@debbugs.gnu.org; Wed, 28 Apr 2021 04:37:45 -0400 Received: from mail-pf1-f176.google.com ([209.85.210.176]:36848) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lbfhS-0002Uj-Om for 47885@debbugs.gnu.org; Wed, 28 Apr 2021 04:37:43 -0400 Received: by mail-pf1-f176.google.com with SMTP id c3so24475455pfo.3 for <47885@debbugs.gnu.org>; Wed, 28 Apr 2021 01:37:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version; bh=lDvzLBEA8+XP2xpA2a1Xwy7He2PmxTZ3ce+S38iHNg0=; b=Gs1Ehbr2LFvv4Rohu0Pj8yB/qTh94aAaXbtPOS1vusGfzY8eS7A3OdB/1paeFSOeo7 Lyj9lVV+uIz5fK6RGDmidoUCEmbi0MzkRU3yceMlvMVTxV/t4E7BxyZFx1550zdG1w2b ftyHaz9NOBX2lm91CvJmMw/aaKPemU+UMkdnJtlZ0jAwc2jzPsq0BFT2JZl/W/6TL/l2 RtAZmewaswtNHIzfXU3jnMRExT72h5pfY/T8sXZ6PE6rr4GgqeIEu9EFcHi0Wzgk+voY BEMIep81qBpetuKrAkX1gniQqhVklLj5evjmb5HeLD8OwaiZBBrPFEAbB0D3lNIhEUVV 6eEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version; bh=lDvzLBEA8+XP2xpA2a1Xwy7He2PmxTZ3ce+S38iHNg0=; b=Lhp34zcpa563ud4c+GDWfVvZ4vYkuB4+ihiq+AnQWmaNjcnMTEB5Z//RnFqiHaQ6Hm zM9SSRuxq1TCInPw8sbGve/FnJfz/XhfuU43wjo/0balvS4U2N4zbbUXeCvkLIzfj8Rf DXzqPNP2TfAOTgBSkpc48C0qg62mrMnhu8cCdUy7oa3G4+sdCypjWfk26VNejygkIa6S V1GLI1W245tcWJcKaRK87S5Rcr7Uvm7oitcx6FDQvZAFJos6PTXy13QUR9Z5JeRuRMor 249jGTGHZ38b/PMN5gUPwcA8jZToSJQZ772Mpco7cZ4x+YQtqeP0koJpiV8De1qSb0Gy E7ZQ== X-Gm-Message-State: AOAM5316NhxGoQ+NXP2u7o8UhDJwHF0YtUPwK/OM2wCzrdEnsmt3HMuS au16xxIkLrXkvwwsH6n84RULTcRJTfE= X-Google-Smtp-Source: ABdhPJxj5kiAqgPquQINHbjSBetKh8WIIGh/NILBgmSKjVw0jC+V+Ej2l1Vu/CMMUOJKbQ1+qlVGbA== X-Received: by 2002:a05:6a00:1a12:b029:272:bfa6:867f with SMTP id g18-20020a056a001a12b0290272bfa6867fmr20131640pfv.6.1619599052719; Wed, 28 Apr 2021 01:37:32 -0700 (PDT) Received: from localhost ([43.230.65.99]) by smtp.gmail.com with ESMTPSA id j23sm4586075pfh.179.2021.04.28.01.37.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Apr 2021 01:37:32 -0700 (PDT) From: Utkarsh Singh References: <87czuq9958.fsf@gmail.com> <8735vmelfs.fsf@nicolasgoaziou.fr> <87k0oyfj4y.fsf@gmail.com> <87im4h9irn.fsf@nicolasgoaziou.fr> <87zgxpwqa7.fsf@gmail.com> <875z07jx6n.fsf@nicolasgoaziou.fr> Date: Wed, 28 Apr 2021 14:07:37 +0530 In-Reply-To: <875z07jx6n.fsf@nicolasgoaziou.fr> (Nicolas Goaziou's message of "Tue, 27 Apr 2021 22:21:20 +0200") Message-ID: <87tunqby9a.fsf@gmail.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: emacs-orgmode@gnu.org List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: 47885@debbugs.gnu.org Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1619599132; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:resent-cc:resent-from:resent-sender: resent-message-id:in-reply-to:in-reply-to:references:references: list-id:list-help:list-unsubscribe:list-subscribe:list-post: dkim-signature; bh=lDvzLBEA8+XP2xpA2a1Xwy7He2PmxTZ3ce+S38iHNg0=; b=JHDRTem38pCsXNp0sHHtV60EmIXdmC/Dw0vzUjwfDTlbxx1dXr5WdlgCOI4e66miBvHTRr l+zjEMSQd2cTRoVJZ2yMHvHCiseGB1cpVYoKgWkiSoG7T5Cz5yHwwOWAfpIPRUPEoutahr 3RvQrt7eYUww9L2EwSaeBif00KS04S1VCNvO+ZKtLwKa8Shn42F1q6Ag4vgVU/6G6+IugM stAAAI4nH2yTGLjcM4Vkx37USQE6cLxSIeXs4TsxQFeRXTndBeBNjyAAml/GmzhktIw6H4 iQEU5pAj0Z4VFOvOx6iEYKVR0ko6FRxCImZaNpPBlcfEa3VFtXXXJRacNUq4yg== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1619599132; a=rsa-sha256; cv=none; b=GtmOM/99jeZ8DR/PLF2t4bUCp6uWHA6q7+8uqJn/zZYibNe2XxLRKKz3AGkNbHwFwaogtK WhXahNLsnzmC0gezGBTzclCrwN/z4v9Tlp47w/4OSW9u50d78IdrekzzH5aMC4FmjcrOlm s5c1gIX33GjZBWe22Ipq10GynJ9XOaCxkgmJ/jSRkWUJm+/daEju/4Yc/XC2JsXy4CNZwf P6K/JEy26OAtYd6HgHhN9IHUUJlaym5nNyCSiFC+JHLV/X4q4Ddg98TsIGYokWWWdGqC9d 7O79D+v2Dr+hm5MhRhJixoeWWtG2gujVQjiAPcw1ICygpO0BrTLJbEWxajZd5A== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gmail.com header.s=20161025 header.b=Gs1Ehbr2; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Migadu-Spam-Score: 0.25 Authentication-Results: aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gmail.com header.s=20161025 header.b=Gs1Ehbr2; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Migadu-Queue-Id: 14E4F1E646 X-Spam-Score: 0.25 X-Migadu-Scanner: scn0.migadu.com X-TUID: TEF3RZSuBce1 --=-=-= Content-Type: text/plain Hi, On 2021-04-27, 22:21 +0200, Nicolas Goaziou wrote: >> + When using org-table-import interactively if we failed to guess >> separator then we will be left with a user-error message and an >> 'unconverted table'. We can make use of 'temp-buffer' to import our >> file after successfully conversion. > > I'm not sure to understand what you mean. Note: I will advice you to apply patch no. 2 before trying out the following example. 1. Download the attached CSV file. We can call this example.csv 2. Go to *scratch* buffer. 3. Use 'M-x org-table-import' to import example.csv as org-table. You will see even thought org-table-guess-separator failed in guessing separator we are still left with unconverted region added to our buffer. >> + Conversion part of org-table-convert-region make a distinction between >> '(4) (comma separator) and rest of the separator we should either string >> version of comma as AND condition or rewrite to simplify it. > > Ditto. But it can be the object of another patch. Let's concentrate on > `org-table-guess-separator' first. > >> I am willing to do these possible changes but currently waiting for your >> review for org-table-guess-separator as there can be more serious bugs >> lurking around on my code which I am considering base for these >> changes. > > You should definitely write tests for this function. Here's a start: > > (ert-deftest test-org-table/guess-separator () > "Test `test-org-table/guess-separator'." > ;; Test space separator. > (should > (equal " " > (org-test-with-temp-text "a b\nc d" > (org-table-guess-separator (point-min) (point-max))))) > (should > (equal " " > (org-test-with-temp-text "a b\nc d" > (org-table-guess-separator (point-min) (point-max))))) > ;; Test "inverted" region. > (should > (equal " " > (org-test-with-temp-text "a b\nc d" > (org-table-guess-separator (point-max) (point-min))))) > ;; Do not error on empty region. > (should-not > (org-test-with-temp-text "" > (org-table-guess-separator (point-max) (point-min)))) > (should-not > (org-test-with-temp-text " \n" > (org-table-guess-separator (point-max) (point-min))))) > I will surely do more testing. I would also like to simplify the condition for guessing SPACE as separator due to following cases: + field1 'this is field2' 'this is field3' :: In this case we still have SPACE inside quote (' in this case). + Since SPACE is our last valid separator I think searching for a line which doesn't contains space is more than enough. Required patch: --=-=-= Content-Type: text/x-patch Content-Disposition: inline; filename=0001-org-table.el-org-table-import-add-yes-and-no-prompt.patch Content-Description: patch1 >From 6b112927de73c43edfd08254217808ebff42772a Mon Sep 17 00:00:00 2001 From: Utkarsh Singh Date: Wed, 28 Apr 2021 10:26:46 +0530 Subject: [PATCH 1/3] org-table.el (org-table-import): add yes-and-no prompt Add a yes and no prompt for files which don't have .txt, .tsv OR .csv as file extensions. --- lisp/org/org-table.el | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/lisp/org/org-table.el b/lisp/org/org-table.el index 0e93fb271f..e0b2be6892 100644 --- a/lisp/org/org-table.el +++ b/lisp/org/org-table.el @@ -938,7 +938,8 @@ org-table-import - regexp When a regular expression, use it to match the separator." (interactive "f\nP") (when (and (called-interactively-p 'any) - (not (string-match-p (rx "." (or "txt" "tsv" "csv") eos) file))) + (not (string-match-p (rx "." (or "txt" "tsv" "csv") eos) file)) + (not (yes-or-no-p "File does not have .txt, .tsv or .csv as extension. Do you still want to continue? "))) (user-error "Cannot import such file")) (unless (bolp) (insert "\n")) (let ((beg (point)) -- 2.31.1 --=-=-= Content-Type: text/x-patch Content-Disposition: inline; filename=0002-org-table.el-org-table-convert-region-move-out-separ.patch Content-Description: patch2 >From 9bb017cfc8284075e04faf5496ed560ba48d5bbc Mon Sep 17 00:00:00 2001 From: Utkarsh Singh Date: Wed, 28 Apr 2021 10:42:32 +0530 Subject: [PATCH 2/3] org-table.el (org-table-convert-region): move out separator-guessing 1. Move separator guessing code to org-table-guess-separator (new function). 2. Add semicolon, colon and SPACE to the list of know separator (separator which we can guess). --- lisp/org/org-table.el | 49 +++++++++++++++++++++++++++++++++---------- 1 file changed, 38 insertions(+), 11 deletions(-) diff --git a/lisp/org/org-table.el b/lisp/org/org-table.el index e0b2be6892..295f7a9b90 100644 --- a/lisp/org/org-table.el +++ b/lisp/org/org-table.el @@ -846,6 +846,39 @@ org-table-create (goto-char pos)) (org-table-align))) +(defun org-table-guess-separator (beg0 end0) + "Guess separator for region BEG0 to END0. + +List of preferred separator (in order of preference): +comma, TAB, semicolon, colon or SPACE. + +Search for a line which doesn't contain a separator if found +search again using next preferred separator or else return +separator as string." + (let* ((beg (save-excursion + (goto-char (min beg0 end0)) + (skip-chars-forward " \t\n") + (if (eobp) (point) (line-beginning-position)))) + (end (save-excursion + (goto-char (max beg0 end0)) + (skip-chars-backward " \t\n" beg) + (if (= beg (point)) (point) (line-end-position)))) + (sep-regexp + (list (list "," (rx bol (1+ (not (or ?\n ?,))) eol)) + (list "\t" (rx bol (1+ (not (or ?\n ?\t))) eol)) + (list ";" (rx bol (1+ (not (or ?\n ?\;))) eol)) + (list ":" (rx bol (1+ (not (or ?\n ?:))) eol)) + (list " " (rx bol (1+ (not (or ?\n ?\s))) eol))))) + (unless (= beg end) + (save-excursion + (goto-char beg) + (catch :found + (pcase-dolist (`(,sep ,regexp) sep-regexp) + (save-excursion + (unless (re-search-forward regexp end t) + (throw :found sep)))) + nil))))) + ;;;###autoload (defun org-table-convert-region (beg0 end0 &optional separator) "Convert region to a table. @@ -862,10 +895,7 @@ org-table-convert-region integer When a number, use that many spaces, or a TAB, as field separator regexp When a regular expression, use it to match the separator nil When nil, the command tries to be smart and figure out the - separator in the following way: - - when each line contains a TAB, assume TAB-separated material - - when each line contains a comma, assume CSV material - - else, assume one or more SPACE characters as separator." + separator using `org-table-guess-seperator'." (interactive "r\nP") (let* ((beg (min beg0 end0)) (end (max beg0 end0)) @@ -882,13 +912,10 @@ org-table-convert-region (if (bolp) (backward-char 1) (end-of-line 1)) (setq end (point-marker)) ;; Get the right field separator - (unless separator - (goto-char beg) - (setq separator - (cond - ((not (re-search-forward "^[^\n\t]+$" end t)) '(16)) - ((not (re-search-forward "^[^\n,]+$" end t)) '(4)) - (t 1)))) + (when (and (not separator) + (not (setq separator + (org-table-guess-separator beg end)))) + (user-error "Failed to guess separator")) (goto-char beg) (if (equal separator '(4)) (while (< (point) end) -- 2.31.1 --=-=-= Content-Type: text/x-patch Content-Disposition: inline; filename=0003-org-table.el-org-table-import-add-file-prompt.patch Content-Description: patch3 >From fef97ffe27ff908647c45f1b066a845e71a0926f Mon Sep 17 00:00:00 2001 From: Utkarsh Singh Date: Wed, 28 Apr 2021 14:01:31 +0530 Subject: [PATCH 3/3] org-table.el (org-table-import): add file prompt --- lisp/org/org-table.el | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/lisp/org/org-table.el b/lisp/org/org-table.el index 295f7a9b90..e904903576 100644 --- a/lisp/org/org-table.el +++ b/lisp/org/org-table.el @@ -963,7 +963,8 @@ org-table-import - (64) Prompt for a regular expression as field separator. - integer When a number, use that many spaces, or a TAB, as field separator. - regexp When a regular expression, use it to match the separator." - (interactive "f\nP") + (interactive (list (read-file-name "Import file: ") + (prefix-numeric-value current-prefix-arg))) (when (and (called-interactively-p 'any) (not (string-match-p (rx "." (or "txt" "tsv" "csv") eos) file)) (not (yes-or-no-p "File does not have .txt, .tsv or .csv as extension. Do you still want to continue? "))) -- 2.31.1 --=-=-= Content-Type: application/octet-stream Content-Disposition: attachment; filename=example.csv Content-Transfer-Encoding: base64 Content-Description: csv file dGhpcyxpcyxhbixleGFtcGxlLGNzdgpvbixmaXJzdCx0d28sbGluZSx3ZSxoYXZlLGNvbW1hLGFz LHNlcGVyYXRvcgpidXQ6bm93OndlOnN3aXRjaGVkOnRvOmNvbG9uOmFzOnNlcGFyYXRvcgo= --=-=-= Content-Type: text/plain -- Utkarsh Singh http://utkarshsingh.xyz --=-=-=--