From: Utkarsh Singh <utkarsh190601@gmail.com>
To: Nicolas Goaziou <mail@nicolasgoaziou.fr>
Cc: 47885@debbugs.gnu.org
Subject: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use
Date: Wed, 28 Apr 2021 14:07:37 +0530 [thread overview]
Message-ID: <87tunqby9a.fsf@gmail.com> (raw)
In-Reply-To: <875z07jx6n.fsf@nicolasgoaziou.fr> (Nicolas Goaziou's message of "Tue, 27 Apr 2021 22:21:20 +0200")
[-- Attachment #1: Type: text/plain, Size: 2733 bytes --]
Hi,
On 2021-04-27, 22:21 +0200, Nicolas Goaziou <mail@nicolasgoaziou.fr> wrote:
>> + When using org-table-import interactively if we failed to guess
>> separator then we will be left with a user-error message and an
>> 'unconverted table'. We can make use of 'temp-buffer' to import our
>> file after successfully conversion.
>
> I'm not sure to understand what you mean.
Note: I will advice you to apply patch no. 2 before trying out the
following example.
1. Download the attached CSV file. We can call this example.csv
2. Go to *scratch* buffer.
3. Use 'M-x org-table-import' to import example.csv as org-table.
You will see even thought org-table-guess-separator failed in guessing
separator we are still left with unconverted region added to our buffer.
>> + Conversion part of org-table-convert-region make a distinction between
>> '(4) (comma separator) and rest of the separator we should either string
>> version of comma as AND condition or rewrite to simplify it.
>
> Ditto. But it can be the object of another patch. Let's concentrate on
> `org-table-guess-separator' first.
>
>> I am willing to do these possible changes but currently waiting for your
>> review for org-table-guess-separator as there can be more serious bugs
>> lurking around on my code which I am considering base for these
>> changes.
>
> You should definitely write tests for this function. Here's a start:
>
> (ert-deftest test-org-table/guess-separator ()
> "Test `test-org-table/guess-separator'."
> ;; Test space separator.
> (should
> (equal " "
> (org-test-with-temp-text "a b\nc d"
> (org-table-guess-separator (point-min) (point-max)))))
> (should
> (equal " "
> (org-test-with-temp-text "a b\nc d"
> (org-table-guess-separator (point-min) (point-max)))))
> ;; Test "inverted" region.
> (should
> (equal " "
> (org-test-with-temp-text "a b\nc d"
> (org-table-guess-separator (point-max) (point-min)))))
> ;; Do not error on empty region.
> (should-not
> (org-test-with-temp-text ""
> (org-table-guess-separator (point-max) (point-min))))
> (should-not
> (org-test-with-temp-text " \n"
> (org-table-guess-separator (point-max) (point-min)))))
>
I will surely do more testing.
I would also like to simplify the condition for guessing SPACE as
separator due to following cases:
+ field1 'this is field2' 'this is field3' :: In this case we still have
SPACE inside quote (' in this case).
+ Since SPACE is our last valid separator I think searching for a line
which doesn't contains space is more than enough.
Required patch:
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch1 --]
[-- Type: text/x-patch, Size: 1110 bytes --]
From 6b112927de73c43edfd08254217808ebff42772a Mon Sep 17 00:00:00 2001
From: Utkarsh Singh <utkarsh190601@gmail.com>
Date: Wed, 28 Apr 2021 10:26:46 +0530
Subject: [PATCH 1/3] org-table.el (org-table-import): add yes-and-no prompt
Add a yes and no prompt for files which don't have .txt, .tsv OR .csv
as file extensions.
---
lisp/org/org-table.el | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/lisp/org/org-table.el b/lisp/org/org-table.el
index 0e93fb271f..e0b2be6892 100644
--- a/lisp/org/org-table.el
+++ b/lisp/org/org-table.el
@@ -938,7 +938,8 @@ org-table-import
- regexp When a regular expression, use it to match the separator."
(interactive "f\nP")
(when (and (called-interactively-p 'any)
- (not (string-match-p (rx "." (or "txt" "tsv" "csv") eos) file)))
+ (not (string-match-p (rx "." (or "txt" "tsv" "csv") eos) file))
+ (not (yes-or-no-p "File does not have .txt, .tsv or .csv as extension. Do you still want to continue? ")))
(user-error "Cannot import such file"))
(unless (bolp) (insert "\n"))
(let ((beg (point))
--
2.31.1
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: patch2 --]
[-- Type: text/x-patch, Size: 3523 bytes --]
From 9bb017cfc8284075e04faf5496ed560ba48d5bbc Mon Sep 17 00:00:00 2001
From: Utkarsh Singh <utkarsh190601@gmail.com>
Date: Wed, 28 Apr 2021 10:42:32 +0530
Subject: [PATCH 2/3] org-table.el (org-table-convert-region): move out
separator-guessing
1. Move separator guessing code to org-table-guess-separator (new
function).
2. Add semicolon, colon and SPACE to the list of know separator
(separator which we can guess).
---
lisp/org/org-table.el | 49 +++++++++++++++++++++++++++++++++----------
1 file changed, 38 insertions(+), 11 deletions(-)
diff --git a/lisp/org/org-table.el b/lisp/org/org-table.el
index e0b2be6892..295f7a9b90 100644
--- a/lisp/org/org-table.el
+++ b/lisp/org/org-table.el
@@ -846,6 +846,39 @@ org-table-create
(goto-char pos))
(org-table-align)))
+(defun org-table-guess-separator (beg0 end0)
+ "Guess separator for region BEG0 to END0.
+
+List of preferred separator (in order of preference):
+comma, TAB, semicolon, colon or SPACE.
+
+Search for a line which doesn't contain a separator if found
+search again using next preferred separator or else return
+separator as string."
+ (let* ((beg (save-excursion
+ (goto-char (min beg0 end0))
+ (skip-chars-forward " \t\n")
+ (if (eobp) (point) (line-beginning-position))))
+ (end (save-excursion
+ (goto-char (max beg0 end0))
+ (skip-chars-backward " \t\n" beg)
+ (if (= beg (point)) (point) (line-end-position))))
+ (sep-regexp
+ (list (list "," (rx bol (1+ (not (or ?\n ?,))) eol))
+ (list "\t" (rx bol (1+ (not (or ?\n ?\t))) eol))
+ (list ";" (rx bol (1+ (not (or ?\n ?\;))) eol))
+ (list ":" (rx bol (1+ (not (or ?\n ?:))) eol))
+ (list " " (rx bol (1+ (not (or ?\n ?\s))) eol)))))
+ (unless (= beg end)
+ (save-excursion
+ (goto-char beg)
+ (catch :found
+ (pcase-dolist (`(,sep ,regexp) sep-regexp)
+ (save-excursion
+ (unless (re-search-forward regexp end t)
+ (throw :found sep))))
+ nil)))))
+
;;;###autoload
(defun org-table-convert-region (beg0 end0 &optional separator)
"Convert region to a table.
@@ -862,10 +895,7 @@ org-table-convert-region
integer When a number, use that many spaces, or a TAB, as field separator
regexp When a regular expression, use it to match the separator
nil When nil, the command tries to be smart and figure out the
- separator in the following way:
- - when each line contains a TAB, assume TAB-separated material
- - when each line contains a comma, assume CSV material
- - else, assume one or more SPACE characters as separator."
+ separator using `org-table-guess-seperator'."
(interactive "r\nP")
(let* ((beg (min beg0 end0))
(end (max beg0 end0))
@@ -882,13 +912,10 @@ org-table-convert-region
(if (bolp) (backward-char 1) (end-of-line 1))
(setq end (point-marker))
;; Get the right field separator
- (unless separator
- (goto-char beg)
- (setq separator
- (cond
- ((not (re-search-forward "^[^\n\t]+$" end t)) '(16))
- ((not (re-search-forward "^[^\n,]+$" end t)) '(4))
- (t 1))))
+ (when (and (not separator)
+ (not (setq separator
+ (org-table-guess-separator beg end))))
+ (user-error "Failed to guess separator"))
(goto-char beg)
(if (equal separator '(4))
(while (< (point) end)
--
2.31.1
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #4: patch3 --]
[-- Type: text/x-patch, Size: 1103 bytes --]
From fef97ffe27ff908647c45f1b066a845e71a0926f Mon Sep 17 00:00:00 2001
From: Utkarsh Singh <utkarsh190601@gmail.com>
Date: Wed, 28 Apr 2021 14:01:31 +0530
Subject: [PATCH 3/3] org-table.el (org-table-import): add file prompt
---
lisp/org/org-table.el | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/lisp/org/org-table.el b/lisp/org/org-table.el
index 295f7a9b90..e904903576 100644
--- a/lisp/org/org-table.el
+++ b/lisp/org/org-table.el
@@ -963,7 +963,8 @@ org-table-import
- (64) Prompt for a regular expression as field separator.
- integer When a number, use that many spaces, or a TAB, as field separator.
- regexp When a regular expression, use it to match the separator."
- (interactive "f\nP")
+ (interactive (list (read-file-name "Import file: ")
+ (prefix-numeric-value current-prefix-arg)))
(when (and (called-interactively-p 'any)
(not (string-match-p (rx "." (or "txt" "tsv" "csv") eos) file))
(not (yes-or-no-p "File does not have .txt, .tsv or .csv as extension. Do you still want to continue? ")))
--
2.31.1
[-- Attachment #5: csv file --]
[-- Type: application/octet-stream, Size: 110 bytes --]
this,is,an,example,csv
on,first,two,line,we,have,comma,as,seperator
but:now:we:switched:to:colon:as:separator
[-- Attachment #6: Type: text/plain, Size: 43 bytes --]
--
Utkarsh Singh
http://utkarshsingh.xyz
next prev parent reply other threads:[~2021-04-28 8:38 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-19 4:43 [PATCH] org-table-import: Make it more smarter for interactive use Utkarsh Singh
2021-04-19 8:19 ` Nicolas Goaziou
2021-04-19 14:23 ` Utkarsh Singh
2021-04-20 13:40 ` Nicolas Goaziou
2021-04-20 17:15 ` Utkarsh Singh
2021-04-23 4:58 ` Utkarsh Singh
2021-04-27 20:21 ` bug#47885: " Nicolas Goaziou
2021-04-28 8:37 ` Utkarsh Singh [this message]
2021-04-28 16:38 ` Maxim Nikulin
2021-05-10 18:36 ` Utkarsh Singh
2021-05-12 17:08 ` Maxim Nikulin
2021-05-14 14:54 ` Utkarsh Singh
2021-05-15 9:13 ` Bastien
2021-05-15 10:10 ` Utkarsh Singh
2021-05-15 10:30 ` Bastien
2021-05-15 11:09 ` Utkarsh Singh
2021-05-17 5:29 ` Bastien
2021-05-17 16:27 ` Utkarsh Singh
2021-06-01 16:23 ` Maxim Nikulin
2021-06-01 17:46 ` Utkarsh Singh
2021-06-02 12:06 ` Maxim Nikulin
2021-06-02 15:08 ` Utkarsh Singh
2021-06-02 16:44 ` Maxim Nikulin
2021-06-04 4:04 ` Utkarsh Singh
2021-06-05 12:40 ` Maxim Nikulin
2021-06-05 17:50 ` Utkarsh Singh
2021-06-09 12:15 ` Maxim Nikulin
2021-09-26 8:40 ` Bastien
2021-05-16 16:24 ` Maxim Nikulin
2021-05-17 16:30 ` Utkarsh Singh
2021-05-18 10:24 ` Utkarsh Singh
2021-05-18 12:31 ` Maxim Nikulin
2021-05-18 15:05 ` Utkarsh Singh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.orgmode.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87tunqby9a.fsf@gmail.com \
--to=utkarsh190601@gmail.com \
--cc=47885@debbugs.gnu.org \
--cc=mail@nicolasgoaziou.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).