emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* [PATCH] org-table-import: Make it more smarter for interactive use
@ 2021-04-19  4:43 Utkarsh Singh
  2021-04-19  8:19 ` Nicolas Goaziou
  0 siblings, 1 reply; 33+ messages in thread
From: Utkarsh Singh @ 2021-04-19  4:43 UTC (permalink / raw)
  To: emacs-orgmode, bug-gnu-emacs

Hi,

My previous patch proposed to add support for importing file with
arbitrary name and building upon that this patch tries to make use of it
by making org-table-import smarter by simply adding more separators
(delimiters).

Currently org-table-import 'smartly' guesses only COMMA, TAB and SPACE
as separator whereas this patch tries to add support for ';'(SEMICOLON)
and ':' (COLON).

Here is an example org-table generated using =M-x org-table-import=
/etc/passwd (uses COLON as separator) with private information removed.

| bin                    | x |     1 |     1 |                                | /                     | /usr/bin/nologin   |
| daemon                 | x |     2 |     2 |                                | /                     | /usr/bin/nologin   |
| mail                   | x |     8 |    12 |                                | /var/spool/mail       | /usr/bin/nologin   |
| ftp                    | x |    14 |    11 |                                | /srv/ftp              | /usr/bin/nologin   |
| http                   | x |    33 |    33 |                                | /srv/http             | /usr/bin/nologin   |
| nobody                 | x | 65534 | 65534 | Nobody                         | /                     | /usr/bin/nologin   |
| dbus                   | x |    81 |    81 | System Message Bus             | /                     | /usr/bin/nologin   |
| systemd-journal-remote | x |   981 |   981 | systemd Journal Remote         | /                     | /usr/bin/nologin   |
| systemd-network        | x |   980 |   980 | systemd Network Management     | /                     | /usr/bin/nologin   |
| systemd-oom            | x |   979 |   979 | systemd Userspace OOM Killer   | /                     | /usr/bin/nologin   |
| systemd-resolve        | x |   978 |   978 | systemd Resolver               | /                     | /usr/bin/nologin   |
| systemd-timesync       | x |   977 |   977 | systemd Time Synchronization   | /                     | /usr/bin/nologin   |
| systemd-coredump       | x |   976 |   976 | systemd Core Dumper            | /                     | /usr/bin/nologin   |
| avahi                  | x |   974 |   974 | Avahi mDNS/DNS-SD daemon       | /                     | /usr/bin/nologin   |
| colord                 | x |   973 |   973 | Color management daemon        | /var/lib/colord       | /usr/bin/nologin   |
| rtkit                  | x |   133 |   133 | RealtimeKit                    | /proc                 | /usr/bin/nologin   |
| transmission           | x |   169 |   169 | Transmission BitTorrent Daemon | /var/lib/transmission | /usr/bin/nologin   |
| geoclue                | x |   972 |   972 | Geoinformation service         | /var/lib/geoclue      | /usr/bin/nologin   |
| usbmux                 | x |   140 |   140 | usbmux user                    | /                     | /usr/bin/nologin   |


diff --git a/lisp/org/org-table.el b/lisp/org/org-table.el
index ab66859d6a..5ee4af612b 100644
--- a/lisp/org/org-table.el
+++ b/lisp/org/org-table.el
@@ -846,6 +846,35 @@ org-table-create
       (goto-char pos))
     (org-table-align)))
 
+
+(defun org-table-guess-separator (beg0 end0)
+  "Guess separator for `org-table-convert-region' for region BEG0 to END0.
+
+List of preferred separator:
+comma, TAB, ';', ':' or SPACE
+
+If region contains a line which doesn't contain the required
+separator then discard the separator and search again using next
+separator."
+  (let ((beg (save-excursion
+	       (goto-char (min beg0 end0))
+	       (beginning-of-line 1)
+	       (point)))
+	(end (save-excursion
+	       (goto-char (max beg0 end0))
+	       (end-of-line 1)
+	       (if (bolp) (backward-char 1) (end-of-line 1))
+	       (point))))
+    (save-excursion
+      (goto-char beg)
+      (cond
+       ((not (re-search-forward "^[^\n,]+$" end t)) '(4))
+       ((not (re-search-forward "^[^\n\t]+$" end t)) '(16))
+       ((not (re-search-forward "^[^\n;]+$" end t)) ";")
+       ((not (re-search-forward "^[^\n:]+$" end t)) ":")
+       ((not (re-search-forward "^\\([^'\"][^\n\s][^'\"]\\)+$" end t)) " ")
+       (t nil)))))
+
 ;;;###autoload
 (defun org-table-convert-region (beg0 end0 &optional separator)
   "Convert region to a table.
@@ -862,10 +891,7 @@ org-table-convert-region
 integer  When a number, use that many spaces, or a TAB, as field separator
 regexp   When a regular expression, use it to match the separator
 nil      When nil, the command tries to be smart and figure out the
-         separator in the following way:
-         - when each line contains a TAB, assume TAB-separated material
-         - when each line contains a comma, assume CSV material
-         - else, assume one or more SPACE characters as separator."
+         separator using `org-table-guess-seperator'."
   (interactive "r\nP")
   (let* ((beg (min beg0 end0))
 	 (end (max beg0 end0))
@@ -881,14 +907,9 @@ org-table-convert-region
       (goto-char end)
       (if (bolp) (backward-char 1) (end-of-line 1))
       (setq end (point-marker))
-      ;; Get the right field separator
-      (unless separator
-	(goto-char beg)
-	(setq separator
-	      (cond
-	       ((not (re-search-forward "^[^\n\t]+$" end t)) '(16))
-	       ((not (re-search-forward "^[^\n,]+$" end t)) '(4))
-	       (t 1))))
+      (if (and (not separator)
+               (not (setq separator (org-table-guess-separator beg end))))
+          (error "Unable to guess suitable separator."))
       (goto-char beg)
       (if (equal separator '(4))
 	  (while (< (point) end)
@@ -921,12 +942,8 @@ org-table-convert-region
 (defun org-table-import (file separator)
   "Import FILE as a table.
 
-The command tries to be smart and figure out the separator in the
-following way:
-
-- when each line contains a TAB, assume TAB-separated material;
-- when each line contains a comma, assume CSV material;
-- else, assume one or more SPACE characters as separator.
+The command tries to be smart and figure out the separator using
+`org-table-guess-seperator'.
 
 When non-nil, SEPARATOR specifies the field separator in the
 lines.  It can have the following values:

-- 
Utkarsh Singh
http://utkarshsingh.xyz


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH] org-table-import: Make it more smarter for interactive use
  2021-04-19  4:43 [PATCH] org-table-import: Make it more smarter for interactive use Utkarsh Singh
@ 2021-04-19  8:19 ` Nicolas Goaziou
  2021-04-19 14:23   ` Utkarsh Singh
  0 siblings, 1 reply; 33+ messages in thread
From: Nicolas Goaziou @ 2021-04-19  8:19 UTC (permalink / raw)
  To: Utkarsh Singh; +Cc: bug-gnu-emacs, emacs-orgmode

Hello,

Utkarsh Singh <utkarsh190601@gmail.com> writes:

> My previous patch proposed to add support for importing file with
> arbitrary name and building upon that this patch tries to make use of it
> by making org-table-import smarter by simply adding more separators
> (delimiters).

Good idea, thank you. Some comments follow.

> +(defun org-table-guess-separator (beg0 end0)
> +  "Guess separator for `org-table-convert-region' for region BEG0 to END0.
> +
> +List of preferred separator:
> +comma, TAB, ';', ':' or SPACE

I suggest to use full names everywhere: comma, TAB, semicolon, colon, or
SPACE.

> +If region contains a line which doesn't contain the required
> +separator then discard the separator and search again using next
> +separator."
> +  (let ((beg (save-excursion
> +	       (goto-char (min beg0 end0))
> +	       (beginning-of-line 1)
> +	       (point)))

  (beginning-of-line 1) + (point) -> (line-beginning-position)

since you don't intent to move point.

> +	(end (save-excursion
> +	       (goto-char (max beg0 end0))
> +	       (end-of-line 1)
> +	       (if (bolp) (backward-char 1) (end-of-line 1))

I'm not sure about what you mean above. First, the second call to
end-of-line is useless, since you're already at the end of the line.
Second, what is wrong if point is at an empty line? Why do you want to
move it back?

> +	       (point))))

You may want to use `line-end-position'.

> +    (save-excursion
> +      (goto-char beg)
> +      (cond
> +       ((not (re-search-forward "^[^\n,]+$" end t)) '(4))
> +       ((not (re-search-forward "^[^\n\t]+$" end t)) '(16))
> +       ((not (re-search-forward "^[^\n;]+$" end t)) ";")
> +       ((not (re-search-forward "^[^\n:]+$" end t)) ":")
> +       ((not (re-search-forward "^\\([^'\"][^\n\s][^'\"]\\)+$" end t)) " ")
> +       (t nil)))))

I think you need to wrap `save-excursion' around each
`re-search-forward' call. Otherwise each test starts at the first line
containing the separator previously tested.
> +
>  ;;;###autoload
>  (defun org-table-convert-region (beg0 end0 &optional separator)
>    "Convert region to a table.
> @@ -862,10 +891,7 @@ org-table-convert-region
>  integer  When a number, use that many spaces, or a TAB, as field separator
>  regexp   When a regular expression, use it to match the separator
>  nil      When nil, the command tries to be smart and figure out the
> -         separator in the following way:
> -         - when each line contains a TAB, assume TAB-separated material
> -         - when each line contains a comma, assume CSV material
> -         - else, assume one or more SPACE characters as separator."
> +         separator using `org-table-guess-seperator'."

I wonder if creating a new function is warranted here. You could add the
news checks after those already present in the code, couldn't you?


Regards,
-- 
Nicolas Goaziou


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] org-table-import: Make it more smarter for interactive use
  2021-04-19  8:19 ` Nicolas Goaziou
@ 2021-04-19 14:23   ` Utkarsh Singh
  2021-04-20 13:40     ` Nicolas Goaziou
  0 siblings, 1 reply; 33+ messages in thread
From: Utkarsh Singh @ 2021-04-19 14:23 UTC (permalink / raw)
  To: Nicolas Goaziou; +Cc: 47885, emacs-orgmode

On 2021-04-19, 10:19 +0200, Nicolas Goaziou <mail@nicolasgoaziou.fr> wrote:

>> My previous patch proposed to add support for importing file with
>> arbitrary name and building upon that this patch tries to make use of it
>> by making org-table-import smarter by simply adding more separators
>> (delimiters).
>
> Good idea, thank you. Some comments follow.
>
>> +(defun org-table-guess-separator (beg0 end0)
>> +  "Guess separator for `org-table-convert-region' for region BEG0 to END0.
>> +
>> +List of preferred separator:
>> +comma, TAB, ';', ':' or SPACE
>
> I suggest to use full names everywhere: comma, TAB, semicolon, colon, or
> SPACE.
>
>> +If region contains a line which doesn't contain the required
>> +separator then discard the separator and search again using next
>> +separator."
>> +  (let ((beg (save-excursion
>> +	       (goto-char (min beg0 end0))
>> +	       (beginning-of-line 1)
>> +	       (point)))
>
>   (beginning-of-line 1) + (point) -> (line-beginning-position)
>
> since you don't intent to move point.
>
>> +	(end (save-excursion
>> +	       (goto-char (max beg0 end0))
>> +	       (end-of-line 1)
>> +	       (if (bolp) (backward-char 1) (end-of-line 1))
>
> I'm not sure about what you mean above. First, the second call to
> end-of-line is useless, since you're already at the end of the line.
> Second, what is wrong if point is at an empty line? Why do you want to
> move it back?
>
>> +	       (point))))
>
> You may want to use `line-end-position'.
>
>> +    (save-excursion
>> +      (goto-char beg)
>> +      (cond
>> +       ((not (re-search-forward "^[^\n,]+$" end t)) '(4))
>> +       ((not (re-search-forward "^[^\n\t]+$" end t)) '(16))
>> +       ((not (re-search-forward "^[^\n;]+$" end t)) ";")
>> +       ((not (re-search-forward "^[^\n:]+$" end t)) ":")
>> +       ((not (re-search-forward "^\\([^'\"][^\n\s][^'\"]\\)+$" end t)) " ")
>> +       (t nil)))))
>
> I think you need to wrap `save-excursion' around each
> `re-search-forward' call. Otherwise each test starts at the first line
> containing the separator previously tested.
>> +
>>  ;;;###autoload
>>  (defun org-table-convert-region (beg0 end0 &optional separator)
>>    "Convert region to a table.
>> @@ -862,10 +891,7 @@ org-table-convert-region
>>  integer  When a number, use that many spaces, or a TAB, as field separator
>>  regexp   When a regular expression, use it to match the separator
>>  nil      When nil, the command tries to be smart and figure out the
>> -         separator in the following way:
>> -         - when each line contains a TAB, assume TAB-separated material
>> -         - when each line contains a comma, assume CSV material
>> -         - else, assume one or more SPACE characters as separator."
>> +         separator using `org-table-guess-seperator'."

Thanks for reviewing the patch!

> I wonder if creating a new function is warranted here. You could add the
> news checks after those already present in the code, couldn't you?

At first I was also reluctant in creating a new function but decided to
do so because:

+ org-table-convert-region is currently doing two thing 'guessing the
separator' and 'converting the region'.  I thought it was a good idea to
separate out function into it's atomic operations.

+ Current guessing technique is quite basic as it assumes that data
(file that has to be imported) has no error/inconsistency in it.  I
would like to show you the doc string of Python's CSV library
implementation to guess separator (region inside """):

"""
Looks for text enclosed between two identical quotes
(the probable quotechar) which are preceded and followed
by the same character (the probable delimiter).
For example:
    ,'some text',
The quote with the most wins, same with the delimiter.
If there is no quotechar the delimiter can't be determined
this way.
"""

And if this functions fails then we have:

"""
The delimiter /should/ occur the same number of times on
each row. However, due to malformed data, it may not. We don't want
an all or nothing approach, so we allow for small variations in this
number.
1) build a table of the frequency of each character on every line.
2) build a table of frequencies of this frequency (meta-frequency?),
e.g.  'x occurred 5 times in 10 rows, 6 times in 1000 rows,
7 times in 2 rows'
3) use the mode of the meta-frequency to determine the /expected/
frequency for that character
4) find out how often the character actually meets that goal
5) the character that best meets its goal is the delimiter
For performance reasons, the data is evaluated in chunks, so it can
try and evaluate the smallest portion of the data possible, evaluating
additional chunks as necessary.
"""

I tried to do similar in Elisp but currently facing some issues due to
my inexperience in functional programming.  Also moving the 'guessing'
part out the function may lead to development of even better algorithm
than Python counterpart.

Modified version of concerned function:

(defun org-table-guess-separator (beg0 end0)
  "Guess separator for `org-table-convert-region' for region BEG0 to END0.

List of preferred separator:
comma, TAB, semicolon, colon or SPACE.

If region contains a line which doesn't contain the required
separator then discard the separator and search again using next
separator."
  (let* ((beg (save-excursion
		(goto-char (min beg0 end0))
		(line-beginning-position)))
	 (end (save-excursion
		(goto-char (max beg0 end0))
		(line-end-position)))
	 (sep-rexp '((","  "^[^\n,]+$")
		     ("\t" "^[^\n\t]+$")
		     (";"  "^[^\n;]+$")
		     (":"  "^[^\n:]+$")
		     (" "  "^\\([^'\"][^\n\s][^'\"]\\)+$")))
	 (tmp (car sep-rexp))
	 sep)
    (save-excursion
      (goto-char beg)
      (while (and (not sep)
		  (if (save-excursion
			(not (re-search-forward (nth 1 tmp) end t)))
		      (setq sep (nth 0 tmp))
		    (setq sep-rexp (cdr sep-rexp))
		    (setq tmp (car sep-rexp)))))
    sep)))

Version without using iteration:

(defun org-table-guess-separator (beg0 end0)
  "Guess separator for `org-table-convert-region' for region BEG0 to END0.

List of preferred separator:
COMMA, TAB, SEMICOLON, COLON or SPACE.

If region contains a line which doesn't contain the required
separator then discard the separator and search again using next
separator."
  (let ((beg (save-excursion
	       (goto-char (min beg0 end0))
	       (line-beginning-position)))
	(end (save-excursion
	       (goto-char (max beg0 end0))
	       (line-end-position))))
    (save-excursion
      (goto-char beg)
      (cond
       ((save-excursion (not (re-search-forward "^[^\n,]+$" end t))) ",")
       ((save-excursion (not (re-search-forward "^[^\n\t]+$" end t))) "\t")
       ((save-excursion (not (re-search-forward "^[^\n;]+$" end t))) ";")
       ((save-excursion (not (re-search-forward "^[^\n:]+$" end t))) ":")
       ((save-excursion (not (re-search-forward "^\\([^'\"][^\n\s][^'\"]\\)+$" end t))) " ")
       (t nil)))))

--
Utkarsh Singh
http://utkarshsingh.xyz


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] org-table-import: Make it more smarter for interactive use
  2021-04-19 14:23   ` Utkarsh Singh
@ 2021-04-20 13:40     ` Nicolas Goaziou
  2021-04-20 17:15       ` Utkarsh Singh
  2021-04-23  4:58       ` Utkarsh Singh
  0 siblings, 2 replies; 33+ messages in thread
From: Nicolas Goaziou @ 2021-04-20 13:40 UTC (permalink / raw)
  To: Utkarsh Singh; +Cc: 47885, emacs-orgmode

Hello,

Utkarsh Singh <utkarsh190601@gmail.com> writes:

> At first I was also reluctant in creating a new function but decided to
> do so because:
>
> + org-table-convert-region is currently doing two thing 'guessing the
> separator' and 'converting the region'.  I thought it was a good idea to
> separate out function into it's atomic operations.

I understand, but there is sometimes a (difficult) line to draw between
"separating concerns" and "function proliferation". Anyway, that's fine
here.

> + Current guessing technique is quite basic as it assumes that data
> (file that has to be imported) has no error/inconsistency in it.  I
> would like to show you the doc string of Python's CSV library
> implementation to guess separator (region inside """):
>
> """
> Looks for text enclosed between two identical quotes
> (the probable quotechar) which are preceded and followed
> by the same character (the probable delimiter).
> For example:
>     ,'some text',
> The quote with the most wins, same with the delimiter.
> If there is no quotechar the delimiter can't be determined
> this way.
> """
>
> And if this functions fails then we have:
>
> """
> The delimiter /should/ occur the same number of times on
> each row. However, due to malformed data, it may not. We don't want
> an all or nothing approach, so we allow for small variations in this
> number.
> 1) build a table of the frequency of each character on every line.
> 2) build a table of frequencies of this frequency (meta-frequency?),
> e.g.  'x occurred 5 times in 10 rows, 6 times in 1000 rows,
> 7 times in 2 rows'
> 3) use the mode of the meta-frequency to determine the /expected/
> frequency for that character
> 4) find out how often the character actually meets that goal
> 5) the character that best meets its goal is the delimiter
> For performance reasons, the data is evaluated in chunks, so it can
> try and evaluate the smallest portion of the data possible, evaluating
> additional chunks as necessary.
> """

For the problem we're trying to solve, this sounds like over-engineering
to me. Do we want so badly to guess a separator?

> I tried to do similar in Elisp but currently facing some issues due to
> my inexperience in functional programming.  Also moving the 'guessing'
> part out the function may lead to development of even better algorithm
> than Python counterpart.
>
> Modified version of concerned function:
>
> (defun org-table-guess-separator (beg0 end0)
>   "Guess separator for `org-table-convert-region' for region BEG0 to END0.
>
> List of preferred separator:
> comma, TAB, semicolon, colon or SPACE.
>
> If region contains a line which doesn't contain the required
> separator then discard the separator and search again using next
> separator."
>   (let* ((beg (save-excursion
> 		(goto-char (min beg0 end0))
> 		(line-beginning-position)))
> 	 (end (save-excursion
> 		(goto-char (max beg0 end0))
> 		(line-end-position)))

Thinking again about it, this needs extra care, as end0 might end up on
an empty line. You tried to avoid this in your first function, but
I think this was not sufficient either. Actually, beg0 could also start
on an empty line.

This needs to be tested extensively, but as a first approximation,
I think `beg' needs to be defined as:

  (save-excursion
    (goto-char (min beg0 end0))
    (skip-chars-forward " \t\n")
    (if (eobp) (point) (line-beginning-position)))

and `end' as

  (save-excursion
    (goto-char (max beg end0))
    (skip-chars-backward " \t\n" beg)
    (if (= beg (point)) (point) (line-end-position)))

Then you need to bail out if beg = end.

> 	 (sep-rexp '((","  "^[^\n,]+$")

sep-rexp -> sep-regexp

> 		     ("\t" "^[^\n\t]+$")
> 		     (";"  "^[^\n;]+$")
> 		     (":"  "^[^\n:]+$")
> 		     (" "  "^\\([^'\"][^\n\s][^'\"]\\)+$")))

At this point, I suggest to use `rx' macro instead.

> 	 (tmp (car sep-rexp))
> 	 sep)
>     (save-excursion
>       (goto-char beg)
>       (while (and (not sep)
> 		  (if (save-excursion
> 			(not (re-search-forward (nth 1 tmp) end t)))
> 		      (setq sep (nth 0 tmp))
> 		    (setq sep-rexp (cdr sep-rexp))
> 		    (setq tmp (car sep-rexp)))))

I suggest this (yes, I like pattern-matching, `car' and `cdr' are so
80's) instead:

  (save-excursion
    (goto-char beg)
    (catch :found
      (pcase-dolist (`(,sep ,regexp) sep-regexp)
        (save-excursion
          (unless (re-search-forward regexp end t)
          (throw :found sep))))
    nil))

Again all this needs to extensively tested, as there are a lot of
dangers lurking around.

Regards,
-- 
Nicolas Goaziou


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] org-table-import: Make it more smarter for interactive use
  2021-04-20 13:40     ` Nicolas Goaziou
@ 2021-04-20 17:15       ` Utkarsh Singh
  2021-04-23  4:58       ` Utkarsh Singh
  1 sibling, 0 replies; 33+ messages in thread
From: Utkarsh Singh @ 2021-04-20 17:15 UTC (permalink / raw)
  To: Nicolas Goaziou; +Cc: 47885, emacs-orgmode

Hi,

On 2021-04-20, 15:40 +0200, Nicolas Goaziou <mail@nicolasgoaziou.fr> wrote:

> For the problem we're trying to solve, this sounds like over-engineering
> to me. Do we want so badly to guess a separator?

Earlier I took is as an assignment to learn Elisp but now I don't think
we should increase complexity this much.

> Thinking again about it, this needs extra care, as end0 might end up on
> an empty line. You tried to avoid this in your first function, but
> I think this was not sufficient either. Actually, beg0 could also start
> on an empty line.
>
> This needs to be tested extensively, but as a first approximation,
> I think `beg' needs to be defined as:
>
>   (save-excursion
>     (goto-char (min beg0 end0))
>     (skip-chars-forward " \t\n")
>     (if (eobp) (point) (line-beginning-position)))
>
> and `end' as
>
>   (save-excursion
>     (goto-char (max beg end0))
>     (skip-chars-backward " \t\n" beg)
>     (if (= beg (point)) (point) (line-end-position)))
>
> Then you need to bail out if beg = end.
>
>> 	 (sep-rexp '((","  "^[^\n,]+$")
>
> sep-rexp -> sep-regexp
>
>> 		     ("\t" "^[^\n\t]+$")
>> 		     (";"  "^[^\n;]+$")
>> 		     (":"  "^[^\n:]+$")
>> 		     (" "  "^\\([^'\"][^\n\s][^'\"]\\)+$")))
>
> At this point, I suggest to use `rx' macro instead.
>
> I suggest this (yes, I like pattern-matching, `car' and `cdr' are so
> 80's) instead:
>
>   (save-excursion
>     (goto-char beg)
>     (catch :found
>       (pcase-dolist (`(,sep ,regexp) sep-regexp)
>         (save-excursion
>           (unless (re-search-forward regexp end t)
>           (throw :found sep))))
>     nil))
>

Thanks! I was not aware of pcase-dolist function.

Function after doing the necessary changes:

(defun org-table-guess-separator (beg0 end0)
  "Guess separator for `org-table-convert-region' for region BEG0 to END0.

List of preferred separator:
comma, TAB, semicolon, colon or SPACE.

If region contains a line which doesn't contain the required
separator then discard the separator and search again using next
separator."
  (let* ((beg (save-excursion
                (goto-char (min beg0 end0))
                (skip-chars-forward " \t\n")
                (if (eobp) (point) (line-beginning-position))))
	 (end (save-excursion
                (goto-char (max beg end0))
                (skip-chars-backward " \t\n" beg)
                (if (= beg (point)) (point) (line-end-position))))
         (sep-regexp '((","  (rx bol (1+ (not (or ?\n ?,))) eol))
		       ("\t" (rx bol (1+ (not (or ?\n ?\t))) eol))
		       (";"  (rx bol (1+ (not (or ?\n ?\;))) eol))
		       (":"  (rx bol (1+ (not (or ?\n ?:))) eol))
		       (" "  (rx bol (1+ (not (or ?' ?\" ))
                                         (not (or ?\s ?\;))
                                         (not (or ?' ?\"))) eol))))
         sep)
    (unless (= beg end)
      (save-excursion
        (goto-char beg)
        (catch :found
          (pcase-dolist (`(,sep ,regexp) sep-regexp)
            (save-excursion
              (unless (re-search-forward (eval regexp) end t)
                (throw :found sep))))
          nil)))))

> Again all this needs to extensively tested, as there are a lot of
> dangers lurking around.

Summary of things that still requires a review:

+ Setting boundary right

+ When using SPACE as separator is it sufficient to check for all for
all non quoted SPACE's?

-- 
Utkarsh Singh
http://utkarshsingh.xyz


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] org-table-import: Make it more smarter for interactive use
  2021-04-20 13:40     ` Nicolas Goaziou
  2021-04-20 17:15       ` Utkarsh Singh
@ 2021-04-23  4:58       ` Utkarsh Singh
  2021-04-27 20:21         ` bug#47885: " Nicolas Goaziou
  1 sibling, 1 reply; 33+ messages in thread
From: Utkarsh Singh @ 2021-04-23  4:58 UTC (permalink / raw)
  To: Nicolas Goaziou; +Cc: 47885, emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 1093 bytes --]

Hi,

On 2021-04-20, 15:40 +0200, Nicolas Goaziou <mail@nicolasgoaziou.fr> wrote:

> Again all this needs to extensively tested, as there are a lot of
> dangers lurking around.

I am attaching my patch which also include my previous suggestion of
including yes-or-no prompt to org-table-import to allow file which don't
have csv, tsv or txt as extension.  Here are some concerns
with require your attention:

+ When using org-table-import interactively if we failed to guess
separator then we will be left with a user-error message and an
'unconverted table'.  We can make use of 'temp-buffer' to import our
file after successfully conversion.

+ Conversion part of org-table-convert-region make a distinction between
'(4) (comma separator) and rest of the separator we should either string
version of comma as AND condition or rewrite to simplify it.

I am willing to do these possible changes but currently waiting for your
review for org-table-guess-separator as there can be more serious bugs
lurking around on my code which I am considering base for these changes.

All the best,
Utkarsh


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: org-table --]
[-- Type: text/x-patch, Size: 5430 bytes --]

diff --git a/lisp/org/org-table.el b/lisp/org/org-table.el
index 0e93fb271f..84bc981fec 100644
--- a/lisp/org/org-table.el
+++ b/lisp/org/org-table.el
@@ -846,6 +846,42 @@ org-table-create
       (goto-char pos))
     (org-table-align)))
 
+
+(defun org-table-guess-separator (beg0 end0)
+  "Guess separator for `org-table-convert-region' for region BEG0 to END0.
+
+List of preferred separator:
+comma, TAB, semicolon, colon or SPACE.
+
+If region contains a line which doesn't contain the required
+separator then discard the separator and search again using next
+separator."
+  (let* ((beg (save-excursion
+                (goto-char (min beg0 end0))
+                (skip-chars-forward " \t\n")
+                (if (eobp) (point) (line-beginning-position))))
+	 (end (save-excursion
+                (goto-char (max beg end0))
+                (skip-chars-backward " \t\n" beg)
+                (if (= beg (point)) (point) (line-end-position))))
+         (sep-regexp '((","  (rx bol (1+ (not (or ?\n ?,))) eol))
+		       ("\t" (rx bol (1+ (not (or ?\n ?\t))) eol))
+		       (";"  (rx bol (1+ (not (or ?\n ?\;))) eol))
+		       (":"  (rx bol (1+ (not (or ?\n ?:))) eol))
+		       (" "  (rx bol (1+ (not (or ?' ?\" ))
+                                         (not (or ?\s ?\;))
+                                         (not (or ?' ?\"))) eol))))
+         sep)
+    (unless (= beg end)
+      (save-excursion
+        (goto-char beg)
+        (catch :found
+          (pcase-dolist (`(,sep ,regexp) sep-regexp)
+            (save-excursion
+              (unless (re-search-forward (eval regexp) end t)
+                (throw :found sep))))
+          nil)))))
+
 ;;;###autoload
 (defun org-table-convert-region (beg0 end0 &optional separator)
   "Convert region to a table.
@@ -859,20 +895,19 @@ org-table-convert-region
 (4)     Use the comma as a field separator
 (16)    Use a TAB as field separator
 (64)    Prompt for a regular expression as field separator
-integer  When a number, use that many spaces, or a TAB, as field separator
-regexp   When a regular expression, use it to match the separator
-nil      When nil, the command tries to be smart and figure out the
-         separator in the following way:
-         - when each line contains a TAB, assume TAB-separated material
-         - when each line contains a comma, assume CSV material
-         - else, assume one or more SPACE characters as separator."
+integer When a number, use that many spaces, or a TAB, as field separator
+regexp  When a regular expression, use it to match the separator
+nil     When nil, the command tries to be smart and figure out the
+        separator using `org-table-guess-seperator'."
   (interactive "r\nP")
   (let* ((beg (min beg0 end0))
 	 (end (max beg0 end0))
 	 re)
+
     (if (> (count-lines beg end) org-table-convert-region-max-lines)
 	(user-error "Region is longer than `org-table-convert-region-max-lines' (%s) lines; not converting"
 		    org-table-convert-region-max-lines)
+
       (when (equal separator '(64))
 	(setq separator (read-regexp "Regexp for field separator")))
       (goto-char beg)
@@ -881,17 +916,13 @@ org-table-convert-region
       (goto-char end)
       (if (bolp) (backward-char 1) (end-of-line 1))
       (setq end (point-marker))
-      ;; Get the right field separator
-      (unless separator
-	(goto-char beg)
-	(setq separator
-	      (cond
-	       ((not (re-search-forward "^[^\n\t]+$" end t)) '(16))
-	       ((not (re-search-forward "^[^\n,]+$" end t)) '(4))
-	       (t 1))))
+      (when (and (not separator)
+                 (not (setq separator
+                            (org-table-guess-separator (beg end)))))
+        (user-error "Failed to guess separator"))
       (goto-char beg)
       (if (equal separator '(4))
-	  (while (< (point) end)
+          (while (< (point) end)
 	    ;; parse the csv stuff
 	    (cond
 	     ((looking-at "^") (insert "| "))
@@ -905,7 +936,7 @@ org-table-convert-region
 	(setq re (cond
 		  ((equal separator '(4)) "^\\|\"?[ \t]*,[ \t]*\"?")
 		  ((equal separator '(16)) "^\\|\t")
-		  ((integerp separator)
+	          ((integerp separator)
 		   (if (< separator 1)
 		       (user-error "Number of spaces in separator must be >= 1")
 		     (format "^ *\\| *\t *\\| \\{%d,\\}" separator)))
@@ -921,12 +952,8 @@ org-table-convert-region
 (defun org-table-import (file separator)
   "Import FILE as a table.
 
-The command tries to be smart and figure out the separator in the
-following way:
-
-- when each line contains a TAB, assume TAB-separated material;
-- when each line contains a comma, assume CSV material;
-- else, assume one or more SPACE characters as separator.
+The command tries to be smart and figure out the separator using
+`org-table-guess-seperator'.
 
 When non-nil, SEPARATOR specifies the field separator in the
 lines.  It can have the following values:
@@ -938,7 +965,8 @@ org-table-import
 - regexp  When a regular expression, use it to match the separator."
   (interactive "f\nP")
   (when (and (called-interactively-p 'any)
-	     (not (string-match-p (rx "." (or "txt" "tsv" "csv") eos) file)))
+	     (not (string-match-p (rx "." (or "txt" "tsv" "csv") eos) file))
+             (not (yes-or-no-p "File does not havs .txt .txt .csv as extension.  Do you still want to continue? ")))
     (user-error "Cannot import such file"))
   (unless (bolp) (insert "\n"))
   (let ((beg (point))

[-- Attachment #3: Type: text/plain, Size: 43 bytes --]


-- 
Utkarsh Singh
http://utkarshsingh.xyz

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use
  2021-04-23  4:58       ` Utkarsh Singh
@ 2021-04-27 20:21         ` Nicolas Goaziou
  2021-04-28  8:37           ` Utkarsh Singh
  0 siblings, 1 reply; 33+ messages in thread
From: Nicolas Goaziou @ 2021-04-27 20:21 UTC (permalink / raw)
  To: Utkarsh Singh; +Cc: 47885

Hello,

Utkarsh Singh <utkarsh190601@gmail.com> writes:

> I am attaching my patch which also include my previous suggestion of
> including yes-or-no prompt to org-table-import to allow file which don't
> have csv, tsv or txt as extension.  

I suggest to make several patches. Do not try to stuff as many changes
as possible in a single one, please.

> + When using org-table-import interactively if we failed to guess
> separator then we will be left with a user-error message and an
> 'unconverted table'.  We can make use of 'temp-buffer' to import our
> file after successfully conversion.

I'm not sure to understand what you mean.

> + Conversion part of org-table-convert-region make a distinction between
> '(4) (comma separator) and rest of the separator we should either string
> version of comma as AND condition or rewrite to simplify it.

Ditto. But it can be the object of another patch. Let's concentrate on
`org-table-guess-separator' first.

> I am willing to do these possible changes but currently waiting for your
> review for org-table-guess-separator as there can be more serious bugs
> lurking around on my code which I am considering base for these
> changes.

You should definitely write tests for this function. Here's a start:

    (ert-deftest test-org-table/guess-separator ()
      "Test `test-org-table/guess-separator'."
      ;; Test space separator.
      (should
       (equal " "
              (org-test-with-temp-text "a b\nc d"
                (org-table-guess-separator (point-min) (point-max)))))
      (should
       (equal " "
              (org-test-with-temp-text "a b\nc d"
                (org-table-guess-separator (point-min) (point-max)))))
      ;; Test "inverted" region.
      (should
       (equal " "
              (org-test-with-temp-text "a b\nc d"
                (org-table-guess-separator (point-max) (point-min)))))
      ;; Do not error on empty region.
      (should-not
       (org-test-with-temp-text ""
         (org-table-guess-separator (point-max) (point-min))))
      (should-not
       (org-test-with-temp-text "   \n"
         (org-table-guess-separator (point-max) (point-min)))))

> +	 (end (save-excursion
> +                (goto-char (max beg end0))

This should be beg0 instead of beg above.

> +         (sep-regexp '((","  (rx bol (1+ (not (or ?\n ?,))) eol))
> +		       ("\t" (rx bol (1+ (not (or ?\n ?\t))) eol))
> +		       (";"  (rx bol (1+ (not (or ?\n ?\;))) eol))
> +		       (":"  (rx bol (1+ (not (or ?\n ?:))) eol))
> +		       (" "  (rx bol (1+ (not (or ?' ?\" ))
> +                                         (not (or ?\s ?\;))
> +                                         (not (or ?' ?\"))) eol))))

Use
         (sep-regexp
          (list (list ","  (rx bol (1+ (not (or ?\n ?,))) eol))
	        (list "\t" (rx bol (1+ (not (or ?\n ?\t))) eol))
	        (list ";"  (rx bol (1+ (not (or ?\n ?\;))) eol))
	        (list ":"  (rx bol (1+ (not (or ?\n ?:))) eol))
	        (list " "  (rx bol (1+ (not (or ?' ?\" ))
                                       (not (or ?\s ?\;))
                                       (not (or ?' ?\")))
                               eol))))

so you don't need eval below, and rx forms become constants when
compiled.

> +         sep)

This `sep' binding can be removed.

> +    (unless (= beg end)
> +      (save-excursion
> +        (goto-char beg)
> +        (catch :found
> +          (pcase-dolist (`(,sep ,regexp) sep-regexp)
> +            (save-excursion
> +              (unless (re-search-forward (eval regexp) end t)

You can drop the `eval'.

>    (when (and (called-interactively-p 'any)
> -	     (not (string-match-p (rx "." (or "txt" "tsv" "csv") eos) file)))
> +	     (not (string-match-p (rx "." (or "txt" "tsv" "csv") eos) file))
> +             (not (yes-or-no-p "File does not havs .txt .txt .csv as extension.  Do you still want to continue? ")))

"does not have" and ".txt" -> ".tsv" I guess.

Also please provide a patch with a commit message, possibly using `git
format-patch'.

Thanks!

Regards,
-- 
Nicolas Goaziou




^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use
  2021-04-27 20:21         ` bug#47885: " Nicolas Goaziou
@ 2021-04-28  8:37           ` Utkarsh Singh
  2021-04-28 16:38             ` Maxim Nikulin
  0 siblings, 1 reply; 33+ messages in thread
From: Utkarsh Singh @ 2021-04-28  8:37 UTC (permalink / raw)
  To: Nicolas Goaziou; +Cc: 47885

[-- Attachment #1: Type: text/plain, Size: 2733 bytes --]

Hi,

On 2021-04-27, 22:21 +0200, Nicolas Goaziou <mail@nicolasgoaziou.fr> wrote:

>> + When using org-table-import interactively if we failed to guess
>> separator then we will be left with a user-error message and an
>> 'unconverted table'.  We can make use of 'temp-buffer' to import our
>> file after successfully conversion.
>
> I'm not sure to understand what you mean.

Note: I will advice you to apply patch no. 2 before trying out the
following example.

1. Download the attached CSV file.  We can call this example.csv

2. Go to *scratch* buffer.

3. Use 'M-x org-table-import' to import example.csv as org-table.

You will see even thought org-table-guess-separator failed in guessing
separator we are still left with unconverted region added to our buffer.

>> + Conversion part of org-table-convert-region make a distinction between
>> '(4) (comma separator) and rest of the separator we should either string
>> version of comma as AND condition or rewrite to simplify it.
>
> Ditto. But it can be the object of another patch. Let's concentrate on
> `org-table-guess-separator' first.
>
>> I am willing to do these possible changes but currently waiting for your
>> review for org-table-guess-separator as there can be more serious bugs
>> lurking around on my code which I am considering base for these
>> changes.
>
> You should definitely write tests for this function. Here's a start:
>
>     (ert-deftest test-org-table/guess-separator ()
>       "Test `test-org-table/guess-separator'."
>       ;; Test space separator.
>       (should
>        (equal " "
>               (org-test-with-temp-text "a b\nc d"
>                 (org-table-guess-separator (point-min) (point-max)))))
>       (should
>        (equal " "
>               (org-test-with-temp-text "a b\nc d"
>                 (org-table-guess-separator (point-min) (point-max)))))
>       ;; Test "inverted" region.
>       (should
>        (equal " "
>               (org-test-with-temp-text "a b\nc d"
>                 (org-table-guess-separator (point-max) (point-min)))))
>       ;; Do not error on empty region.
>       (should-not
>        (org-test-with-temp-text ""
>          (org-table-guess-separator (point-max) (point-min))))
>       (should-not
>        (org-test-with-temp-text "   \n"
>          (org-table-guess-separator (point-max) (point-min)))))
>

I will surely do more testing.

I would also like to simplify the condition for guessing SPACE as
separator due to following cases:

+ field1 'this is field2' 'this is field3' :: In this case we still have
SPACE inside quote (' in this case).

+ Since SPACE is our last valid separator I think searching for a line
which doesn't contains space is more than enough.

Required patch:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch1 --]
[-- Type: text/x-patch, Size: 1110 bytes --]

From 6b112927de73c43edfd08254217808ebff42772a Mon Sep 17 00:00:00 2001
From: Utkarsh Singh <utkarsh190601@gmail.com>
Date: Wed, 28 Apr 2021 10:26:46 +0530
Subject: [PATCH 1/3] org-table.el (org-table-import): add yes-and-no prompt

Add a yes and no prompt for files which don't have .txt, .tsv OR .csv
as file extensions.
---
 lisp/org/org-table.el | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lisp/org/org-table.el b/lisp/org/org-table.el
index 0e93fb271f..e0b2be6892 100644
--- a/lisp/org/org-table.el
+++ b/lisp/org/org-table.el
@@ -938,7 +938,8 @@ org-table-import
 - regexp  When a regular expression, use it to match the separator."
   (interactive "f\nP")
   (when (and (called-interactively-p 'any)
-	     (not (string-match-p (rx "." (or "txt" "tsv" "csv") eos) file)))
+	     (not (string-match-p (rx "." (or "txt" "tsv" "csv") eos) file))
+             (not (yes-or-no-p "File does not have .txt, .tsv or .csv as extension.  Do you still want to continue? ")))
     (user-error "Cannot import such file"))
   (unless (bolp) (insert "\n"))
   (let ((beg (point))
-- 
2.31.1


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: patch2 --]
[-- Type: text/x-patch, Size: 3523 bytes --]

From 9bb017cfc8284075e04faf5496ed560ba48d5bbc Mon Sep 17 00:00:00 2001
From: Utkarsh Singh <utkarsh190601@gmail.com>
Date: Wed, 28 Apr 2021 10:42:32 +0530
Subject: [PATCH 2/3] org-table.el (org-table-convert-region): move out
 separator-guessing

1. Move separator guessing code to org-table-guess-separator (new
function).
2. Add semicolon, colon and SPACE to the list of know separator
(separator which we can guess).
---
 lisp/org/org-table.el | 49 +++++++++++++++++++++++++++++++++----------
 1 file changed, 38 insertions(+), 11 deletions(-)

diff --git a/lisp/org/org-table.el b/lisp/org/org-table.el
index e0b2be6892..295f7a9b90 100644
--- a/lisp/org/org-table.el
+++ b/lisp/org/org-table.el
@@ -846,6 +846,39 @@ org-table-create
       (goto-char pos))
     (org-table-align)))
 
+(defun org-table-guess-separator (beg0 end0)
+  "Guess separator for region BEG0 to END0.
+
+List of preferred separator (in order of preference):
+comma, TAB, semicolon, colon or SPACE.
+
+Search for a line which doesn't contain a separator if found
+search again using next preferred separator or else return
+separator as string."
+  (let* ((beg (save-excursion
+                (goto-char (min beg0 end0))
+                (skip-chars-forward " \t\n")
+                (if (eobp) (point) (line-beginning-position))))
+	 (end (save-excursion
+                (goto-char (max beg0 end0))
+                (skip-chars-backward " \t\n" beg)
+                (if (= beg (point)) (point) (line-end-position))))
+         (sep-regexp
+          (list (list ","  (rx bol (1+ (not (or ?\n ?,))) eol))
+		(list "\t" (rx bol (1+ (not (or ?\n ?\t))) eol))
+		(list ";"  (rx bol (1+ (not (or ?\n ?\;))) eol))
+		(list ":"  (rx bol (1+ (not (or ?\n ?:))) eol))
+		(list " "  (rx bol (1+ (not (or ?\n ?\s))) eol)))))
+    (unless (= beg end)
+      (save-excursion
+        (goto-char beg)
+        (catch :found
+          (pcase-dolist (`(,sep ,regexp) sep-regexp)
+            (save-excursion
+              (unless (re-search-forward regexp end t)
+                (throw :found sep))))
+          nil)))))
+
 ;;;###autoload
 (defun org-table-convert-region (beg0 end0 &optional separator)
   "Convert region to a table.
@@ -862,10 +895,7 @@ org-table-convert-region
 integer  When a number, use that many spaces, or a TAB, as field separator
 regexp   When a regular expression, use it to match the separator
 nil      When nil, the command tries to be smart and figure out the
-         separator in the following way:
-         - when each line contains a TAB, assume TAB-separated material
-         - when each line contains a comma, assume CSV material
-         - else, assume one or more SPACE characters as separator."
+         separator using `org-table-guess-seperator'."
   (interactive "r\nP")
   (let* ((beg (min beg0 end0))
 	 (end (max beg0 end0))
@@ -882,13 +912,10 @@ org-table-convert-region
       (if (bolp) (backward-char 1) (end-of-line 1))
       (setq end (point-marker))
       ;; Get the right field separator
-      (unless separator
-	(goto-char beg)
-	(setq separator
-	      (cond
-	       ((not (re-search-forward "^[^\n\t]+$" end t)) '(16))
-	       ((not (re-search-forward "^[^\n,]+$" end t)) '(4))
-	       (t 1))))
+      (when (and (not separator)
+                 (not (setq separator
+                            (org-table-guess-separator beg end))))
+        (user-error "Failed to guess separator"))
       (goto-char beg)
       (if (equal separator '(4))
 	  (while (< (point) end)
-- 
2.31.1


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #4: patch3 --]
[-- Type: text/x-patch, Size: 1103 bytes --]

From fef97ffe27ff908647c45f1b066a845e71a0926f Mon Sep 17 00:00:00 2001
From: Utkarsh Singh <utkarsh190601@gmail.com>
Date: Wed, 28 Apr 2021 14:01:31 +0530
Subject: [PATCH 3/3] org-table.el (org-table-import): add file prompt

---
 lisp/org/org-table.el | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lisp/org/org-table.el b/lisp/org/org-table.el
index 295f7a9b90..e904903576 100644
--- a/lisp/org/org-table.el
+++ b/lisp/org/org-table.el
@@ -963,7 +963,8 @@ org-table-import
 - (64)    Prompt for a regular expression as field separator.
 - integer When a number, use that many spaces, or a TAB, as field separator.
 - regexp  When a regular expression, use it to match the separator."
-  (interactive "f\nP")
+  (interactive (list (read-file-name "Import file: ")
+                     (prefix-numeric-value current-prefix-arg)))
   (when (and (called-interactively-p 'any)
 	     (not (string-match-p (rx "." (or "txt" "tsv" "csv") eos) file))
              (not (yes-or-no-p "File does not have .txt, .tsv or .csv as extension.  Do you still want to continue? ")))
-- 
2.31.1


[-- Attachment #5: csv file --]
[-- Type: application/octet-stream, Size: 110 bytes --]

this,is,an,example,csv
on,first,two,line,we,have,comma,as,seperator
but:now:we:switched:to:colon:as:separator

[-- Attachment #6: Type: text/plain, Size: 43 bytes --]


-- 
Utkarsh Singh
http://utkarshsingh.xyz

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use
  2021-04-28  8:37           ` Utkarsh Singh
@ 2021-04-28 16:38             ` Maxim Nikulin
  2021-05-10 18:36               ` Utkarsh Singh
  0 siblings, 1 reply; 33+ messages in thread
From: Maxim Nikulin @ 2021-04-28 16:38 UTC (permalink / raw)
  To: emacs-orgmode

On 28/04/2021 15:37, Utkarsh Singh wrote:
> +List of preferred separator (in order of preference):
> +comma, TAB, semicolon, colon or SPACE.
I will hardly be using this feature heavily, so I do not insist that the 
following must be taken into account. Just some considerations...

There are locales where comma is used as decimal separator 23,5 (e.g. 
ru_RU.UTF-8). Office software and applications oriented to office users 
often use semicolon as field separator. There are still may be plenty of 
numbers with fractional part, so with commas. Likely ";" should be tried 
at first for such locales.

However the same user may have enough CSV files that are really "comma 
separated", e.g. results of numerical simulation where localization is 
intentionally ignored, data obtained from some equipment, etc.

Some files (e.g. downloaded bank statements) may be in legacy 8-bit 
encoding instead of UTF-8.

As a result, sometimes the only convenient way is to try various options 
with interactive preview.



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use
  2021-04-28 16:38             ` Maxim Nikulin
@ 2021-05-10 18:36               ` Utkarsh Singh
  2021-05-12 17:08                 ` Maxim Nikulin
  0 siblings, 1 reply; 33+ messages in thread
From: Utkarsh Singh @ 2021-05-10 18:36 UTC (permalink / raw)
  To: Maxim Nikulin; +Cc: emacs-orgmode

Hi Maxim,

Sorry for the late reply!

On 2021-04-28, 23:38 +0700, Maxim Nikulin <manikulin@gmail.com> wrote:

> On 28/04/2021 15:37, Utkarsh Singh wrote:
>> +List of preferred separator (in order of preference):
>> +comma, TAB, semicolon, colon or SPACE.
> I will hardly be using this feature heavily, so I do not insist that
> the following must be taken into account. Just some considerations...
>
> There are locales where comma is used as decimal separator 23,5
> (e.g. ru_RU.UTF-8). Office software and applications oriented to
> office users often use semicolon as field separator. There are still
> may be plenty of numbers with fractional part, so with commas. Likely
> ";" should be tried at first for such locales.
>
> However the same user may have enough CSV files that are really "comma
> separated", e.g. results of numerical simulation where localization is
> intentionally ignored, data obtained from some equipment, etc.
>
> Some files (e.g. downloaded bank statements) may be in legacy 8-bit
> encoding instead of UTF-8.
>
> As a result, sometimes the only convenient way is to try various
> options with interactive preview.

What do we mean by interactive preview?  Does this mean that we should
present a user with a list of possible delimiters using minibuffer?

If you are suggesting this then rather trying 'various options' based on
locals you can use =org-table-import= function directly as it accepts
separator as it's argument.

For ex (please review my usage of alist):

#+begin_src elisp
(defvar my-separator-alist '(("comma" . ",")
			     ("tab" . "\t")
			     ("semicolon" . ";")
			     ("colon" . ":")
			     ("space" . " ")))

(defun my/table-import (file separator)
  (interactive (list (read-file-name "Import CSV file: " nil nil t)
		     (cdr (assoc (completing-read "Separator: " my-separator-alist)
				 my-separator-alist))))
  (org-table-import file separator))
#+end_src

-- 
Utkarsh Singh
http://utkarshsingh.xyz


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use
  2021-05-10 18:36               ` Utkarsh Singh
@ 2021-05-12 17:08                 ` Maxim Nikulin
  2021-05-14 14:54                   ` Utkarsh Singh
  0 siblings, 1 reply; 33+ messages in thread
From: Maxim Nikulin @ 2021-05-12 17:08 UTC (permalink / raw)
  To: emacs-orgmode

On 11/05/2021 01:36, Utkarsh Singh wrote:
> What do we mean by interactive preview?  Does this mean that we should
> present a user with a list of possible delimiters using minibuffer?

I mean something like the dialog that LibreOffice shows on opening of a 
csv file. There are various options and table preview that allows to 
check if options are selected correctly before dropping to usual 
spreadsheet interface. I have no idea what is the best way to implement 
something like this in emacs. Likely it is out of scope of the discussed 
patch. I do not know what is your use cases. My intention was to show 
that CSV import could be quite cumbersome.

Some users believe that CSV is a reliable portable format and expect 
support of most features from Excel. Actually there are plenty of 
dialects and no way to determine which one was used (e.g. no header that 
defines field or string separator).

> For ex (please review my usage of alist):
> 
> #+begin_src elisp
> (defvar my-separator-alist '(("comma" . ",")
> 			     ("tab" . "\t")
> 			     ("semicolon" . ";")
> 			     ("colon" . ":")
> 			     ("space" . " ")))
> 
> (defun my/table-import (file separator)
>    (interactive (list (read-file-name "Import CSV file: " nil nil t)
> 		     (cdr (assoc (completing-read "Separator: " my-separator-alist)
> 				 my-separator-alist))))
>    (org-table-import file separator))
> #+end_src

With such function it is necessary to open file at first to see what 
separator is used inside.

My comments were related to the "guess" part of your patch. Comma is 
tried first. Consider the following file

A;1,2;3,4;5,6
B;7,8;9,1;11,12
C;13,14;15,16;17,18

Decimal separator is ",". Field separator is ";" but there are plenty of 
"," in each row.

LANG=fr_FR.UTF-8 python3 -c "import locale as l; l.setlocale(l.LC_ALL, 
''); print(l.format_string('%.2f', 123456.789))"
123456,79

Comma is decimal separator for es_ES, de_DE, ru_RU, etc. The point is 
that order in which separator candidates are tried should depend on 
active locale.

I do not insist that interactive preview or smarter approach to guess 
separator have to be implemented. Feel free to disregard my comments. I 
am just not sure whether you are aware of limitations for noticeable 
part of users.



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use
  2021-05-12 17:08                 ` Maxim Nikulin
@ 2021-05-14 14:54                   ` Utkarsh Singh
  2021-05-15  9:13                     ` Bastien
  2021-05-16 16:24                     ` Maxim Nikulin
  0 siblings, 2 replies; 33+ messages in thread
From: Utkarsh Singh @ 2021-05-14 14:54 UTC (permalink / raw)
  To: Maxim Nikulin; +Cc: emacs-orgmode, mail

On 2021-05-13, 00:08 +0700, Maxim Nikulin <manikulin@gmail.com> wrote:

> Comma is decimal separator for es_ES, de_DE, ru_RU, etc. The point is
> that order in which separator candidates are tried should depend on
> active locale.
>
> I do not insist that interactive preview or smarter approach to guess
> separator have to be implemented. Feel free to disregard my
> comments. I am just not sure whether you are aware of limitations for
> noticeable part of users.

I am willing to work on this problem but before this can you identify
any other locale with similar problem or a resource from where I can
information's about locale.

I would also like to hear from the maintainers on what the think about
the issue and are they willing for an workaround?

-- 
Utkarsh Singh
http://utkarshsingh.xyz


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use
  2021-05-14 14:54                   ` Utkarsh Singh
@ 2021-05-15  9:13                     ` Bastien
  2021-05-15 10:10                       ` Utkarsh Singh
  2021-05-16 16:24                     ` Maxim Nikulin
  1 sibling, 1 reply; 33+ messages in thread
From: Bastien @ 2021-05-15  9:13 UTC (permalink / raw)
  To: Utkarsh Singh; +Cc: Maxim Nikulin, emacs-orgmode, mail

Hi Utkarsh and Maxim,

Utkarsh Singh <utkarsh190601@gmail.com> writes:

> On 2021-05-13, 00:08 +0700, Maxim Nikulin <manikulin@gmail.com> wrote:
>
>> Comma is decimal separator for es_ES, de_DE, ru_RU, etc. The point is
>> that order in which separator candidates are tried should depend on
>> active locale.
>>
>> I do not insist that interactive preview or smarter approach to guess
>> separator have to be implemented. 

I don't think the interactive preview would be that useful.  For the
smarter approach, I think it will always be somewhat fragile, I don't
think it is worth the addional complexity in Org's code.

> I am willing to work on this problem but before this can you identify
> any other locale with similar problem or a resource from where I can
> information's about locale.
>
> I would also like to hear from the maintainers on what the think about
> the issue and are they willing for an workaround?

There you have it.  Perhaps you can continue to implement the feature
the way you want it, and if it feels complete and predictable, submit
it again?

Thanks,

-- 
 Bastien


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use
  2021-05-15  9:13                     ` Bastien
@ 2021-05-15 10:10                       ` Utkarsh Singh
  2021-05-15 10:30                         ` Bastien
  2021-05-17  5:29                         ` Bastien
  0 siblings, 2 replies; 33+ messages in thread
From: Utkarsh Singh @ 2021-05-15 10:10 UTC (permalink / raw)
  To: Bastien; +Cc: Maxim Nikulin, emacs-orgmode, mail

On 2021-05-15, 11:13 +0200, Bastien <bzg@gnu.org> wrote:

> Hi Utkarsh and Maxim,
>
> Utkarsh Singh <utkarsh190601@gmail.com> writes:
>
>> On 2021-05-13, 00:08 +0700, Maxim Nikulin <manikulin@gmail.com> wrote:
>>
>>> Comma is decimal separator for es_ES, de_DE, ru_RU, etc. The point is
>>> that order in which separator candidates are tried should depend on
>>> active locale.
>>>
>>> I do not insist that interactive preview or smarter approach to guess
>>> separator have to be implemented. 
>
> I don't think the interactive preview would be that useful.  For the
> smarter approach, I think it will always be somewhat fragile, I don't
> think it is worth the addional complexity in Org's code.

Ok.

For now can you review the patches I proposed earlier in this thread?

Thank you,
Utkarsh Singh

-- 
Utkarsh Singh
http://utkarshsingh.xyz


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use
  2021-05-15 10:10                       ` Utkarsh Singh
@ 2021-05-15 10:30                         ` Bastien
  2021-05-15 11:09                           ` Utkarsh Singh
  2021-05-17  5:29                         ` Bastien
  1 sibling, 1 reply; 33+ messages in thread
From: Bastien @ 2021-05-15 10:30 UTC (permalink / raw)
  To: Utkarsh Singh; +Cc: Maxim Nikulin, emacs-orgmode, mail

Hi Utkarsh,

Utkarsh Singh <utkarsh190601@gmail.com> writes:

> For now can you review the patches I proposed earlier in this
> thread?

Do these patches provide a complete and predictable solution?
If so, can you merge them into a single patch against master?

Thanks,

-- 
 Bastien


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use
  2021-05-15 10:30                         ` Bastien
@ 2021-05-15 11:09                           ` Utkarsh Singh
  0 siblings, 0 replies; 33+ messages in thread
From: Utkarsh Singh @ 2021-05-15 11:09 UTC (permalink / raw)
  To: Bastien; +Cc: Maxim Nikulin, emacs-orgmode, mail

[-- Attachment #1: Type: text/plain, Size: 473 bytes --]

On 2021-05-15, 12:30 +0200, Bastien <bzg@gnu.org> wrote:

> Hi Utkarsh,
>
> Utkarsh Singh <utkarsh190601@gmail.com> writes:
>
>> For now can you review the patches I proposed earlier in this
>> thread?
>
> Do these patches provide a complete and predictable solution?
> If so, can you merge them into a single patch against master?
>

Here are the patches:

I have separated out 'adding of prompt' to patch2 as it doesn't
have to do anything with separator guessing part.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch1 --]
[-- Type: text/x-patch, Size: 3563 bytes --]

From 659e5e95bd8e024ac4730cf410a565b9e975b669 Mon Sep 17 00:00:00 2001
From: Utkarsh Singh <utkarsh190601@gmail.com>
Date: Sat, 15 May 2021 16:30:36 +0530
Subject: [PATCH 1/2] org-table-convert-region: move out separator-guessing

1. Move separator guessing code to org-table-guess-separator (new
function).
2. Add semicolon, colon and SPACE to the list of know separator
(separator which we can guess).
---
 lisp/org-table.el | 49 ++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 38 insertions(+), 11 deletions(-)

diff --git a/lisp/org-table.el b/lisp/org-table.el
index cc69542..d3242da 100644
--- a/lisp/org-table.el
+++ b/lisp/org-table.el
@@ -837,6 +837,39 @@ SIZE is a string Columns x Rows like for example \"3x2\"."
       (goto-char pos))
     (org-table-align)))
 
+(defun org-table-guess-separator (beg0 end0)
+  "Guess separator for region BEG0 to END0.
+
+List of preferred separator (in order of preference):
+comma, TAB, semicolon, colon or SPACE.
+
+Search for a line which doesn't contain a separator if found
+search again using next preferred separator or else return
+separator as string."
+  (let* ((beg (save-excursion
+                (goto-char (min beg0 end0))
+                (skip-chars-forward " \t\n")
+                (if (eobp) (point) (line-beginning-position))))
+	 (end (save-excursion
+                (goto-char (max beg0 end0))
+                (skip-chars-backward " \t\n" beg)
+                (if (= beg (point)) (point) (line-end-position))))
+         (sep-regexp
+          (list (list ","  (rx bol (1+ (not (or ?\n ?,))) eol))
+		(list "\t" (rx bol (1+ (not (or ?\n ?\t))) eol))
+		(list ";"  (rx bol (1+ (not (or ?\n ?\;))) eol))
+		(list ":"  (rx bol (1+ (not (or ?\n ?:))) eol))
+		(list " "  (rx bol (1+ (not (or ?\n ?\s))) eol)))))
+    (unless (= beg end)
+      (save-excursion
+        (goto-char beg)
+        (catch :found
+          (pcase-dolist (`(,sep ,regexp) sep-regexp)
+            (save-excursion
+              (unless (re-search-forward regexp end t)
+                (throw :found sep))))
+          nil)))))
+
 ;;;###autoload
 (defun org-table-convert-region (beg0 end0 &optional separator)
   "Convert region to a table.
@@ -853,10 +886,7 @@ following values:
 integer  When a number, use that many spaces, or a TAB, as field separator
 regexp   When a regular expression, use it to match the separator
 nil      When nil, the command tries to be smart and figure out the
-         separator in the following way:
-         - when each line contains a TAB, assume TAB-separated material
-         - when each line contains a comma, assume CSV material
-         - else, assume one or more SPACE characters as separator."
+         separator using `org-table-guess-seperator'."
   (interactive "r\nP")
   (let* ((beg (min beg0 end0))
 	 (end (max beg0 end0))
@@ -873,13 +903,10 @@ nil      When nil, the command tries to be smart and figure out the
       (if (bolp) (backward-char 1) (end-of-line 1))
       (setq end (point-marker))
       ;; Get the right field separator
-      (unless separator
-	(goto-char beg)
-	(setq separator
-	      (cond
-	       ((not (re-search-forward "^[^\n\t]+$" end t)) '(16))
-	       ((not (re-search-forward "^[^\n,]+$" end t)) '(4))
-	       (t 1))))
+      (when (and (not separator)
+                 (not (setq separator
+                            (org-table-guess-separator beg end))))
+        (user-error "Failed to guess separator"))
       (goto-char beg)
       (if (equal separator '(4))
 	  (while (< (point) end)
-- 
2.31.1


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: patch2 --]
[-- Type: text/x-patch, Size: 1060 bytes --]

From 328d59dd2d2fc797e5819f43b40c96ffa45cd62f Mon Sep 17 00:00:00 2001
From: Utkarsh Singh <utkarsh190601@gmail.com>
Date: Sat, 15 May 2021 16:32:32 +0530
Subject: [PATCH 2/2] org-table-import: add file prompt

---
 lisp/org-table.el | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lisp/org-table.el b/lisp/org-table.el
index d3242da..20144c9 100644
--- a/lisp/org-table.el
+++ b/lisp/org-table.el
@@ -954,7 +954,8 @@ lines.  It can have the following values:
 - (64)    Prompt for a regular expression as field separator.
 - integer When a number, use that many spaces, or a TAB, as field separator.
 - regexp  When a regular expression, use it to match the separator."
-  (interactive "f\nP")
+  (interactive (list (read-file-name "Import file: ")
+                     (prefix-numeric-value current-prefix-arg)))
   (when (and (called-interactively-p 'any)
 	     (not (string-match-p (rx "." (or "txt" "tsv" "csv") eos) file))
              (not (yes-or-no-p "The file's extension is not .txt, .tsv or .csv.  Import? ")))
-- 
2.31.1


[-- Attachment #4: Type: text/plain, Size: 43 bytes --]


-- 
Utkarsh Singh
http://utkarshsingh.xyz

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use
  2021-05-14 14:54                   ` Utkarsh Singh
  2021-05-15  9:13                     ` Bastien
@ 2021-05-16 16:24                     ` Maxim Nikulin
  2021-05-17 16:30                       ` Utkarsh Singh
  2021-05-18 10:24                       ` Utkarsh Singh
  1 sibling, 2 replies; 33+ messages in thread
From: Maxim Nikulin @ 2021-05-16 16:24 UTC (permalink / raw)
  To: emacs-orgmode

On 14/05/2021 21:54, Utkarsh Singh wrote:
> On 2021-05-13, 00:08 +0700, Maxim Nikulin wrote:
> 
>> Comma is decimal separator for es_ES, de_DE, ru_RU, etc. The point is
>> that order in which separator candidates are tried should depend on
>> active locale.
>>
> I am willing to work on this problem but before this can you identify
> any other locale with similar problem or a resource from where I can
> information's about locale.

I do not think list of locales should be hard coded. I am not familiar 
with elisp enough to tell you if locale properties such as decimal 
separator are exposed through some function. I expect, it is quite 
probable. As a last measure, some number, e.g. 0.5 or 1.5 could be 
formatted using a locale-aware function and result string could be 
checked if it contains ",".




^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use
  2021-05-15 10:10                       ` Utkarsh Singh
  2021-05-15 10:30                         ` Bastien
@ 2021-05-17  5:29                         ` Bastien
  2021-05-17 16:27                           ` Utkarsh Singh
  2021-06-01 16:23                           ` Maxim Nikulin
  1 sibling, 2 replies; 33+ messages in thread
From: Bastien @ 2021-05-17  5:29 UTC (permalink / raw)
  To: Utkarsh Singh; +Cc: Maxim Nikulin, emacs-orgmode, mail

Hi Utkarsh,

Utkarsh Singh <utkarsh190601@gmail.com> writes:

> For now can you review the patches I proposed earlier in this
> thread?

Not until both you and Maxim are confident this is useful, complete
and predictable.

Also, if you resend the patch for review, please use a proper commit
message: https://orgmode.org/worg/org-contribute.html#commit-messages

Thanks,

-- 
 Bastien


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use
  2021-05-17  5:29                         ` Bastien
@ 2021-05-17 16:27                           ` Utkarsh Singh
  2021-06-01 16:23                           ` Maxim Nikulin
  1 sibling, 0 replies; 33+ messages in thread
From: Utkarsh Singh @ 2021-05-17 16:27 UTC (permalink / raw)
  To: Bastien; +Cc: Maxim Nikulin, emacs-orgmode, mail

On 2021-05-17, 07:29 +0200, Bastien <bzg@gnu.org> wrote:

> Hi Utkarsh,
>
> Utkarsh Singh <utkarsh190601@gmail.com> writes:
>
>> For now can you review the patches I proposed earlier in this
>> thread?
>
> Not until both you and Maxim are confident this is useful, complete
> and predictable.

Ok.

> Also, if you resend the patch for review, please use a proper commit
> message: https://orgmode.org/worg/org-contribute.html#commit-messages

Thanks for the guidance.  I was just trying to fit commit message in 50
words but next time I will keep this in mind.

-- 
Utkarsh Singh
http://utkarshsingh.xyz


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use
  2021-05-16 16:24                     ` Maxim Nikulin
@ 2021-05-17 16:30                       ` Utkarsh Singh
  2021-05-18 10:24                       ` Utkarsh Singh
  1 sibling, 0 replies; 33+ messages in thread
From: Utkarsh Singh @ 2021-05-17 16:30 UTC (permalink / raw)
  To: Maxim Nikulin, emacs-orgmode

On 2021-05-16, 23:24 +0700, Maxim Nikulin <manikulin@gmail.com> wrote:

> On 14/05/2021 21:54, Utkarsh Singh wrote:
>> On 2021-05-13, 00:08 +0700, Maxim Nikulin wrote:
>> 
>>> Comma is decimal separator for es_ES, de_DE, ru_RU, etc. The point is
>>> that order in which separator candidates are tried should depend on
>>> active locale.
>>>
>> I am willing to work on this problem but before this can you identify
>> any other locale with similar problem or a resource from where I can
>> information's about locale.
>
> I do not think list of locales should be hard coded. I am not familiar 
> with elisp enough to tell you if locale properties such as decimal 
> separator are exposed through some function. I expect, it is quite 
> probable. As a last measure, some number, e.g. 0.5 or 1.5 could be 
> formatted using a locale-aware function and result string could be 
> checked if it contains ",".

I will go through the Elisp manual to find solution for this problem.
If you now any other better resource fell free to mail me.

-- 
Utkarsh Singh
http://utkarshsingh.xyz


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use
  2021-05-16 16:24                     ` Maxim Nikulin
  2021-05-17 16:30                       ` Utkarsh Singh
@ 2021-05-18 10:24                       ` Utkarsh Singh
  2021-05-18 12:31                         ` Maxim Nikulin
  1 sibling, 1 reply; 33+ messages in thread
From: Utkarsh Singh @ 2021-05-18 10:24 UTC (permalink / raw)
  To: Maxim Nikulin, emacs-orgmode

On 2021-05-16, 23:24 +0700, Maxim Nikulin <manikulin@gmail.com> wrote:

> On 14/05/2021 21:54, Utkarsh Singh wrote:
>> On 2021-05-13, 00:08 +0700, Maxim Nikulin wrote:
>> 
>>> Comma is decimal separator for es_ES, de_DE, ru_RU, etc. The point is
>>> that order in which separator candidates are tried should depend on
>>> active locale.
>>>
>> I am willing to work on this problem but before this can you identify
>> any other locale with similar problem or a resource from where I can
>> information's about locale.
>
> I do not think list of locales should be hard coded. I am not familiar 
> with elisp enough to tell you if locale properties such as decimal 
> separator are exposed through some function. I expect, it is quite 
> probable. As a last measure, some number, e.g. 0.5 or 1.5 could be 
> formatted using a locale-aware function and result string could be 
> checked if it contains ",".

Can you test this function:

(defun org-table--comma-as-decimal-sep ()
  "Return nil or 2 if separator is dot or comma respectively."
  (string-search "," (format "%f" 10)))

To test I am using:
$ LANG=de_DE.UTF-8 emacs -Q

But I am getting this as warning:
(process:1787): Gtk-WARNING **: 15:40:49.375: Locale not supported by C library.
	Using the fallback 'C' locale.
-- 
Utkarsh Singh
http://utkarshsingh.xyz


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use
  2021-05-18 10:24                       ` Utkarsh Singh
@ 2021-05-18 12:31                         ` Maxim Nikulin
  2021-05-18 15:05                           ` Utkarsh Singh
  0 siblings, 1 reply; 33+ messages in thread
From: Maxim Nikulin @ 2021-05-18 12:31 UTC (permalink / raw)
  To: emacs-orgmode

It seems, there is no reliable way to work with numbers in a 
locale-aware way in emacs.  I am still against hard-coded list of 
locales. Requirement to customize a variable is rather inconvenient. 
Considering such properties as a part of translation is a little better 
but I prefer to avoid it.

The question may be risen in emacs-devel but I am unsure if I will 
participate in discussion.

On 18/05/2021 17:24, Utkarsh Singh wrote:
> On 2021-05-16, 23:24 +0700, Maxim Nikulin wrote:
> 
>> On 14/05/2021 21:54, Utkarsh Singh wrote:
>>> On 2021-05-13, 00:08 +0700, Maxim Nikulin wrote:
>>>
>>>> Comma is decimal separator for es_ES, de_DE, ru_RU, etc. The point is
>>>> that order in which separator candidates are tried should depend on
>>>> active locale.
>>>>
>>> I am willing to work on this problem but before this can you identify
>>> any other locale with similar problem or a resource from where I can
>>> information's about locale.
>>
>> I do not think list of locales should be hard coded. I am not familiar
>> with elisp enough to tell you if locale properties such as decimal
>> separator are exposed through some function. I expect, it is quite
>> probable. As a last measure, some number, e.g. 0.5 or 1.5 could be
>> formatted using a locale-aware function and result string could be
>> checked if it contains ",".
> 
> Can you test this function:
> 
> (defun org-table--comma-as-decimal-sep ()
>    "Return nil or 2 if separator is dot or comma respectively."
>    (string-search "," (format "%f" 10)))

No, it does not work. `format' always uses dot. It is reasonable when 
e.g. during writing a config file or during data exchange when locales 
must be ignored.

I was too optimistic. I did not expect that support of locales are so 
poor in Emcacs. I do not see any traces of localeconv(3) in sources that 
would allow to get value of decimal_point directly.

Numbers are forced to use "C" locale and I have not noticed any way to 
override it. Initial settings:

http://git.savannah.gnu.org/cgit/emacs.git/tree/src/emacs.c#n1490

http://git.savannah.gnu.org/cgit/emacs.git/tree/src/emacs.c#n2861

setlocale (LC_NUMERIC, "C");

> To test I am using:
> $ LANG=de_DE.UTF-8 emacs -Q
> 
> But I am getting this as warning:
> (process:1787): Gtk-WARNING **: 15:40:49.375: Locale not supported by C library.
> 	Using the fallback 'C' locale.

You get this error due to you have not generated this locale. On debian 
& ubuntu

dpkg-reconfigure locales

allows to select desired locales and performs all necessary actions.



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use
  2021-05-18 12:31                         ` Maxim Nikulin
@ 2021-05-18 15:05                           ` Utkarsh Singh
  0 siblings, 0 replies; 33+ messages in thread
From: Utkarsh Singh @ 2021-05-18 15:05 UTC (permalink / raw)
  To: Maxim Nikulin, emacs-orgmode

On 2021-05-18, 19:31 +0700, Maxim Nikulin <manikulin@gmail.com> wrote:

> The question may be risen in emacs-devel but I am unsure if I will 
> participate in discussion.

Why?

>> Can you test this function:
>> 
>> (defun org-table--comma-as-decimal-sep ()
>>    "Return nil or 2 if separator is dot or comma respectively."
>>    (string-search "," (format "%f" 10)))
>
> No, it does not work. `format' always uses dot. It is reasonable when 
> e.g. during writing a config file or during data exchange when locales 
> must be ignored.
>
> I was too optimistic. I did not expect that support of locales are so 
> poor in Emcacs. I do not see any traces of localeconv(3) in sources that 
> would allow to get value of decimal_point directly.
>
> Numbers are forced to use "C" locale and I have not noticed any way to 
> override it. Initial settings:
>
> http://git.savannah.gnu.org/cgit/emacs.git/tree/src/emacs.c#n1490
>
> http://git.savannah.gnu.org/cgit/emacs.git/tree/src/emacs.c#n2861
>
> setlocale (LC_NUMERIC, "C");
>

Hmm, so this means that Elisp cannot read something like this '10,1' as
floating point number.

>> To test I am using: $ LANG=de_DE.UTF-8 emacs -Q
>> 
>> But I am getting this as warning:
>> (process:1787): Gtk-WARNING **: 15:40:49.375: Locale not supported by C library.
>> 	Using the fallback 'C' locale.
>
> You get this error due to you have not generated this locale. On debian 
> & ubuntu
>
> dpkg-reconfigure locales
>
> allows to select desired locales and performs all necessary actions.

Thanks!  I have fixed it now.

-- 
Utkarsh Singh
http://utkarshsingh.xyz


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use
  2021-05-17  5:29                         ` Bastien
  2021-05-17 16:27                           ` Utkarsh Singh
@ 2021-06-01 16:23                           ` Maxim Nikulin
  2021-06-01 17:46                             ` Utkarsh Singh
  1 sibling, 1 reply; 33+ messages in thread
From: Maxim Nikulin @ 2021-06-01 16:23 UTC (permalink / raw)
  To: emacs-orgmode; +Cc: Bastien, Utkarsh Singh

On 17/05/2021 12:29, Bastien wrote:
> Utkarsh Singh <utkarsh190601@gmail.com> writes:
>> For now can you review the patches I proposed earlier in this
>> thread?
> 
> Not until both you and Maxim are confident this is useful, complete
> and predictable.

I have too many points to object to consider my opinion as objective.

I am unsure if colon as a separator (passwd "db") is widely used case. I 
was surprised that it is impossible to implement locale-aware detection 
of semicolon as separator due to limitations of Emacs that is not 
friendly in respect to internationalization of number formatting.

Bastien, I do not know how much tests you prefer to have in Org. Nicolas 
asked for extensive tests 
https://orgmode.org/list/875z07jx6n.fsf@nicolasgoaziou.fr I think, 
before starting work on tests, it is necessary to decide if the patches 
are acceptable in general.

Personally, I have realized that I would prefer to have anything related 
to CSV (besides very basic features) in a dedicated package, e.g. in 
csv-mode. It might define a special yank handler for copy-paste to 
org-mode. Unsure if the author of csv-mode agrees with my point of view. 
Another option is to pass files through python code to take advantage of 
more advanced heuristics.

On 15/05/2021 18:09, https://orgmode.org/list/87im3kdzi5.fsf@gmail.com 
Utkarsh Singh wrote:
> --- a/lisp/org-table.el
> +++ b/lisp/org-table.el
> @@ -954,7 +954,8 @@ lines.  It can have the following values:
>  - (64)    Prompt for a regular expression as field separator.
>  - integer When a number, use that many spaces, or a TAB, as field separator.
>  - regexp  When a regular expression, use it to match the separator."
> -  (interactive "f\nP")
> +  (interactive (list (read-file-name "Import file: ")
> +                     (prefix-numeric-value current-prefix-arg)))
>    (when (and (called-interactively-p 'any)

Sending patches, I am afraid to break something I was not aware about.

I have read docstrings for the modified functions and tried to import 
the following file "tbl.csv" with some CSV features. LibreOffice imports 
it correctly, it even normalizes 66.3e-35 to 6.63e-34 (I can not say 
that I always appreciate such silent modifications).

1,Word,66.3e-35
2,Unquoted cell,2.7
3,"Quoted cell",3.14
4,"Cell ""with quotes""",2021-06-01
5,"Next cell is empty",""
6,"Cell with new
Line",6.28

My optimistic expectation was (OK, I did not believe I got such result):

| 1 | Word               |   66.3e-35 |
| 2 | Unquoted cell      |        2.7 |
| 3 | Quoted cell        |       3.14 |
| 4 | Cell "with quotes" | 2021-06-01 |
| 5 | Next cell is empty |            |
| 6 | Cell with new      |       6.28 |
|   | Line               |            |

Org 9.1.9 M-x org-table-import
and C-u M-x org-table-import
actual results are close enough to my expectations

|     1 | Word               |   66.3e-35 |
|     2 | Unquoted cell      |        2.7 |
|     3 | Quoted cell        |       3.14 |
|     4 | Cell "with quotes" | 2021-06-01 |
|     5 | Next cell is empty |            |
|     6 | "Cell with new     |            |
| Line" | 6.28               |            |

M-x org-table-import RET tbl RET
completes file name to tbl.csv

Org 9.4.5+patches M-x org-table-import

| 1,Word,66.3e-35 |            |                      |           |
| 2,Unquoted      | cell,2.7   |                      |           |
| 3,"Quoted       | cell",3.14 |                      |           |
| 4,"Cell         | ""with     | quotes""",2021-06-01 |           |
| 5,"Next         | cell       | is                   | empty","" |
| 6,"Cell         | with       | new                  |           |
| Line",6.28      |            |                      |           |

Org 9.4.5+patches C-u M-x org-table-import

| 1,Word,66.3e-35                     |
| 2,Unquoted cell,2.7                 |
| 3,"Quoted cell",3.14                |
| 4,"Cell ""with quotes""",2021-06-01 |
| 5,"Next cell is empty",""           |
| 6,"Cell with new                    |
| Line",6.28                          |

M-x org-table-import RET tbl RET
complains that file name extension is not txt, tsv or csv.

So my personal conclusion is that CSV file is imported incorrectly in 
both cases: with guessed separator and with explicitly requested through 
prefix argument. Completion works a bit worse too.

One more note concerning locale support.

On 18/05/2021 22:05, Utkarsh Singh wrote:
 > On 2021-05-18, 19:31 +0700, Maxim Nikulin wrote:
>> The question may be risen in emacs-devel but I am unsure if I will 
>> participate in discussion.
> 
> Why?

I am aware of some problems related to localization but I do not have 
consistent vision what API emacs should have. I have no idea what 
information is available in Windows. That is why I expect that 
discussion may be time consuming while I am not sure that someone will 
be ready to implement new features.



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use
  2021-06-01 16:23                           ` Maxim Nikulin
@ 2021-06-01 17:46                             ` Utkarsh Singh
  2021-06-02 12:06                               ` Maxim Nikulin
  0 siblings, 1 reply; 33+ messages in thread
From: Utkarsh Singh @ 2021-06-01 17:46 UTC (permalink / raw)
  To: Maxim Nikulin; +Cc: Bastien, emacs-orgmode

Hi Maxim,

First of all I would like to thank you for testing out patches and
taking time to write a detailed report about it.

On 2021-06-01, 23:23 +0700, Maxim Nikulin <manikulin@gmail.com> wrote:

> On 17/05/2021 12:29, Bastien wrote:
>> Utkarsh Singh <utkarsh190601@gmail.com> writes:
>>> For now can you review the patches I proposed earlier in this
>>> thread?
>> 
>> Not until both you and Maxim are confident this is useful, complete
>> and predictable.
>
> I have too many points to object to consider my opinion as objective.
>
> Org 9.4.5+patches M-x org-table-import
>
> | 1,Word,66.3e-35 |            |                      |           |
> | 2,Unquoted      | cell,2.7   |                      |           |
> | 3,"Quoted       | cell",3.14 |                      |           |
> | 4,"Cell         | ""with     | quotes""",2021-06-01 |           |
> | 5,"Next         | cell       | is                   | empty","" |
> | 6,"Cell         | with       | new                  |           |
> | Line",6.28      |            |                      |           |
>
> So my personal conclusion is that CSV file is imported incorrectly in 
> both cases: with guessed separator and with explicitly requested through 
> prefix argument. Completion works a bit worse too.

Currently `org-table-guess-separator' returns "," (COMMA as string) and
`org-table-covert-region' uses '(4) to represent COMMA as separator
which is the main cause of breakdown in importing.

To make importing work well we have to:

+ Guess right separator (`org-table-guess-separator')
+ Parse CSV with this separator (`org-table-covert-region')

As far as I can tell "guessing part" works well and now we just have to
make parser work well with new separators.

-- 
Utkarsh Singh
http://utkarshsingh.xyz


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use
  2021-06-01 17:46                             ` Utkarsh Singh
@ 2021-06-02 12:06                               ` Maxim Nikulin
  2021-06-02 15:08                                 ` Utkarsh Singh
  0 siblings, 1 reply; 33+ messages in thread
From: Maxim Nikulin @ 2021-06-02 12:06 UTC (permalink / raw)
  To: emacs-orgmode

On 02/06/2021 00:46, Utkarsh Singh wrote:
>> Org 9.4.5+patches M-x org-table-import
>>
>> | 1,Word,66.3e-35 |            |                      |           |
>> | 2,Unquoted      | cell,2.7   |                      |           |
>> | 3,"Quoted       | cell",3.14 |                      |           |
>> | 4,"Cell         | ""with     | quotes""",2021-06-01 |           |
>> | 5,"Next         | cell       | is                   | empty","" |
>> | 6,"Cell         | with       | new                  |           |
>> | Line",6.28      |            |                      |           |
>>
>> So my personal conclusion is that CSV file is imported incorrectly in
>> both cases: with guessed separator and with explicitly requested through
>> prefix argument. Completion works a bit worse too.
> 
> Currently `org-table-guess-separator' returns "," (COMMA as string) and
> `org-table-covert-region' uses '(4) to represent COMMA as separator
> which is the main cause of breakdown in importing.
> 
> To make importing work well we have to:
> 
> + Guess right separator (`org-table-guess-separator')
> + Parse CSV with this separator (`org-table-covert-region')
> 
> As far as I can tell "guessing part" works well and now we just have to
> make parser work well with new separators.

Notice that for "C-u M-x org-table-import" heuristics is not necessary, 
the separator is specified explicitly. I see that your intention was to 
improve user interface of org-table-import, but actually you broke it by 
the "org-table-import: add file prompt" patch. I have not tried it, but 
my expectation is that user prompt can be customized while keeping all 
other things working. Maybe the docstring should be updated to make it 
more friendly to novices (like me while reviewing your patch).





^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use
  2021-06-02 12:06                               ` Maxim Nikulin
@ 2021-06-02 15:08                                 ` Utkarsh Singh
  2021-06-02 16:44                                   ` Maxim Nikulin
  0 siblings, 1 reply; 33+ messages in thread
From: Utkarsh Singh @ 2021-06-02 15:08 UTC (permalink / raw)
  To: Maxim Nikulin, emacs-orgmode

On 2021-06-02, 19:06 +0700, Maxim Nikulin <manikulin@gmail.com> wrote:

> Notice that for "C-u M-x org-table-import" heuristics is not necessary, 
> the separator is specified explicitly. I see that your intention was to 
> improve user interface of org-table-import, but actually you broke it by 
> the "org-table-import: add file prompt" patch. I have not tried it, but 
> my expectation is that user prompt can be customized while keeping all 
> other things working. Maybe the docstring should be updated to make it 
> more friendly to novices (like me while reviewing your patch).

Thanks for pointing out the error, locally you can fix it by (patches
for patches):

;;;###autoload
 (defun org-table-import (file separator)
@@ -955,12 +971,13 @@ lines.  It can have the following values:
 - integer When a number, use that many spaces, or a TAB, as field separator.
 - regexp  When a regular expression, use it to match the separator."
   (interactive (list (read-file-name "Import file: ")
-                     (prefix-numeric-value current-prefix-arg)))
+                     current-prefix-arg))

Currently I am trying to refactor CSV parsing by applying techniques
used in pcsv library (MELPA package) which I think you will also enjoy
to play with!

-- 
Utkarsh Singh
http://utkarshsingh.xyz


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use
  2021-06-02 15:08                                 ` Utkarsh Singh
@ 2021-06-02 16:44                                   ` Maxim Nikulin
  2021-06-04  4:04                                     ` Utkarsh Singh
  0 siblings, 1 reply; 33+ messages in thread
From: Maxim Nikulin @ 2021-06-02 16:44 UTC (permalink / raw)
  To: emacs-orgmode

On 02/06/2021 22:08, Utkarsh Singh wrote:
> ;;;###autoload
>   (defun org-table-import (file separator)
> @@ -955,12 +971,13 @@ lines.  It can have the following values:
>   - integer When a number, use that many spaces, or a TAB, as field separator.
>   - regexp  When a regular expression, use it to match the separator."
>     (interactive (list (read-file-name "Import file: ")
> -                     (prefix-numeric-value current-prefix-arg)))
> +                     current-prefix-arg))

It seems, prefix argument works now. Let me remind that file name 
completion was working better before your change.

I have noticed a couple of error messages. Unsure what is going wrong, I 
hope, command to launch emacs is correct (-Q -L ~/src/org-mode/lisp 
test.org).

At startup:
Eager macro-expansion failure: (error "rx ‘not’ syntax error: (or 10 44)")

In response to M-x org-table-import:
rx-check-not: rx ‘not’ syntax error: (or 10 44)

> Currently I am trying to refactor CSV parsing by applying techniques
> used in pcsv library (MELPA package) which I think you will also enjoy
> to play with!

I do not know opinion of Org maintainers. Personally I believe that org 
can take advantage of Emacs core features or of other packages if they 
are available and fallback to minimal implementation otherwise. Unsure 
whether borrowing code from pcsv can cause license issues. Current CSV 
parser is not perfect but it works reasonably well.



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use
  2021-06-02 16:44                                   ` Maxim Nikulin
@ 2021-06-04  4:04                                     ` Utkarsh Singh
  2021-06-05 12:40                                       ` Maxim Nikulin
  0 siblings, 1 reply; 33+ messages in thread
From: Utkarsh Singh @ 2021-06-04  4:04 UTC (permalink / raw)
  To: Maxim Nikulin, emacs-orgmode

On 2021-06-02, 23:44 +0700, Maxim Nikulin <manikulin@gmail.com> wrote:

> On 02/06/2021 22:08, Utkarsh Singh wrote:
>> ;;;###autoload
>>   (defun org-table-import (file separator)
>> @@ -955,12 +971,13 @@ lines.  It can have the following values:
>>   - integer When a number, use that many spaces, or a TAB, as field separator.
>>   - regexp  When a regular expression, use it to match the separator."
>>     (interactive (list (read-file-name "Import file: ")
>> -                     (prefix-numeric-value current-prefix-arg)))
>> +                     current-prefix-arg))
>
> It seems, prefix argument works now. Let me remind that file name 
> completion was working better before your change.
>
> I have noticed a couple of error messages. Unsure what is going wrong, I 
> hope, command to launch emacs is correct (-Q -L ~/src/org-mode/lisp 
> test.org).
>
> At startup:
> Eager macro-expansion failure: (error "rx ‘not’ syntax error: (or 10 44)")
>
> In response to M-x org-table-import:
> rx-check-not: rx ‘not’ syntax error: (or 10 44)
>

Is `rx' library loading correctly? If no then you can try:

diff --git a/lisp/org-table.el b/lisp/org-table.el
index 20144c91b..9278f91bb 100644
--- a/lisp/org-table.el
+++ b/lisp/org-table.el
@@ -38,6 +38,7 @@
 (require 'org-macs)
 (require 'org-compat)
 (require 'org-keys)
+(require 'rx)

Can you also share test.org so that I can test locally?

>> Currently I am trying to refactor CSV parsing by applying techniques
>> used in pcsv library (MELPA package) which I think you will also enjoy
>> to play with!
>
> I do not know opinion of Org maintainers. Personally I believe that org 
> can take advantage of Emacs core features or of other packages if they 
> are available and fallback to minimal implementation otherwise. Unsure 
> whether borrowing code from pcsv can cause license issues. Current CSV 
> parser is not perfect but it works reasonably well.

Yes, I think you are right about how Org should make use of existing
feature rather than including everyting with itself but for now I am
considering this as an opportunity to learn how Org and Emacs work and
will leave it upto the maintainers if they find anything useful to be
including into Org itself.

Now on the topic of CSV parser, I have succesfully implemented a CSV
parser in less than 65 LOC which also preserves newline character but I
am facing a dilemma on how should I represent it as a Org table.  For
ex:

Row with newline as CSV data:

6,"Cell with new
Line",6.28

Row with newline as Lisp data:

("6" "Cell with new\nLine" "6.28")

Row with newline a Org table:

| 6 | Cell with new      |       6.28 |
|   | Line               |            |

Rows in Org mode are separated using a newline-character which makes
"Cell with new" and "Line" as different cells which is different from
how Libreoffice Calc represents it.  Libreoffice Calc has a much
powerfull representation of a cell which is out of scope for any plain
text tables.  Possible solutions for this problem are:

1. Replace newline with a space character.  What will happen when
imported line as a really long line?

2. Represent every newline as distinct cell.  I was trying to implement
this method but the algorithm that I came up with requires time
complexity of O(n^2) to just print each row which is too much for large
CSV files.

(defun org-table--print-row (row)
  (let ((maxlines 1)
        list1 list2 tmp)
    ;; get maxlines
    (dolist (i row)
      (if (not (stringp i))
          (push (list i) list1)
        (push (string-lines i) list1)
        (setq tmp (length (car list1)))
        (when (< maxlines tmp)
          (setq maxlines tmp))))
    ;; normalize row
    (dolist (i list1)
      (setq tmp (length i))
      (when (< tmp maxlines)
        (setq i (append i (make-list (- maxlines tmp) ""))))
      (push i list2))
    ;; print row
    (dolist (i list2)
      (setq tmp (point))
      (dolist (j i))))) <-- Printing is still not complete

Thank you,
Utkarsh Singh

-- 
Utkarsh Singh
http://utkarshsingh.xyz


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use
  2021-06-04  4:04                                     ` Utkarsh Singh
@ 2021-06-05 12:40                                       ` Maxim Nikulin
  2021-06-05 17:50                                         ` Utkarsh Singh
  0 siblings, 1 reply; 33+ messages in thread
From: Maxim Nikulin @ 2021-06-05 12:40 UTC (permalink / raw)
  To: emacs-orgmode

On 04/06/2021 11:04, Utkarsh Singh wrote:
> On 2021-06-02, 23:44 +0700, Maxim Nikulin wrote:
> 
>> On 02/06/2021 22:08, Utkarsh Singh wrote:
>>> ;;;###autoload
>>>    (defun org-table-import (file separator)
>>> @@ -955,12 +971,13 @@ lines.  It can have the following values:
>>>    - integer When a number, use that many spaces, or a TAB, as field separator.
>>>    - regexp  When a regular expression, use it to match the separator."
>>>      (interactive (list (read-file-name "Import file: ")
>>> -                     (prefix-numeric-value current-prefix-arg)))
>>> +                     current-prefix-arg))
>>
>> It seems, prefix argument works now. Let me remind that file name
>> completion was working better before your change.

If you are still going to update this patch, I recommend you to check if 
`list' form of `interactive' declaration is supported by Emacs 24.3. 
Recently Kyle had to almost revert one my patch. Fortunately a simpler 
alternative supported by old Emacs versions were already discussed.

>> I have noticed a couple of error messages. Unsure what is going wrong, I
>> hope, command to launch emacs is correct (-Q -L ~/src/org-mode/lisp
>> test.org).
>>
>> At startup:
>> Eager macro-expansion failure: (error "rx ‘not’ syntax error: (or 10 44)")
>>
>> In response to M-x org-table-import:
>> rx-check-not: rx ‘not’ syntax error: (or 10 44)
> 
> Is `rx' library loading correctly? If no then you can try:
> 
> +(require 'rx)

This line should be added to org-table.el, but notice "rx ‘not’ syntax 
error: (or ". This logical operators have a bit different meaning for 
regular expression.

Nicolas suggested you to use `list' to avoid `eval'. Certainly `eval' 
should be avoided whenever possible, but there is alternative with "`" 
and "," instead of "'". Though I often prefer `list'.

> 
> Can you also share test.org so that I can test locally?

I can reproduce it with an empty org file. Emacs-26.3. So I am curious 
if your tested your patch at all.

> Now on the topic of CSV parser, I have succesfully implemented a CSV
> parser in less than 65 LOC which also preserves newline character but I
> am facing a dilemma on how should I represent it as a Org table.  For
> ex:
...
> 6,"Cell with new
> Line",6.28

Due to
https://orgmode.org/worg/org-faq.html#Tables
"Will there ever be support for multiple lines in a table field? No."
(there are may be some tricks in Org manual) any variant is acceptable, 
even current one. Since during export split cell is considered as 
different rows, it is responsibility of the user to preprocess original 
file and import it in a form suitable for Org.

However independently of chosen variant, it would be great to get list 
of warnings with ambiguous lines similar to compiler errors. Such 
feature is not implemented in Org yet, the most close is `org-lint' 
result (its behavior differs from compiler errors).

> 2. Represent every newline as distinct cell.  I was trying to implement
> this method but the algorithm that I came up with requires time
> complexity of O(n^2) to just print each row which is too much for large
> CSV files.
> 
> (defun org-table--print-row (row)
>    (let ((maxlines 1)
>          list1 list2 tmp)
>      ;; get maxlines
>      (dolist (i row)

I expect complexity O(number of lines in result). I do not like 
excessive usage of `dolist'. I would try slicing subrows using `mapcar' 
while there is at least one non-nil element.

P.S. Have you read "Structure and Interpretation of Computer Programs" 
by Abelson, Sussman, and Sussman?
https://sarabander.github.io/sicp/html/index.xhtml
or https://mitpress.mit.edu/sites/default/files/sicp/index.html
While reading, it is necessary to realize that elisp is not scheme, tail 
recursion should be avoided in elisp and there are over differences.



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use
  2021-06-05 12:40                                       ` Maxim Nikulin
@ 2021-06-05 17:50                                         ` Utkarsh Singh
  2021-06-09 12:15                                           ` Maxim Nikulin
  2021-09-26  8:40                                           ` Bastien
  0 siblings, 2 replies; 33+ messages in thread
From: Utkarsh Singh @ 2021-06-05 17:50 UTC (permalink / raw)
  To: Maxim Nikulin, emacs-orgmode

On 2021-06-05, 19:40 +0700, Maxim Nikulin <manikulin@gmail.com> wrote:

> This line should be added to org-table.el, but notice "rx ‘not’ syntax 
> error: (or ". This logical operators have a bit different meaning for 
> regular expression.
>
> Nicolas suggested you to use `list' to avoid `eval'. Certainly `eval' 
> should be avoided whenever possible, but there is alternative with "`" 
> and "," instead of "'". Though I often prefer `list'.

Done.

>> Can you also share test.org so that I can test locally?
>
> I can reproduce it with an empty org file. Emacs-26.3. So I am curious 
> if your tested your patch at all.

Yes you are right I was not testing my patches using 'emacs -Q' and I am
apologetic about it.  I just "tested" most of them by M-x eval-buffer on
the patched file and then checked the output of `with-temp-buffer'.

>> Now on the topic of CSV parser, I have succesfully implemented a CSV
>> parser in less than 65 LOC which also preserves newline character but I
>> am facing a dilemma on how should I represent it as a Org table.  For
>> ex:
> ...
>> 6,"Cell with new
>> Line",6.28
>
> Due to
> https://orgmode.org/worg/org-faq.html#Tables
> "Will there ever be support for multiple lines in a table field? No."
> (there are may be some tricks in Org manual) any variant is acceptable, 
> even current one. Since during export split cell is considered as 
> different rows, it is responsibility of the user to preprocess original 
> file and import it in a form suitable for Org.
>
> However independently of chosen variant, it would be great to get list 
> of warnings with ambiguous lines similar to compiler errors. Such 
> feature is not implemented in Org yet, the most close is `org-lint' 
> result (its behavior differs from compiler errors).

After reading FAQ about multiple lines in table field I don't think this
PATCH makes much sense as my main driving force for this patch was to
simplify regular expression in `org-table-convert-region' and somehow
add newlines support.

I also think `org-table-convert-region' is doing a fine job and I should
keep my "I want to contribute...." habit in check.

> I expect complexity O(number of lines in result). I do not like 
> excessive usage of `dolist'. I would try slicing subrows using `mapcar' 
> while there is at least one non-nil element.

Yes! Currently I am trying to sequentially read Elisp manual and I have
completed the section on 'Mapping Function' yesterday.  I think it's a
really nice language feature that I can make use of.

> P.S. Have you read "Structure and Interpretation of Computer Programs" 
> by Abelson, Sussman, and Sussman?
> https://sarabander.github.io/sicp/html/index.xhtml
> or https://mitpress.mit.edu/sites/default/files/sicp/index.html
> While reading, it is necessary to realize that elisp is not scheme, tail 
> recursion should be avoided in elisp and there are over differences.

Thank you! I have seen most of the decision on Lisp includes SICP but I
am currently stopping myself to Elisp manual to learn something useful
about this timeless editor and learn thing in a step-by-step manner.

I also tried Racket after reading:

https://dustycloud.org/blog/racket-is-an-acceptable-python
https://beautifulracket.com/

But stopped myself for the reason same as above.

-- 
Utkarsh Singh
http://utkarshsingh.xyz


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use
  2021-06-05 17:50                                         ` Utkarsh Singh
@ 2021-06-09 12:15                                           ` Maxim Nikulin
  2021-09-26  8:40                                           ` Bastien
  1 sibling, 0 replies; 33+ messages in thread
From: Maxim Nikulin @ 2021-06-09 12:15 UTC (permalink / raw)
  To: emacs-orgmode

On 06/06/2021 00:50, Utkarsh Singh wrote:
> On 2021-06-05, 19:40 +0700, Maxim Nikulin wrote:
> 
>> This line should be added to org-table.el, but notice "rx ‘not’ syntax
>> error: (or ". This logical operators have a bit different meaning for
>> regular expression.
> 
> I just "tested" most of them by M-x eval-buffer on
> the patched file and then checked the output of `with-temp-buffer'.

I get "Eager macro-expansion failure: (error "rx ‘not’ syntax error: (or 
10 32)")" even in response to M-x eval-buffer.

> After reading FAQ about multiple lines in table field I don't think this
> PATCH makes much sense as my main driving force for this patch was to
> simplify regular expression in `org-table-convert-region' and somehow
> add newlines support.

You may check whether csv-mode have got support of newline inside cell 
already and try to sell your parser to them.

There are regular discussions how wonderful it would be to extend org 
table syntax. In my opinion org spreadsheet is overloaded already. On 
the other hand, newline in table cell sometimes is really useful. I have 
seen an idea to put some macro to single-line cell that expands to 
format-specific representation during export ({{{nl}}} => <br> in the 
case of HTML). Such macro does not require extension of syntax, so it 
may be viable at least in user configuration.

>> P.S. Have you read "Structure and Interpretation of Computer Programs"
> I also tried Racket after reading:

SICP is dedicated to general concepts rather than to particular 
language, that is why I consider it as a "special case".




^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use
  2021-06-05 17:50                                         ` Utkarsh Singh
  2021-06-09 12:15                                           ` Maxim Nikulin
@ 2021-09-26  8:40                                           ` Bastien
  1 sibling, 0 replies; 33+ messages in thread
From: Bastien @ 2021-09-26  8:40 UTC (permalink / raw)
  To: Utkarsh Singh; +Cc: Maxim Nikulin, emacs-orgmode

Hi,

I lost track of the discussion in this thread.  

My understanding is that there is no agreement on what feature 
should be added to Org regarding "smarter interactive table import"
and a patch is not yet ready for the suggested feature.

I'm marking this as canceled on updates.orgmode.org but feel free
to correct me if needed.

Thanks,

-- 
 Bastien


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2021-09-26  8:41 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-19  4:43 [PATCH] org-table-import: Make it more smarter for interactive use Utkarsh Singh
2021-04-19  8:19 ` Nicolas Goaziou
2021-04-19 14:23   ` Utkarsh Singh
2021-04-20 13:40     ` Nicolas Goaziou
2021-04-20 17:15       ` Utkarsh Singh
2021-04-23  4:58       ` Utkarsh Singh
2021-04-27 20:21         ` bug#47885: " Nicolas Goaziou
2021-04-28  8:37           ` Utkarsh Singh
2021-04-28 16:38             ` Maxim Nikulin
2021-05-10 18:36               ` Utkarsh Singh
2021-05-12 17:08                 ` Maxim Nikulin
2021-05-14 14:54                   ` Utkarsh Singh
2021-05-15  9:13                     ` Bastien
2021-05-15 10:10                       ` Utkarsh Singh
2021-05-15 10:30                         ` Bastien
2021-05-15 11:09                           ` Utkarsh Singh
2021-05-17  5:29                         ` Bastien
2021-05-17 16:27                           ` Utkarsh Singh
2021-06-01 16:23                           ` Maxim Nikulin
2021-06-01 17:46                             ` Utkarsh Singh
2021-06-02 12:06                               ` Maxim Nikulin
2021-06-02 15:08                                 ` Utkarsh Singh
2021-06-02 16:44                                   ` Maxim Nikulin
2021-06-04  4:04                                     ` Utkarsh Singh
2021-06-05 12:40                                       ` Maxim Nikulin
2021-06-05 17:50                                         ` Utkarsh Singh
2021-06-09 12:15                                           ` Maxim Nikulin
2021-09-26  8:40                                           ` Bastien
2021-05-16 16:24                     ` Maxim Nikulin
2021-05-17 16:30                       ` Utkarsh Singh
2021-05-18 10:24                       ` Utkarsh Singh
2021-05-18 12:31                         ` Maxim Nikulin
2021-05-18 15:05                           ` Utkarsh Singh

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).