* [BUG] ob-shell: results missing leading quotes
@ 2024-04-28 13:15 Matt
2024-04-29 11:58 ` Ihor Radchenko
0 siblings, 1 reply; 14+ messages in thread
From: Matt @ 2024-04-28 13:15 UTC (permalink / raw)
To: emacs-orgmode
While investigating "[BUG] ob-shell: :shebang changes interpretation of :cmdline" (https://list.orgmode.org/orgmode/18f01342a2f.124ad27612732529.8693431365849276517@excalamus.com/), behavior like the following was observed in which leading quote marks are sometimes removed from results:
#+begin_src bash :results raw
echo \"\"1\"\"
#+end_src
#+RESULTS:
""1""
#+begin_src bash
echo \"\"1\"\"
#+end_src
#+RESULTS:
: 1""
#+begin_src bash :results output
echo \"\"1\"\"
#+end_src
#+RESULTS:
: ""1""
Org mode version 9.7-pre (release_9.6.27-1393-ge0f24a @ /home/ahab/Projects/org-mode/lisp/), commit e0f24a3f6
--
Matt Trzcinski
Emacs Org contributor (ob-shell)
Learn more about Org mode at https://orgmode.org
Support Org development at https://liberapay.com/org-mode
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG] ob-shell: results missing leading quotes
2024-04-28 13:15 [BUG] ob-shell: results missing leading quotes Matt
@ 2024-04-29 11:58 ` Ihor Radchenko
2024-05-01 10:48 ` Max Nikulin
0 siblings, 1 reply; 14+ messages in thread
From: Ihor Radchenko @ 2024-04-29 11:58 UTC (permalink / raw)
To: Matt; +Cc: emacs-orgmode
Matt <matt@excalamus.com> writes:
> #+begin_src bash
> echo \"\"1\"\"
> #+end_src
>
> #+RESULTS:
> : 1""
Fixed, partially, on main.
https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=89c68683f
The problem was auto-detection of table data - the algorithm used CSV
parser as a fallback and ""1"" is a very peculiar data from the point of
view of CSV spec.
I now changed the code to not try parsing CSV when a single-line data
does not contain commas.
The new behavior is
#+begin_src bash
echo \"\"1\"\"
#+end_src
#+RESULTS:
: 1
The nested "..." are stripped on purpose via `org-babel-string-read'.
Although this function dates back to R output processing and I do not
fully understand why stripping nested quotes is useful for all possible
babel backends. But that's a completely different story.
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG] ob-shell: results missing leading quotes
2024-04-29 11:58 ` Ihor Radchenko
@ 2024-05-01 10:48 ` Max Nikulin
2024-05-01 12:19 ` Importing "quoted" strings in `org-babel-import-elisp-from-file' (was: [BUG] ob-shell: results missing leading quotes) Ihor Radchenko
0 siblings, 1 reply; 14+ messages in thread
From: Max Nikulin @ 2024-05-01 10:48 UTC (permalink / raw)
To: emacs-orgmode
On 29/04/2024 18:58, Ihor Radchenko wrote:
>
> The nested "..." are stripped on purpose via `org-babel-string-read'.
> Although this function dates back to R output processing and I do not
> fully understand why stripping nested quotes is useful for all possible
> babel backends. But that's a completely different story.
I would say heuristics is quite fragile
#+begin_src elisp :results verbatim
(org-babel-string-read "\"1 2\" 3 \"4 5\"")
#+end_src
#+RESULTS:
: "1 2\" 3 \"4 5"
Leading and trailing quote characters are not stripped in the case of
#+begin_src elisp :results verbatim
(org-babel-read "\"1 2\" 3 \"4 5\"" t)
#+end_src
#+RESULTS:
: "\"1 2\" 3 \"4 5\""
^ permalink raw reply [flat|nested] 14+ messages in thread
* Importing "quoted" strings in `org-babel-import-elisp-from-file' (was: [BUG] ob-shell: results missing leading quotes)
2024-05-01 10:48 ` Max Nikulin
@ 2024-05-01 12:19 ` Ihor Radchenko
2024-05-02 17:56 ` Importing "quoted" strings in `org-babel-import-elisp-from-file' Max Nikulin
0 siblings, 1 reply; 14+ messages in thread
From: Ihor Radchenko @ 2024-05-01 12:19 UTC (permalink / raw)
To: Max Nikulin; +Cc: emacs-orgmode
Max Nikulin <manikulin@gmail.com> writes:
> On 29/04/2024 18:58, Ihor Radchenko wrote:
>>
>> The nested "..." are stripped on purpose via `org-babel-string-read'.
>> Although this function dates back to R output processing and I do not
>> fully understand why stripping nested quotes is useful for all possible
>> babel backends. But that's a completely different story.
>
> I would say heuristics is quite fragile
>
> #+begin_src elisp :results verbatim
> (org-babel-string-read "\"1 2\" 3 \"4 5\"")
> #+end_src
>
> #+RESULTS:
> : "1 2\" 3 \"4 5"
>
> Leading and trailing quote characters are not stripped in the case of
>
> #+begin_src elisp :results verbatim
> (org-babel-read "\"1 2\" 3 \"4 5\"" t)
> #+end_src
Let's zoom out a bit.
`org-babel-string-read' is used exclusively by `org-babel-import-elisp-from-file'
Its effect is making the following tables look same for Org babel:
| 1 | 2 |
| 1 | 2 |
| "1" | "2" |
| 1 | 2 |
I cannot find justification - this behavior has been introduced in ob-R
from the very beginning (over 15 years ago) and then ported to ob-core.
IMHO, the current behavior is not correct. Moreover, it is unexpected.
Consider
#+begin_src C :results output table :includes <stdio.h>
printf("1 2\n\"1\" \"2\"");
#+end_src
#+RESULTS[30c03a8e3f8a71e03d7892e6817cb45713bec424]:
| 1 | 2 |
| 1 | 2 |
Why are the quotes dropped?
And replacing the call to org-babel-string-read with org-babel-read does not
break any tests...
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Importing "quoted" strings in `org-babel-import-elisp-from-file'
2024-05-01 12:19 ` Importing "quoted" strings in `org-babel-import-elisp-from-file' (was: [BUG] ob-shell: results missing leading quotes) Ihor Radchenko
@ 2024-05-02 17:56 ` Max Nikulin
2024-05-03 12:06 ` Ihor Radchenko
0 siblings, 1 reply; 14+ messages in thread
From: Max Nikulin @ 2024-05-02 17:56 UTC (permalink / raw)
To: emacs-orgmode
On 01/05/2024 19:19, Ihor Radchenko wrote:
> And replacing the call to org-babel-string-read with org-babel-read does not
> break any tests...
Since
005e68294 2009-06-11 17:04:42 -0700 Eric Schulte: making progress
bringing org-babel-R.el into the new evaluation schema
`org-babel-string-read' has been stripping quotes, but `org-string-read'
got this feature later in
60a8ba556 2011-03-02 07:55:39 -0700 Eric Schulte: ob: read string
variable values wrapped in double quotes, removing the quotes
So now outer quotes are stripped twice.
(org-babel-string-read "\"\"1\"\"")
"1"
(org-babel-read "\"\"1\"\"" t)
"\"\"1\"\""
There are still more subtle differences between these functions:
(org-babel-string-read "\"1\" 2 \"3\"")
"1\" 2 \"3"
(org-babel-read "\"1\" 2 \"3\"" t)
"\"1\" 2 \"3\""
What I do not like in `org-babel-read' is false positive for escaped
quote when actually backslash is escaped:
(org-babel-read "\"1\\\\\" 2 \\\\\"3\"" t)
"1\\"
(org-babel-read "\"1\\\\|\" 2 \\\\|\"3\"" t)
"\"1\\\\|\" 2 \\\\|\"3\""
Test suite is far from being exhaustive.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Importing "quoted" strings in `org-babel-import-elisp-from-file'
2024-05-02 17:56 ` Importing "quoted" strings in `org-babel-import-elisp-from-file' Max Nikulin
@ 2024-05-03 12:06 ` Ihor Radchenko
2024-05-04 7:53 ` Max Nikulin
0 siblings, 1 reply; 14+ messages in thread
From: Ihor Radchenko @ 2024-05-03 12:06 UTC (permalink / raw)
To: Max Nikulin; +Cc: emacs-orgmode
Max Nikulin <manikulin@gmail.com> writes:
> What I do not like in `org-babel-read' is false positive for escaped
> quote when actually backslash is escaped:
>
> (org-babel-read "\"1\\\\\" 2 \\\\\"3\"" t)
> "1\\"
This is a bug.
Fixed, on main.
https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=2028bb15c
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Importing "quoted" strings in `org-babel-import-elisp-from-file'
2024-05-03 12:06 ` Ihor Radchenko
@ 2024-05-04 7:53 ` Max Nikulin
2024-05-04 8:03 ` Ihor Radchenko
0 siblings, 1 reply; 14+ messages in thread
From: Max Nikulin @ 2024-05-04 7:53 UTC (permalink / raw)
To: emacs-orgmode
[-- Attachment #1: Type: text/plain, Size: 1156 bytes --]
On 03/05/2024 19:06, Ihor Radchenko wrote:
> Max Nikulin writes:
>
>> What I do not like in `org-babel-read' is false positive for escaped
>> quote when actually backslash is escaped:
>>
>> (org-babel-read "\"1\\\\\" 2 \\\\\"3\"" t)
>> "1\\"
>
> Fixed, on main.
> https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=2028bb15c
I have no idea if "other\n\"string\"\nlines" may be passed
`org-babel-read', but it is not discarded by the current regexp:
"^[[:space:]]*\"\\(.*\\)\"[[:space:]]*$"
Is there a reason why it is necessary to call `read' twice on the same
content? From my point of view, result of first call may be returned.
Does `read' have other role than unescaping backslash-protected
characters? Likely it can be done by `replace-regexp-in-string', see the
attachment. I have tried regexp on the following string:
(let ((cases '(("" . nil)
("\"" . nil)
("\"\"" . t)
("\"\\\"" . nil)
("\"\\\\\"" . t)
("a" . nil)
("\"a\"" . t)
("\\\"a\\\"" . nil)
("\\\"a\\\\\"" . nil)
("\"a\\\"\"" . t)
("\"aa\\\"bb\"" . t)
(" \"aa( bb\"" . t))))
[-- Attachment #2: patch-ob-read-quoted-string.diff --]
[-- Type: text/x-patch, Size: 2050 bytes --]
diff --git a/lisp/ob-core.el b/lisp/ob-core.el
index 470db9fe6..a44d27cba 100644
--- a/lisp/ob-core.el
+++ b/lisp/ob-core.el
@@ -3346,6 +3346,26 @@ (defun org-babel-script-escape (str &optional force)
(t str))))
(condition-case nil (org-babel-read escaped) (error escaped))))
+(defconst org-babel--quoted-string-regexp
+ (rx string-start
+ (zero-or-more (or space ?\n))
+ ?\"
+ (group
+ (zero-or-more
+ ;; Anything besides double quotes ended not with backslash.
+ (zero-or-one (zero-or-more (not ?\"))
+ (not (or ?\" ?\\)))
+ ;; Skip backslashes escaping themselves.
+ (zero-or-more "\\\\")
+ ;; Escaped quotes are allowed.
+ (zero-or-one "\\\"")))
+ ?\"
+ (zero-or-more (or space ?\n))
+ string-end)
+ "Regexp matching single string in double quotes.
+Group 1 is text inside quotes. String may be optionally padded with
+spaces. Backslashes quote other characters.")
+
(defun org-babel-read (cell &optional inhibit-lisp-eval)
"Convert the string value of CELL to a number if appropriate.
Otherwise if CELL looks like Lisp (meaning it starts with a
@@ -3361,15 +3381,11 @@ (defun org-babel-read (cell &optional inhibit-lisp-eval)
;; FIXME: Arbitrary code evaluation.
(eval (read cell) t))
((save-match-data
- (and (string-match "^[[:space:]]*\"\\(.*\\)\"[[:space:]]*$" cell)
- ;; CELL is a single string
- (with-temp-buffer
- (insert cell)
- (goto-char 1)
- (read (current-buffer))
- (skip-chars-forward "[:space:]")
- (eobp))))
- (read cell))
+ (and (string-match org-babel--quoted-string-regexp cell)
+ ;; Unquote characters escaped by backslashes similar to `read'.
+ (replace-regexp-in-string
+ "\\\\\\(?:\\(.\\)\\|\n\\)" "\\1"
+ (match-string 1 cell) 'fixedcase nil))))
(t (org-no-properties cell))))
(defun org-babel--string-to-number (string)
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: Importing "quoted" strings in `org-babel-import-elisp-from-file'
2024-05-04 7:53 ` Max Nikulin
@ 2024-05-04 8:03 ` Ihor Radchenko
2024-05-04 11:17 ` Max Nikulin
0 siblings, 1 reply; 14+ messages in thread
From: Ihor Radchenko @ 2024-05-04 8:03 UTC (permalink / raw)
To: Max Nikulin; +Cc: emacs-orgmode
Max Nikulin <manikulin@gmail.com> writes:
>>> (org-babel-read "\"1\\\\\" 2 \\\\\"3\"" t)
>>> "1\\"
>>
>> Fixed, on main.
>> https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=2028bb15c
>
> I have no idea if "other\n\"string\"\nlines" may be passed
> `org-babel-read', but it is not discarded by the current regexp:
>
> "^[[:space:]]*\"\\(.*\\)\"[[:space:]]*$"
I do not see why we should limit things to single-line strings.
> Is there a reason why it is necessary to call `read' twice on the same
> content? From my point of view, result of first call may be returned.
Yes, it can.
> Does `read' have other role than unescaping backslash-protected
> characters? Likely it can be done by `replace-regexp-in-string', see the
> attachment. ...
> +(defconst org-babel--quoted-string-regexp
> ...
1. read is faster
2. read is less maintenance - we can rely upon robust implementation
provided by Emacs itself instead of doing something custom, with
potential bugs.
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Importing "quoted" strings in `org-babel-import-elisp-from-file'
2024-05-04 8:03 ` Ihor Radchenko
@ 2024-05-04 11:17 ` Max Nikulin
2024-05-04 11:51 ` Ihor Radchenko
0 siblings, 1 reply; 14+ messages in thread
From: Max Nikulin @ 2024-05-04 11:17 UTC (permalink / raw)
To: emacs-orgmode
On 04/05/2024 15:03, Ihor Radchenko wrote:
> Max Nikulin writes:
>> I have no idea if "other\n\"string\"\nlines" may be passed
>> `org-babel-read', but it is not discarded by the current regexp:
>>
>> "^[[:space:]]*\"\\(.*\\)\"[[:space:]]*$"
>
> I do not see why we should limit things to single-line strings.
Quotes are not stripped:
(org-babel-read "\"abc\nsdf\"" t)
"\"abc
sdf\""
(org-babel-read "(\n\"abc\"\n" t)
progn: End of file during parsing
My conclusion that the current regexp may give both false positives and
false negatives. `read` errors are not handled, but likely it is a
result of wrong guarding regexp.
> 1. read is faster
I would not argue since I have no benchmark. My expectation was that
replace is mostly no-op, so original string is returned while using
`read' requires some overhead for a temporary buffer.
> 2. read is less maintenance - we can rely upon robust implementation
> provided by Emacs itself instead of doing something custom, with
> potential bugs.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2024-05-04 19:34 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-28 13:15 [BUG] ob-shell: results missing leading quotes Matt
2024-04-29 11:58 ` Ihor Radchenko
2024-05-01 10:48 ` Max Nikulin
2024-05-01 12:19 ` Importing "quoted" strings in `org-babel-import-elisp-from-file' (was: [BUG] ob-shell: results missing leading quotes) Ihor Radchenko
2024-05-02 17:56 ` Importing "quoted" strings in `org-babel-import-elisp-from-file' Max Nikulin
2024-05-03 12:06 ` Ihor Radchenko
2024-05-04 7:53 ` Max Nikulin
2024-05-04 8:03 ` Ihor Radchenko
2024-05-04 11:17 ` Max Nikulin
2024-05-04 11:51 ` Ihor Radchenko
2024-05-04 14:55 ` Max Nikulin
2024-05-04 19:34 ` Ihor Radchenko
2024-05-04 15:58 ` Max Nikulin
2024-05-04 19:35 ` Ihor Radchenko
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).