emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Matt <matt@excalamus.com>
To: "Max Nikulin" <manikulin@gmail.com>
Cc: "emacs-orgmode" <emacs-orgmode@gnu.org>
Subject: [PATCH] Re: [BUG] ob-shell: :shebang changes interpretation of :cmdline
Date: Sun, 21 Apr 2024 17:09:06 +0200	[thread overview]
Message-ID: <18f01342a2f.124ad27612732529.8693431365849276517@excalamus.com> (raw)
In-Reply-To: <ujamo1$5t9$1@ciao.gmane.io>

[-- Attachment #1: Type: text/plain, Size: 7954 bytes --]


 ---- On Sat, 18 Nov 2023 16:54:39 +0100  Max Nikulin  wrote --- 
 > Hi,
 > 
 > Trying to figure out the origin of the confusion with
 > "bash -c bash /path/to/file-containing-the-source-code.sh"
 > I have faced an inconsistency with :cmdline treatment in ob-shell.el. I 
 > expect same results in the following cases:
 > 
 > #+begin_src bash :cmdline 1 2 3
 >    printf "%s\n" "$1"
 > #+end_src
 > 
 > #+RESULTS:
 > : 1
 > 
 > #+begin_src bash :cmdline 1 2 3 :shebang #!/bin/bash
 >    printf "%s\n" "$1"
 > #+end_src
 > 
 > #+RESULTS:
 > : 1 2 3
 > 
 > Emacs-28, Org is the current git HEAD.

AFAIU, the inconsistency is due to how the characters following :cmdline are interpreted when the subprocess call is made.

Consider the following, when only :cmdline is used:

# Evaluates like:
#
#     bash -c "./sh-script-8GJzdG 1 2 3"
#
#+begin_src bash :cmdline 1 2 3
echo \"$1\"
#+end_src

#+RESULTS:
: 1

# Evaluates like:
#
#     bash -c "./sh-script-8GJzdG \"1 2\" 3"
#
#+begin_src bash :cmdline "1 2" 3
echo \"$1\"
#+end_src

#+RESULTS:
: 1 2

For :cmdline alone, the characters following :cmdline are passed as though each is quoted.  That is, separate arguments are delimited by one or more spaces.  The first example is equivalent to the following:

# Evaluates like:
#
#     bash -c "./sh-script-8GJzdG \"1\" \"2\" \"3\""
#
#+begin_src bash :cmdline 1 2 3
echo \"$1\"
#+end_src

#+RESULTS:
: 1

How would you expect :cmdline "1 2 3" to be evaluated?

#+begin_src bash :cmdline "1 2 3"
echo \"$1\"
#+end_src

My expectation would be that it evaluates like:

  bash -c "./sh-script-8GJzdG \"1 2 3\""

It turns out, however, that it's evaluated exactly like :cmdline 1 2 3, or :cmdline "1" "2" "3".  The result is "1".

To make the block evaluate as expected requires an extra set of parentheses:

# Evaluates like:
#
#     bash -c "./sh-script-8GJzdG \"1 2 3\""
#
#+begin_src bash :cmdline "\"1 2 3\""
echo \"$1\"
#+end_src

#+RESULTS:
: 1 2 3

This, however, appears to be separate from the reported issue[fn:1].

Now, consider :cmdline paired with :shebang, called with the same values as above.

# Evaluates like:
#
#     /tmp/babel-Xd6rGS/sh-script-61jvMa "1 2 3"
#
#+begin_src bash :cmdline 1 2 3 :shebang #!/usr/bin/env bash
echo \"$1\"
#+end_src

#+RESULTS:
: 1 2 3

# Evaluates like:
#
#     /tmp/babel-Xd6rGS/sh-script-61jvMa "\"1 2\" 3"
#
#+begin_src bash :cmdline "1 2" 3 :shebang #!/usr/bin/env bash
echo \"$1\"
#+end_src

#+RESULTS:
: 1 2" 3"

# Evaluates like:
#
#     /tmp/babel-Xd6rGS/sh-script-61jvMa "1 2 3"
#
#+begin_src bash :cmdline "1 2 3" :shebang #!/usr/bin/env bash
echo \"$1\"
#+end_src

#+RESULTS:
: 1 2 3

# Evaluates like:
#
#     /tmp/babel-Xd6rGS/sh-script-61jvMa "\"1 2 3\""
#
#+begin_src bash :cmdline "\"1 2 3\"" :shebang #!/usr/bin/env bash
echo \"$1\"
#+end_src

#+RESULTS:
: 1 2 3""

# Evaluates like:
#
#     /tmp/babel-Xd6rGS/sh-script-61jvMa "\"1\" \"2\" \"3\""
#
#+begin_src bash :cmdline "1" "2" "3" :shebang #!/usr/bin/env bash
echo \"$1\"
#+end_src

#+RESULTS:
: 1" "2" "3""

An immediate observation is that the output results don't format correctly.  If you change the results type to "raw", however, you'll see that the Org results match those from a terminal, like xfce4-terminal.  The fact that raw output matches output from the terminal means that the formatting issue is (also) separate from the bug we're trying to fix.  That is, the bug we're trying to fix occurs in how the subprocess call is made, not in how the result is formatted.

In ob-shell, the subprocess call is made with 'process-file'.  Arguments are determined casewise:

1. shebang+cmdline
2. cmdline

The characters following :cmdline are received by the 'cmdline' argument to 'org-babel-sh-evaluate' as a string.  Both cases put this string into a list for the ARGS of 'process-file':

| header               | 'org-babel-sh-evaluate' | process-file ARGS     |
|                      | cmdline variable value  | shebang+cmdline       |
|----------------------+-------------------------+-----------------------|
| :cmdline 1 2 3       | "1 2 3"                 | ("1 2 3")             |
| :cmdline "1 2" 3"    | "\"1 2\" 3"             | ("\"1 2\" 3")         |
| :cmdline "1" "2" "3" | "\"1\" \"2\" \"3\""     | ("\"1\" \"2\" \"3\"") |

| header               | 'org-babel-sh-evaluate' | process-file ARGS     |
|                      | cmdline variable value  | cmdline               |
|----------------------+-------------------------+-----------------------|
| :cmdline 1 2 3       | "1 2 3"                 | ("1 2 3")             |
| :cmdline "1 2" 3"    | "\"1 2\" 3"             | ("\"1 2\" 3")         |
| :cmdline "1" "2" "3" | "\"1\" \"2\" \"3\""     | ("\"1\" \"2\" \"3\"") |

Notice that the ARGS passed to 'process-file' are the same for both cases.  The problem is that the "block equivalent shell calls" are *not* the same.  If we arrange the equivalent shell calls from the blocks given above into a table, we see that the forms are different:

| header               | cmdline variable value | shebang+cmdline call                                   |
|----------------------+------------------------+--------------------------------------------------------|
| :cmdline 1 2 3       | "1 2 3"                | /tmp/babel-Xd6rGS/sh-script-61jvMa "1 2 3"             |
| :cmdline "1 2" 3"    | "\"1 2\" 3"            | /tmp/babel-Xd6rGS/sh-script-61jvMa "\"1 2\" 3"         |
| :cmdline "1" "2" "3" | "\"1\" \"2\" \"3\""    | /tmp/babel-Xd6rGS/sh-script-61jvMa "\"1\" \"2\" \"3\"" |

| header               | cmdline variable value | cmdline call                                   |
|----------------------+------------------------+------------------------------------------------|
| :cmdline 1 2 3       | "1 2 3"                | bash -c "./sh-script-8GJzdG 1 2 3"             |
| :cmdline "1 2" 3"    | "\"1 2\" 3"            | bash -c "./sh-script-8GJzdG \"1 2\" 3"         |
| :cmdline "1" "2" "3" | "\"1\" \"2\" \"3\""    | bash -c "./sh-script-8GJzdG \"1\" \"2\" \"3\"" |

The reported bug exists because shebang+cmdline interprets the characters following :cmdline as a *single* string.  Without :shebang, a lone :cmdline interprets them as space delimited.

One possible solution is to reformat the 'process-file' ARGS for the shebang+cmdline case so that characters following :cmdline are interpreted as space delimited.  This is possible using 'split-string-and-unquote':

(split-string-and-unquote "1 2 3")             -> ("1" "2" "3")
(split-string-and-unquote "\"1 2\" 3")         -> ("1 2" "3")
(split-string-and-unquote "\"1\" \"2\" \"3\"") -> ("1" "2" "3")

Whether this is a solution, in part, depends on the perennial problem of shell blocks: knowing what's wrong means knowing what's right.

The proposed solution assumes we intend to parse the characters following :cmdline as space delimited and grouped by quotes.  However, AFAICT, the parsing issue makes this solution ambiguous.

Thoughts?

--
Matt Trzcinski
Emacs Org contributor (ob-shell)
Learn more about Org mode at https://orgmode.org
Support Org development at https://liberapay.com/org-mode

[fn:1] AFAICT, it's due to how headers are parsed by 'org-babel-parse-header-arguments' using 'org-babel-read'.  The cell "\"1 2 3\"" (corresponding to :cmdline "1 2 3") is reduced through 'string-match' to "1 2 3".  The cell "1 2 3" (corresponding to :cmdline 1 2 3), on the other hand, passes through.  The result is that :cmdline "1 2 3" and :cmdline 1 2 3 become indistinguishable.  I mention this because it's easy to get confused by this issue which, AFAICT, is independent of the one we're trying to fix.  The reported issue appears only to be related to how the result of :cmdline header parsing is passed to the subprocess.

[-- Attachment #2: 0001-testing-lisp-test-ob-shell.el-Test-shebang-cmdline-r.patch --]
[-- Type: application/octet-stream, Size: 2070 bytes --]

From acf27407c43ba48be21b8f75a1c791d2c45db57a Mon Sep 17 00:00:00 2001
From: Matthew Trzcinski <matt@excalamus.com>
Date: Sun, 21 Apr 2024 16:02:40 +0200
Subject: [PATCH 1/3] testing/lisp/test-ob-shell.el: Test shebang/cmdline
 results

* test-ob-shell.el
(test-cmdline-alone-and-with-shebang-have-same-result): Test that
results when using the cmdline header alone are the same as when
paired with the shebang header.
---
 testing/lisp/test-ob-shell.el | 37 +++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/testing/lisp/test-ob-shell.el b/testing/lisp/test-ob-shell.el
index 8cebd8467..b0c15b07d 100644
--- a/testing/lisp/test-ob-shell.el
+++ b/testing/lisp/test-ob-shell.el
@@ -362,6 +362,43 @@ echo 3
       "- 1\n- 2\n- 3\n"
       (buffer-substring-no-properties (point) (point-max))))))
 
+(ert-deftest test-cmdline-alone-and-with-shebang-have-same-result ()
+  "Pass arguments to a block.  Don't use shebang.  Then use
+shebang set to the same language as the block.  The result should
+be the same."
+  (should (equal
+           (org-test-with-temp-text
+               "#+begin_src bash :cmdline 1 2 3
+echo \"$1\"
+<point>
+#+end_src"
+             (org-babel-execute-src-block))
+           (org-test-with-temp-text
+               "#+begin_src bash :cmdline 1 2 3 :shebang #!/usr/bin/env bash
+echo \"$1\"
+<point>
+#+end_src"
+             (org-babel-execute-src-block)))))
+
+(ert-deftest test-cmdline-alone-and-with-shebang-have-same-result-2 ()
+  "Pass arguments to a block.  Don't use shebang.  Then use
+shebang set to the same language as the block.  The result should
+be the same."
+  (should (equal
+           (org-test-with-temp-text
+               "#+begin_src bash :cmdline \"1 2\" 3
+echo \"$1\"
+<point>
+#+end_src"
+             (org-babel-execute-src-block))
+           (org-test-with-temp-text
+               "#+begin_src bash :cmdline \"1 2\" 3 :shebang #!/usr/bin/env bash
+echo \"$1\"
+<point>
+#+end_src"
+             (org-babel-execute-src-block)))))
+
+
 \f
 ;;; Standard output
 
-- 
2.41.0


[-- Attachment #3: 0002-lisp-ob-shell.el-Add-comments-to-apply-call.patch --]
[-- Type: application/octet-stream, Size: 1664 bytes --]

From e3e71a3478666af4a4a9aa92402cd438aec52b46 Mon Sep 17 00:00:00 2001
From: Matthew Trzcinski <matt@excalamus.com>
Date: Sun, 21 Apr 2024 16:16:59 +0200
Subject: [PATCH 2/3] lisp/ob-shell.el: Add comments to `apply' call

* lisp/ob-shell.el (org-babel-sh-evaluate): Comment the arguments to
`apply'.  These correspond to the arguments of the function it calls.
Eldoc is only able to document the parameters to `apply' and not the
function.
---
 lisp/ob-shell.el | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/lisp/ob-shell.el b/lisp/ob-shell.el
index 35d9e9376..307360e3c 100644
--- a/lisp/ob-shell.el
+++ b/lisp/ob-shell.el
@@ -322,11 +322,11 @@ return the value of the last statement in BODY."
 	      (with-temp-buffer
                 (with-connection-local-variables
                  (apply #'process-file
-                        (if shebang (file-local-name script-file)
-                          shell-file-name)
-		        stdin-file
-                        (current-buffer)
-                        nil
+                        (if shebang (file-local-name script-file) shell-file-name) ; PROGRAM
+		        stdin-file                                                 ; INFILE
+                        (current-buffer)                                           ; BUFFER
+                        nil                                                        ; DISPLAY
+                        ;; ARGS
                         (if shebang (when cmdline (list cmdline))
                           (list shell-command-switch
                                 (concat (file-local-name script-file)  " " cmdline)))))
-- 
2.41.0


[-- Attachment #4: 0003-lisp-ob-shell.el-Fix-cmdline-shebang-inconsistencies.patch --]
[-- Type: application/octet-stream, Size: 1590 bytes --]

From 39e2f5ea53f8f6d86b1e4e07725a9a25ac7326ce Mon Sep 17 00:00:00 2001
From: Matthew Trzcinski <matt@excalamus.com>
Date: Sun, 21 Apr 2024 16:27:25 +0200
Subject: [PATCH 3/3] lisp/ob-shell.el: Fix :cmdline/:shebang inconsistencies

* ob-shell.el (org-babel-sh-evaluate): The reported bug exists because
shebang+cmdline interprets the characters following :cmdline as a
single string (single argument).  Without :shebang, a lone :cmdline
considers the characters as being space delimited (separate
arguments).  This change updates the shebang+cmdline condition so that
results are the same as a lone :cmdline header.

Reported-by: "Max Nikulin" <manikulin@gmail.com>
Link: https://list.orgmode.org/orgmode/ujamo1$5t9$1@ciao.gmane.io/
---
 lisp/ob-shell.el | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lisp/ob-shell.el b/lisp/ob-shell.el
index 307360e3c..d379acdc6 100644
--- a/lisp/ob-shell.el
+++ b/lisp/ob-shell.el
@@ -327,7 +327,8 @@ return the value of the last statement in BODY."
                         (current-buffer)                                           ; BUFFER
                         nil                                                        ; DISPLAY
                         ;; ARGS
-                        (if shebang (when cmdline (list cmdline))
+                        (if shebang
+                            (when cmdline (split-string-and-unquote cmdline))
                           (list shell-command-switch
                                 (concat (file-local-name script-file)  " " cmdline)))))
 		(buffer-string))))
-- 
2.41.0


  parent reply	other threads:[~2024-04-21 15:10 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-18 15:54 [BUG] ob-shell: :shebang changes interpretation of :cmdline Max Nikulin
2023-11-18 18:09 ` Matt
2023-12-04 13:58   ` Ihor Radchenko
2023-12-04 20:41     ` Matt
2023-11-18 18:20 ` [BUG] ob-shell: :cmdline fails with single argument (was Re: [BUG] ob-shell: :shebang changes interpretation of :cmdline) Matt
2023-11-19  6:57   ` Max Nikulin
2023-11-19  7:57     ` Matt
2024-04-21 15:09 ` Matt [this message]
2024-04-23 10:28   ` [PATCH] Re: [BUG] ob-shell: :shebang changes interpretation of :cmdline Ihor Radchenko
2024-04-24 10:33     ` Max Nikulin
2024-04-24 12:52       ` Ihor Radchenko
2024-04-25 10:06         ` Max Nikulin
2024-04-26 11:49         ` Ihor Radchenko
2024-04-27 10:31           ` Max Nikulin
2024-04-27 13:37             ` Max Nikulin
2024-04-28 12:34             ` Ihor Radchenko
2024-04-23 10:51   ` Max Nikulin
2024-04-23 17:08     ` Max Nikulin
2024-04-26 13:09     ` [DISCUSSION] The meaning of :cmdline header argument across babel backends (was: [PATCH] Re: [BUG] ob-shell: :shebang changes interpretation of :cmdline) Ihor Radchenko
2024-04-27 10:53       ` [DISCUSSION] The meaning of :cmdline header argument across babel backends Max Nikulin
2024-04-29 13:33         ` Ihor Radchenko
2024-05-01 17:48           ` Matt
2024-05-01 18:01             ` Ihor Radchenko
2024-05-02 18:50               ` Matt
2024-05-03 12:12                 ` Ihor Radchenko
2024-04-26 13:12     ` [PATCH] Re: [BUG] ob-shell: :shebang changes interpretation of :cmdline Ihor Radchenko
2024-04-27  7:43       ` Matt
2024-04-27  7:48         ` Ihor Radchenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=18f01342a2f.124ad27612732529.8693431365849276517@excalamus.com \
    --to=matt@excalamus.com \
    --cc=emacs-orgmode@gnu.org \
    --cc=manikulin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).