* greedy substitution in org-open-file
@ 2021-01-20 16:08 Maxim Nikulin
2021-02-12 7:16 ` Kyle Meyer
0 siblings, 1 reply; 7+ messages in thread
From: Maxim Nikulin @ 2021-01-20 16:08 UTC (permalink / raw)
To: emacs-orgmode
Looking into the code related to 'pty problem with
start-process-shell-command, I have realized that the following case is
not handled correctly:
#+begin_src elisp
(setq org-file-apps '(("\\.pdf::\\([0-9]+\\)\\'" . "xpdf %s %1")))
#+end_src
I hope, I adapted an example from [[help:org-file-apps]] correctly. When
I try to open the following link using C-c C-o
[[file:test-%1f.pdf::42]]
I get
Running xpdf /home/ubuntu/examples/org/test-\42f.pdf 42...done
I believe, it should be
Running xpdf /home/ubuntu/examples/org/test-%1f.pdf 42...done
Or
Running xpdf /home/ubuntu/examples/org/test-\%1f.pdf 42...done
Org mode version 9.4.4 (release_9.4.4-164-g7a9a8a
The source of the problem is that %s substitution is expanded at first
and regexp groups are replaced later. Ideally, it should be performed in
a single pass. I have found format-spec function but I am unsure
concerning it, maybe it is avoided intentionally.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: greedy substitution in org-open-file
2021-01-20 16:08 greedy substitution in org-open-file Maxim Nikulin
@ 2021-02-12 7:16 ` Kyle Meyer
2021-02-12 16:46 ` Maxim Nikulin
0 siblings, 1 reply; 7+ messages in thread
From: Kyle Meyer @ 2021-02-12 7:16 UTC (permalink / raw)
To: Maxim Nikulin; +Cc: emacs-orgmode
Maxim Nikulin writes:
> Looking into the code related to 'pty problem with
> start-process-shell-command, I have realized that the following case is
> not handled correctly:
>
> #+begin_src elisp
> (setq org-file-apps '(("\\.pdf::\\([0-9]+\\)\\'" . "xpdf %s %1")))
> #+end_src
Not relevant for the underlying issue, but doesn't xpdf require a colon
before the page number (i.e. ":%1")?
> I hope, I adapted an example from [[help:org-file-apps]] correctly. When
> I try to open the following link using C-c C-o
>
> [[file:test-%1f.pdf::42]]
>
> I get
>
> Running xpdf /home/ubuntu/examples/org/test-\42f.pdf 42...done
>
> I believe, it should be
>
> Running xpdf /home/ubuntu/examples/org/test-%1f.pdf 42...done
>
> Or
>
> Running xpdf /home/ubuntu/examples/org/test-\%1f.pdf 42...done
Yep, it should show up as "Running xpdf /tmp/test-\%1f.pdf :42...done",
with the backslash coming from shell-quote-argument.
> The source of the problem is that %s substitution is expanded at first
> and regexp groups are replaced later. Ideally, it should be performed in
> a single pass. I have found format-spec function but I am unsure
> concerning it, maybe it is avoided intentionally.
I believe format-spec requires the placeholder to be A-z:
(format-spec "xpdf %s" '((?s . "a"))) => "xpdf a"
(format-spec "xpdf %s %1" '((?s . "a") (?1 . "b"))) ;; Invalid format string
What about flipping the processing, handling the %N placeholders first
and then formatting the file name? Seems to work on my end, though I
haven't tested it thoroughly.
diff --git a/lisp/org.el b/lisp/org.el
index c61b8fb56..31dbe352a 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -8714,12 +8714,6 @@ (defun org-open-file (path &optional in-emacs line search)
;; Remove quotes around the file name - we'll use shell-quote-argument.
(while (string-match "['\"]%s['\"]" cmd)
(setq cmd (replace-match "%s" t t cmd)))
- (setq cmd (replace-regexp-in-string
- "%s"
- (shell-quote-argument (convert-standard-filename file))
- cmd
- nil t))
-
;; Replace "%1", "%2" etc. in command with group matches from regex
(save-match-data
(let ((match-index 1)
@@ -8731,7 +8725,11 @@ (defun org-open-file (path &optional in-emacs line search)
(while (string-match regex cmd)
(setq cmd (replace-match replace-with t t cmd))))
(setq match-index (+ match-index 1)))))
-
+ (setq cmd (replace-regexp-in-string
+ "%s"
+ (shell-quote-argument (convert-standard-filename file))
+ cmd
+ nil t))
(save-window-excursion
(message "Running %s...done" cmd)
(start-process-shell-command cmd nil cmd)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: greedy substitution in org-open-file
2021-02-12 7:16 ` Kyle Meyer
@ 2021-02-12 16:46 ` Maxim Nikulin
2021-02-13 4:38 ` Kyle Meyer
0 siblings, 1 reply; 7+ messages in thread
From: Maxim Nikulin @ 2021-02-12 16:46 UTC (permalink / raw)
To: emacs-orgmode
On 12/02/2021 14:16, Kyle Meyer wrote:
>> #+begin_src elisp
>> (setq org-file-apps '(("\\.pdf::\\([0-9]+\\)\\'" . "xpdf %s %1")))
>> #+end_src
>
> Not relevant for the underlying issue, but doesn't xpdf require a colon
> before the page number (i.e. ":%1")?
At least for the application in debian & ubuntu xpdf package, page
number should be specified without a colon. It is Xt interface to
poppler PDF library, recently its maintainer decided to switch to
xpopple project as upstream. UI is derived from old version of xpdf.
Latest original xpdf version is based on Qt and might have different
convention in respect to page numbers.
> I believe format-spec requires the placeholder to be A-z:
>
> (format-spec "xpdf %s" '((?s . "a"))) => "xpdf a"
> (format-spec "xpdf %s %1" '((?s . "a") (?1 . "b"))) ;; Invalid format string
You are right. I missed that format-spec allows to specify field width,
so digits could not be used.
> What about flipping the processing, handling the %N placeholders first
> and then formatting the file name? Seems to work on my end, though I
> haven't tested it thoroughly.
I could anticipate similar problems if named destinations are involved.
I have not checked but I expect that internal links might have "%s" in
their names at least for some file types. That is why I would strongly
prefer substitutions performed in a single pass. I do not like it, but
it seems that simplified variant of format-spec is better. It should
allows substitutions with digit. I hope, single digit should be enough.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: greedy substitution in org-open-file
2021-02-12 16:46 ` Maxim Nikulin
@ 2021-02-13 4:38 ` Kyle Meyer
2021-02-15 17:04 ` Maxim Nikulin
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Kyle Meyer @ 2021-02-13 4:38 UTC (permalink / raw)
To: Maxim Nikulin; +Cc: emacs-orgmode
Maxim Nikulin writes:
> On 12/02/2021 14:16, Kyle Meyer wrote:
>> Not relevant for the underlying issue, but doesn't xpdf require a colon
>> before the page number (i.e. ":%1")?
>
> At least for the application in debian & ubuntu xpdf package, page
> number should be specified without a colon. It is Xt interface to
> poppler PDF library, recently its maintainer decided to switch to
> xpopple project as upstream. UI is derived from old version of xpdf.
> Latest original xpdf version is based on Qt and might have different
> convention in respect to page numbers.
Okay. Fwiw the xpdf version I have that requires ":" before the page is
4.03.
>> What about flipping the processing, handling the %N placeholders first
>> and then formatting the file name? Seems to work on my end, though I
>> haven't tested it thoroughly.
>
> I could anticipate similar problems if named destinations are involved.
> I have not checked but I expect that internal links might have "%s" in
> their names at least for some file types.
Indeed, flipping the order unsurprisingly just flips which placeholders
can be problematic. A very contrived example:
(setq org-file-apps
'(("\\.pdf::\\([A-z%]+\\)\\'" . "doesntmatter %s %1")))
;; file:/tmp/test.pdf::a%sb =>
;; Running doesntmatter /tmp/test.pdf a/tmp/test.pdfb...done
> That is why I would strongly prefer substitutions performed in a
> single pass. I do not like it, but it seems that simplified variant of
> format-spec is better. It should allows substitutions with digit. I
> hope, single digit should be enough.
True, realistically I don't think anyone has a command in org-file-apps
that relies on more than a couple of capture groups. All right, here's
a format-spec-inspired fix. At the very least it needs doc updates and
a comment or two.
diff --git a/lisp/org.el b/lisp/org.el
index 5b1443c4e..e8f60fd83 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -8644,6 +8644,23 @@ (defun org--file-apps-regexp-alist (list &optional add-auto-mode)
(when add-auto-mode
(mapcar (lambda (x) (cons (car x) 'emacs)) auto-mode-alist))))
+(defun org--open-file-format-spec (format specification)
+ (with-temp-buffer
+ (insert format)
+ (goto-char (point-min))
+ (while (search-forward "%" nil t)
+ (cond ((eq (char-after) ?%)
+ (delete-char 1))
+ ((looking-at "[s0-9]")
+ (replace-match
+ (or (cdr (assoc (match-string 0) specification))
+ (error "Invalid format string"))
+ 'fixed-case 'literal)
+ (delete-region (1- (match-beginning 0)) (match-beginning 0)))
+ (t
+ (error "Invalid format string"))))
+ (buffer-string)))
+
;;;###autoload
(defun org-open-file (path &optional in-emacs line search)
"Open the file at PATH.
@@ -8745,24 +8762,20 @@ (defun org-open-file (path &optional in-emacs line search)
;; Remove quotes around the file name - we'll use shell-quote-argument.
(while (string-match "['\"]%s['\"]" cmd)
(setq cmd (replace-match "%s" t t cmd)))
- (setq cmd (replace-regexp-in-string
- "%s"
- (shell-quote-argument (convert-standard-filename file))
- cmd
- nil t))
-
- ;; Replace "%1", "%2" etc. in command with group matches from regex
- (save-match-data
- (let ((match-index 1)
- (number-of-groups (- (/ (length link-match-data) 2) 1)))
- (set-match-data link-match-data)
- (while (<= match-index number-of-groups)
- (let ((regex (concat "%" (number-to-string match-index)))
- (replace-with (match-string match-index dlink)))
- (while (string-match regex cmd)
- (setq cmd (replace-match replace-with t t cmd))))
- (setq match-index (+ match-index 1)))))
-
+ (setq cmd
+ (org--open-file-format-spec
+ cmd
+ (cons
+ (cons "s" (shell-quote-argument
+ (convert-standard-filename file)))
+ (let ((ngroups (- (/ (length link-match-data) 2) 1)))
+ (and (> ngroups 0)
+ (progn
+ (set-match-data link-match-data)
+ (mapcar (lambda (n)
+ (cons (number-to-string n)
+ (match-string-no-properties n dlink)))
+ (number-sequence 1 ngroups))))))))
(save-window-excursion
(message "Running %s...done" cmd)
(start-process-shell-command cmd nil cmd)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: greedy substitution in org-open-file
2021-02-13 4:38 ` Kyle Meyer
@ 2021-02-15 17:04 ` Maxim Nikulin
2021-03-03 12:47 ` Maxim Nikulin
2021-03-21 12:36 ` Maxim Nikulin
2 siblings, 0 replies; 7+ messages in thread
From: Maxim Nikulin @ 2021-02-15 17:04 UTC (permalink / raw)
To: emacs-orgmode
On 13/02/2021 11:38, Kyle Meyer wrote:
>
> All right, here's
> a format-spec-inspired fix. At the very least it needs doc updates and
> a comment or two.
Thank you. I am hardly familiar with elisp so it would be difficult for
me to express the same. My comments are mostly a matter of taste.
Sorry, I have not tried to run the code yet.
> diff --git a/lisp/org.el b/lisp/org.el
> index 5b1443c4e..e8f60fd83 100644
> --- a/lisp/org.el
> +++ b/lisp/org.el
> @@ -8644,6 +8644,23 @@ (defun org--file-apps-regexp-alist (list &optional add-auto-mode)
> (when add-auto-mode
> (mapcar (lambda (x) (cons (car x) 'emacs)) auto-mode-alist))))
>
> +(defun org--open-file-format-spec (format specification)
> + (with-temp-buffer
> + (insert format)
> + (goto-char (point-min))
> + (while (search-forward "%" nil t)
> + (cond ((eq (char-after) ?%)
> + (delete-char 1))
> + ((looking-at "[s0-9]")
Personally I do not see a reason to limit substitutions just to just
"%s". I would consider "[a-zA-Z0-9]".
> + (replace-match
> + (or (cdr (assoc (match-string 0) specification))
> + (error "Invalid format string"))
I think, substitution character in the error message could be useful
during debugging, something like (format "Invalid format character %%%s"
(match-string 0)).
> + 'fixed-case 'literal)
> + (delete-region (1- (match-beginning 0)) (match-beginning 0)))
> + (t
> + (error "Invalid format string"))))
> + (buffer-string)))
> +
> ;;;###autoload
> (defun org-open-file (path &optional in-emacs line search)
> "Open the file at PATH.
> @@ -8745,24 +8762,20 @@ (defun org-open-file (path &optional in-emacs line search)
> ;; Remove quotes around the file name - we'll use shell-quote-argument.
> (while (string-match "['\"]%s['\"]" cmd)
> (setq cmd (replace-match "%s" t t cmd)))
> - (setq cmd (replace-regexp-in-string
> - "%s"
> - (shell-quote-argument (convert-standard-filename file))
> - cmd
> - nil t))
> -
> - ;; Replace "%1", "%2" etc. in command with group matches from regex
> - (save-match-data
> - (let ((match-index 1)
> - (number-of-groups (- (/ (length link-match-data) 2) 1)))
> - (set-match-data link-match-data)
> - (while (<= match-index number-of-groups)
> - (let ((regex (concat "%" (number-to-string match-index)))
> - (replace-with (match-string match-index dlink)))
> - (while (string-match regex cmd)
> - (setq cmd (replace-match replace-with t t cmd))))
> - (setq match-index (+ match-index 1)))))
> -
> + (setq cmd
> + (org--open-file-format-spec
> + cmd
> + (cons
> + (cons "s" (shell-quote-argument
> + (convert-standard-filename file)))
> + (let ((ngroups (- (/ (length link-match-data) 2) 1)))
> + (and (> ngroups 0)
> + (progn
> + (set-match-data link-match-data)
> + (mapcar (lambda (n)
> + (cons (number-to-string n)
> + (match-string-no-properties n dlink)))
Should not be numbered substitutions passed through shell-quote-argument
as well? Besides just page numbers the link could be named internal
anchor where more characters are allowed. It is the reason why in
general I prefer bare exec to shell commands.
I am unsure concerning optional matches as
"\\(?:\\.pdf\\)\\(?:::\\([0-9]+\\)\\)?\\'"
Maybe they should not be used at all in favor of separate entries with
and without page number. Maybe nil should coalesce to empty string "".
Certainly I am against "nil" string for a missed group. By the way, in
the original format-spec, cdr is applied after the check whether assoc
value is nil.
> + (number-sequence 1 ngroups))))))))
> (save-window-excursion
> (message "Running %s...done" cmd)
> (start-process-shell-command cmd nil cmd)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: greedy substitution in org-open-file
2021-02-13 4:38 ` Kyle Meyer
2021-02-15 17:04 ` Maxim Nikulin
@ 2021-03-03 12:47 ` Maxim Nikulin
2021-03-21 12:36 ` Maxim Nikulin
2 siblings, 0 replies; 7+ messages in thread
From: Maxim Nikulin @ 2021-03-03 12:47 UTC (permalink / raw)
To: emacs-orgmode
Discussion of the original patch:
https://orgmode.org/list/4B51D104.9090502@jboecker.de/T/#u
https://lists.gnu.org/archive/html/emacs-orgmode/2010-01/msg00450.html
https://orgmode.org/list/4BB9C078.9050202@jboecker.de/
> The patch may make it a little easier to break things by
> misconfiguring org-file-apps.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: greedy substitution in org-open-file
2021-02-13 4:38 ` Kyle Meyer
2021-02-15 17:04 ` Maxim Nikulin
2021-03-03 12:47 ` Maxim Nikulin
@ 2021-03-21 12:36 ` Maxim Nikulin
2 siblings, 0 replies; 7+ messages in thread
From: Maxim Nikulin @ 2021-03-21 12:36 UTC (permalink / raw)
To: emacs-orgmode
[-- Attachment #1: Type: text/plain, Size: 3465 bytes --]
On 13/02/2021 11:38, Kyle Meyer wrote:
>
> +(defun org--open-file-format-spec (format specification)
> + (with-temp-buffer
> + (insert format)
> + (goto-char (point-min))
> + (while (search-forward "%" nil t)
> + (cond ((eq (char-after) ?%)
> + (delete-char 1))
> + ((looking-at "[s0-9]")
> + (replace-match
> + (or (cdr (assoc (match-string 0) specification))
> + (error "Invalid format string"))
> + 'fixed-case 'literal)
> + (delete-region (1- (match-beginning 0)) (match-beginning 0)))
Finally I managed to convince myself that delete-region does not change
position in the buffer, so "%s" or "%1" in specification are not a
problem. I am aware that this implementation is a simplified version of
format-spec, but I am still unsure if jumping over a buffer is the best
choice.
I am in doubts if strings or characters should be used in the spec:
'(("s" . "file.pdf")) vs. '((?s . "file.pdf")), but really it does not
matter.
I have created some tests for this function, see the attachment.
Actually I do not like such style of tests since first failure stops
whole test and it is hard to get general impression to which degree the
function under the test is broken, but I am not aware of a better way.
Recently I asked Ihor a similar question:
https://orgmode.org/list/s324b0$74g$1@ciao.gmane.io
I know that functions called from one point are not in favor in org
sources, but I do not mind to have additional helper function to add
tests that all substitutions are properly escaped.
> + (let ((ngroups (- (/ (length link-match-data) 2) 1)))
> + (and (> ngroups 0)
> + (progn
> + (set-match-data link-match-data)
> + (mapcar (lambda (n)
> + (cons (number-to-string n)
> + (match-string-no-properties n dlink)))
> + (number-sequence 1 ngroups))))))))
Matter of taste: it seems that with (number-sequence 1 ngroups 1)) it is
possible to avoid (> ngroups 0).
I have spent some time evaluating how to make errors more helpful to
users. I am unsure if multiline message is acceptable to dump content of
specification. For a while, a place in the format where error has
happened (combined with a different approach to parse format string)
(defun org--open-file-format-spec (format specification)
(apply #'concat
(nreverse
(let ((result nil)
(token-end 0))
(while (string-match "%\\(.?\\)" format token-end)
(let ((token-start (match-beginning 0))
(subst (match-string-no-properties 1 format)))
(push (substring format token-end token-start) result)
(push (if (equal subst "%")
"%"
(or (cdr (assoc subst specification))
(error "Unknown substitution: '%s<ERROR ->%s'"
(substring format 0 token-start)
(substring format token-start nil))))
result))
(setq token-end (match-end 0)))
(push (substring format token-end nil) result)))))
To my surprise neither ^ nor \\` in string-match regexp works if
start-pos is not zero.
[-- Attachment #2: test-org--open-file-format-spec.patch --]
[-- Type: text/x-patch, Size: 3006 bytes --]
diff --git a/testing/lisp/test-org.el b/testing/lisp/test-org.el
index 78cd29576..b6e42dc99 100644
--- a/testing/lisp/test-org.el
+++ b/testing/lisp/test-org.el
@@ -8236,6 +8236,72 @@ two
(call-interactively #'org-paste-subtree)
(buffer-string)))))
+(ert-deftest org-test/org--open-file-format-spec ()
+ "Test `org-open-file' helper `org--open-file-format-spec'."
+ (let ((def-spec '(("s" . "file.pdf") ("1" . "10") ("2" . "pattern"))))
+ (should (equal "zathura --page 10 --find pattern file.pdf"
+ (org--open-file-format-spec
+ "zathura --page %1 --find %2 %s" def-spec)))
+ (should (equal "no subst"
+ (org--open-file-format-spec
+ "no subst" def-spec)))
+ (should (equal "simple file.pdf"
+ (org--open-file-format-spec
+ "simple %s" def-spec)))
+ (should (equal "with --page 10 file.pdf"
+ (org--open-file-format-spec
+ "with --page %1 %s" def-spec)))
+ (should (equal "file.pdf at start"
+ (org--open-file-format-spec
+ "%s at start" def-spec)))
+ (should (equal "in the file.pdf middle"
+ (org--open-file-format-spec
+ "in the %s middle" def-spec)))
+ (should (equal "literal %"
+ (org--open-file-format-spec
+ "literal %%" def-spec)))
+ (should (equal "literal %s in the middle"
+ (org--open-file-format-spec
+ "literal %%s in the middle" def-spec)))
+ (should (equal "% literal at start"
+ (org--open-file-format-spec
+ "%% literal at start" def-spec)))
+ (should (equal "" (org--open-file-format-spec "" def-spec)))
+ (should (equal "adjucent 10file.pdf substitutions"
+ (org--open-file-format-spec
+ "adjucent %1%s substitutions" def-spec))))
+
+ (should (equal "many -f first -s second -t third -e eigth file.pdf"
+ (org--open-file-format-spec
+ "many -f %1 -s %2 -t %3 -e %8 %s"
+ '(("1" . "first") ("2" . "second") ("3" . "third")
+ ("4" . "fourth") ("5" . "firth") ("6" . "sixth")
+ ("7" . "seventh") ("8" . "eigth") ("s" . "file.pdf")))))
+
+ ;; I am afraid to add recursive substitutions like ("%s" . "recursive %s").
+ (should (equal "no-recursion-with file-%1.pdf --page 10"
+ (org--open-file-format-spec
+ "no-recursion-with %s --page %1"
+ '(("s" . "file-%1.pdf") ("1" . "10")))))
+ (should (equal "no-recursion-with --search printf-%s file.pdf"
+ (org--open-file-format-spec
+ "no-recursion-with --search %1 %s"
+ '(("s" . "file.pdf") ("1" . "printf-%s")))))
+
+ (let* ((err
+ (should-error (org--open-file-format-spec "invalid-end %" ())
+ :type 'error))
+ (text (cadr err)))
+ (should (and (stringp text)
+ (string-match-p "’invalid-end <ERROR ->%’" text))))
+ (let* ((err
+ (should-error (org--open-file-format-spec "missed --subst %1 %s"
+ '(("s" . "file.pdf")))
+ :type 'error))
+ (text (cadr err)))
+ (should (and (stringp text)
+ (string-match-p "’missed --subst <ERROR ->%1 %s’" text)))))
+
(provide 'test-org)
;;; test-org.el ends here
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-03-21 12:37 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-20 16:08 greedy substitution in org-open-file Maxim Nikulin
2021-02-12 7:16 ` Kyle Meyer
2021-02-12 16:46 ` Maxim Nikulin
2021-02-13 4:38 ` Kyle Meyer
2021-02-15 17:04 ` Maxim Nikulin
2021-03-03 12:47 ` Maxim Nikulin
2021-03-21 12:36 ` Maxim Nikulin
Code repositories for project(s) associated with this inbox:
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).