emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* greedy substitution in org-open-file
@ 2021-01-20 16:08 Maxim Nikulin
  2021-02-12  7:16 ` Kyle Meyer
  0 siblings, 1 reply; 13+ messages in thread
From: Maxim Nikulin @ 2021-01-20 16:08 UTC (permalink / raw)
  To: emacs-orgmode

Looking into the code related to 'pty problem with 
start-process-shell-command, I have realized that the following case is 
not handled correctly:

#+begin_src elisp
   (setq org-file-apps '(("\\.pdf::\\([0-9]+\\)\\'" . "xpdf %s %1")))
#+end_src

I hope, I adapted an example from [[help:org-file-apps]] correctly. When 
I try to open the following link using C-c C-o

[[file:test-%1f.pdf::42]]

I get

Running xpdf /home/ubuntu/examples/org/test-\42f.pdf 42...done

I believe, it should be

Running xpdf /home/ubuntu/examples/org/test-%1f.pdf 42...done

Or

Running xpdf /home/ubuntu/examples/org/test-\%1f.pdf 42...done

Org mode version 9.4.4 (release_9.4.4-164-g7a9a8a

The source of the problem is that %s substitution is expanded at first 
and regexp groups are replaced later. Ideally, it should be performed in 
a single pass. I have found format-spec function but I am unsure 
concerning it, maybe it is avoided intentionally.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: greedy substitution in org-open-file
  2021-01-20 16:08 greedy substitution in org-open-file Maxim Nikulin
@ 2021-02-12  7:16 ` Kyle Meyer
  2021-02-12 16:46   ` Maxim Nikulin
  0 siblings, 1 reply; 13+ messages in thread
From: Kyle Meyer @ 2021-02-12  7:16 UTC (permalink / raw)
  To: Maxim Nikulin; +Cc: emacs-orgmode

Maxim Nikulin writes:

> Looking into the code related to 'pty problem with 
> start-process-shell-command, I have realized that the following case is 
> not handled correctly:
>
> #+begin_src elisp
>    (setq org-file-apps '(("\\.pdf::\\([0-9]+\\)\\'" . "xpdf %s %1")))
> #+end_src

Not relevant for the underlying issue, but doesn't xpdf require a colon
before the page number (i.e. ":%1")?

> I hope, I adapted an example from [[help:org-file-apps]] correctly. When 
> I try to open the following link using C-c C-o
>
> [[file:test-%1f.pdf::42]]
>
> I get
>
> Running xpdf /home/ubuntu/examples/org/test-\42f.pdf 42...done
>
> I believe, it should be
>
> Running xpdf /home/ubuntu/examples/org/test-%1f.pdf 42...done
>
> Or
>
> Running xpdf /home/ubuntu/examples/org/test-\%1f.pdf 42...done

Yep, it should show up as "Running xpdf /tmp/test-\%1f.pdf :42...done",
with the backslash coming from shell-quote-argument.

> The source of the problem is that %s substitution is expanded at first 
> and regexp groups are replaced later. Ideally, it should be performed in 
> a single pass. I have found format-spec function but I am unsure 
> concerning it, maybe it is avoided intentionally.

I believe format-spec requires the placeholder to be A-z:

  (format-spec "xpdf %s" '((?s . "a")))  => "xpdf a"
  (format-spec "xpdf %s %1" '((?s . "a") (?1 . "b")))  ;; Invalid format string

What about flipping the processing, handling the %N placeholders first
and then formatting the file name?  Seems to work on my end, though I
haven't tested it thoroughly.

diff --git a/lisp/org.el b/lisp/org.el
index c61b8fb56..31dbe352a 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -8714,12 +8714,6 @@ (defun org-open-file (path &optional in-emacs line search)
       ;; Remove quotes around the file name - we'll use shell-quote-argument.
       (while (string-match "['\"]%s['\"]" cmd)
 	(setq cmd (replace-match "%s" t t cmd)))
-      (setq cmd (replace-regexp-in-string
-		 "%s"
-		 (shell-quote-argument (convert-standard-filename file))
-		 cmd
-		 nil t))
-
       ;; Replace "%1", "%2" etc. in command with group matches from regex
       (save-match-data
 	(let ((match-index 1)
@@ -8731,7 +8725,11 @@ (defun org-open-file (path &optional in-emacs line search)
 	      (while (string-match regex cmd)
 		(setq cmd (replace-match replace-with t t cmd))))
 	    (setq match-index (+ match-index 1)))))
-
+      (setq cmd (replace-regexp-in-string
+		 "%s"
+		 (shell-quote-argument (convert-standard-filename file))
+		 cmd
+		 nil t))
       (save-window-excursion
 	(message "Running %s...done" cmd)
 	(start-process-shell-command cmd nil cmd)



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: greedy substitution in org-open-file
  2021-02-12  7:16 ` Kyle Meyer
@ 2021-02-12 16:46   ` Maxim Nikulin
  2021-02-13  4:38     ` Kyle Meyer
  0 siblings, 1 reply; 13+ messages in thread
From: Maxim Nikulin @ 2021-02-12 16:46 UTC (permalink / raw)
  To: emacs-orgmode

On 12/02/2021 14:16, Kyle Meyer wrote:
>> #+begin_src elisp
>>     (setq org-file-apps '(("\\.pdf::\\([0-9]+\\)\\'" . "xpdf %s %1")))
>> #+end_src
> 
> Not relevant for the underlying issue, but doesn't xpdf require a colon
> before the page number (i.e. ":%1")?

At least for the application in debian & ubuntu xpdf package, page 
number should be specified without a colon. It is Xt interface to 
poppler PDF library, recently its maintainer decided to switch to 
xpopple project as upstream. UI is derived from old version of xpdf. 
Latest original xpdf version is based on Qt and might have different 
convention in respect to page numbers.

> I believe format-spec requires the placeholder to be A-z:
> 
>    (format-spec "xpdf %s" '((?s . "a")))  => "xpdf a"
>    (format-spec "xpdf %s %1" '((?s . "a") (?1 . "b")))  ;; Invalid format string

You are right. I missed that format-spec allows to specify field width, 
so digits could not be used.

> What about flipping the processing, handling the %N placeholders first
> and then formatting the file name?  Seems to work on my end, though I
> haven't tested it thoroughly.

I could anticipate similar problems if named destinations are involved. 
I have not checked but I expect that internal links might have "%s" in 
their names at least for some file types. That is why I would strongly 
prefer substitutions performed in a single pass. I do not like it, but 
it seems that simplified variant of format-spec is better. It should 
allows substitutions with digit. I hope, single digit should be enough.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: greedy substitution in org-open-file
  2021-02-12 16:46   ` Maxim Nikulin
@ 2021-02-13  4:38     ` Kyle Meyer
  2021-02-15 17:04       ` Maxim Nikulin
                         ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Kyle Meyer @ 2021-02-13  4:38 UTC (permalink / raw)
  To: Maxim Nikulin; +Cc: emacs-orgmode

Maxim Nikulin writes:

> On 12/02/2021 14:16, Kyle Meyer wrote:

>> Not relevant for the underlying issue, but doesn't xpdf require a colon
>> before the page number (i.e. ":%1")?
>
> At least for the application in debian & ubuntu xpdf package, page 
> number should be specified without a colon. It is Xt interface to 
> poppler PDF library, recently its maintainer decided to switch to 
> xpopple project as upstream. UI is derived from old version of xpdf. 
> Latest original xpdf version is based on Qt and might have different 
> convention in respect to page numbers.

Okay.  Fwiw the xpdf version I have that requires ":" before the page is
4.03.

>> What about flipping the processing, handling the %N placeholders first
>> and then formatting the file name?  Seems to work on my end, though I
>> haven't tested it thoroughly.
>
> I could anticipate similar problems if named destinations are involved. 
> I have not checked but I expect that internal links might have "%s" in 
> their names at least for some file types.

Indeed, flipping the order unsurprisingly just flips which placeholders
can be problematic.  A very contrived example:

  (setq org-file-apps
        '(("\\.pdf::\\([A-z%]+\\)\\'" . "doesntmatter %s %1")))

  ;; file:/tmp/test.pdf::a%sb =>
  ;; Running doesntmatter /tmp/test.pdf a/tmp/test.pdfb...done

> That is why I would strongly prefer substitutions performed in a
> single pass. I do not like it, but it seems that simplified variant of
> format-spec is better. It should allows substitutions with digit. I
> hope, single digit should be enough.

True, realistically I don't think anyone has a command in org-file-apps
that relies on more than a couple of capture groups.  All right, here's
a format-spec-inspired fix.  At the very least it needs doc updates and
a comment or two.

diff --git a/lisp/org.el b/lisp/org.el
index 5b1443c4e..e8f60fd83 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -8644,6 +8644,23 @@ (defun org--file-apps-regexp-alist (list &optional add-auto-mode)
    (when add-auto-mode
      (mapcar (lambda (x) (cons (car x) 'emacs)) auto-mode-alist))))
 
+(defun org--open-file-format-spec (format specification)
+  (with-temp-buffer
+    (insert format)
+    (goto-char (point-min))
+    (while (search-forward "%" nil t)
+      (cond ((eq (char-after) ?%)
+             (delete-char 1))
+            ((looking-at "[s0-9]")
+             (replace-match
+              (or (cdr (assoc (match-string 0) specification))
+                  (error "Invalid format string"))
+              'fixed-case 'literal)
+             (delete-region (1- (match-beginning 0)) (match-beginning 0)))
+            (t
+             (error "Invalid format string"))))
+    (buffer-string)))
+
 ;;;###autoload
 (defun org-open-file (path &optional in-emacs line search)
   "Open the file at PATH.
@@ -8745,24 +8762,20 @@ (defun org-open-file (path &optional in-emacs line search)
       ;; Remove quotes around the file name - we'll use shell-quote-argument.
       (while (string-match "['\"]%s['\"]" cmd)
 	(setq cmd (replace-match "%s" t t cmd)))
-      (setq cmd (replace-regexp-in-string
-		 "%s"
-		 (shell-quote-argument (convert-standard-filename file))
-		 cmd
-		 nil t))
-
-      ;; Replace "%1", "%2" etc. in command with group matches from regex
-      (save-match-data
-	(let ((match-index 1)
-	      (number-of-groups (- (/ (length link-match-data) 2) 1)))
-	  (set-match-data link-match-data)
-	  (while (<= match-index number-of-groups)
-	    (let ((regex (concat "%" (number-to-string match-index)))
-		  (replace-with (match-string match-index dlink)))
-	      (while (string-match regex cmd)
-		(setq cmd (replace-match replace-with t t cmd))))
-	    (setq match-index (+ match-index 1)))))
-
+      (setq cmd
+            (org--open-file-format-spec
+             cmd
+             (cons
+              (cons "s" (shell-quote-argument
+                         (convert-standard-filename file)))
+              (let ((ngroups (- (/ (length link-match-data) 2) 1)))
+                (and (> ngroups 0)
+                     (progn
+                       (set-match-data link-match-data)
+                       (mapcar (lambda (n)
+                                 (cons (number-to-string n)
+                                       (match-string-no-properties n dlink)))
+                               (number-sequence 1 ngroups))))))))
       (save-window-excursion
 	(message "Running %s...done" cmd)
 	(start-process-shell-command cmd nil cmd)


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: greedy substitution in org-open-file
  2021-02-13  4:38     ` Kyle Meyer
@ 2021-02-15 17:04       ` Maxim Nikulin
  2021-03-03 12:47       ` Maxim Nikulin
  2021-03-21 12:36       ` Maxim Nikulin
  2 siblings, 0 replies; 13+ messages in thread
From: Maxim Nikulin @ 2021-02-15 17:04 UTC (permalink / raw)
  To: emacs-orgmode

On 13/02/2021 11:38, Kyle Meyer wrote:
>
> All right, here's
> a format-spec-inspired fix.  At the very least it needs doc updates and
> a comment or two.

Thank you. I am hardly familiar with elisp so it would be difficult for 
me to express the same. My comments are mostly a matter of taste.

Sorry, I have not tried to run the code yet.

> diff --git a/lisp/org.el b/lisp/org.el
> index 5b1443c4e..e8f60fd83 100644
> --- a/lisp/org.el
> +++ b/lisp/org.el
> @@ -8644,6 +8644,23 @@ (defun org--file-apps-regexp-alist (list &optional add-auto-mode)
>      (when add-auto-mode
>        (mapcar (lambda (x) (cons (car x) 'emacs)) auto-mode-alist))))
>   
> +(defun org--open-file-format-spec (format specification)
> +  (with-temp-buffer
> +    (insert format)
> +    (goto-char (point-min))
> +    (while (search-forward "%" nil t)
> +      (cond ((eq (char-after) ?%)
> +             (delete-char 1))
> +            ((looking-at "[s0-9]")

Personally I do not see a reason to limit substitutions just to just 
"%s". I would consider "[a-zA-Z0-9]".

> +             (replace-match
> +              (or (cdr (assoc (match-string 0) specification))
> +                  (error "Invalid format string"))

I think, substitution character in the error message could be useful 
during debugging, something like (format "Invalid format character %%%s" 
(match-string 0)).

> +              'fixed-case 'literal)
> +             (delete-region (1- (match-beginning 0)) (match-beginning 0)))
> +            (t
> +             (error "Invalid format string"))))
> +    (buffer-string)))
> +
>   ;;;###autoload
>   (defun org-open-file (path &optional in-emacs line search)
>     "Open the file at PATH.
> @@ -8745,24 +8762,20 @@ (defun org-open-file (path &optional in-emacs line search)
>         ;; Remove quotes around the file name - we'll use shell-quote-argument.
>         (while (string-match "['\"]%s['\"]" cmd)
>   	(setq cmd (replace-match "%s" t t cmd)))
> -      (setq cmd (replace-regexp-in-string
> -		 "%s"
> -		 (shell-quote-argument (convert-standard-filename file))
> -		 cmd
> -		 nil t))
> -
> -      ;; Replace "%1", "%2" etc. in command with group matches from regex
> -      (save-match-data
> -	(let ((match-index 1)
> -	      (number-of-groups (- (/ (length link-match-data) 2) 1)))
> -	  (set-match-data link-match-data)
> -	  (while (<= match-index number-of-groups)
> -	    (let ((regex (concat "%" (number-to-string match-index)))
> -		  (replace-with (match-string match-index dlink)))
> -	      (while (string-match regex cmd)
> -		(setq cmd (replace-match replace-with t t cmd))))
> -	    (setq match-index (+ match-index 1)))))
> -
> +      (setq cmd
> +            (org--open-file-format-spec
> +             cmd
> +             (cons
> +              (cons "s" (shell-quote-argument
> +                         (convert-standard-filename file)))
> +              (let ((ngroups (- (/ (length link-match-data) 2) 1)))
> +                (and (> ngroups 0)
> +                     (progn
> +                       (set-match-data link-match-data)
> +                       (mapcar (lambda (n)
> +                                 (cons (number-to-string n)
> +                                       (match-string-no-properties n dlink)))

Should not be numbered substitutions passed through shell-quote-argument 
as well? Besides just page numbers the link could be named internal 
anchor where more characters are allowed. It is the reason why in 
general I prefer bare exec to shell commands.

I am unsure concerning optional matches as

     "\\(?:\\.pdf\\)\\(?:::\\([0-9]+\\)\\)?\\'"

Maybe they should not be used at all in favor of separate entries with 
and without page number. Maybe nil should coalesce to empty string "". 
Certainly I am against "nil" string for a missed group. By the way, in 
the original format-spec, cdr is applied after the check whether assoc 
value is nil.

> +                               (number-sequence 1 ngroups))))))))
>         (save-window-excursion
>   	(message "Running %s...done" cmd)
>   	(start-process-shell-command cmd nil cmd)





^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: greedy substitution in org-open-file
  2021-02-13  4:38     ` Kyle Meyer
  2021-02-15 17:04       ` Maxim Nikulin
@ 2021-03-03 12:47       ` Maxim Nikulin
  2021-03-21 12:36       ` Maxim Nikulin
  2 siblings, 0 replies; 13+ messages in thread
From: Maxim Nikulin @ 2021-03-03 12:47 UTC (permalink / raw)
  To: emacs-orgmode

Discussion of the original patch:
https://orgmode.org/list/4B51D104.9090502@jboecker.de/T/#u
https://lists.gnu.org/archive/html/emacs-orgmode/2010-01/msg00450.html

https://orgmode.org/list/4BB9C078.9050202@jboecker.de/
> The patch may make it a little easier to break things by
> misconfiguring org-file-apps.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: greedy substitution in org-open-file
  2021-02-13  4:38     ` Kyle Meyer
  2021-02-15 17:04       ` Maxim Nikulin
  2021-03-03 12:47       ` Maxim Nikulin
@ 2021-03-21 12:36       ` Maxim Nikulin
  2022-08-27 17:20         ` [PATCH] org.el: Fix percent substitutions in `org-open-file' Max Nikulin
  2 siblings, 1 reply; 13+ messages in thread
From: Maxim Nikulin @ 2021-03-21 12:36 UTC (permalink / raw)
  To: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 3465 bytes --]

On 13/02/2021 11:38, Kyle Meyer wrote:
>   
> +(defun org--open-file-format-spec (format specification)
> +  (with-temp-buffer
> +    (insert format)
> +    (goto-char (point-min))
> +    (while (search-forward "%" nil t)
> +      (cond ((eq (char-after) ?%)
> +             (delete-char 1))
> +            ((looking-at "[s0-9]")
> +             (replace-match
> +              (or (cdr (assoc (match-string 0) specification))
> +                  (error "Invalid format string"))
> +              'fixed-case 'literal)
> +             (delete-region (1- (match-beginning 0)) (match-beginning 0)))

Finally I managed to convince myself that delete-region does not change 
position in the buffer, so "%s" or "%1" in specification are not a 
problem. I am aware that this implementation is a simplified version of 
format-spec, but I am still unsure if jumping over a buffer is the best 
choice.

I am in doubts if strings or characters should be used in the spec:
'(("s" . "file.pdf")) vs. '((?s . "file.pdf")), but really it does not 
matter.

I have created some tests for this function, see the attachment. 
Actually I do not like such style of tests since first failure stops 
whole test and it is hard to get general impression to which degree the 
function under the test is broken, but I am not aware of a better way.
Recently I asked Ihor a similar question: 
https://orgmode.org/list/s324b0$74g$1@ciao.gmane.io

I know that functions called from one point are not in favor in org 
sources, but I do not mind to have additional helper function to add 
tests that all substitutions are properly escaped.

> +              (let ((ngroups (- (/ (length link-match-data) 2) 1)))
> +                (and (> ngroups 0)
> +                     (progn
> +                       (set-match-data link-match-data)
> +                       (mapcar (lambda (n)
> +                                 (cons (number-to-string n)
> +                                       (match-string-no-properties n dlink)))
> +                               (number-sequence 1 ngroups))))))))

Matter of taste: it seems that with (number-sequence 1 ngroups 1)) it is 
possible to avoid (> ngroups 0).

I have spent some time evaluating how to make errors more helpful to 
users. I am unsure if multiline message is acceptable to dump content of 
specification. For a while, a place in the format where error has 
happened (combined with a different approach to parse format string)

(defun org--open-file-format-spec (format specification)
   (apply #'concat
          (nreverse
           (let ((result nil)
                 (token-end 0))
             (while (string-match "%\\(.?\\)" format token-end)
               (let ((token-start (match-beginning 0))
                     (subst (match-string-no-properties 1 format)))
                 (push (substring format token-end token-start) result)
                 (push (if (equal subst "%")
                           "%"
                         (or (cdr (assoc subst specification))
                           (error "Unknown substitution: '%s<ERROR ->%s'"
                                    (substring format 0 token-start)
                                    (substring format token-start nil))))
                       result))
               (setq token-end (match-end 0)))
             (push (substring format token-end nil) result)))))

To my surprise neither ^ nor \\` in string-match regexp works if 
start-pos is not zero.

[-- Attachment #2: test-org--open-file-format-spec.patch --]
[-- Type: text/x-patch, Size: 3006 bytes --]

diff --git a/testing/lisp/test-org.el b/testing/lisp/test-org.el
index 78cd29576..b6e42dc99 100644
--- a/testing/lisp/test-org.el
+++ b/testing/lisp/test-org.el
@@ -8236,6 +8236,72 @@ two
      (call-interactively #'org-paste-subtree)
      (buffer-string)))))
 
+(ert-deftest org-test/org--open-file-format-spec ()
+  "Test `org-open-file' helper `org--open-file-format-spec'."
+  (let ((def-spec '(("s" . "file.pdf") ("1" . "10") ("2" . "pattern"))))
+    (should (equal "zathura --page 10 --find pattern file.pdf"
+		   (org--open-file-format-spec
+		    "zathura --page %1 --find %2 %s" def-spec)))
+    (should (equal "no subst"
+		   (org--open-file-format-spec
+		    "no subst" def-spec)))
+    (should (equal "simple file.pdf"
+		   (org--open-file-format-spec
+		    "simple %s" def-spec)))
+    (should (equal "with --page 10 file.pdf"
+		   (org--open-file-format-spec
+		      "with --page %1 %s" def-spec)))
+    (should (equal "file.pdf at start"
+		     (org--open-file-format-spec
+		      "%s at start" def-spec)))
+    (should (equal "in the file.pdf middle"
+		     (org--open-file-format-spec
+		      "in the %s middle" def-spec)))
+    (should (equal "literal %"
+		   (org--open-file-format-spec
+		    "literal %%" def-spec)))
+    (should (equal "literal %s in the middle"
+		   (org--open-file-format-spec
+		    "literal %%s in the middle" def-spec)))
+    (should (equal "% literal at start"
+		     (org--open-file-format-spec
+		      "%% literal at start" def-spec)))
+    (should (equal "" (org--open-file-format-spec "" def-spec)))
+    (should (equal "adjucent 10file.pdf substitutions"
+		   (org--open-file-format-spec
+		    "adjucent %1%s substitutions" def-spec))))
+
+  (should (equal "many -f first -s second -t third -e eigth file.pdf"
+		 (org--open-file-format-spec
+		  "many -f %1 -s %2 -t %3 -e %8 %s"
+		  '(("1" . "first") ("2" . "second") ("3" . "third")
+		    ("4" . "fourth") ("5" . "firth") ("6" . "sixth")
+		    ("7" . "seventh") ("8" . "eigth") ("s" . "file.pdf")))))
+
+  ;; I am afraid to add recursive substitutions like ("%s" . "recursive %s").
+  (should (equal "no-recursion-with file-%1.pdf --page 10"
+		 (org--open-file-format-spec
+		  "no-recursion-with %s --page %1"
+		  '(("s" . "file-%1.pdf") ("1" . "10")))))
+  (should (equal "no-recursion-with --search printf-%s file.pdf"
+		 (org--open-file-format-spec
+		  "no-recursion-with --search %1 %s"
+		  '(("s" . "file.pdf") ("1" . "printf-%s")))))
+
+  (let* ((err
+	  (should-error (org--open-file-format-spec "invalid-end %" ())
+			:type 'error))
+	 (text (cadr err)))
+    (should (and (stringp text)
+		 (string-match-p "’invalid-end <ERROR ->%’" text))))
+  (let* ((err
+	  (should-error (org--open-file-format-spec "missed --subst %1 %s"
+						    '(("s" . "file.pdf")))
+		       :type 'error))
+	 (text (cadr err)))
+    (should (and (stringp text)
+		 (string-match-p "’missed --subst <ERROR ->%1 %s’" text)))))
+
 (provide 'test-org)
 
 ;;; test-org.el ends here

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH] org.el: Fix percent substitutions in `org-open-file'
  2021-03-21 12:36       ` Maxim Nikulin
@ 2022-08-27 17:20         ` Max Nikulin
  2022-09-02 12:08           ` Ihor Radchenko
  0 siblings, 1 reply; 13+ messages in thread
From: Max Nikulin @ 2022-08-27 17:20 UTC (permalink / raw)
  To: emacs-orgmode; +Cc: Kyle Meyer

[-- Attachment #1: Type: text/plain, Size: 1245 bytes --]

On 21/03/2021 19:36, Maxim Nikulin wrote:
> On 13/02/2021 11:38, Kyle Meyer wrote:
>> +(defun org--open-file-format-spec (format specification)
>> +  (with-temp-buffer
>> +    (insert format)
>> +    (goto-char (point-min))
>> +    (while (search-forward "%" nil t)
>> +      (cond ((eq (char-after) ?%)
>> +             (delete-char 1))
>> +            ((looking-at "[s0-9]")
>> +             (replace-match
>> +              (or (cdr (assoc (match-string 0) specification))
>> +                  (error "Invalid format string"))
>> +              'fixed-case 'literal)
>> +             (delete-region (1- (match-beginning 0)) (match-beginning 
>> 0)))

It's pity that this issue has not fix yet. I have tried to implement 
another helper function. It is hardly usable outside of `org-open-file' 
(perhaps it is still suitable to make mailcap commands), but separate 
function makes the code testable. I hope, the attached patch fixes the 
problems with multiple regexp groups and with percent characters in 
replacement string. Notice that "\%", not "%%" is used to escape percent 
in mailcap, anyway neither variant was supported before.

[-- Attachment #2: 0001-org.el-Fix-percent-substitutions-in-org-open-file.patch --]
[-- Type: text/x-patch, Size: 13173 bytes --]

From 2f2850dcdc0cba8cfc6c0e21fc893c4e664d575c Mon Sep 17 00:00:00 2001
From: Max Nikulin <manikulin@gmail.com>
Date: Sat, 27 Aug 2022 23:51:24 +0700
Subject: [PATCH] org.el: Fix percent substitutions in `org-open-file'

* lisp/org.el (org--open-file-format-command): New function with better
coverage of mailcap RFC 1524 syntax.  Do not replace percent character
in file name or link component, fix substitution of multiple regular
expression groups matched in the link target.
(org-open-file): Use `org--open-file-format-command' instead of inline
code.
* testing/lisp/test-org.el (org-test/org--open-file-format-command):
Tests for `org--open-file-format-command'.

The primary goal of moving code outside of `org-open-file' function is to
make it testable.

It should fix the following issues:
- Maxim Nikulin. greedy substitution in org-open-file.
  Wed, 20 Jan 2021 23:08:35 +0700.
  https://list.orgmode.org/ru9ki4$t5e$1@ciao.gmane.io
- Rodrigo Morales. Org mode links: Open a PDF file at a given page
  and highlight a given string. Tue, 02 Mar 2021 15:07:32 -0500.
  https://list.orgmode.org/87lfb5pbej.fsf@gmail.com
---
 lisp/org.el              |  95 +++++++++++++++++++++-------
 testing/lisp/test-org.el | 132 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 206 insertions(+), 21 deletions(-)

diff --git a/lisp/org.el b/lisp/org.el
index 858123e67..e94ec02d9 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -8012,6 +8012,78 @@ opened in Emacs."
    (when add-auto-mode
      (mapcar (lambda (x) (cons (car x) 'emacs)) auto-mode-alist))))
 
+(defun org--open-file-format-command
+    (mailcap-command file link match-data)
+  "Format MAILCAP-COMMAND to launch viewer for FILE.
+
+Command may be an entry from the `org-file-apps' list or from
+mailcap file, for details see RFC 1524 A User Agent Configuration
+Mechanism For Multimedia Mail Format Information
+<https://tools.ietf.org/html/rfc1524>, man mailcap(5), and Info
+node `(emacs-mime) mailcap'.  Only a part of mailcap specification
+is supported.
+
+The following substitutions are interpolated in the MAILCAP-COMMAND
+string:
+- \"%s\" to FILE name passed through `convert-standard-file-name',
+  so it must be absolute path,
+- \"%1\" to \"%9\" groups from MATCH-DATA found in the LINK string
+  by the regular expression in the `org-file-apps' entry key
+  (performed by caller).
+
+Use backslash \"\\\" to quote percent \"%\" or any other character
+including backslash itself.
+
+In addition, each argument is passed through `shell-quote-argument',
+so quotes around substitutions should not be used.  For compliance
+with mailcap files single or double quotes around substitutions
+are stripped.
+
+For example, to specify particular location withing a PDF file,
+`org-file-apps' list may have the following entries (order is important):
+
+    ;; Page and search string,
+    ;; e.g. <file:///usr/share/doc/bash/bashref.pdf::34::order of redirections>.
+    (\"\\\\.pdf::\\\\([0-9]+\\\\)::\\\\(.+\\\\)\\\\\\='\"
+        . \"okular --page %1 --find %2 %s\")
+    ;; Internal anchor and search string,
+    ;; e.g. <file:///usr/share/doc/bash/bashref.pdf::Redirections::allocate a file>.
+    (\"\\\\.pdf::\\\\(.+\\\\)::\\\\(.+\\\\)\\\\\\='\"
+        . \"okular --find %2 file://%s\\\\\\\\#%1\")
+    ;; Page number, e.g. <file:///usr/share/doc/bash/bashref.pdf::34>.
+    (\"\\\\.pdf::\\\\([0-9]+\\\\)\\\\\\='\" . \"okular --page %1 %s\")
+    ;; Internal reference, e.g. <file:///usr/share/doc/bash/bashref.pdf::Redirections>.
+    (\"\\\\.pdf::\\\\(.+\\\\)\\\\\\='\" . \"okular file://%s\\\\\\\\#%1\")
+    ;; No location within the file, optionally followed by \"::\",
+    ;; e.g. <file:///usr/share/doc/bash/bashref.pdf>.
+    (\"\\\\.pdf\\\\(?:::\\\\)?\\\\\\='\" . \"okular %s\")
+
+Side note: perhaps you would prefer to read the same bash manual used in the
+example as Info node `(bash) Redirections' (Org link: <info:bash#Redirections>)."
+  (let ((spec (list (cons ?s  (convert-standard-filename file))))
+        (ngroups (min 9 (- (/ (length match-data) 2) 1))))
+    (when (> ngroups 0)
+      (set-match-data match-data)
+      (dolist (i (number-sequence 1 ngroups))
+        (push (cons (+ ?0 i) (match-string-no-properties i link)) spec)))
+    (replace-regexp-in-string
+     "\\\\\\(.\\)\\|\\(['\"]\\)?%\\(?:\\(.\\)\\(\\2\\)?\\)?"
+     (lambda (fmt)
+       (let ((backslash (match-string-no-properties 1 fmt)))
+         (or backslash
+             (let* ((quot (match-string 2 fmt))
+                    (percent (alist-get (string-to-char
+                                         (or (match-string 3 fmt) ""))
+                                        spec))
+                    (value (and percent (shell-quote-argument percent))))
+               (unless value
+                 (error "Invalid format `%s'" fmt))
+               ;; Remove quotes around the file name - we use `shell-quote-argument'.
+               (if (and quot (string-equal quot (match-string 4 fmt)))
+                   value
+                 (concat quot value))))))
+     mailcap-command nil 'literal)))
+
 ;;;###autoload
 (defun org-open-file (path &optional in-emacs line search)
   "Open the file at PATH.
@@ -8109,27 +8181,8 @@ If the file does not exist, throw an error."
 	       (not org-open-non-existing-files))
       (user-error "No such file: %s" file))
     (cond
-     ((and (stringp cmd) (not (string-match "^\\s-*$" cmd)))
-      ;; Remove quotes around the file name - we'll use shell-quote-argument.
-      (while (string-match "['\"]%s['\"]" cmd)
-	(setq cmd (replace-match "%s" t t cmd)))
-      (setq cmd (replace-regexp-in-string
-		 "%s"
-		 (shell-quote-argument (convert-standard-filename file))
-		 cmd
-		 nil t))
-
-      ;; Replace "%1", "%2" etc. in command with group matches from regex
-      (save-match-data
-	(let ((match-index 1)
-	      (number-of-groups (- (/ (length link-match-data) 2) 1)))
-	  (set-match-data link-match-data)
-	  (while (<= match-index number-of-groups)
-	    (let ((regex (concat "%" (number-to-string match-index)))
-		  (replace-with (match-string match-index dlink)))
-	      (while (string-match regex cmd)
-		(setq cmd (replace-match replace-with t t cmd))))
-	    (setq match-index (+ match-index 1)))))
+     ((org-string-nw-p cmd)
+      (setq cmd (org--open-file-format-command cmd file dlink link-match-data))
 
       (save-window-excursion
 	(message "Running %s...done" cmd)
diff --git a/testing/lisp/test-org.el b/testing/lisp/test-org.el
index b14cbeb26..afae24aca 100644
--- a/testing/lisp/test-org.el
+++ b/testing/lisp/test-org.el
@@ -8421,6 +8421,138 @@ two
      (call-interactively #'org-paste-subtree)
      (buffer-string)))))
 
+(ert-deftest test-org/org--open-file-format-command ()
+  "Test `org--open-file-format-command' helper for `org-open-file'."
+  (let ((system-type 'gnu/linux)) ; Fix behavior of `shell-quote-argument'.
+    ;; No additional groups in `org-file-apps' key.
+    (let ((file "/file.pdf")
+          (pattern "\\.pdf\\'"))
+      (should
+       (equal "simple /file.pdf"
+              (and (string-match pattern file)
+                   (org--open-file-format-command
+                    "simple %s" file file (match-data)))))
+      (should
+       (equal "single-quotes /file.pdf"
+              (and (string-match pattern file)
+                   (org--open-file-format-command
+                    "single-quotes '%s'" file file (match-data)))))
+      (should
+       (equal "double-quotes /file.pdf"
+              (and (string-match pattern file)
+                   (org--open-file-format-command
+                    "double-quotes \"%s\"" file file (match-data)))))
+      (should
+       (equal "no subst"
+              (and (string-match pattern file)
+                   (org--open-file-format-command
+                    "no subst" file file (match-data)))))
+      (should
+       (equal "% literal percent 100% %s"
+              (and (string-match pattern file)
+                   (org--open-file-format-command
+                    "\\% literal percent 100\\% \\%s" file file (match-data)))))
+      (should
+       (equal "escape \"/file.pdf\" \\ more"
+              (and (string-match pattern file)
+                   (org--open-file-format-command
+                    ;; Second quote is not escaped.
+                    "escape \\\"%s\" \\\\ more" file file (match-data)))))
+      (should
+       (equal "/file.pdf file at start"
+              (and (string-match pattern file)
+                   (org--open-file-format-command
+                    "%s file at start" file file (match-data))))))
+    ;; Anchors within target file.
+    (let ((file "/page-search.pdf")
+          (link "/page-search.pdf::10::some words")
+          (pattern "\\.pdf::\\([0-9]+\\)::\\(.*\\)\\'"))
+      (should
+       (equal "zathura --page 10 --find some\\ words /page-search.pdf"
+              (and (string-match pattern link)
+                   (org--open-file-format-command
+                    "zathura --page '%1' --find %2 \"%s\"" file link (match-data)))))
+      ;; Unused %2.
+      (should
+       (equal "firefox file:///page-search.pdf\\#page=10"
+              (and (string-match pattern link)
+                   (org--open-file-format-command
+                    "firefox file://%s\\\\#page=%1" file link (match-data)))))
+      (should
+       (equal "adjucent-subst /page-search.pdfsome\\ words10some\\ words"
+              (and (string-match pattern link)
+                   (org--open-file-format-command
+                    "adjucent-subst %s%2'%1'%2" file link (match-data))))))
+    ;; No more than 9 substitutions are supported.
+    (let ((file "/many.pdf")
+          (link "/many.pdf::one:2:3:4:5:6:7:8:9:a:b:c")
+          (pattern (concat "\\.pdf:"
+                           (mapconcat (lambda (_) ":\\([^:]+\\)")
+                                      (number-sequence 1 12)
+                                      "")
+                           "\\'")))
+      (should
+       (equal "overflow /many.pdf::one:2:3:4:5:6:7:8:9:one0:one1:one2"
+              (and (string-match pattern link)
+                   (org--open-file-format-command
+                    "overflow %s::%1:%2:%3:%4:%5:%6:%7:%8:%9:%10:%11:%12"
+                    file link (match-data))))))
+    ;; Percent character in link fields does not cause any problem.
+    (let ((file "/file-%2.pdf")
+          (link "/file-%2.pdf::anchor-%3::search %1")
+          (pattern "\\.pdf::\\([^:]+\\)::\\(.+\\)\\'"))
+      (should
+       (equal "percents --find search\\ \\%1 file:///file-\\%2.pdf\\#anchor-\\%3"
+              (and (string-match pattern link)
+                   (org--open-file-format-command
+                    "percents --find %2 file://%s\\\\#%1"
+                    file link (match-data))))))
+    ;; Errors.
+    (let ((file "/error.pdf")
+          (pattern "\\.pdf\\'"))
+      (let* ((err (should-error
+                   (and (string-match pattern file)
+                        (org--open-file-format-command
+                         "trailing-percent %s %" file file (match-data)))
+                   :type 'error))
+             (err-text (cadr err)))
+        (should-not (unless (and (stringp err-text)
+                                 (string-match-p "\\`Invalid format .*%" err-text))
+                      err)))
+      ;; Mailcap escape for "%" is "\%", not "%%".
+      (let* ((err (should-error
+                   (and (string-match pattern file)
+                        (org--open-file-format-command
+                         "percent-percent %s%%" file file (match-data)))
+                   :type 'error))
+             (err-text (cadr err)))
+        (should-not (unless (and (stringp err-text)
+                                 (string-match-p "\\`Invalid format .*%%" err-text))
+                      err)))
+      ;; Mailcap allows "%t" for MIME type, but Org has no such information.
+      (let* ((err (should-error
+                   (and (string-match pattern file)
+                        (org--open-file-format-command
+                         "percent-t-unsupported --type '%t' %s" file file (match-data)))
+                   :type 'error))
+             (err-text (cadr err)))
+        (should-not (unless (and (stringp err-text)
+                                 (string-match-p "\\`Invalid format .*%t" err-text))
+                      err))))
+    ;; Optional regular expression groups have no point in `org-file-apps' patterns.
+    (let* ((file "/error.pdf")
+           (link "/error.pdf::1")
+           (pattern "\\.pdf::\\([^:]+\\)\\(?:::\\(.+\\)\\)?\\'")
+           (err (should-error
+                 (and (string-match pattern link)
+                      (org--open-file-format-command
+                       "no-such-match --search %2 %s" file link (match-data)))
+                 :type 'error))
+           (err-text (cadr err)))
+      (should-not (unless (and (stringp err-text)
+                               (string-match-p "\\`Invalid format.*%2" err-text))
+                    err)))))
+
 (provide 'test-org)
 
 ;;; test-org.el ends here
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] org.el: Fix percent substitutions in `org-open-file'
  2022-08-27 17:20         ` [PATCH] org.el: Fix percent substitutions in `org-open-file' Max Nikulin
@ 2022-09-02 12:08           ` Ihor Radchenko
  2022-09-02 15:41             ` Max Nikulin
  0 siblings, 1 reply; 13+ messages in thread
From: Ihor Radchenko @ 2022-09-02 12:08 UTC (permalink / raw)
  To: Max Nikulin; +Cc: emacs-orgmode, Kyle Meyer

Max Nikulin <manikulin@gmail.com> writes:

> It's pity that this issue has not fix yet. I have tried to implement 
> another helper function. It is hardly usable outside of `org-open-file' 
> (perhaps it is still suitable to make mailcap commands), but separate 
> function makes the code testable. I hope, the attached patch fixes the 
> problems with multiple regexp groups and with percent characters in 
> replacement string. Notice that "\%", not "%%" is used to escape percent 
> in mailcap, anyway neither variant was supported before.
> From 2f2850dcdc0cba8cfc6c0e21fc893c4e664d575c Mon Sep 17 00:00:00 2001
> From: Max Nikulin <manikulin@gmail.com>
> Date: Sat, 27 Aug 2022 23:51:24 +0700
> Subject: [PATCH] org.el: Fix percent substitutions in `org-open-file'

Thanks! LGTM, except one comment.

> +(defun org--open-file-format-command
> +    (mailcap-command file link match-data)
> +  "Format MAILCAP-COMMAND to launch viewer for FILE.
> +
> ...
> +For example, to specify particular location withing a PDF file,
> +`org-file-apps' list may have the following entries (order is important):
> +
> +    ;; Page and search string,
> +    ;; e.g. <file:///usr/share/doc/bash/bashref.pdf::34::order of redirections>.
> +    (\"\\\\.pdf::\\\\([0-9]+\\\\)::\\\\(.+\\\\)\\\\\\='\"
> +        . \"okular --page %1 --find %2 %s\")
> +    ;; Internal anchor and search string,
> +    ;; e.g. <file:///usr/share/doc/bash/bashref.pdf::Redirections::allocate a file>.
> +    (\"\\\\.pdf::\\\\(.+\\\\)::\\\\(.+\\\\)\\\\\\='\"
> +        . \"okular --find %2 file://%s\\\\\\\\#%1\")
> +    ;; Page number, e.g. <file:///usr/share/doc/bash/bashref.pdf::34>.
> +    (\"\\\\.pdf::\\\\([0-9]+\\\\)\\\\\\='\" . \"okular --page %1 %s\")
> +    ;; Internal reference, e.g. <file:///usr/share/doc/bash/bashref.pdf::Redirections>.
> +    (\"\\\\.pdf::\\\\(.+\\\\)\\\\\\='\" . \"okular file://%s\\\\\\\\#%1\")
> +    ;; No location within the file, optionally followed by \"::\",
> +    ;; e.g. <file:///usr/share/doc/bash/bashref.pdf>.
> +    (\"\\\\.pdf\\\\(?:::\\\\)?\\\\\\='\" . \"okular %s\")

This is a nice set of examples, but it probably does not belong to this
docstring. I'd rather see this in `org-file-apps' docstring or even in
the manual.

-- 
Ihor Radchenko,
Org mode contributor,
Learn more about Org mode at https://orgmode.org/.
Support Org development at https://liberapay.com/org-mode,
or support my work at https://liberapay.com/yantar92


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] org.el: Fix percent substitutions in `org-open-file'
  2022-09-02 12:08           ` Ihor Radchenko
@ 2022-09-02 15:41             ` Max Nikulin
  2022-09-03  8:26               ` Ihor Radchenko
  0 siblings, 1 reply; 13+ messages in thread
From: Max Nikulin @ 2022-09-02 15:41 UTC (permalink / raw)
  To: emacs-orgmode

On 02/09/2022 19:08, Ihor Radchenko wrote:
> Max Nikulin writes:
> 
>> Subject: [PATCH] org.el: Fix percent substitutions in `org-open-file'
> 
>> +(defun org--open-file-format-command
>> +    (mailcap-command file link match-data)
>> +  "Format MAILCAP-COMMAND to launch viewer for FILE.
>> +
>> ...
>> +For example, to specify particular location withing a PDF file,
>> +`org-file-apps' list may have the following entries (order is important):
>> +
>> +    ;; Page and search string,
>> +    ;; e.g. <file:///usr/share/doc/bash/bashref.pdf::34::order of redirections>.
>> +    (\"\\\\.pdf::\\\\([0-9]+\\\\)::\\\\(.+\\\\)\\\\\\='\"
>> +        . \"okular --page %1 --find %2 %s\")
>> +    ;; Internal anchor and search string,
>> +    ;; e.g. <file:///usr/share/doc/bash/bashref.pdf::Redirections::allocate a file>.

In the meanwhile I have realized that it should be

<file:///usr/share/doc/bash/bashref.pdf::#Redirections::allocate a file>

To allow just search as

<file:///usr/share/doc/bash/bashref.pdf::allocate a file>

>> +    (\"\\\\.pdf::\\\\(.+\\\\)::\\\\(.+\\\\)\\\\\\='\"
>> +        . \"okular --find %2 file://%s\\\\\\\\#%1\")
>> +    ;; Page number, e.g. <file:///usr/share/doc/bash/bashref.pdf::34>.
>> +    (\"\\\\.pdf::\\\\([0-9]+\\\\)\\\\\\='\" . \"okular --page %1 %s\")
>> +    ;; Internal reference, e.g. <file:///usr/share/doc/bash/bashref.pdf::Redirections>.

And <file:///usr/share/doc/bash/bashref.pdf::#Redirections>
As a side effect it should dramatically reduce number of backslashes 
since hash symbol becomes a part of %1 substitution.

>> +    (\"\\\\.pdf::\\\\(.+\\\\)\\\\\\='\" . \"okular file://%s\\\\\\\\#%1\")
>> +    ;; No location within the file, optionally followed by \"::\",
>> +    ;; e.g. <file:///usr/share/doc/bash/bashref.pdf>.
>> +    (\"\\\\.pdf\\\\(?:::\\\\)?\\\\\\='\" . \"okular %s\")
> 
> This is a nice set of examples, but it probably does not belong to this
> docstring. I'd rather see this in `org-file-apps' docstring or even in
> the manual.

I thought on this and I do not think it should be added to the manual. 
Instead a set of hooks should be defined for popular PDF viewers: 
evince, zathura, xpdf, firefox, chromium & Co, etc. Such hook injects a 
number of supported `org-file-apps' records and users may add suitable 
hook to e.g. (with-eval-after-load 'org (push ...)). It may be 
implemented as a dedicated package org-pdf-viewers.el. The only problems 
is that adding entries programmatically breaks easy customization 
interface for `org-file-apps'. Currently there is the same issue with 
`org-link-parameters' that is a defcustom variables with entries added 
by various org extensions.

I added the example with hope to better explain the purpose of this 
function.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] org.el: Fix percent substitutions in `org-open-file'
  2022-09-02 15:41             ` Max Nikulin
@ 2022-09-03  8:26               ` Ihor Radchenko
  2022-09-04 12:16                 ` [PATCH v2] " Max Nikulin
  0 siblings, 1 reply; 13+ messages in thread
From: Ihor Radchenko @ 2022-09-03  8:26 UTC (permalink / raw)
  To: Max Nikulin; +Cc: emacs-orgmode

Max Nikulin <manikulin@gmail.com> writes:

>>> +    (\"\\\\.pdf::\\\\(.+\\\\)\\\\\\='\" . \"okular file://%s\\\\\\\\#%1\")
>>> +    ;; No location within the file, optionally followed by \"::\",
>>> +    ;; e.g. <file:///usr/share/doc/bash/bashref.pdf>.
>>> +    (\"\\\\.pdf\\\\(?:::\\\\)?\\\\\\='\" . \"okular %s\")
>> 
>> This is a nice set of examples, but it probably does not belong to this
>> docstring. I'd rather see this in `org-file-apps' docstring or even in
>> the manual.
>
> I thought on this and I do not think it should be added to the manual. 
> Instead a set of hooks should be defined for popular PDF viewers: 
> evince, zathura, xpdf, firefox, chromium & Co, etc. Such hook injects a 
> number of supported `org-file-apps' records and users may add suitable 
> hook to e.g. (with-eval-after-load 'org (push ...)). It may be 
> implemented as a dedicated package org-pdf-viewers.el. The only problems 
> is that adding entries programmatically breaks easy customization 
> interface for `org-file-apps'. Currently there is the same issue with 
> `org-link-parameters' that is a defcustom variables with entries added 
> by various org extensions.

We may alter the :type specifier in `org-file-apps' to something like

:type `(repeat
         (choice
           ,org-file-apps-presets
           ...))

`org-file-apps-presets' will then contain :type specs for some common
file types and the apps. The variable can be easily populated by
third-party packages as well.

>       I added the example with hope to better explain the purpose of this 
>       function.

I do not think that it is needed in the `org--open-file-format-command'
docstring. If you wish, it would be better to add examples of the
arguments and return values of `org--open-file-format-command' instead
of showing `org-file-apps' examples.

-- 
Ihor Radchenko,
Org mode contributor,
Learn more about Org mode at https://orgmode.org/.
Support Org development at https://liberapay.com/org-mode,
or support my work at https://liberapay.com/yantar92


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] org.el: Fix percent substitutions in `org-open-file'
  2022-09-03  8:26               ` Ihor Radchenko
@ 2022-09-04 12:16                 ` Max Nikulin
  2022-09-05  5:46                   ` Ihor Radchenko
  0 siblings, 1 reply; 13+ messages in thread
From: Max Nikulin @ 2022-09-04 12:16 UTC (permalink / raw)
  To: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 2269 bytes --]

On 03/09/2022 15:26, Ihor Radchenko wrote:
> Max Nikulin writes:
>> The only problems
>> is that adding entries programmatically breaks easy customization
>> interface for `org-file-apps'. Currently there is the same issue with
>> `org-link-parameters' that is a defcustom variables with entries added
>> by various org extensions.
> 
> We may alter the :type specifier in `org-file-apps' to something like
> 
> :type `(repeat
>           (choice
>             ,org-file-apps-presets
>             ...))
> 
> `org-file-apps-presets' will then contain :type specs for some common
> file types and the apps. The variable can be easily populated by
> third-party packages as well.

I mean another problem. A user adds (require 'some-package) that defines 
new link type. Later the user invokes easy customization interface to 
adjust a link unrelated to some-package. At this step
links from some-package are added to user's init file. Ideally it should 
not happen and after removing of (require 'some-package) there should be 
no link types from the package in the init file at all, including 
customization section.

 From my point of view entries added to `org-link-parameters', 
`org-file-apps', etc. by packages should not affect entries managed 
through customization.

>>        I added the example with hope to better explain the purpose of this
>>        function.
> 
> I do not think that it is needed in the `org--open-file-format-command'
> docstring. If you wish, it would be better to add examples of the
> arguments and return values of `org--open-file-format-command' instead
> of showing `org-file-apps' examples.

I have removed most parts of the example. In addition I fixed some 
issues with references in the docstring (man mailcap is not recognized 
as a link for some reason yet). I changed the function to consider 
trailing backslash as a format error.

See the following message for my new idea concerning providing 
configuration for PDF viewers. Single elisp function should be more 
robust than 6 shell commands and it is better to add code than instructions.

Max Nikulin. Re: Org mode links: Open a PDF file at a given page and 
highlight a given string. Sat, 3 Sep 2022 20:00:47 +0700. 
https://list.orgmode.org/tevj61$17d8$1@ciao.gmane.io

[-- Attachment #2: v2-0001-org.el-Fix-percent-substitutions-in-org-open-file.patch --]
[-- Type: text/x-patch, Size: 15057 bytes --]

From 5c7a38b1d61c44ee4414f01c94abc841bd203191 Mon Sep 17 00:00:00 2001
From: Max Nikulin <manikulin@gmail.com>
Date: Sat, 27 Aug 2022 23:51:24 +0700
Subject: [PATCH v2] org.el: Fix percent substitutions in `org-open-file'

* lisp/org.el (org--open-file-format-command): New function with better
coverage of mailcap RFC 1524 syntax.  Do not replace percent character
in file name or link component, fix substitution of multiple regular
expression groups matched in the link target.
(org-open-file): Use `org--open-file-format-command' instead of inline
code.
* testing/lisp/test-org.el (org-test/org--open-file-format-command):
Tests for `org--open-file-format-command'.

The primary goal of moving code outside of `org-open-file' function is to
make it testable.

It should fix the following issues:
- Maxim Nikulin. greedy substitution in org-open-file.
  Wed, 20 Jan 2021 23:08:35 +0700.
  https://list.orgmode.org/ru9ki4$t5e$1@ciao.gmane.io
- Rodrigo Morales. Org mode links: Open a PDF file at a given page
  and highlight a given string. Tue, 02 Mar 2021 15:07:32 -0500.
  https://list.orgmode.org/87lfb5pbej.fsf@gmail.com
---
 lisp/org.el              | 103 ++++++++++++++++++++-----
 testing/lisp/test-org.el | 160 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 242 insertions(+), 21 deletions(-)

diff --git a/lisp/org.el b/lisp/org.el
index 6fafed8f1..9c440bd67 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -1299,6 +1299,17 @@ Possible values for the file identifier are:
 \"evince -p %1 %s\")
                      to open [[file:document.pdf::5]] with evince at page 5.
 
+                 Likely you will need more entries: without page
+                 number, with search pattern, with
+                 cross-reference anchor, some combination of
+                 options.  Consider simple pattern here and a
+                 lisp function to determine command line
+                 arguments instead.  Passing argument list to
+                 `call-process' or `make-process' directly allows
+                 to avoid treating some character in peculiar
+                 file names as shell specialls causing executing
+                 part of file name as a subcommand.
+
  `directory'   Matches a directory
  `remote'      Matches a remote file, accessible through tramp or efs.
                Remote files most likely should be visited through Emacs
@@ -8015,6 +8026,75 @@ opened in Emacs."
    (when add-auto-mode
      (mapcar (lambda (x) (cons (car x) 'emacs)) auto-mode-alist))))
 
+(defun org--open-file-format-command
+    (mailcap-command file link match-data)
+  "Format MAILCAP-COMMAND to launch viewer for the FILE.
+
+MAILCAP-COMMAND may be an entry from the `org-file-apps' list or
+viewer field from mailcap file loaded to `mailcap-mime-data'. See
+\"RFC 1524. A User Agent Configuration Mechanism For Multimedia
+Mail Format Information\" (URL
+`https://www.rfc-editor.org/rfc/rfc1524.html') for details, man
+page `mailcap(5)' for brief summary, and Info node `(emacs-mime)
+mailcap' for specific related to Emacs.  Only a part of mailcap
+specification is supported.
+
+The following substitutions are interpolated in the
+MAILCAP-COMMAND string:
+
+- \"%s\" to FILE name passed through
+  `convert-standard-filename', so it must be absolute path,
+
+- \"%1\" to \"%9\" groups from MATCH-DATA found in the LINK
+  string by the regular expression in the key part of the `org-file-apps' entry.
+  (performed by caller).  Not recommended, consider a lisp
+  function instead of a shell command.  For example the following
+  link in an Org file
+
+       <file:///usr/share/doc/bash/bashref.pdf::#Redirections::allocate a file>
+
+   may be handled by an `org-file-apps' entry like
+
+       (\"\\\\.pdf\\\\(?:\\\\.[gx]z\\\\|\\\\.bz2\\\\)?::\\\\(#[^:]+\\\\)::\\\\(.+\\\\)\\\\\\='\"
+        . \"okular --find %2 %s%1\")
+
+Use backslash \"\\\" to quote percent \"%\" or any other character
+including backslash itself.
+
+In addition, each argument is passed through
+`shell-quote-argument', so quotes around substitutions should not
+be used.  For compliance with mailcap files shipped e.g. in
+Debian GNU/Linux, single or double quotes around substitutions
+are stripped. It deviates from mailcap specification that
+requires file name to be safe for shell and for the application."
+  (let ((spec (list (cons ?s  (convert-standard-filename file))))
+        (ngroups (min 9 (- (/ (length match-data) 2) 1))))
+    (when (> ngroups 0)
+      (set-match-data match-data)
+      (dolist (i (number-sequence 1 ngroups))
+        (push (cons (+ ?0 i) (match-string-no-properties i link)) spec)))
+    (replace-regexp-in-string
+     (rx (or (and "\\" (or (group anything) string-end))
+             (and (optional (group (any "'\"")))
+                  "%"
+                  (or (group anything) string-end)
+                  (optional (group (backref 2))))))
+     (lambda (fmt)
+       (let* ((backslash (match-string-no-properties 1 fmt))
+              (key (match-string 3 fmt))
+              (value (and key (alist-get (string-to-char key) spec))))
+         (cond
+          (backslash)
+          (value (let ((quot (match-string 2 fmt))
+                       (subst (shell-quote-argument value)))
+                   ;; Remove quotes around the file name - we use
+                   ;; `shell-quote-argument'.
+                   (if (match-string 4 fmt)
+                       subst
+                     (concat quot subst))))
+          (t (error "Invalid format `%s'" fmt)))))
+     mailcap-command nil 'literal)))
+
 ;;;###autoload
 (defun org-open-file (path &optional in-emacs line search)
   "Open the file at PATH.
@@ -8112,27 +8192,8 @@ If the file does not exist, throw an error."
 	       (not org-open-non-existing-files))
       (user-error "No such file: %s" file))
     (cond
-     ((and (stringp cmd) (not (string-match "^\\s-*$" cmd)))
-      ;; Remove quotes around the file name - we'll use shell-quote-argument.
-      (while (string-match "['\"]%s['\"]" cmd)
-	(setq cmd (replace-match "%s" t t cmd)))
-      (setq cmd (replace-regexp-in-string
-		 "%s"
-		 (shell-quote-argument (convert-standard-filename file))
-		 cmd
-		 nil t))
-
-      ;; Replace "%1", "%2" etc. in command with group matches from regex
-      (save-match-data
-	(let ((match-index 1)
-	      (number-of-groups (- (/ (length link-match-data) 2) 1)))
-	  (set-match-data link-match-data)
-	  (while (<= match-index number-of-groups)
-	    (let ((regex (concat "%" (number-to-string match-index)))
-		  (replace-with (match-string match-index dlink)))
-	      (while (string-match regex cmd)
-		(setq cmd (replace-match replace-with t t cmd))))
-	    (setq match-index (+ match-index 1)))))
+     ((org-string-nw-p cmd)
+      (setq cmd (org--open-file-format-command cmd file dlink link-match-data))
 
       (save-window-excursion
 	(message "Running %s...done" cmd)
diff --git a/testing/lisp/test-org.el b/testing/lisp/test-org.el
index b14cbeb26..004e89732 100644
--- a/testing/lisp/test-org.el
+++ b/testing/lisp/test-org.el
@@ -8421,6 +8421,166 @@ two
      (call-interactively #'org-paste-subtree)
      (buffer-string)))))
 
+(ert-deftest test-org/org--open-file-format-command ()
+  "Test `org--open-file-format-command' helper for `org-open-file'."
+  (let ((system-type 'gnu/linux)) ; Fix behavior of `shell-quote-argument'.
+    ;; No additional groups in `org-file-apps' key.
+    (let ((file "/file.pdf")
+          (pattern "\\.pdf\\'"))
+      (should
+       (equal "simple /file.pdf"
+              (and (string-match pattern file)
+                   (org--open-file-format-command
+                    "simple %s" file file (match-data)))))
+      (should
+       (equal "single-quotes /file.pdf"
+              (and (string-match pattern file)
+                   (org--open-file-format-command
+                    "single-quotes '%s'" file file (match-data)))))
+      (should
+       (equal "double-quotes /file.pdf"
+              (and (string-match pattern file)
+                   (org--open-file-format-command
+                    "double-quotes \"%s\"" file file (match-data)))))
+      (should
+       (equal "quotes 'mismatch \"/file.pdf'"
+              (and (string-match pattern file)
+                   (org--open-file-format-command
+                    "quotes 'mismatch \"%s'" file file (match-data)))))
+      (should
+       (equal "no subst"
+              (and (string-match pattern file)
+                   (org--open-file-format-command
+                    "no subst" file file (match-data)))))
+      (should
+       (equal "% literal percent 100% %s"
+              (and (string-match pattern file)
+                   (org--open-file-format-command
+                    "\\% literal percent 100\\% \\%s" file file (match-data)))))
+      (should
+       (equal "escape \"/file.pdf\" \\ more"
+              (and (string-match pattern file)
+                   (org--open-file-format-command
+                    ;; Second quote is not escaped.
+                    "escape \\\"%s\" \\\\ more" file file (match-data)))))
+      (should
+       (equal "/file.pdf file at start"
+              (and (string-match pattern file)
+                   (org--open-file-format-command
+                    "%s file at start" file file (match-data)))))
+      (should
+       (equal "backslash-newline\n/file.pdf"
+              (and (string-match pattern file)
+                   (org--open-file-format-command
+                    "backslash-newline\\\n%s" file file (match-data))))))
+    ;; Anchors within target file.
+    (let ((file "/page-search.pdf")
+          (link "/page-search.pdf::10::some words")
+          (pattern "\\.pdf::\\([0-9]+\\)::\\(.*\\)\\'"))
+      (should
+       (equal "zathura --page 10 --find some\\ words /page-search.pdf"
+              (and (string-match pattern link)
+                   (org--open-file-format-command
+                    "zathura --page '%1' --find %2 \"%s\"" file link (match-data)))))
+      ;; Unused %2.
+      (should
+       (equal "firefox file:///page-search.pdf\\#page=10"
+              (and (string-match pattern link)
+                   (org--open-file-format-command
+                    "firefox file://%s\\\\#page=%1" file link (match-data)))))
+      (should
+       (equal "adjucent-subst /page-search.pdfsome\\ words10some\\ words"
+              (and (string-match pattern link)
+                   (org--open-file-format-command
+                    "adjucent-subst %s%2'%1'%2" file link (match-data))))))
+    ;; No more than 9 substitutions are supported.
+    (let ((file "/many.pdf")
+          (link "/many.pdf::one:2:3:4:5:6:7:8:9:a:b:c")
+          (pattern (concat "\\.pdf:"
+                           (mapconcat (lambda (_) ":\\([^:]+\\)")
+                                      (number-sequence 1 12)
+                                      "")
+                           "\\'")))
+      (should
+       (equal "overflow /many.pdf::one:2:3:4:5:6:7:8:9:one0:one1:one2"
+              (and (string-match pattern link)
+                   (org--open-file-format-command
+                    "overflow %s::%1:%2:%3:%4:%5:%6:%7:%8:%9:%10:%11:%12"
+                    file link (match-data))))))
+    ;; Percent character in link fields does not cause any problem.
+    (let ((file "/file-%2.pdf")
+          (link "/file-%2.pdf::anchor-%3::search %1")
+          (pattern "\\.pdf::\\([^:]+\\)::\\(.+\\)\\'"))
+      (should
+       (equal "percents --find search\\ \\%1 file:///file-\\%2.pdf\\#anchor-\\%3"
+              (and (string-match pattern link)
+                   (org--open-file-format-command
+                    "percents --find %2 file://%s\\\\#%1"
+                    file link (match-data))))))
+    ;; Errors.
+    (let ((file "/error.pdf")
+          (pattern "\\.pdf\\'"))
+      (let* ((err (should-error
+                   (and (string-match pattern file)
+                        (org--open-file-format-command
+                         "trailing-percent %s %" file file (match-data)))
+                   :type 'error))
+             (err-text (cadr err)))
+        (should-not (unless (and (stringp err-text)
+                                 (string-match-p "\\`Invalid format .*%" err-text))
+                      err)))
+      (let* ((err (should-error
+                   (and (string-match pattern file)
+                        (org--open-file-format-command
+                         "trailing-backslash %s \\" file file (match-data)))
+                   :type 'error))
+             (err-text (cadr err)))
+        (should-not (unless (and (stringp err-text)
+                                 (string-match-p "\\`Invalid format .*\\\\" err-text))
+                      err)))
+      (let* ((err (should-error
+                   (and (string-match pattern file)
+                        (org--open-file-format-command
+                         "percent-newline %\n%s" file file (match-data)))
+                   :type 'error))
+             (err-text (cadr err)))
+        (should-not (unless (and (stringp err-text)
+                                 (string-match-p "\\`Invalid format .*%\n" err-text))
+                      err)))
+      ;; Mailcap escape for "%" is "\%", not "%%".
+      (let* ((err (should-error
+                   (and (string-match pattern file)
+                        (org--open-file-format-command
+                         "percent-percent %s%%" file file (match-data)))
+                   :type 'error))
+             (err-text (cadr err)))
+        (should-not (unless (and (stringp err-text)
+                                 (string-match-p "\\`Invalid format .*%%" err-text))
+                      err)))
+      ;; Mailcap allows "%t" for MIME type, but Org has no such information.
+      (let* ((err (should-error
+                   (and (string-match pattern file)
+                        (org--open-file-format-command
+                         "percent-t-unsupported --type '%t' %s" file file (match-data)))
+                   :type 'error))
+             (err-text (cadr err)))
+        (should-not (unless (and (stringp err-text)
+                                 (string-match-p "\\`Invalid format .*%t" err-text))
+                      err))))
+    ;; Optional regular expression groups have no point in `org-file-apps' patterns.
+    (let* ((file "/error.pdf")
+           (link "/error.pdf::1")
+           (pattern "\\.pdf::\\([^:]+\\)\\(?:::\\(.+\\)\\)?\\'")
+           (err (should-error
+                 (and (string-match pattern link)
+                      (org--open-file-format-command
+                       "no-such-match --search %2 %s" file link (match-data)))
+                 :type 'error))
+           (err-text (cadr err)))
+      (should-not (unless (and (stringp err-text)
+                               (string-match-p "\\`Invalid format.*%2" err-text))
+                    err)))))
+
 (provide 'test-org)
 
 ;;; test-org.el ends here
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] org.el: Fix percent substitutions in `org-open-file'
  2022-09-04 12:16                 ` [PATCH v2] " Max Nikulin
@ 2022-09-05  5:46                   ` Ihor Radchenko
  0 siblings, 0 replies; 13+ messages in thread
From: Ihor Radchenko @ 2022-09-05  5:46 UTC (permalink / raw)
  To: Max Nikulin; +Cc: emacs-orgmode

Max Nikulin <manikulin@gmail.com> writes:

> I have removed most parts of the example. In addition I fixed some 
> issues with references in the docstring (man mailcap is not recognized 
> as a link for some reason yet). I changed the function to consider 
> trailing backslash as a format error.

Thanks!
Applied onto main with minor amendments to punctuation in the docstrings.
https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=ac2d0a249e5419fb52bc1e953e70c847a31d40de

> I mean another problem. A user adds (require 'some-package) that defines 
> new link type. Later the user invokes easy customization interface to 
> adjust a link unrelated to some-package. At this step
> links from some-package are added to user's init file. Ideally it should 
> not happen and after removing of (require 'some-package) there should be 
> no link types from the package in the init file at all, including 
> customization section.
>
>  From my point of view entries added to `org-link-parameters', 
> `org-file-apps', etc. by packages should not affect entries managed 
> through customization.

Well. The situation with third-party packages overriding/changing
defcustoms is not uncommon in Emacs. For example, display-buffer-alist
is altered by shackle.

I think that the usual approach to mitigate this problem (when it is a
problem) is adding an auxiliary variable supplementing defcustom. Just
like `org-entities' and `org-entities-user'. Note that we actually do
have such auxiliary variables for `org-file-apps':
`org-file-apps-macos', `org-file-apps-gnu', and
`org-file-apps-windowsnt'. They are currently declared as constants, but
we may change it - there is no strong reason to disallow altering those
OS-specific variables.

-- 
Ihor Radchenko,
Org mode contributor,
Learn more about Org mode at https://orgmode.org/.
Support Org development at https://liberapay.com/org-mode,
or support my work at https://liberapay.com/yantar92


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-09-05  5:48 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-20 16:08 greedy substitution in org-open-file Maxim Nikulin
2021-02-12  7:16 ` Kyle Meyer
2021-02-12 16:46   ` Maxim Nikulin
2021-02-13  4:38     ` Kyle Meyer
2021-02-15 17:04       ` Maxim Nikulin
2021-03-03 12:47       ` Maxim Nikulin
2021-03-21 12:36       ` Maxim Nikulin
2022-08-27 17:20         ` [PATCH] org.el: Fix percent substitutions in `org-open-file' Max Nikulin
2022-09-02 12:08           ` Ihor Radchenko
2022-09-02 15:41             ` Max Nikulin
2022-09-03  8:26               ` Ihor Radchenko
2022-09-04 12:16                 ` [PATCH v2] " Max Nikulin
2022-09-05  5:46                   ` Ihor Radchenko

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).