emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Max Nikulin <manikulin@gmail.com>
To: emacs-orgmode@gnu.org
Subject: Re: Org mode links: Open a PDF file at a given page and highlight a given string
Date: Sat, 3 Sep 2022 20:00:47 +0700	[thread overview]
Message-ID: <tevj61$17d8$1@ciao.gmane.io> (raw)
In-Reply-To: <87lfb5pbej.fsf@gmail.com>

On 03/03/2021 03:07, Rodrigo Morales wrote:
> 
> #+begin_src emacs-lisp :results silent
> (setq org-file-apps
>        '(("\\.pdf::\\([0-9]+\\)::\\([^:]+\\)\\'" . "zathura -P %1 -f %2 %s")))
> #+end_src

I am going to respond to a message from another thread containing 
discussion of a patch, but I suppose the following considerations are 
more appropriate in the thread discussing combined specifier for 
location withing a PDF document. For me it is more convenient to test 
ideas using okular, I hope, changing code for zathura or other PDF 
viewer is trivial.

Ihor Radchenko. Re: [PATCH] org.el: Fix percent substitutions in 
`org-open-file' Fri, 02 Sep 2022 20:08:17 +0800. 
https://list.orgmode.org/87tu5qm11q.fsf@localhost
>> +    ;; Page and search string,
>> +    ;; e.g. <file:///usr/share/doc/bash/bashref.pdf::34::order of redirections>.
>> +    (\"\\\\.pdf::\\\\([0-9]+\\\\)::\\\\(.+\\\\)\\\\\\='\"
>> +        . \"okular --page %1 --find %2 %s\")
>> +    ;; Internal anchor and search string,
>> +    ;; e.g. <file:///usr/share/doc/bash/bashref.pdf::Redirections::allocate a file>.
>> +    (\"\\\\.pdf::\\\\(.+\\\\)::\\\\(.+\\\\)\\\\\\='\"
>> +        . \"okular --find %2 file://%s\\\\\\\\#%1\")
>> +    ;; Page number, e.g. <file:///usr/share/doc/bash/bashref.pdf::34>.
>> +    (\"\\\\.pdf::\\\\([0-9]+\\\\)\\\\\\='\" . \"okular --page %1 %s\")
>> +    ;; Internal reference, e.g. <file:///usr/share/doc/bash/bashref.pdf::Redirections>.
>> +    (\"\\\\.pdf::\\\\(.+\\\\)\\\\\\='\" . \"okular file://%s\\\\\\\\#%1\")
>> +    ;; No location within the file, optionally followed by \"::\",
>> +    ;; e.g. <file:///usr/share/doc/bash/bashref.pdf>.
>> +    (\"\\\\.pdf\\\\(?:::\\\\)?\\\\\\='\" . \"okular %s\")
> 
> This is a nice set of examples, but it probably does not belong to this
> docstring. I'd rather see this in `org-file-apps' docstring or even in
> the manual.

It is a part of docstring, so number of slashes is doubled.

First of all, I overlooked possibility to distinguish text search 
"file:text.pdf::patttern" and cross reference target within the document 
"file:text.pdf::#anchor". Secondly, I forgot that PDF viewers may 
support compressed files.

Currently I believe that instead of injecting up to 6 entries into 
`org-file-apps' for various combinations of page, anchor, and search 
pattern, it is better to add single record with function handler. Notice 
that the approach presented above is not affected by the bug with 
multiple regexp group. Its additional advantage is that shell is not 
involved, so peculiar file names can not cause execution of some code 
when quoting and escaping are messed up.

I think a set of functions for popular PDF viewers (evince, zathura, 
okular, xpdf, xpopple, firefox, chromium & Co., etc.) should be defined 
in some package, but I am in doubts if it suitable for Org core.

Proof of concept implementation.

Configuration:

(add-to-list
  'org-file-apps
  `(("\\.pdf\\(?:\\.gz\\|\\.bz2\\|\\.xz\\)?\\(?:::.*\\)?\\'"
     . ,#'my-open-file-pdf-okular)))

Helper functions:

(defun my--parse-file-link-search (suffix-re link)
   "Parse PDF file LINK for page number, cross reference anchor, search 
string.

Return nil if it is not a link to some PDF file. Any element may be nil."
   (let ((case-fold-search t)) ; Handle files having .PDF suffix as well
     (and (string-match
	  (concat suffix-re
		  (rx
		   (optional "::"
			     (or (group (+ digit))
				 (group "#" (+ (not (any ?:))))
				 (optional "#")))
		   (optional "::"
			     (optional (group (+ anything))))
		   string-end))
	  link)
	 (mapcar (lambda (i) (match-string i link)) '(1 2 3)))))

(defun my-launch-viewer (command arguments)
   "Launch external application COMMAND with ARGUMENTS.

The function allows to avoid intermediate shell and so escaping
of arguments that otherwise might be considered as shell
specialls and run arbitrary commands.  The function launches
viewer process using shoot ant forget method like `browse-url-xdg-open',
so the application may run even after quit from Emacs."
   (apply #'call-process command nil 0 nil args))

(defun my-open-file-pdf-okular (file link)
   "PDF files handler for usage as a command in `org-file-apps' alist.

Customize `org-file-apps' to add the following entry:

 
\\='(\"\\\\.pdf\\\\(?:\\\\.gz\\\\|\\\\.bz2\\\\|\\\\.xz\\\\)?\\\\(?:::.*\\\\)?\\\\\\='\"
       . #\\='my-open-file-pdf-okular)

Open FILE at the location specified by LINK (page, internal
reference, search string).  Supported link search options (side
note: in the particular case of bash manual
<info:bash#Redirections> link may be used instead):

- Page number <file:///usr/share/doc/bash/bashref.pdf::34>.
- Page number and search text
   <file:///usr/share/doc/bash/bashref.pdf::34::order of redirections>.
- Cross reference anchor
   <file:///usr/share/doc/bash/bashref.pdf::#Redirections>.
- Cross reference anchor and search text
   <file:///usr/share/doc/bash/bashref.pdf::#Redirections::allocate a file>.
- Search text <file:///usr/share/doc/bash/bashref.pdf::allocate a file>.

Optionally the FILE may be compressed by gzip, bzip2, or xz."
   (pcase-let* ((pdf-re (rx ".pdf"
			   ;; .Z and .zip are not supported by okular
			   (optional (or ".gz" ".bz2" ".xz"))))
	       (`(,page ,ref ,find)
		(or (my--parse-file-link-search pdf-re link)
		    (error "Not a PDF file link: %S" link)))
	       (args (list "--"
		    (if (org-string-nw-p ref)
			(concat file ref)
		      file))))
       ;; Protect against file names starting from a dash that might be
       ;; considered as an option despite `org-open-file' passes absolute
       ;; file name and it is not strictly necessary.
       (when find
	(push find args)
	(push "--find" args))
       (when page
	(push page args)
	(push "--page" args))
       (my-launch-viewer "okular" args)))


And some tests

(ert-deftest test-my/parse-file-link-search ()
   (let ((pdf-re (rx ".pdf"
		    ;; .Z and .zip are not supported by okular
		    (optional (or ".gz" ".bz2" ".xz")))))
     (should-not (my--parse-file-link-search pdf-re "/no-match.doc"))
     (should-not (my--parse-file-link-search pdf-re "/no-match.doc::#ref"))
     (should (equal
	     '(nil nil nil)
	     (my--parse-file-link-search pdf-re "/just-file.pdf")))
     (should (equal
	     '(nil nil nil)
	     (my--parse-file-link-search pdf-re "/just-file-upper-case.PDF")))
     (should (equal
	     '("21" nil nil)
	     (my--parse-file-link-search pdf-re "/page.pdf::21")))
     (should (equal
	     '(nil "#ref" nil)
	     (my--parse-file-link-search pdf-re "/anchor.pdf::#ref")))
     (should (equal
	     '(nil nil "some text")
	     (my--parse-file-link-search pdf-re "/search-string.pdf::some text")))
     (should (equal
	     '(nil nil "in gzipped file")
	     (my--parse-file-link-search
	      pdf-re
	      "/compressed-search-string.pdf.GZ::in gzipped file")))
     (should (equal
	     '("32" nil "page text")
	     (my--parse-file-link-search
	      pdf-re
	      "/page-search.pdf::32::page text")))
     (should (equal
	     '(nil "#ref" "anchor text")
	     (my--parse-file-link-search
	      pdf-re
	      "/anchor-search.pdf::#ref::anchor text")))
     (should (equal
	     '(nil nil "::")
	     (my--parse-file-link-search pdf-re "/search-quad.pdf::::::")))
     (should (equal
	     '(nil nil nil)
	     (my--parse-file-link-search pdf-re "/nothing-1.pdf::")))
     (should (equal
	     '(nil nil nil)
	     (my--parse-file-link-search pdf-re "/nothing-2.PDF::::")))
     (should (equal
	     '(nil nil nil)
	     (my--parse-file-link-search pdf-re "/empty-anchor-1.pdf::#")))
     (should (equal
	     '(nil nil nil)
	     (my--parse-file-link-search pdf-re "/empty-anchor-2.pdf::#::")))
     (should (equal
	     '(nil nil "empty anchor text")
	     (my--parse-file-link-search
	      pdf-re
	      "/empty-anchor-1.pdf::#::empty anchor text")))))




  parent reply	other threads:[~2022-09-03 13:03 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-02 20:07 Org mode links: Open a PDF file at a given page and highlight a given string Rodrigo Morales
2021-03-02 22:36 ` Kyle Meyer
2021-03-03 12:37   ` Maxim Nikulin
2021-09-27 16:39   ` Max Nikulin
2021-03-03  2:31 ` Juan Manuel Macías
2021-03-03 14:51   ` Maxim Nikulin
2021-03-03 16:11     ` Juan Manuel Macías
2021-03-05 13:02       ` Maxim Nikulin
2022-09-03 13:00 ` Max Nikulin [this message]
2022-09-20 11:54   ` Ihor Radchenko
2022-09-20 17:03     ` Max Nikulin
2022-09-21  8:17       ` Ihor Radchenko
2022-09-21  8:18       ` Ihor Radchenko
2023-01-25 11:46         ` AW
2023-01-25 11:55           ` Ihor Radchenko
2023-01-25 12:35           ` Max Nikulin
2023-01-25 14:33           ` Jean Louis
2022-10-04  6:45 ` Ihor Radchenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='tevj61$17d8$1@ciao.gmane.io' \
    --to=manikulin@gmail.com \
    --cc=emacs-orgmode@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).