emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* Bug: Plain https links with brackets are not recognised [9.4.4 (release_9.4.4-625-g763c7a @ /home/yantar92/.emacs.d/straight/build/org/)]
@ 2021-03-13  2:56 Ihor Radchenko
  2021-03-13  3:23 ` Kyle Meyer
  2021-03-24 12:51 ` Nicolas Goaziou
  0 siblings, 2 replies; 16+ messages in thread
From: Ihor Radchenko @ 2021-03-13  2:56 UTC (permalink / raw)
  To: emacs-orgmode


I have a https link like "https://doi.org/10.1016/0160-791x(79)90023-x".
If I put this link into an org file (using emacs -Q or Org mode master),
only "https://doi.org/10.1016/0160-791x(79)" part of the link is treated
as a link. I guess, it should not happen.

Best,
Ihor



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Bug: Plain https links with brackets are not recognised [9.4.4 (release_9.4.4-625-g763c7a @ /home/yantar92/.emacs.d/straight/build/org/)]
  2021-03-13  2:56 Bug: Plain https links with brackets are not recognised [9.4.4 (release_9.4.4-625-g763c7a @ /home/yantar92/.emacs.d/straight/build/org/)] Ihor Radchenko
@ 2021-03-13  3:23 ` Kyle Meyer
  2021-03-13  5:21   ` [PATCH] " Ihor Radchenko
  2021-03-24 12:51 ` Nicolas Goaziou
  1 sibling, 1 reply; 16+ messages in thread
From: Kyle Meyer @ 2021-03-13  3:23 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: emacs-orgmode

Ihor Radchenko writes:

> I have a https link like "https://doi.org/10.1016/0160-791x(79)90023-x".
> If I put this link into an org file (using emacs -Q or Org mode master),
> only "https://doi.org/10.1016/0160-791x(79)" part of the link is treated
> as a link. I guess, it should not happen.

This case was discussed at
<https://orgmode.org/list/5faf0bd7-b114-9723-773e-7f3da16604a0@grinta.net/T/#u>.
There's mention of an improved regexp that might be worth trying.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] Re: Bug: Plain https links with brackets are not recognised [9.4.4 (release_9.4.4-625-g763c7a @ /home/yantar92/.emacs.d/straight/build/org/)]
  2021-03-13  3:23 ` Kyle Meyer
@ 2021-03-13  5:21   ` Ihor Radchenko
  2021-03-13  5:24     ` Ihor Radchenko
  0 siblings, 1 reply; 16+ messages in thread
From: Ihor Radchenko @ 2021-03-13  5:21 UTC (permalink / raw)
  To: Kyle Meyer; +Cc: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 376 bytes --]

Kyle Meyer <kyle@kyleam.com> writes:

> This case was discussed at
> <https://orgmode.org/list/5faf0bd7-b114-9723-773e-7f3da16604a0@grinta.net/T/#u>.
> There's mention of an improved regexp that might be worth trying.

The improved regexp appears to work for me. I tried my best to transform
the regexp provided in the link to rx notation. The patch is attached.

Best,
Ihor


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Improve-org-link-plain-re.patch --]
[-- Type: text/x-diff, Size: 2368 bytes --]

From 79eaa3b5b71f2aba02d4f17b860f67a6a77255fc Mon Sep 17 00:00:00 2001
From: Ihor Radchenko <yantar92@gmail.com>
Date: Sat, 13 Mar 2021 13:16:06 +0800
Subject: [PATCH] Improve org-link-plain-re

* lisp/ol.el (org-link-make-regexps): Allow URLs with up to two levels
of nested brackets.  Now, URLs like [1] can be matched.  The new
regexp is taken from [2].

[1] https://doi.org/10.1016/0160-791x(79)90023-x
[2] https://daringfireball.net/2010/07/improved_regex_for_matching_urls
---
 lisp/ol.el | 30 +++++++++++++++++++++---------
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/lisp/ol.el b/lisp/ol.el
index 8b9755b51..0e166de38 100644
--- a/lisp/ol.el
+++ b/lisp/ol.el
@@ -829,15 +829,27 @@ This should be called after the variable `org-link-parameters' has changed."
 	  (format "<%s:\\([^>\n]*\\(?:\n[ \t]*[^> \t\n][^>\n]*\\)*\\)>"
 		  types-re)
 	  org-link-plain-re
-	  (concat
-	   "\\<" types-re ":"
-	   "\\([^][ \t\n()<>]+\\(?:([[:word:]0-9_]+)\\|\\([^[:punct:] \t\n]\\|/\\)\\)\\)")
-	  ;;	 "\\([^]\t\n\r<>() ]+[^]\t\n\r<>,.;() ]\\)")
-	  org-link-bracket-re
-	  (rx (seq "[["
-		   ;; URI part: match group 1.
-		   (group
-		    (one-or-more
+	  (let ((non-space-bracket "[^][ \t\n()<>]+"))
+            ;; Heiristics for an URL link.  Source:
+            ;; https://daringfireball.net/2010/07/improved_regex_for_matching_urls
+            (rx-to-string
+             `(seq (regexp "\\<")
+                   (regexp ,types-re)
+                   ":"
+                   (1+ (or (regex ,non-space-bracket)
+                           (seq "("
+                                (* (or (regex ,non-space-bracket)
+                                       (seq "("
+                                            (regex ,non-space-bracket)
+                                            ")")))
+                                ")")))
+                   (or (seq "("
+                            (* (or (regex ,non-space-bracket)
+                                   (seq "("
+                                        (regex ,non-space-bracket)
+                                        ")")))
+                            ")")
+                       (regexp "\\([^[:punct:] \t\n]\\|/\\)")))))
                      (or (not (any "[]\\"))
 			 (and "\\" (zero-or-more "\\\\") (any "[]"))
 			 (and (one-or-more "\\") (not (any "[]"))))))
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] Re: Bug: Plain https links with brackets are not recognised [9.4.4 (release_9.4.4-625-g763c7a @ /home/yantar92/.emacs.d/straight/build/org/)]
  2021-03-13  5:21   ` [PATCH] " Ihor Radchenko
@ 2021-03-13  5:24     ` Ihor Radchenko
  2021-03-15 11:54       ` Maxim Nikulin
  0 siblings, 1 reply; 16+ messages in thread
From: Ihor Radchenko @ 2021-03-13  5:24 UTC (permalink / raw)
  To: Kyle Meyer; +Cc: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 68 bytes --]

Sorry, attached wrong version of the patch.

Here is the right one


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Improve-org-link-plain-re.patch --]
[-- Type: text/x-diff, Size: 3297 bytes --]

From 0acb961d916d4bfde505c9f5eb16d1e8851b6c8f Mon Sep 17 00:00:00 2001
From: Ihor Radchenko <yantar92@gmail.com>
Date: Sat, 13 Mar 2021 13:16:06 +0800
Subject: [PATCH] Improve org-link-plain-re

* lisp/ol.el (org-link-make-regexps): Allow URLs with up to two levels
of nested brackets.  Now, URLs like [1] can be matched.  The new
regexp is taken from [2].

[1] https://doi.org/10.1016/0160-791x(79)90023-x
[2] https://daringfireball.net/2010/07/improved_regex_for_matching_urls
---
 lisp/ol.el | 55 +++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 36 insertions(+), 19 deletions(-)

diff --git a/lisp/ol.el b/lisp/ol.el
index 8b9755b51..7ce7a4798 100644
--- a/lisp/ol.el
+++ b/lisp/ol.el
@@ -829,26 +829,43 @@ This should be called after the variable `org-link-parameters' has changed."
 	  (format "<%s:\\([^>\n]*\\(?:\n[ \t]*[^> \t\n][^>\n]*\\)*\\)>"
 		  types-re)
 	  org-link-plain-re
-	  (concat
-	   "\\<" types-re ":"
-	   "\\([^][ \t\n()<>]+\\(?:([[:word:]0-9_]+)\\|\\([^[:punct:] \t\n]\\|/\\)\\)\\)")
-	  ;;	 "\\([^]\t\n\r<>() ]+[^]\t\n\r<>,.;() ]\\)")
-	  org-link-bracket-re
-	  (rx (seq "[["
-		   ;; URI part: match group 1.
-		   (group
-		    (one-or-more
+	  (let ((non-space-bracket "[^][ \t\n()<>]+"))
+            ;; Heiristics for an URL link.  Source:
+            ;; https://daringfireball.net/2010/07/improved_regex_for_matching_urls
+            (rx-to-string
+             `(seq (regexp "\\<")
+                   (regexp ,types-re)
+                   ":"
+                   (1+ (or (regex ,non-space-bracket)
+                           (seq "("
+                                (* (or (regex ,non-space-bracket)
+                                       (seq "("
+                                            (regex ,non-space-bracket)
+                                            ")")))
+                                ")")))
+                   (or (seq "("
+                            (* (or (regex ,non-space-bracket)
+                                   (seq "("
+                                        (regex ,non-space-bracket)
+                                        ")")))
+                            ")")
+                       (regexp "\\([^[:punct:] \t\n]\\|/\\)")))))
+          org-link-bracket-re
+          (rx (seq "[["
+	           ;; URI part: match group 1.
+	           (group
+	            (one-or-more
                      (or (not (any "[]\\"))
-			 (and "\\" (zero-or-more "\\\\") (any "[]"))
-			 (and (one-or-more "\\") (not (any "[]"))))))
-		   "]"
-		   ;; Description (optional): match group 2.
-		   (opt "[" (group (+? anything)) "]")
-		   "]"))
-	  org-link-any-re
-	  (concat "\\(" org-link-bracket-re "\\)\\|\\("
-		  org-link-angle-re "\\)\\|\\("
-		  org-link-plain-re "\\)"))))
+		         (and "\\" (zero-or-more "\\\\") (any "[]"))
+		         (and (one-or-more "\\") (not (any "[]"))))))
+	           "]"
+	           ;; Description (optional): match group 2.
+	           (opt "[" (group (+? anything)) "]")
+	           "]"))
+          org-link-any-re
+          (concat "\\(" org-link-bracket-re "\\)\\|\\("
+	          org-link-angle-re "\\)\\|\\("
+	          org-link-plain-re "\\)"))))
 
 (defun org-link-complete-file (&optional arg)
   "Create a file link using completion."
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] Re: Bug: Plain https links with brackets are not recognised [9.4.4 (release_9.4.4-625-g763c7a @ /home/yantar92/.emacs.d/straight/build/org/)]
  2021-03-13  5:24     ` Ihor Radchenko
@ 2021-03-15 11:54       ` Maxim Nikulin
  2021-03-16 12:35         ` Ihor Radchenko
  0 siblings, 1 reply; 16+ messages in thread
From: Maxim Nikulin @ 2021-03-15 11:54 UTC (permalink / raw)
  To: emacs-orgmode

On 13/03/2021 12:24, Ihor Radchenko wrote:
> Here is the right one

I think, some unit tests would be great to avoid future regressions. 
Since it is heuristics, some cases will be always broken, so tests could 
clarify intentions.

I have not tested the regexp on real cases, so you have full rights to 
be skeptical concerning my comments. I prefer to have explicit URLs in 
my files.

> +	  (let ((non-space-bracket "[^][ \t\n()<>]+"))

I think, "+" is redundant here, it does not allow empty inner 
parenthesis and leads to nested construction "(x+)+" that could be 
simplified to "x+"

> +            ;; Heiristics for an URL link.  Source:

Typo: Heuristics

> +            ;; https://daringfireball.net/2010/07/improved_regex_for_matching_urls
> +            (rx-to-string
> +             `(seq (regexp "\\<")

Isn't it an equivalent to "word-start"?

> +                   (regexp ,types-re)
> +                   ":"
> +                   (1+ (or (regex ,non-space-bracket)
> +                           (seq "("
> +                                (* (or (regex ,non-space-bracket)
> +                                       (seq "("
> +                                            (regex ,non-space-bracket)

Maybe "0+" to allow "(())"?

> +                                            ")")))
> +                                ")")))
> +                   (or (seq "("
> +                            (* (or (regex ,non-space-bracket)
> +                                   (seq "("
> +                                        (regex ,non-space-bracket)
> +                                        ")")))
> +                            ")")
> +                       (regexp "\\([^[:punct:] \t\n]\\|/\\)")))))

Is the group useful for any purpose? The construction is already inside 
an "or".

I have tried to compare current, your, and a little modified your 
regexps on several synthetic examples:

#+begin_src elisp
   (let*
       ((types-re (regexp-opt (org-link-types) t))
        (org-link-plain-re-ir
	(let ((non-space-bracket "[^][ \t\n()<>]+"))
	  ;; Heiristics for an URL link.  Source:
	  ;; https://daringfireball.net/2010/07/improved_regex_for_matching_urls
	  (rx-to-string
	   `(seq (regexp "\\<")
		 (regexp ,types-re)
		 ":"
		 (1+ (or (regex ,non-space-bracket)
			 (seq "("
			      (* (or (regex ,non-space-bracket)
				     (seq "("
					  (regex ,non-space-bracket)
					  ")")))
			      ")")))
		 (or (seq "("
			  (* (or (regex ,non-space-bracket)
				 (seq "("
				      (regex ,non-space-bracket)
				      ")")))
			  ")")
		     (regexp "\\([^[:punct:] \t\n]\\|/\\)"))))))
        (org-link-plain-re-nm
	(let* ((non-space-bracket "[^][ \t\n()<>]")
	       (parenthesis
		`(seq "("
		      (0+ (or (regex ,non-space-bracket)
			      (seq "("
				   (0+ (regex ,non-space-bracket))
				   ")")))
		      ")")))
	  ;; Heuristics for an URL link inspired by
	  ;; https://daringfireball.net/2010/07/improved_regex_for_matching_urls
	  (rx-to-string
	   `(seq word-start
		 (regexp ,types-re)
		 ":"
		 (1+ (or (regex ,non-space-bracket)
			 ,parenthesis))
		 (or (regexp "[^[:punct:] \t\n]")
                      ?/
		     ,parenthesis))))))
        (cons
	(list "src" "orig" "ir" "nm")
	(mapcar (lambda (s)
		  (cons
		   s
		   (mapcar (lambda (r)
			     (if (string-match r s) (match-string 0 s) ""))
			   (list org-link-plain-re org-link-plain-re-ir org-link-plain-re-nm))))
		(list
		 "The file:a link"
		 "The file:aa link"
		 "The file:a(b)c link"
		 "The file:a() link"
		 "The file:aa((a)) link"
		 "The file:aa(()) link"
		 "The file:///a link"
		 "The file:///a/, link"
		 "The http:// link"
		 "The (some file:ab) link"
		 "The file:aa) link"
		 "The file:aa( link"
))))
#+end_src

#+RESULTS:
| src                     | orig      | ir           | nm           |
| The file:a link         |           |              |              |
| The file:aa link        | file:aa   | file:aa      | file:aa      |
| The file:a(b)c link     | file:a(b) | file:a(b)c   | file:a(b)c   |
| The file:a() link       |           | file:a()     | file:a()     |
| The file:aa((a)) link   | file:aa   | file:aa((a)) | file:aa((a)) |
| The file:aa(()) link    | file:aa   | file:aa      | file:aa(())  |
| The file:/a link        | file:/a   | file:/a      | file:/a      |
| The file:/a/, link      | file:/a/  | file:/a/     | file:/a/     |
| The http:// link        | http://   | http://      | http://      |
| The (some file:ab) link | file:ab   | file:ab      | file:ab      |
| The file:aa) link       | file:aa   | file:aa      | file:aa      |
| The file:aa( link       | file:aa   | file:aa      | file:aa      |



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Re: Bug: Plain https links with brackets are not recognised [9.4.4 (release_9.4.4-625-g763c7a @ /home/yantar92/.emacs.d/straight/build/org/)]
  2021-03-15 11:54       ` Maxim Nikulin
@ 2021-03-16 12:35         ` Ihor Radchenko
  2021-03-17 14:59           ` Maxim Nikulin
  2021-03-19 12:10           ` Maxim Nikulin
  0 siblings, 2 replies; 16+ messages in thread
From: Ihor Radchenko @ 2021-03-16 12:35 UTC (permalink / raw)
  To: Maxim Nikulin, emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 690 bytes --]

Maxim Nikulin <manikulin@gmail.com> writes:

> I have tried to compare current, your, and a little modified your 
> regexps on several synthetic examples:

Thanks! Your version is indeed cleaner. I updated the patch. It uses
your version now with one group specification that I missed earlier. I
also added tests from your email, the post introducing regexp, and few
more examples I came up with.

I am testing the new regexp for a few days now. Because the regexp is
quite complex and because font-lock apparently fontifies even invisible
(folded) text, loading time on large org files with many links became
noticeably longer. Though it was 7.2Mb file with ~13k links in it.

Best,
Ihor


[-- Attachment #2: 0001-Improve-org-link-plain-re.patch --]
[-- Type: text/x-diff, Size: 9281 bytes --]

From 6eb2208a67745e3150024e7b72509115b97fcfa3 Mon Sep 17 00:00:00 2001
From: Ihor Radchenko <yantar92@gmail.com>
Date: Tue, 16 Mar 2021 20:20:32 +0800
Subject: [PATCH] Improve org-link-plain-re

(org-link-plain-re): Update docstring.  Now, the docstring explicitly
mentions that the regexp must contain groups for the link type and the
path.

* lisp/ol.el (org-link-make-regexps): Allow URLs with up to two levels
of nested brackets.  Now, URLs like [1] can be matched.  The new
regexp is based on [2].

[1] https://doi.org/10.1016/0160-791x(79)90023-x
[2] https://daringfireball.net/2010/07/improved_regex_for_matching_urls

* testing/lisp/test-ol.el: Add tests for plain links.
---
 lisp/ol.el              |  61 +++++++++++++------
 testing/lisp/test-ol.el | 132 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 173 insertions(+), 20 deletions(-)

diff --git a/lisp/ol.el b/lisp/ol.el
index b8bd7d234..d3a07a3ed 100644
--- a/lisp/ol.el
+++ b/lisp/ol.el
@@ -519,7 +519,10 @@ links more efficient."
   "Matches link with angular brackets, spaces are allowed.")
 
 (defvar org-link-plain-re nil
-  "Matches plain link, without spaces.")
+  "Matches plain link, without spaces.
+Group 1 must contain the link type (i.e. https).
+Group 2 must contain the link path (i.e. //example.com).
+Used by `org-element-link-parser'.")
 
 (defvar org-link-bracket-re nil
   "Matches a link in double brackets.")
@@ -807,26 +810,44 @@ This should be called after the variable `org-link-parameters' has changed."
 	  (format "<%s:\\([^>\n]*\\(?:\n[ \t]*[^> \t\n][^>\n]*\\)*\\)>"
 		  types-re)
 	  org-link-plain-re
-	  (concat
-	   "\\<" types-re ":"
-	   "\\([^][ \t\n()<>]+\\(?:([[:word:]0-9_]+)\\|\\([^[:punct:] \t\n]\\|/\\)\\)\\)")
-	  ;;	 "\\([^]\t\n\r<>() ]+[^]\t\n\r<>,.;() ]\\)")
-	  org-link-bracket-re
-	  (rx (seq "[["
-		   ;; URI part: match group 1.
-		   (group
-		    (one-or-more
+          (let* ((non-space-bracket "[^][ \t\n()<>]")
+	         (parenthesis
+		  `(seq "("
+		        (0+ (or (regex ,non-space-bracket)
+			        (seq "("
+				     (0+ (regex ,non-space-bracket))
+				     ")")))
+		        ")")))
+	    ;; Heuristics for an URL link inspired by
+	    ;; https://daringfireball.net/2010/07/improved_regex_for_matching_urls
+	    (rx-to-string
+	     `(seq word-start
+                   ;; Link type: match group 1.
+		   (regexp ,types-re)
+		   ":"
+                   ;; Link path: match group 2.
+                   (group
+		    (1+ (or (regex ,non-space-bracket)
+			    ,parenthesis))
+		    (or (regexp "[^[:punct:] \t\n]")
+		        ?/
+		        ,parenthesis)))))
+          org-link-bracket-re
+          (rx (seq "[["
+	           ;; URI part: match group 1.
+	           (group
+	            (one-or-more
                      (or (not (any "[]\\"))
-			 (and "\\" (zero-or-more "\\\\") (any "[]"))
-			 (and (one-or-more "\\") (not (any "[]"))))))
-		   "]"
-		   ;; Description (optional): match group 2.
-		   (opt "[" (group (+? anything)) "]")
-		   "]"))
-	  org-link-any-re
-	  (concat "\\(" org-link-bracket-re "\\)\\|\\("
-		  org-link-angle-re "\\)\\|\\("
-		  org-link-plain-re "\\)"))))
+		         (and "\\" (zero-or-more "\\\\") (any "[]"))
+		         (and (one-or-more "\\") (not (any "[]"))))))
+	           "]"
+	           ;; Description (optional): match group 2.
+	           (opt "[" (group (+? anything)) "]")
+	           "]"))
+          org-link-any-re
+          (concat "\\(" org-link-bracket-re "\\)\\|\\("
+	          org-link-angle-re "\\)\\|\\("
+	          org-link-plain-re "\\)"))))
 
 (defun org-link-complete-file (&optional arg)
   "Create a file link using completion."
diff --git a/testing/lisp/test-ol.el b/testing/lisp/test-ol.el
index 5b7dc513b..e6208cd38 100644
--- a/testing/lisp/test-ol.el
+++ b/testing/lisp/test-ol.el
@@ -491,5 +491,137 @@
 	      (org-previous-link))
 	    (buffer-substring (point) (line-end-position))))))
 
+\f
+;;; Link regexps
+
+(ert-deftest test-ol/plain-link-re ()
+  "Test `org-link-plain-re'."
+  (should
+   (equal
+    '("https" "//example.com/qwe()")
+    (org-test-with-temp-text
+        "(Some text in parenthesis followed by link with brackets <point>https://example.com/qwe())"
+      (list (org-element-property :type (org-element-link-parser))
+            (org-element-property :path (org-element-link-parser))))))
+  (should
+   (equal
+    '("https" "//doi.org/10.1016/0160-791x(79)90023-x")
+    (org-test-with-temp-text
+        "<point>https://doi.org/10.1016/0160-791x(79)90023-x"
+      (list (org-element-property :type (org-element-link-parser))
+            (org-element-property :path (org-element-link-parser))))))
+  (should
+   (equal
+    '("file" "aa")
+    (org-test-with-temp-text
+        "The <point>file:aa link"
+      (list (org-element-property :type (org-element-link-parser))
+            (org-element-property :path (org-element-link-parser))))))
+  (should
+   (equal
+    '("file" "a(b)c")
+    (org-test-with-temp-text
+        "The <point>file:a(b)c link"
+      (list (org-element-property :type (org-element-link-parser))
+            (org-element-property :path (org-element-link-parser))))))
+  (should
+   (equal
+    '("file" "a()")
+    (org-test-with-temp-text
+        "The <point>file:a() link"
+      (list (org-element-property :type (org-element-link-parser))
+            (org-element-property :path (org-element-link-parser))))))
+  (should
+   (equal
+    '("file" "aa((a))")
+    (org-test-with-temp-text
+        "The <point>file:aa((a)) link"
+      (list (org-element-property :type (org-element-link-parser))
+            (org-element-property :path (org-element-link-parser))))))
+  (should
+   (equal
+    '("file" "aa(())")
+    (org-test-with-temp-text
+        "The <point>file:aa(()) link"
+      (list (org-element-property :type (org-element-link-parser))
+            (org-element-property :path (org-element-link-parser))))))
+  (should
+   (equal
+    '("file" "/a")
+    (org-test-with-temp-text
+        "The <point>file:/a link"
+      (list (org-element-property :type (org-element-link-parser))
+            (org-element-property :path (org-element-link-parser))))))
+  (should
+   (equal
+    '("file" "/a/")
+    (org-test-with-temp-text
+        "The <point>file:/a/ link"
+      (list (org-element-property :type (org-element-link-parser))
+            (org-element-property :path (org-element-link-parser))))))
+  (should
+   (equal
+    '("http" "//")
+    (org-test-with-temp-text
+        "The <point>http:// link"
+      (list (org-element-property :type (org-element-link-parser))
+            (org-element-property :path (org-element-link-parser))))))
+  (should
+   (equal
+    '("file" "ab")
+    (org-test-with-temp-text
+        "The (some <point>file:ab) link"
+      (list (org-element-property :type (org-element-link-parser))
+            (org-element-property :path (org-element-link-parser))))))
+  (should
+   (equal
+    '("file" "aa")
+    (org-test-with-temp-text
+        "The <point>file:aa) link"
+      (list (org-element-property :type (org-element-link-parser))
+            (org-element-property :path (org-element-link-parser))))))
+  (should
+   (equal
+    '("file" "aa")
+    (org-test-with-temp-text
+        "The <point>file:aa( link"
+      (list (org-element-property :type (org-element-link-parser))
+            (org-element-property :path (org-element-link-parser))))))
+  (should
+   (equal
+    '("http" "//foo.com/more_(than)_one_(parens)")
+    (org-test-with-temp-text
+        "The <point>http://foo.com/more_(than)_one_(parens) link"
+      (list (org-element-property :type (org-element-link-parser))
+            (org-element-property :path (org-element-link-parser))))))
+  (should
+   (equal
+    '("http" "//foo.com/blah_(wikipedia)#cite-1")
+    (org-test-with-temp-text
+        "The <point>http://foo.com/blah_(wikipedia)#cite-1 link"
+      (list (org-element-property :type (org-element-link-parser))
+            (org-element-property :path (org-element-link-parser))))))
+  (should
+   (equal
+    '("http" "//foo.com/blah_(wikipedia)_blah#cite-1")
+    (org-test-with-temp-text
+        "The <point>http://foo.com/blah_(wikipedia)_blah#cite-1 link"
+      (list (org-element-property :type (org-element-link-parser))
+            (org-element-property :path (org-element-link-parser))))))
+  (should
+   (equal
+    '("http" "//foo.com/unicode_(✪)_in_parens")
+    (org-test-with-temp-text
+        "The <point>http://foo.com/unicode_(✪)_in_parens link"
+      (list (org-element-property :type (org-element-link-parser))
+            (org-element-property :path (org-element-link-parser))))))
+  (should
+   (equal
+    '("http" "//foo.com/(something)?after=parens")
+    (org-test-with-temp-text
+        "The <point>http://foo.com/(something)?after=parens link"
+      (list (org-element-property :type (org-element-link-parser))
+            (org-element-property :path (org-element-link-parser)))))))
+
 (provide 'test-ol)
 ;;; test-ol.el ends here
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] Re: Bug: Plain https links with brackets are not recognised [9.4.4 (release_9.4.4-625-g763c7a @ /home/yantar92/.emacs.d/straight/build/org/)]
  2021-03-16 12:35         ` Ihor Radchenko
@ 2021-03-17 14:59           ` Maxim Nikulin
  2021-03-19 15:07             ` Ihor Radchenko
  2021-03-19 12:10           ` Maxim Nikulin
  1 sibling, 1 reply; 16+ messages in thread
From: Maxim Nikulin @ 2021-03-17 14:59 UTC (permalink / raw)
  To: emacs-orgmode

On 16/03/2021 19:35, Ihor Radchenko wrote:
> 
> I am testing the new regexp for a few days now. Because the regexp is
> quite complex and because font-lock apparently fontifies even invisible
> (folded) text, loading time on large org files with many links became
> noticeably longer. Though it was 7.2Mb file with ~13k links in it.

I could not guess how to benchmark font-lock. I have tried to open file 
(to get everything loaded), kill the buffer,

emacsclient --eval '(progn (setq my-start (float-time)) (find-file 
"file.org") (- (float-time) my-start))'

but I see some changes in the buffer after 0.19 is reported (both with 
and without the patch). However I have not converted bracketed links 
into plain ones yet. I was going to try if some tricks could improve 
performance. E.g. I am curious if it will work noticeably faster when no 
nested parenthesis are allowed, but single ones may be at any position, 
not necessary at the end.

Are changes in white spaces below actually modified lines in your patch 
intended?



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Re: Bug: Plain https links with brackets are not recognised [9.4.4 (release_9.4.4-625-g763c7a @ /home/yantar92/.emacs.d/straight/build/org/)]
  2021-03-16 12:35         ` Ihor Radchenko
  2021-03-17 14:59           ` Maxim Nikulin
@ 2021-03-19 12:10           ` Maxim Nikulin
  2021-03-19 15:40             ` Ihor Radchenko
  1 sibling, 1 reply; 16+ messages in thread
From: Maxim Nikulin @ 2021-03-19 12:10 UTC (permalink / raw)
  To: emacs-orgmode

On 16/03/2021 19:35, Ihor Radchenko wrote:
> +;;; Link regexps
> +
> +(ert-deftest test-ol/plain-link-re ()
> +  "Test `org-link-plain-re'."
> +  (should
> +   (equal
> +    '("https" "//example.com/qwe()")
> +    (org-test-with-temp-text
> +        "(Some text in parenthesis followed by link with brackets <point>https://example.com/qwe())"
> +      (list (org-element-property :type (org-element-link-parser))
> +            (org-element-property :path (org-element-link-parser))))))
> +  (should
> +   (equal
> +    '("https" "//doi.org/10.1016/0160-791x(79)90023-x")
> +    (org-test-with-temp-text
> +        "<point>https://doi.org/10.1016/0160-791x(79)90023-x"
> +      (list (org-element-property :type (org-element-link-parser))
> +            (org-element-property :path (org-element-link-parser))))))

To be clear, I do not ask for any changes. It is great to have some 
tests even in the current form. I just have never tried ert before, so I 
have some questions.

Am I right that just one failure will be reported if a change in the 
regexp causes problems with several test inputs? I am afraid, it is 
rather inconvenient since this tests are rather for quality of 
heuristics than exact requirements. To find proper balance of accuracy 
and speed/regexp complexity, it is better to have all failures at once. 
I am looking up for a feature like EXPECT_EQ (delayed failure) in 
addition to strict ASSERT_EQ in googletest or at least like 
TestCase.subTest in puthon unittest. For this particular case even 
parametrized tests are enough (googletest, pytest).

I have tried implement something similar to illustrate my expectations. 
I think, for experienced lisp programmers the code looks ugly

(defmacro test-ol/deftest-parametrized
     (prefix func &rest cases)
   (declare (indent 2))
   (cons
    #'progn
    (mapcar
     (lambda (case)
       (pcase-let ((`(,id ,docstring ,expect ,arguments) case))
	(let ((test (intern (concat prefix "--" id)))
	      (exp (if (and expect (listp expect)) `'(,@expect) expect))
	      (args (if (listp arguments) arguments (list arguments))))
	  `(ert-deftest ,test ()
	     ,docstring
	     (should (equal ,exp ,(cons func args)))))))
     cases)))

(defun test-ol/match-link-plain (text)
   (and (string-match org-link-plain-re text)
        (mapcar (lambda (i) (match-string-no-properties i text))
	       ;; Currently there is a useless additional group
	       ;; and I think it should cause failure.
	    ;; (number-sequence 1 (- (/ (length (match-data)) 2) 1) 1)
	       '(1 2))))

(test-ol/deftest-parametrized "test-ol/org-link-plain-re"
			      test-ol/match-link-plain
   ("sheme-only" "No match without host or path"
    nil "Secure https: connection")
   ("scheme-slashes" "Questionable case with no host or path"
    ("file" "///") "Use file:/// for local files")
   ("simple-link" "Simple link"
    ("https" "//orgmode.org/") "Link https://orgmode.org/ to Org site")
   ("false-match" "Failure example: unexpected match"
    nil "Valid file:///bin/bash link")
   ("failed-match" "Failure example: host-path is too short to match"
    ("https" "a") "file:a")
   ("a-typo" "Failure example: slashes missed in expectation"
    ("http" "www.gnu.org") "http://www.gnu.org/"))

(ert t)

In my opinion, test report is acceptable

FFF...

F test-ol/org-link-plain-re--a-typo
     Failure example: slashes missed in expectation
     (ert-test-failed
      ((should
        (equal
	'("http" "www.gnu.org")
	(test-ol/match-link-plain "http://www.gnu.org/")))
       :form
       (equal
        ("http" "www.gnu.org")
        ("http" "//www.gnu.org/"))
       :value nil :explanation
       (list-elt 1
		(arrays-of-different-length 11 14 "www.gnu.org" "//www.gnu.org/" 
first-mismatch-at 0))))

F test-ol/org-link-plain-re--failed-match
     Failure example: host-path is too short to match
     (ert-test-failed
      ((should
        (equal
	'("https" "a")
	(test-ol/match-link-plain "file:a")))
       :form
       (equal
        ("https" "a")
        nil)
       :value nil :explanation
       (different-types
        ("https" "a")
        nil)))

F test-ol/org-link-plain-re--false-match
     Failure example: unexpected match
     (ert-test-failed
      ((should
        (equal nil
	      (test-ol/match-link-plain "Valid file:///bin/bash link")))
       :form
       (equal nil
	     ("file" "///bin/bash"))
       :value nil :explanation
       (different-types nil
		       ("file" "///bin/bash"))))

I do not know if it is possible to implement "might" (in addition to 
"should") that using restart or some other technique will prevent 
immediate abandoning of the test but will mark whole test as failed at 
the end.

Actually I hope to get response that I am trying to reinvent a wheel and 
org or emacs has an established way to write tests feeding the same 
function with list of cases and to get all failures in a reasonably 
readable report.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Re: Bug: Plain https links with brackets are not recognised [9.4.4 (release_9.4.4-625-g763c7a @ /home/yantar92/.emacs.d/straight/build/org/)]
  2021-03-17 14:59           ` Maxim Nikulin
@ 2021-03-19 15:07             ` Ihor Radchenko
  2021-03-24 12:31               ` Maxim Nikulin
  0 siblings, 1 reply; 16+ messages in thread
From: Ihor Radchenko @ 2021-03-19 15:07 UTC (permalink / raw)
  To: Maxim Nikulin, emacs-orgmode

Maxim Nikulin <manikulin@gmail.com> writes:

> I could not guess how to benchmark font-lock. I have tried to open file 
> (to get everything loaded), kill the buffer,

I usually just use profiler-start/open buffer/profiler-report. However,
there is also https://github.com/Lindydancer/font-lock-profiler for more
fine-grained benchmark. Also, you may instrument org-activate-links with
elp.el (built-in).

> but I see some changes in the buffer after 0.19 is reported (both with 
> and without the patch). However I have not converted bracketed links 
> into plain ones yet. I was going to try if some tricks could improve 
> performance. E.g. I am curious if it will work noticeably faster when no 
> nested parenthesis are allowed, but single ones may be at any position, 
> not necessary at the end.

I am currently looking into somewhat orthogonal approach. Instead of
tweaking individual regexps (there were similar issues with priority
regexp in the past), I am trying to use custom
font-lock-fontify-region-function. Once we avoid fontifying folded text,
startup time drops several times at least. I can see the difference
immediately. It comes at the cost though - the behaviour of some Org API
functions changes. They will not always return fontified text. AFAIK, at
least helm-org and helm-org-ql do reply on such fontification.

> Are changes in white spaces below actually modified lines in your patch 
> intended?

They are generated by aggressive-indent. Since org files mix indentation
styles I keep getting those whitespace changes in my patches. Forgot to
remove this time. Should I?

Best,
Ihor



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Re: Bug: Plain https links with brackets are not recognised [9.4.4 (release_9.4.4-625-g763c7a @ /home/yantar92/.emacs.d/straight/build/org/)]
  2021-03-19 12:10           ` Maxim Nikulin
@ 2021-03-19 15:40             ` Ihor Radchenko
  0 siblings, 0 replies; 16+ messages in thread
From: Ihor Radchenko @ 2021-03-19 15:40 UTC (permalink / raw)
  To: Maxim Nikulin, emacs-orgmode

Maxim Nikulin <manikulin@gmail.com> writes:

> To be clear, I do not ask for any changes. It is great to have some 
> tests even in the current form. I just have never tried ert before, so I 
> have some questions.

Note that I have no experience with tests since I never did programming
professionally. I wrote the tests in the patch simply by looking at
other examples in the file.

> Am I right that just one failure will be reported if a change in the 
> regexp causes problems with several test inputs? I am afraid, it is 
> rather inconvenient since this tests are rather for quality of 
> heuristics than exact requirements. To find proper balance of accuracy 
> and speed/regexp complexity, it is better to have all failures at once. 
> I am looking up for a feature like EXPECT_EQ (delayed failure) in 
> addition to strict ASSERT_EQ in googletest or at least like 
> TestCase.subTest in puthon unittest. For this particular case even 
> parametrized tests are enough (googletest, pytest).

I cannot find anything like delayed failure in the info node for ERT and
in the source code. There are should, should-not, should-error, and
skip-unless asserts.

> I have tried implement something similar to illustrate my expectations. 
> I think, for experienced lisp programmers the code looks ugly

Using a macro is certainly more efficient than my copy-paste in the
patch :) Though you may probably use more backquoting (see Backquote in
Elisp manual) when writing macros. That would look more readable.

> I do not know if it is possible to implement "might" (in addition to 
> "should") that using restart or some other technique will prevent 
> immediate abandoning of the test but will mark whole test as failed at 
> the end.

It is indeed possible and maybe even welcome in emacs-devel.

> Actually I hope to get response that I am trying to reinvent a wheel and 
> org or emacs has an established way to write tests feeding the same 
> function with list of cases

Well... Example from ERT info page:

     (ert-deftest ert-test-mismatch ()
       (should (eql (cl-mismatch "" "") nil))
       (should (eql (cl-mismatch "" "a") 0))
       (should (eql (cl-mismatch "a" "a") nil))
       (should (eql (cl-mismatch "ab" "a") 1))
       (should (eql (cl-mismatch "Aa" "aA") 0))
       (should (eql (cl-mismatch '(a b c) '(a b d)) 2)))

> ... and to get all failures in a reasonably readable report.

When running ert interactively, things should get a little more
manageable. See Running Tests Interactively node of ERT manual.

Hope it helps.

Best,
Ihor



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Re: Bug: Plain https links with brackets are not recognised [9.4.4 (release_9.4.4-625-g763c7a @ /home/yantar92/.emacs.d/straight/build/org/)]
  2021-03-19 15:07             ` Ihor Radchenko
@ 2021-03-24 12:31               ` Maxim Nikulin
  2021-03-24 14:11                 ` Ihor Radchenko
  0 siblings, 1 reply; 16+ messages in thread
From: Maxim Nikulin @ 2021-03-24 12:31 UTC (permalink / raw)
  To: emacs-orgmode

On 19/03/2021 22:07, Ihor Radchenko wrote:
> Maxim Nikulin writes:
> 
>> I could not guess how to benchmark font-lock. I have tried to open file
>> (to get everything loaded), kill the buffer,
> 
> I usually just use profiler-start/open buffer/profiler-report. However,
> there is also https://github.com/Lindydancer/font-lock-profiler for more
> fine-grained benchmark. Also, you may instrument org-activate-links with
> elp.el (built-in).

I suspect I may not notice performance penalty due to some specific 
usage pattern (e.g. I rarely use links in heading titles) or due to 
absence of some customization. I have converted bracketed links to plain 
ones, so I have more than 4000 links in the test file. I expect that 
regexp affects loading of a file. As earlier, org-activate-links entry 
in profiler-report have less than 1%. I have tried elp. My measurements 
are not accurate due to I did not fix CPU regime to performance. I have 
seen times varied quite widely (several times) but often the numbers are 
the same with and without your patch. I am puzzled a bit by number of calls.

(progn
   (require 'elp)
   (setq elp-function-list (list #'org-activate-links))
   (elp-instrument-list nil)
   (dolist (i (number-sequence 1 10))
     (message "iter %d" i)
     (find-file "plain.org")
     (sit-for 3)
     (kill-buffer "plain.org")
     (sit-for 1))
   (elp-results))

Example for #+STARTUP: overview:

org-activate-links  560         0.028971085   5.173...e-05

For content number of calls is 410, without special settings (all) 120,
let me remind that it is for 10 find-file invocations. Another example

org-activate-links  410         0.1384633219  0.0003377154

I see such variations in both cases with and without the patch, but 
these numbers are negligible in my opinion.

I decided to stop experiments since I could not reproduce decrease in 
performance. Thank you for the font-lock-profiler link however.

So I have no reason to be against more complicated regexp.

>> Are changes in white spaces below actually modified lines in your patch
>> intended?
> 
> They are generated by aggressive-indent. Since org files mix indentation
> styles I keep getting those whitespace changes in my patches. Forgot to
> remove this time. Should I?

In my opinion, combining changes related to white spaces and meaningful 
modifications makes commits less clear, especially when reading email. 
However the following recommendation has certainly more weight:

https://orgmode.org/list/87zh2hosex.fsf@bzg.fr/ From: Bastien
> Also, the convention in Emacs is to avoid whitespaces-only commits,
> you need to fix whitespaces within other non-whitespaces changes in
> a commit.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Bug: Plain https links with brackets are not recognised [9.4.4 (release_9.4.4-625-g763c7a @ /home/yantar92/.emacs.d/straight/build/org/)]
  2021-03-13  2:56 Bug: Plain https links with brackets are not recognised [9.4.4 (release_9.4.4-625-g763c7a @ /home/yantar92/.emacs.d/straight/build/org/)] Ihor Radchenko
  2021-03-13  3:23 ` Kyle Meyer
@ 2021-03-24 12:51 ` Nicolas Goaziou
  2021-03-24 13:10   ` Ihor Radchenko
  1 sibling, 1 reply; 16+ messages in thread
From: Nicolas Goaziou @ 2021-03-24 12:51 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: emacs-orgmode

Hello,

Ihor Radchenko <yantar92@gmail.com> writes:

> I have a https link like "https://doi.org/10.1016/0160-791x(79)90023-x".
> If I put this link into an org file (using emacs -Q or Org mode master),
> only "https://doi.org/10.1016/0160-791x(79)" part of the link is treated
> as a link. I guess, it should not happen.

Actually, this is (was?) intentional. By forbidding parenthesis in
a plain URL, you allow one to type, e.g., (https://orgmode.org), which
is, IMO, a more frequent need than having to deal with parenthesis in
the URL.

Regards,
-- 
Nicolas Goaziou


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Bug: Plain https links with brackets are not recognised [9.4.4 (release_9.4.4-625-g763c7a @ /home/yantar92/.emacs.d/straight/build/org/)]
  2021-03-24 12:51 ` Nicolas Goaziou
@ 2021-03-24 13:10   ` Ihor Radchenko
  2021-03-24 14:13     ` Ihor Radchenko
  0 siblings, 1 reply; 16+ messages in thread
From: Ihor Radchenko @ 2021-03-24 13:10 UTC (permalink / raw)
  To: Nicolas Goaziou; +Cc: emacs-orgmode

Nicolas Goaziou <mail@nicolasgoaziou.fr> writes:
> Actually, this is (was?) intentional. By forbidding parenthesis in
> a plain URL, you allow one to type, e.g., (https://orgmode.org), which
> is, IMO, a more frequent need than having to deal with parenthesis in
> the URL.

The patch correctly recognises situations like this.
https://orgmode.org is recognised correctly without parenthesis.

I guess this example may be another addition to the tests.

Best,
Ihor


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Re: Bug: Plain https links with brackets are not recognised [9.4.4 (release_9.4.4-625-g763c7a @ /home/yantar92/.emacs.d/straight/build/org/)]
  2021-03-24 12:31               ` Maxim Nikulin
@ 2021-03-24 14:11                 ` Ihor Radchenko
  0 siblings, 0 replies; 16+ messages in thread
From: Ihor Radchenko @ 2021-03-24 14:11 UTC (permalink / raw)
  To: Maxim Nikulin, emacs-orgmode

Maxim Nikulin <manikulin@gmail.com> writes:

> Example for #+STARTUP: overview:
>
> org-activate-links  560         0.028971085   5.173...e-05
>
> For content number of calls is 410, without special settings (all) 120,
> let me remind that it is for 10 find-file invocations. Another example
>
> org-activate-links  410         0.1384633219  0.0003377154

I repeated your benchmark on my largest working file (~14k links):

;; with patch
;; org-activate-links  400         0.0259791009  6.494...e-05
;; org-activate-links  400         0.0114822140  2.870...e-05
;; org-activate-links  400         0.0255080609  6.377...e-05
;; without patch
;; org-activate-links  400         0.0297167870  7.429...e-05
;; org-activate-links  400         0.0149334709  3.733...e-05
;; org-activate-links  400         0.0105385180  2.634...e-05

There is not much difference indeed. I guess, there was something about
my config and external packages.

> I see such variations in both cases with and without the patch, but 
> these numbers are negligible in my opinion.

Your benchmark is measuring is jit-lock - there will be no reliable
result as jit-lock is timer-based.

I did more reliable version as well:

(progn
  (require 'elp)
  (require 'org-element)
  (setq elp-function-list (list #'org-activate-links))
  (elp-instrument-list nil)
  (dolist (i (number-sequence 1 10))
    (message "iter %d" i)
    (find-file "~/Org/notes.org")
    (font-lock-ensure) ;; Force fontification in all the buffer
    (sit-for 1)
    (kill-buffer "notes.org")
    (sit-for 1))
  (elp-results))

Results are not different though (time-per-single-call):

;; with patch + font-lock-ensure
;; org-activate-links  163290      9.720667509   5.953...e-05
;; org-activate-links  163290      9.8090518640  6.007...e-05
;; without patch + font-lock-ensure
;; org-activate-links  163290      9.9175657860  6.073...e-05
;; org-activate-links  163290      10.073281878  6.168...e-05

This latter case was what was happening with my config. Some package was
causing full buffer fontification.

> In my opinion, combining changes related to white spaces and meaningful 
> modifications makes commits less clear, especially when reading email. 
> However the following recommendation has certainly more weight:
>
> https://orgmode.org/list/87zh2hosex.fsf@bzg.fr/ From: Bastien
>> Also, the convention in Emacs is to avoid whitespaces-only commits,
>> you need to fix whitespaces within other non-whitespaces changes in
>> a commit.

Actually, I feel confused now. I remember that message from Bastien, but
now I cannot recall what is considered "fix whitespaces". Do we use
tab-convention or space-convention?

I think I will better clear the whitespace staff in the patch before I
understand the whitespace policy more clearly. At least, the patch will
be more readable.

Best,
Ihor


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Bug: Plain https links with brackets are not recognised [9.4.4 (release_9.4.4-625-g763c7a @ /home/yantar92/.emacs.d/straight/build/org/)]
  2021-03-24 13:10   ` Ihor Radchenko
@ 2021-03-24 14:13     ` Ihor Radchenko
  2021-05-15  8:34       ` Bastien
  0 siblings, 1 reply; 16+ messages in thread
From: Ihor Radchenko @ 2021-03-24 14:13 UTC (permalink / raw)
  To: Nicolas Goaziou; +Cc: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 561 bytes --]

Updated version of the patch.

Ihor Radchenko <yantar92@gmail.com> writes:

> Nicolas Goaziou <mail@nicolasgoaziou.fr> writes:
>> Actually, this is (was?) intentional. By forbidding parenthesis in
>> a plain URL, you allow one to type, e.g., (https://orgmode.org), which
>> is, IMO, a more frequent need than having to deal with parenthesis in
>> the URL.
>
> The patch correctly recognises situations like this.
> https://orgmode.org is recognised correctly without parenthesis.
>
> I guess this example may be another addition to the tests.
>
> Best,
> Ihor


[-- Attachment #2: 0001-Improve-org-link-plain-re.patch --]
[-- Type: text/x-diff, Size: 6531 bytes --]

From 08efc990a578c925d42315c45e0b9b76536b92af Mon Sep 17 00:00:00 2001
From: Ihor Radchenko <yantar92@gmail.com>
Date: Wed, 24 Mar 2021 21:27:24 +0800
Subject: [PATCH] Improve org-link-plain-re

(org-link-plain-re): Update docstring.  Now, the docstring explicitly
mentions that the regexp must contain groups for the link type and the
path.

* lisp/ol.el (org-link-make-regexps): Allow URLs with up to two levels
of nested brackets.  Now, URLs like [1] can be matched.  The new
regexp is based on [2].

* testing/lisp/test-ol.el: Add tests for the plain link regexp

[1] https://doi.org/10.1016/0160-791x(79)90023-x
[2] https://daringfireball.net/2010/07/improved_regex_for_matching_urls
---
 lisp/ol.el              |  41 +++++++++++----
 testing/lisp/test-ol.el | 110 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 141 insertions(+), 10 deletions(-)

diff --git a/lisp/ol.el b/lisp/ol.el
index b8bd7d234..550c0cff6 100644
--- a/lisp/ol.el
+++ b/lisp/ol.el
@@ -519,7 +519,10 @@ links more efficient."
   "Matches link with angular brackets, spaces are allowed.")
 
 (defvar org-link-plain-re nil
-  "Matches plain link, without spaces.")
+  "Matches plain link, without spaces.
+Group 1 must contain the link type (i.e. https).
+Group 2 must contain the link path (i.e. //example.com).
+Used by `org-element-link-parser'.")
 
 (defvar org-link-bracket-re nil
   "Matches a link in double brackets.")
@@ -807,15 +810,33 @@ This should be called after the variable `org-link-parameters' has changed."
 	  (format "<%s:\\([^>\n]*\\(?:\n[ \t]*[^> \t\n][^>\n]*\\)*\\)>"
 		  types-re)
 	  org-link-plain-re
-	  (concat
-	   "\\<" types-re ":"
-	   "\\([^][ \t\n()<>]+\\(?:([[:word:]0-9_]+)\\|\\([^[:punct:] \t\n]\\|/\\)\\)\\)")
-	  ;;	 "\\([^]\t\n\r<>() ]+[^]\t\n\r<>,.;() ]\\)")
-	  org-link-bracket-re
-	  (rx (seq "[["
-		   ;; URI part: match group 1.
-		   (group
-		    (one-or-more
+          (let* ((non-space-bracket "[^][ \t\n()<>]")
+	         (parenthesis
+		  `(seq "("
+		        (0+ (or (regex ,non-space-bracket)
+			        (seq "("
+				     (0+ (regex ,non-space-bracket))
+				     ")")))
+		        ")")))
+	    ;; Heuristics for an URL link inspired by
+	    ;; https://daringfireball.net/2010/07/improved_regex_for_matching_urls
+	    (rx-to-string
+	     `(seq word-start
+                   ;; Link type: match group 1.
+		   (regexp ,types-re)
+		   ":"
+                   ;; Link path: match group 2.
+                   (group
+		    (1+ (or (regex ,non-space-bracket)
+			    ,parenthesis))
+		    (or (regexp "[^[:punct:] \t\n]")
+		        ?/
+		        ,parenthesis)))))
+          org-link-bracket-re
+          (rx (seq "[["
+	           ;; URI part: match group 1.
+	           (group
+	            (one-or-more
                      (or (not (any "[]\\"))
 			 (and "\\" (zero-or-more "\\\\") (any "[]"))
 			 (and (one-or-more "\\") (not (any "[]"))))))
diff --git a/testing/lisp/test-ol.el b/testing/lisp/test-ol.el
index 5b7dc513b..ddcc570b3 100644
--- a/testing/lisp/test-ol.el
+++ b/testing/lisp/test-ol.el
@@ -491,5 +491,115 @@
 	      (org-previous-link))
 	    (buffer-substring (point) (line-end-position))))))
 
+\f
+;;; Link regexps
+
+
+(defmacro test-ol-parse-link-in-text (text)
+  "Return list of :type and :path of link parsed in TEXT.
+\"<point>\" string must be at the beginning of the link to be parsed."
+  (declare (indent 1))
+  `(org-test-with-temp-text ,text
+     (list (org-element-property :type (org-element-link-parser))
+           (org-element-property :path (org-element-link-parser)))))
+
+(ert-deftest test-ol/plain-link-re ()
+  "Test `org-link-plain-re'."
+  (should
+   (equal
+    '("https" "//example.com")
+    (test-ol-parse-link-in-text
+        "(<point>https://example.com)")))
+  (should
+   (equal
+    '("https" "//example.com/qwe()")
+    (test-ol-parse-link-in-text
+        "(Some text <point>https://example.com/qwe())")))
+  (should
+   (equal
+    '("https" "//doi.org/10.1016/0160-791x(79)90023-x")
+    (test-ol-parse-link-in-text
+        "<point>https://doi.org/10.1016/0160-791x(79)90023-x")))
+  (should
+   (equal
+    '("file" "aa")
+    (test-ol-parse-link-in-text
+        "The <point>file:aa link")))
+  (should
+   (equal
+    '("file" "a(b)c")
+    (test-ol-parse-link-in-text
+        "The <point>file:a(b)c link")))
+  (should
+   (equal
+    '("file" "a()")
+    (test-ol-parse-link-in-text
+        "The <point>file:a() link")))
+  (should
+   (equal
+    '("file" "aa((a))")
+    (test-ol-parse-link-in-text
+        "The <point>file:aa((a)) link")))
+  (should
+   (equal
+    '("file" "aa(())")
+    (test-ol-parse-link-in-text
+        "The <point>file:aa(()) link")))
+  (should
+   (equal
+    '("file" "/a")
+    (test-ol-parse-link-in-text
+        "The <point>file:/a link")))
+  (should
+   (equal
+    '("file" "/a/")
+    (test-ol-parse-link-in-text
+        "The <point>file:/a/ link")))
+  (should
+   (equal
+    '("http" "//")
+    (test-ol-parse-link-in-text
+        "The <point>http:// link")))
+  (should
+   (equal
+    '("file" "ab")
+    (test-ol-parse-link-in-text
+        "The (some <point>file:ab) link")))
+  (should
+   (equal
+    '("file" "aa")
+    (test-ol-parse-link-in-text
+        "The <point>file:aa) link")))
+  (should
+   (equal
+    '("file" "aa")
+    (test-ol-parse-link-in-text
+        "The <point>file:aa( link")))
+  (should
+   (equal
+    '("http" "//foo.com/more_(than)_one_(parens)")
+    (test-ol-parse-link-in-text
+        "The <point>http://foo.com/more_(than)_one_(parens) link")))
+  (should
+   (equal
+    '("http" "//foo.com/blah_(wikipedia)#cite-1")
+    (test-ol-parse-link-in-text
+        "The <point>http://foo.com/blah_(wikipedia)#cite-1 link")))
+  (should
+   (equal
+    '("http" "//foo.com/blah_(wikipedia)_blah#cite-1")
+    (test-ol-parse-link-in-text
+        "The <point>http://foo.com/blah_(wikipedia)_blah#cite-1 link")))
+  (should
+   (equal
+    '("http" "//foo.com/unicode_(✪)_in_parens")
+    (test-ol-parse-link-in-text
+        "The <point>http://foo.com/unicode_(✪)_in_parens link")))
+  (should
+   (equal
+    '("http" "//foo.com/(something)?after=parens")
+    (test-ol-parse-link-in-text
+        "The <point>http://foo.com/(something)?after=parens link"))))
+
 (provide 'test-ol)
 ;;; test-ol.el ends here
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: Bug: Plain https links with brackets are not recognised [9.4.4 (release_9.4.4-625-g763c7a @ /home/yantar92/.emacs.d/straight/build/org/)]
  2021-03-24 14:13     ` Ihor Radchenko
@ 2021-05-15  8:34       ` Bastien
  0 siblings, 0 replies; 16+ messages in thread
From: Bastien @ 2021-05-15  8:34 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: emacs-orgmode, Nicolas Goaziou

Hi Ihor,

Ihor Radchenko <yantar92@gmail.com> writes:

> Updated version of the patch.

Applied in maint as f313b8184 with an modified commit message, thanks.

-- 
 Bastien


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-05-15  8:35 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-13  2:56 Bug: Plain https links with brackets are not recognised [9.4.4 (release_9.4.4-625-g763c7a @ /home/yantar92/.emacs.d/straight/build/org/)] Ihor Radchenko
2021-03-13  3:23 ` Kyle Meyer
2021-03-13  5:21   ` [PATCH] " Ihor Radchenko
2021-03-13  5:24     ` Ihor Radchenko
2021-03-15 11:54       ` Maxim Nikulin
2021-03-16 12:35         ` Ihor Radchenko
2021-03-17 14:59           ` Maxim Nikulin
2021-03-19 15:07             ` Ihor Radchenko
2021-03-24 12:31               ` Maxim Nikulin
2021-03-24 14:11                 ` Ihor Radchenko
2021-03-19 12:10           ` Maxim Nikulin
2021-03-19 15:40             ` Ihor Radchenko
2021-03-24 12:51 ` Nicolas Goaziou
2021-03-24 13:10   ` Ihor Radchenko
2021-03-24 14:13     ` Ihor Radchenko
2021-05-15  8:34       ` Bastien

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).