emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Maxim Nikulin <manikulin@gmail.com>
To: emacs-orgmode@gnu.org
Subject: Re: [PATCH] Re: Bug: Plain https links with brackets are not recognised [9.4.4 (release_9.4.4-625-g763c7a @ /home/yantar92/.emacs.d/straight/build/org/)]
Date: Mon, 15 Mar 2021 18:54:18 +0700	[thread overview]
Message-ID: <s2nhtb$17g4$1@ciao.gmane.io> (raw)
In-Reply-To: <87h7lfk4mz.fsf@localhost>

On 13/03/2021 12:24, Ihor Radchenko wrote:
> Here is the right one

I think, some unit tests would be great to avoid future regressions. 
Since it is heuristics, some cases will be always broken, so tests could 
clarify intentions.

I have not tested the regexp on real cases, so you have full rights to 
be skeptical concerning my comments. I prefer to have explicit URLs in 
my files.

> +	  (let ((non-space-bracket "[^][ \t\n()<>]+"))

I think, "+" is redundant here, it does not allow empty inner 
parenthesis and leads to nested construction "(x+)+" that could be 
simplified to "x+"

> +            ;; Heiristics for an URL link.  Source:

Typo: Heuristics

> +            ;; https://daringfireball.net/2010/07/improved_regex_for_matching_urls
> +            (rx-to-string
> +             `(seq (regexp "\\<")

Isn't it an equivalent to "word-start"?

> +                   (regexp ,types-re)
> +                   ":"
> +                   (1+ (or (regex ,non-space-bracket)
> +                           (seq "("
> +                                (* (or (regex ,non-space-bracket)
> +                                       (seq "("
> +                                            (regex ,non-space-bracket)

Maybe "0+" to allow "(())"?

> +                                            ")")))
> +                                ")")))
> +                   (or (seq "("
> +                            (* (or (regex ,non-space-bracket)
> +                                   (seq "("
> +                                        (regex ,non-space-bracket)
> +                                        ")")))
> +                            ")")
> +                       (regexp "\\([^[:punct:] \t\n]\\|/\\)")))))

Is the group useful for any purpose? The construction is already inside 
an "or".

I have tried to compare current, your, and a little modified your 
regexps on several synthetic examples:

#+begin_src elisp
   (let*
       ((types-re (regexp-opt (org-link-types) t))
        (org-link-plain-re-ir
	(let ((non-space-bracket "[^][ \t\n()<>]+"))
	  ;; Heiristics for an URL link.  Source:
	  ;; https://daringfireball.net/2010/07/improved_regex_for_matching_urls
	  (rx-to-string
	   `(seq (regexp "\\<")
		 (regexp ,types-re)
		 ":"
		 (1+ (or (regex ,non-space-bracket)
			 (seq "("
			      (* (or (regex ,non-space-bracket)
				     (seq "("
					  (regex ,non-space-bracket)
					  ")")))
			      ")")))
		 (or (seq "("
			  (* (or (regex ,non-space-bracket)
				 (seq "("
				      (regex ,non-space-bracket)
				      ")")))
			  ")")
		     (regexp "\\([^[:punct:] \t\n]\\|/\\)"))))))
        (org-link-plain-re-nm
	(let* ((non-space-bracket "[^][ \t\n()<>]")
	       (parenthesis
		`(seq "("
		      (0+ (or (regex ,non-space-bracket)
			      (seq "("
				   (0+ (regex ,non-space-bracket))
				   ")")))
		      ")")))
	  ;; Heuristics for an URL link inspired by
	  ;; https://daringfireball.net/2010/07/improved_regex_for_matching_urls
	  (rx-to-string
	   `(seq word-start
		 (regexp ,types-re)
		 ":"
		 (1+ (or (regex ,non-space-bracket)
			 ,parenthesis))
		 (or (regexp "[^[:punct:] \t\n]")
                      ?/
		     ,parenthesis))))))
        (cons
	(list "src" "orig" "ir" "nm")
	(mapcar (lambda (s)
		  (cons
		   s
		   (mapcar (lambda (r)
			     (if (string-match r s) (match-string 0 s) ""))
			   (list org-link-plain-re org-link-plain-re-ir org-link-plain-re-nm))))
		(list
		 "The file:a link"
		 "The file:aa link"
		 "The file:a(b)c link"
		 "The file:a() link"
		 "The file:aa((a)) link"
		 "The file:aa(()) link"
		 "The file:///a link"
		 "The file:///a/, link"
		 "The http:// link"
		 "The (some file:ab) link"
		 "The file:aa) link"
		 "The file:aa( link"
))))
#+end_src

#+RESULTS:
| src                     | orig      | ir           | nm           |
| The file:a link         |           |              |              |
| The file:aa link        | file:aa   | file:aa      | file:aa      |
| The file:a(b)c link     | file:a(b) | file:a(b)c   | file:a(b)c   |
| The file:a() link       |           | file:a()     | file:a()     |
| The file:aa((a)) link   | file:aa   | file:aa((a)) | file:aa((a)) |
| The file:aa(()) link    | file:aa   | file:aa      | file:aa(())  |
| The file:/a link        | file:/a   | file:/a      | file:/a      |
| The file:/a/, link      | file:/a/  | file:/a/     | file:/a/     |
| The http:// link        | http://   | http://      | http://      |
| The (some file:ab) link | file:ab   | file:ab      | file:ab      |
| The file:aa) link       | file:aa   | file:aa      | file:aa      |
| The file:aa( link       | file:aa   | file:aa      | file:aa      |



  reply	other threads:[~2021-03-15 11:57 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-13  2:56 Bug: Plain https links with brackets are not recognised [9.4.4 (release_9.4.4-625-g763c7a @ /home/yantar92/.emacs.d/straight/build/org/)] Ihor Radchenko
2021-03-13  3:23 ` Kyle Meyer
2021-03-13  5:21   ` [PATCH] " Ihor Radchenko
2021-03-13  5:24     ` Ihor Radchenko
2021-03-15 11:54       ` Maxim Nikulin [this message]
2021-03-16 12:35         ` Ihor Radchenko
2021-03-17 14:59           ` Maxim Nikulin
2021-03-19 15:07             ` Ihor Radchenko
2021-03-24 12:31               ` Maxim Nikulin
2021-03-24 14:11                 ` Ihor Radchenko
2021-03-19 12:10           ` Maxim Nikulin
2021-03-19 15:40             ` Ihor Radchenko
2021-03-24 12:51 ` Nicolas Goaziou
2021-03-24 13:10   ` Ihor Radchenko
2021-03-24 14:13     ` Ihor Radchenko
2021-05-15  8:34       ` Bastien

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='s2nhtb$17g4$1@ciao.gmane.io' \
    --to=manikulin@gmail.com \
    --cc=emacs-orgmode@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).