From: Maxim Nikulin <manikulin@gmail.com>
To: emacs-orgmode@gnu.org
Subject: Re: [PATCH] Re: Bug: Plain https links with brackets are not recognised [9.4.4 (release_9.4.4-625-g763c7a @ /home/yantar92/.emacs.d/straight/build/org/)]
Date: Mon, 15 Mar 2021 18:54:18 +0700 [thread overview]
Message-ID: <s2nhtb$17g4$1@ciao.gmane.io> (raw)
In-Reply-To: <87h7lfk4mz.fsf@localhost>
On 13/03/2021 12:24, Ihor Radchenko wrote:
> Here is the right one
I think, some unit tests would be great to avoid future regressions.
Since it is heuristics, some cases will be always broken, so tests could
clarify intentions.
I have not tested the regexp on real cases, so you have full rights to
be skeptical concerning my comments. I prefer to have explicit URLs in
my files.
> + (let ((non-space-bracket "[^][ \t\n()<>]+"))
I think, "+" is redundant here, it does not allow empty inner
parenthesis and leads to nested construction "(x+)+" that could be
simplified to "x+"
> + ;; Heiristics for an URL link. Source:
Typo: Heuristics
> + ;; https://daringfireball.net/2010/07/improved_regex_for_matching_urls
> + (rx-to-string
> + `(seq (regexp "\\<")
Isn't it an equivalent to "word-start"?
> + (regexp ,types-re)
> + ":"
> + (1+ (or (regex ,non-space-bracket)
> + (seq "("
> + (* (or (regex ,non-space-bracket)
> + (seq "("
> + (regex ,non-space-bracket)
Maybe "0+" to allow "(())"?
> + ")")))
> + ")")))
> + (or (seq "("
> + (* (or (regex ,non-space-bracket)
> + (seq "("
> + (regex ,non-space-bracket)
> + ")")))
> + ")")
> + (regexp "\\([^[:punct:] \t\n]\\|/\\)")))))
Is the group useful for any purpose? The construction is already inside
an "or".
I have tried to compare current, your, and a little modified your
regexps on several synthetic examples:
#+begin_src elisp
(let*
((types-re (regexp-opt (org-link-types) t))
(org-link-plain-re-ir
(let ((non-space-bracket "[^][ \t\n()<>]+"))
;; Heiristics for an URL link. Source:
;; https://daringfireball.net/2010/07/improved_regex_for_matching_urls
(rx-to-string
`(seq (regexp "\\<")
(regexp ,types-re)
":"
(1+ (or (regex ,non-space-bracket)
(seq "("
(* (or (regex ,non-space-bracket)
(seq "("
(regex ,non-space-bracket)
")")))
")")))
(or (seq "("
(* (or (regex ,non-space-bracket)
(seq "("
(regex ,non-space-bracket)
")")))
")")
(regexp "\\([^[:punct:] \t\n]\\|/\\)"))))))
(org-link-plain-re-nm
(let* ((non-space-bracket "[^][ \t\n()<>]")
(parenthesis
`(seq "("
(0+ (or (regex ,non-space-bracket)
(seq "("
(0+ (regex ,non-space-bracket))
")")))
")")))
;; Heuristics for an URL link inspired by
;; https://daringfireball.net/2010/07/improved_regex_for_matching_urls
(rx-to-string
`(seq word-start
(regexp ,types-re)
":"
(1+ (or (regex ,non-space-bracket)
,parenthesis))
(or (regexp "[^[:punct:] \t\n]")
?/
,parenthesis))))))
(cons
(list "src" "orig" "ir" "nm")
(mapcar (lambda (s)
(cons
s
(mapcar (lambda (r)
(if (string-match r s) (match-string 0 s) ""))
(list org-link-plain-re org-link-plain-re-ir org-link-plain-re-nm))))
(list
"The file:a link"
"The file:aa link"
"The file:a(b)c link"
"The file:a() link"
"The file:aa((a)) link"
"The file:aa(()) link"
"The file:///a link"
"The file:///a/, link"
"The http:// link"
"The (some file:ab) link"
"The file:aa) link"
"The file:aa( link"
))))
#+end_src
#+RESULTS:
| src | orig | ir | nm |
| The file:a link | | | |
| The file:aa link | file:aa | file:aa | file:aa |
| The file:a(b)c link | file:a(b) | file:a(b)c | file:a(b)c |
| The file:a() link | | file:a() | file:a() |
| The file:aa((a)) link | file:aa | file:aa((a)) | file:aa((a)) |
| The file:aa(()) link | file:aa | file:aa | file:aa(()) |
| The file:/a link | file:/a | file:/a | file:/a |
| The file:/a/, link | file:/a/ | file:/a/ | file:/a/ |
| The http:// link | http:// | http:// | http:// |
| The (some file:ab) link | file:ab | file:ab | file:ab |
| The file:aa) link | file:aa | file:aa | file:aa |
| The file:aa( link | file:aa | file:aa | file:aa |
next prev parent reply other threads:[~2021-03-15 11:57 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-13 2:56 Bug: Plain https links with brackets are not recognised [9.4.4 (release_9.4.4-625-g763c7a @ /home/yantar92/.emacs.d/straight/build/org/)] Ihor Radchenko
2021-03-13 3:23 ` Kyle Meyer
2021-03-13 5:21 ` [PATCH] " Ihor Radchenko
2021-03-13 5:24 ` Ihor Radchenko
2021-03-15 11:54 ` Maxim Nikulin [this message]
2021-03-16 12:35 ` Ihor Radchenko
2021-03-17 14:59 ` Maxim Nikulin
2021-03-19 15:07 ` Ihor Radchenko
2021-03-24 12:31 ` Maxim Nikulin
2021-03-24 14:11 ` Ihor Radchenko
2021-03-19 12:10 ` Maxim Nikulin
2021-03-19 15:40 ` Ihor Radchenko
2021-03-24 12:51 ` Nicolas Goaziou
2021-03-24 13:10 ` Ihor Radchenko
2021-03-24 14:13 ` Ihor Radchenko
2021-05-15 8:34 ` Bastien
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.orgmode.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='s2nhtb$17g4$1@ciao.gmane.io' \
--to=manikulin@gmail.com \
--cc=emacs-orgmode@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).