emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* Bug in identification of links?
@ 2020-06-11 22:00 Daniele Nicolodi
  2020-06-12  1:19 ` Kyle Meyer
  0 siblings, 1 reply; 2+ messages in thread
From: Daniele Nicolodi @ 2020-06-11 22:00 UTC (permalink / raw)
  To: emacs-orgmode

Hello,

org-mode fails to recognize https://doi.org/10.1016/0370-1573(89)90087-2
as a valid URL, it breaks it after the closing parenthesis ). I don't
understand why this is the case as I would imagine that if the )
character is not allowed in URLs the link would be broken before it and
not after. I haven't tried to find the code responsible for this, thus I
don't know what exactly is going on. Does anyone have an idea?

Thank you.

Cheers,
Dan


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Bug in identification of links?
  2020-06-11 22:00 Bug in identification of links? Daniele Nicolodi
@ 2020-06-12  1:19 ` Kyle Meyer
  0 siblings, 0 replies; 2+ messages in thread
From: Kyle Meyer @ 2020-06-12  1:19 UTC (permalink / raw)
  To: Daniele Nicolodi; +Cc: emacs-orgmode

Daniele Nicolodi writes:

> org-mode fails to recognize https://doi.org/10.1016/0370-1573(89)90087-2
> as a valid URL, it breaks it after the closing parenthesis ). I don't
> understand why this is the case as I would imagine that if the )
> character is not allowed in URLs the link would be broken before it and
> not after. I haven't tried to find the code responsible for this, thus I
> don't know what exactly is going on. Does anyone have an idea?

The link is matched by org-link-plain-re, which is created by
org-link-make-regexps.  The relevant part looks like this:

  \\([^][ \t\n()<>]+\\(?:([[:word:]0-9_]+)\\|\\([^[:punct:] \t\n]\\|/\\)\\)\\)
                         -----------------

The underlined bit is what is matching "(89)".  This subpattern
appeared, without the underscore, in facedba05 (Use John Gruber's
regular expression for URL's, 2009-12-09).  The commit message links to
an article [0] that has this to say about the parentheses matching:

    It attempts to be particularly clever with regard to parentheses,
    which, in my experience, only ever seem to occur in the wild in
    Wikipedia URLs, and which many URL matching patterns seem to
    botch. The pattern looks for a single pair of balanced parentheses
    within the URL, which is how it correctly omits the trailing
    parenthesis in the following line:

        (Something like http://foo.com/blah_blah)

That article also has an update recommending to use an improved variant.
Untested, but it seems like it'd handle your case.

This issue has been around a long time and is minor in that there will
always be cases that fool the regexp and these can be handled by
enclosing the text with <...> or [[...]].  Still, in my view it'd be
worth taking a look at tweaking the regexp after the release of v9.4.


[0] https://daringfireball.net/2009/11/liberal_regex_for_matching_urls
    Related thread on mailing list:
    https://orgmode.org/list/loom.20091130T200527-783@post.gmane.org/


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-06-12  1:19 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-06-11 22:00 Bug in identification of links? Daniele Nicolodi
2020-06-12  1:19 ` Kyle Meyer

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).