emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Kyle Meyer <kyle@kyleam.com>
To: Daniele Nicolodi <daniele@grinta.net>
Cc: emacs-orgmode@gnu.org
Subject: Re: Bug in identification of links?
Date: Fri, 12 Jun 2020 01:19:01 +0000	[thread overview]
Message-ID: <874krhqdx6.fsf@kyleam.com> (raw)
In-Reply-To: <5faf0bd7-b114-9723-773e-7f3da16604a0@grinta.net>

Daniele Nicolodi writes:

> org-mode fails to recognize https://doi.org/10.1016/0370-1573(89)90087-2
> as a valid URL, it breaks it after the closing parenthesis ). I don't
> understand why this is the case as I would imagine that if the )
> character is not allowed in URLs the link would be broken before it and
> not after. I haven't tried to find the code responsible for this, thus I
> don't know what exactly is going on. Does anyone have an idea?

The link is matched by org-link-plain-re, which is created by
org-link-make-regexps.  The relevant part looks like this:

  \\([^][ \t\n()<>]+\\(?:([[:word:]0-9_]+)\\|\\([^[:punct:] \t\n]\\|/\\)\\)\\)
                         -----------------

The underlined bit is what is matching "(89)".  This subpattern
appeared, without the underscore, in facedba05 (Use John Gruber's
regular expression for URL's, 2009-12-09).  The commit message links to
an article [0] that has this to say about the parentheses matching:

    It attempts to be particularly clever with regard to parentheses,
    which, in my experience, only ever seem to occur in the wild in
    Wikipedia URLs, and which many URL matching patterns seem to
    botch. The pattern looks for a single pair of balanced parentheses
    within the URL, which is how it correctly omits the trailing
    parenthesis in the following line:

        (Something like http://foo.com/blah_blah)

That article also has an update recommending to use an improved variant.
Untested, but it seems like it'd handle your case.

This issue has been around a long time and is minor in that there will
always be cases that fool the regexp and these can be handled by
enclosing the text with <...> or [[...]].  Still, in my view it'd be
worth taking a look at tweaking the regexp after the release of v9.4.


[0] https://daringfireball.net/2009/11/liberal_regex_for_matching_urls
    Related thread on mailing list:
    https://orgmode.org/list/loom.20091130T200527-783@post.gmane.org/


      reply	other threads:[~2020-06-12  1:19 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-11 22:00 Bug in identification of links? Daniele Nicolodi
2020-06-12  1:19 ` Kyle Meyer [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874krhqdx6.fsf@kyleam.com \
    --to=kyle@kyleam.com \
    --cc=daniele@grinta.net \
    --cc=emacs-orgmode@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).