From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id kCCdByvY4l6jEwAA0tVLHw (envelope-from ) for ; Fri, 12 Jun 2020 01:19:39 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1 with LMTPS id eF9DAyvY4l6NGwAAbx9fmQ (envelope-from ) for ; Fri, 12 Jun 2020 01:19:39 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 815F194013C for ; Fri, 12 Jun 2020 01:19:36 +0000 (UTC) Received: from localhost ([::1]:45828 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jjYM1-0007M4-Ud for larch@yhetil.org; Thu, 11 Jun 2020 21:19:33 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:37936) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jjYLf-0007Lx-LM for emacs-orgmode@gnu.org; Thu, 11 Jun 2020 21:19:11 -0400 Received: from pb-smtp21.pobox.com ([173.228.157.53]:54895) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jjYLc-0004ti-Qf for emacs-orgmode@gnu.org; Thu, 11 Jun 2020 21:19:11 -0400 Received: from pb-smtp21.pobox.com (unknown [127.0.0.1]) by pb-smtp21.pobox.com (Postfix) with ESMTP id CAAE8CBCC3; Thu, 11 Jun 2020 21:19:05 -0400 (EDT) (envelope-from kyle@kyleam.com) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=from:to:cc :subject:in-reply-to:references:date:message-id:mime-version :content-type; s=sasl; bh=SHie4K6ziqZC3Plnoe6eyvY9fik=; b=VXYb+9 UAgSL1MXEZXGTzDbzTIHnEIDb0r6DA2kxkAB0oBjq8PVQt9UG3wIyEoJKsEy4ByZ YhGtzoAjrPMXe7Lm2TRZsqdFFs9AnkHDg4s//jPSumAh76SRSrO1mnL0VCwxH2+R 00hddlIfaV9+1d0LWXrTfNUf2pFFjjMQR/6bQ= Received: from pb-smtp21.sea.icgroup.com (unknown [127.0.0.1]) by pb-smtp21.pobox.com (Postfix) with ESMTP id C2B7ACBCC1; Thu, 11 Jun 2020 21:19:05 -0400 (EDT) (envelope-from kyle@kyleam.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=kyleam.com; h=from:to:cc:subject:in-reply-to:references:date:message-id:mime-version:content-type; s=mesmtp; bh=vQ3VKckjE2aKOtEMWwSpztHuL6YJ2qeoUFA5c3Ktzrs=; b=R9KEj24V1B+MGtbl+IFFWbbpauvrqN9Jwf9mEI6l8R5JGrgVd5SmqGcXevKNnI8aB/0Y8qQDtULu6G2IesX9gBJrhC1lQGNS0tTBhLGibXUzWxQ0mJwZ6WUOaFmR/YvzCg5EoUGCfWNyj+q2JtkDfDdS7uIRpIiYqid8B3hveKI= Received: from localhost (unknown [45.33.91.115]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by pb-smtp21.pobox.com (Postfix) with ESMTPSA id 3A336CBCC0; Thu, 11 Jun 2020 21:19:03 -0400 (EDT) (envelope-from kyle@kyleam.com) From: Kyle Meyer To: Daniele Nicolodi Subject: Re: Bug in identification of links? In-Reply-To: <5faf0bd7-b114-9723-773e-7f3da16604a0@grinta.net> References: <5faf0bd7-b114-9723-773e-7f3da16604a0@grinta.net> Date: Fri, 12 Jun 2020 01:19:01 +0000 Message-ID: <874krhqdx6.fsf@kyleam.com> MIME-Version: 1.0 Content-Type: text/plain X-Pobox-Relay-ID: B3D009B2-AC4A-11EA-86A6-8D86F504CC47-24757444!pb-smtp21.pobox.com Received-SPF: pass client-ip=173.228.157.53; envelope-from=kyle@kyleam.com; helo=pb-smtp21.pobox.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/06/11 21:19:06 X-ACL-Warn: Detected OS = Linux 2.2.x-3.x [generic] [fuzzy] X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: emacs-orgmode@gnu.org Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=pobox.com header.s=sasl header.b=VXYb+9 U; dkim=pass header.d=kyleam.com header.s=mesmtp header.b=R9KEj24V; dmarc=none; spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Spam-Score: 1.29 X-TUID: nskZqmIjpM/K Daniele Nicolodi writes: > org-mode fails to recognize https://doi.org/10.1016/0370-1573(89)90087-2 > as a valid URL, it breaks it after the closing parenthesis ). I don't > understand why this is the case as I would imagine that if the ) > character is not allowed in URLs the link would be broken before it and > not after. I haven't tried to find the code responsible for this, thus I > don't know what exactly is going on. Does anyone have an idea? The link is matched by org-link-plain-re, which is created by org-link-make-regexps. The relevant part looks like this: \\([^][ \t\n()<>]+\\(?:([[:word:]0-9_]+)\\|\\([^[:punct:] \t\n]\\|/\\)\\)\\) ----------------- The underlined bit is what is matching "(89)". This subpattern appeared, without the underscore, in facedba05 (Use John Gruber's regular expression for URL's, 2009-12-09). The commit message links to an article [0] that has this to say about the parentheses matching: It attempts to be particularly clever with regard to parentheses, which, in my experience, only ever seem to occur in the wild in Wikipedia URLs, and which many URL matching patterns seem to botch. The pattern looks for a single pair of balanced parentheses within the URL, which is how it correctly omits the trailing parenthesis in the following line: (Something like http://foo.com/blah_blah) That article also has an update recommending to use an improved variant. Untested, but it seems like it'd handle your case. This issue has been around a long time and is minor in that there will always be cases that fool the regexp and these can be handled by enclosing the text with <...> or [[...]]. Still, in my view it'd be worth taking a look at tweaking the regexp after the release of v9.4. [0] https://daringfireball.net/2009/11/liberal_regex_for_matching_urls Related thread on mailing list: https://orgmode.org/list/loom.20091130T200527-783@post.gmane.org/