From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2 ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id 4Oq2FKxLT2BjHgAA0tVLHw (envelope-from ) for ; Mon, 15 Mar 2021 11:57:32 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2 with LMTPS id oGOMEKxLT2C+aQAAB5/wlQ (envelope-from ) for ; Mon, 15 Mar 2021 11:57:32 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id BAA3525501 for ; Mon, 15 Mar 2021 12:57:31 +0100 (CET) Received: from localhost ([::1]:40330 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lLlqk-0006HD-QK for larch@yhetil.org; Mon, 15 Mar 2021 07:57:30 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:44158) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lLlno-0004jM-1w for emacs-orgmode@gnu.org; Mon, 15 Mar 2021 07:54:30 -0400 Received: from ciao.gmane.io ([116.202.254.214]:35336) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lLlnm-0002KA-BZ for emacs-orgmode@gnu.org; Mon, 15 Mar 2021 07:54:27 -0400 Received: from list by ciao.gmane.io with local (Exim 4.92) (envelope-from ) id 1lLlnk-000AeH-Gm for emacs-orgmode@gnu.org; Mon, 15 Mar 2021 12:54:24 +0100 X-Injected-Via-Gmane: http://gmane.org/ To: emacs-orgmode@gnu.org From: Maxim Nikulin Subject: Re: [PATCH] Re: Bug: Plain https links with brackets are not recognised [9.4.4 (release_9.4.4-625-g763c7a @ /home/yantar92/.emacs.d/straight/build/org/)] Date: Mon, 15 Mar 2021 18:54:18 +0700 Message-ID: References: <87pn03g3rr.fsf@localhost> <87sg4zu47p.fsf@kyleam.com> <87k0qbk4rq.fsf@localhost> <87h7lfk4mz.fsf@localhost> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1 In-Reply-To: <87h7lfk4mz.fsf@localhost> Content-Language: en-US Received-SPF: pass client-ip=116.202.254.214; envelope-from=geo-emacs-orgmode@m.gmane-mx.org; helo=ciao.gmane.io X-Spam_score_int: 28 X-Spam_score: 2.8 X-Spam_bar: ++ X-Spam_report: (2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_ADSP_CUSTOM_MED=0.001, FORGED_GMAIL_RCVD=1, FORGED_MUA_MOZILLA=2.309, FREEMAIL_FORGED_FROMDOMAIN=0.249, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.25, NICE_REPLY_A=-0.001, NML_ADSP_CUSTOM_MED=0.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1615809451; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=X+Lb2nfyeqrhEuY/GCHyPXhVzWR3vk25Goq1pukyYUw=; b=XbRyZQdA9WsdKHZuLDpd3XY22OGuRyH9/t5A0yX0JVzhyRsfAew1zDZLjmKnCrwa1XMnWy vBtS1HvGTutGlPWuFKTCbH+9z3tkowVCEgN7yD6DSZKDkWYdKZqHt3Kya6fPzM3M35ZB6P n5Vd+uQwtEnclqNXcNf/9kLSEmZqxqWRj0EFA59DRMiYP6aWyMtfV8bRjDef5zl9N9AN+F maLPp5CXC4/pi3Bci5v6J7wUibhzHfjWYzAzXt9/h0Cl+rS42Y8z5yQllVUYZq0K1xrsiE GmZxTeUaKiYDXwsmLK6dF2q6BKYiLjMrRmzVqY6c3uHsDyYh7coRlUaEN/W6Og== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1615809451; a=rsa-sha256; cv=none; b=R9lt5FtkKDsZiYa2dNpsATfb53NmxqLaE2ZxkXmGth297bzMkAMgDOGvQGVD96ubu8tN31 PgApIfD7QveAVM6sa459+tU388hAe6QsyUsIk4womYfb5dbLwlBYIqs2NSV9BGhXR3S7wW bb6mEnUGzpBteBnRiMucgaFSiU5y4CqZ7UhzIWPSoJNUmq5dH9x7cQ9/CpmTOo+6v4qvCz YgBP5JdoACRU9ajl+C7EvGuvk7QzNfZdGwEJJ8os1/FhMWhZCqkGA39YLx20oZN8LafJ0+ OCbItojsU96Q+fuw5wAtyX8d+NezcjvIlzerwxhvfzX5xUK/C9u2LbF4ljKfkQ== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Migadu-Spam-Score: -1.80 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Migadu-Queue-Id: BAA3525501 X-Spam-Score: -1.80 X-Migadu-Scanner: scn0.migadu.com X-TUID: rsikA3frxEzz On 13/03/2021 12:24, Ihor Radchenko wrote: > Here is the right one I think, some unit tests would be great to avoid future regressions. Since it is heuristics, some cases will be always broken, so tests could clarify intentions. I have not tested the regexp on real cases, so you have full rights to be skeptical concerning my comments. I prefer to have explicit URLs in my files. > + (let ((non-space-bracket "[^][ \t\n()<>]+")) I think, "+" is redundant here, it does not allow empty inner parenthesis and leads to nested construction "(x+)+" that could be simplified to "x+" > + ;; Heiristics for an URL link. Source: Typo: Heuristics > + ;; https://daringfireball.net/2010/07/improved_regex_for_matching_urls > + (rx-to-string > + `(seq (regexp "\\<") Isn't it an equivalent to "word-start"? > + (regexp ,types-re) > + ":" > + (1+ (or (regex ,non-space-bracket) > + (seq "(" > + (* (or (regex ,non-space-bracket) > + (seq "(" > + (regex ,non-space-bracket) Maybe "0+" to allow "(())"? > + ")"))) > + ")"))) > + (or (seq "(" > + (* (or (regex ,non-space-bracket) > + (seq "(" > + (regex ,non-space-bracket) > + ")"))) > + ")") > + (regexp "\\([^[:punct:] \t\n]\\|/\\)"))))) Is the group useful for any purpose? The construction is already inside an "or". I have tried to compare current, your, and a little modified your regexps on several synthetic examples: #+begin_src elisp (let* ((types-re (regexp-opt (org-link-types) t)) (org-link-plain-re-ir (let ((non-space-bracket "[^][ \t\n()<>]+")) ;; Heiristics for an URL link. Source: ;; https://daringfireball.net/2010/07/improved_regex_for_matching_urls (rx-to-string `(seq (regexp "\\<") (regexp ,types-re) ":" (1+ (or (regex ,non-space-bracket) (seq "(" (* (or (regex ,non-space-bracket) (seq "(" (regex ,non-space-bracket) ")"))) ")"))) (or (seq "(" (* (or (regex ,non-space-bracket) (seq "(" (regex ,non-space-bracket) ")"))) ")") (regexp "\\([^[:punct:] \t\n]\\|/\\)")))))) (org-link-plain-re-nm (let* ((non-space-bracket "[^][ \t\n()<>]") (parenthesis `(seq "(" (0+ (or (regex ,non-space-bracket) (seq "(" (0+ (regex ,non-space-bracket)) ")"))) ")"))) ;; Heuristics for an URL link inspired by ;; https://daringfireball.net/2010/07/improved_regex_for_matching_urls (rx-to-string `(seq word-start (regexp ,types-re) ":" (1+ (or (regex ,non-space-bracket) ,parenthesis)) (or (regexp "[^[:punct:] \t\n]") ?/ ,parenthesis)))))) (cons (list "src" "orig" "ir" "nm") (mapcar (lambda (s) (cons s (mapcar (lambda (r) (if (string-match r s) (match-string 0 s) "")) (list org-link-plain-re org-link-plain-re-ir org-link-plain-re-nm)))) (list "The file:a link" "The file:aa link" "The file:a(b)c link" "The file:a() link" "The file:aa((a)) link" "The file:aa(()) link" "The file:///a link" "The file:///a/, link" "The http:// link" "The (some file:ab) link" "The file:aa) link" "The file:aa( link" )))) #+end_src #+RESULTS: | src | orig | ir | nm | | The file:a link | | | | | The file:aa link | file:aa | file:aa | file:aa | | The file:a(b)c link | file:a(b) | file:a(b)c | file:a(b)c | | The file:a() link | | file:a() | file:a() | | The file:aa((a)) link | file:aa | file:aa((a)) | file:aa((a)) | | The file:aa(()) link | file:aa | file:aa | file:aa(()) | | The file:/a link | file:/a | file:/a | file:/a | | The file:/a/, link | file:/a/ | file:/a/ | file:/a/ | | The http:// link | http:// | http:// | http:// | | The (some file:ab) link | file:ab | file:ab | file:ab | | The file:aa) link | file:aa | file:aa | file:aa | | The file:aa( link | file:aa | file:aa | file:aa |