From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id QNtnNoClUGC+JQAA0tVLHw (envelope-from ) for ; Tue, 16 Mar 2021 12:33:04 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id CExMMoClUGAIDgAA1q6Kng (envelope-from ) for ; Tue, 16 Mar 2021 12:33:04 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id E9C0E18E44 for ; Tue, 16 Mar 2021 13:33:03 +0100 (CET) Received: from localhost ([::1]:35116 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lM8sh-0008JK-3W for larch@yhetil.org; Tue, 16 Mar 2021 08:33:03 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:43540) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lM8qy-0008Gh-BK for emacs-orgmode@gnu.org; Tue, 16 Mar 2021 08:31:16 -0400 Received: from mail-lf1-x12c.google.com ([2a00:1450:4864:20::12c]:38495) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1lM8qv-0001Kc-O2 for emacs-orgmode@gnu.org; Tue, 16 Mar 2021 08:31:16 -0400 Received: by mail-lf1-x12c.google.com with SMTP id m22so62120415lfg.5 for ; Tue, 16 Mar 2021 05:31:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:in-reply-to:references:date:message-id:mime-version; bh=97GyRBYPl7i9nKZq/BgXom3XRWvWFt0ciR80CE9R+n4=; b=cqzTBhFsVm6rrt6c1P6cchMC71Rh5FGr9SqR9eJsWntScu6zMkdsP+7Pc2RlDKqyY3 8w4YFIMiljsc3XdRAWkydNxTJbpTGzHXClHXfGDPx+vGTjS8crXg/xDaAgYlzNR8GMNi Il6rKwvNXbekz3wRCT3QhXzgCZvbaWIm6C6xJpxEMKXMNkOdl+VnSqv3MhYZn4zUmyjB Gmnxt0wOv3GgjYepbGfGF7huXm0JxqhiFGU9GD1BjtC1DXw379WHY+Be/4VDWWtS7Ob7 FtuOhX2DR8b4DVSp6L/A+fCb9O0lB2+Dli21kEDoi85OX6fQsBdAulyO09tNbn42KYKY 7ABA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:in-reply-to:references:date :message-id:mime-version; bh=97GyRBYPl7i9nKZq/BgXom3XRWvWFt0ciR80CE9R+n4=; b=sREs6WVZEE/tKLFAgPLWC/ySSpRrjKb9ZJnBFjHGmtzHyhdknAGAYxrWjn+xq+8RCK qLi5Ilr6wrmn1i6A5pu00PoqZVGSUh5wzjJPVOo59J3SthrLzPVF0NwAIrsevXbbzos8 W0/oxZZsPuqOb78LeB4LIgtjoDWw3RQZ6xVAcHTs4pTsyLkJis50caMJj2sWUwz0564R wrtNUBOQtSTxrNGkp0uti27agmmEIqgN/bk2Oej7+qTzX/plwYZazbgMc8UZNeo0NhZb kgjjeOJr/DZavdoJoZHhSxDzAezMY2n3ClhG0rGTU0yFzac/s+tdXlcmZNE/J6aDt458 6Q0w== X-Gm-Message-State: AOAM530g78X3YCFKYoEzLuXRJcBbpvZnhAuSSd53IDhKaX2iTzOIIZc0 H1QqqY5kroBJOVzNffqTKto= X-Google-Smtp-Source: ABdhPJwVRrX/795JvP6we1iJOL1PE5TsXN/JV/BgMDLZqqf2jrP78GFn7ZSjW387v6OwD5GiUjwSyQ== X-Received: by 2002:a05:6512:3a8f:: with SMTP id q15mr10701491lfu.389.1615897870967; Tue, 16 Mar 2021 05:31:10 -0700 (PDT) Received: from localhost ([141.105.67.194]) by smtp.gmail.com with ESMTPSA id e25sm3193803ljo.113.2021.03.16.05.31.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 16 Mar 2021 05:31:10 -0700 (PDT) From: Ihor Radchenko To: Maxim Nikulin , emacs-orgmode@gnu.org Subject: Re: [PATCH] Re: Bug: Plain https links with brackets are not recognised [9.4.4 (release_9.4.4-625-g763c7a @ /home/yantar92/.emacs.d/straight/build/org/)] In-Reply-To: References: <87pn03g3rr.fsf@localhost> <87sg4zu47p.fsf@kyleam.com> <87k0qbk4rq.fsf@localhost> <87h7lfk4mz.fsf@localhost> Date: Tue, 16 Mar 2021 20:35:16 +0800 Message-ID: <8735wvuvi3.fsf@localhost> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Received-SPF: pass client-ip=2a00:1450:4864:20::12c; envelope-from=yantar92@gmail.com; helo=mail-lf1-x12c.google.com X-Spam_score_int: -17 X-Spam_score: -1.8 X-Spam_bar: - X-Spam_report: (-1.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1615897984; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=N/zYVrG3I9bWzGy9sgvLeaGqDl7sx1fCWkVdtLgDaNg=; b=oJXHAIEmbGL3/oLGONQY2ctOHjlcP4JJW7arK+Wv1uS9OkKzF8ieg+IurJXAXWCC6f+kBT 6SL0bxRk3DIi1KA87US1IVJo+2s3H1a97mfhoGe3hUQmlBCTv9eOmLz7HtS5dKbfEJgJgi vH7sw6H8YptDOOpqttYMlUucc08GwQlFbbc/BzItjgzUI7zh+eX93AMhSU7PC4stMufPT8 1hwqZOu3HCd+zCvqroxaiHvD/QvOOgLNT+f3gYKX16l2EdOBEQXFjbiRj/9UsHqqHoCV7F F+NxvJELBCTHbgv1D+0OuUlrpgkiMmgpBjedCWeL4uoWT8CDTShcU7Q7DEayow== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1615897984; a=rsa-sha256; cv=none; b=jMF/kGqLYYW1WoeQNskM0g3DrUgOa0RQoXg8NM3MnM9brQEwHaftgy9PfwQ80y1rhv6kwI eTEK8eDz+vM3NdKihw+xi/d7VAnfzBSZ4/yvnlee/7iorEfcEkmzVyqi/l18KDv5YV3rsG zh0DRa0baPJehGBWon7tFzrqVQTQEZEr100gvLz6CuHNlE7uhNC9DabihKPcleUe84mL7V SByhE64fGloV4swRMsOnYIMjiOE4nJ81qGtlSqswwxb1AVkdk6g4r8rnVE3YA8FQvUtPPl qEb/8iJHAeRdwDVIcqpkZXle8QPJucXBSfuVX+MHAeISJULLMXHImxF9yKS3CQ== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("body hash did not verify") header.d=gmail.com header.s=20161025 header.b=cqzTBhFs; spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Migadu-Spam-Score: -0.80 Authentication-Results: aspmx1.migadu.com; dkim=fail ("body hash did not verify") header.d=gmail.com header.s=20161025 header.b=cqzTBhFs; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Migadu-Queue-Id: E9C0E18E44 X-Spam-Score: -0.80 X-Migadu-Scanner: scn0.migadu.com X-TUID: agu7gPlmAZrJ --=-=-= Content-Type: text/plain Maxim Nikulin writes: > I have tried to compare current, your, and a little modified your > regexps on several synthetic examples: Thanks! Your version is indeed cleaner. I updated the patch. It uses your version now with one group specification that I missed earlier. I also added tests from your email, the post introducing regexp, and few more examples I came up with. I am testing the new regexp for a few days now. Because the regexp is quite complex and because font-lock apparently fontifies even invisible (folded) text, loading time on large org files with many links became noticeably longer. Though it was 7.2Mb file with ~13k links in it. Best, Ihor --=-=-= Content-Type: text/x-diff; charset=utf-8 Content-Disposition: inline; filename=0001-Improve-org-link-plain-re.patch Content-Transfer-Encoding: quoted-printable >From 6eb2208a67745e3150024e7b72509115b97fcfa3 Mon Sep 17 00:00:00 2001 From: Ihor Radchenko Date: Tue, 16 Mar 2021 20:20:32 +0800 Subject: [PATCH] Improve org-link-plain-re (org-link-plain-re): Update docstring. Now, the docstring explicitly mentions that the regexp must contain groups for the link type and the path. * lisp/ol.el (org-link-make-regexps): Allow URLs with up to two levels of nested brackets. Now, URLs like [1] can be matched. The new regexp is based on [2]. [1] https://doi.org/10.1016/0160-791x(79)90023-x [2] https://daringfireball.net/2010/07/improved_regex_for_matching_urls * testing/lisp/test-ol.el: Add tests for plain links. --- lisp/ol.el | 61 +++++++++++++------ testing/lisp/test-ol.el | 132 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 173 insertions(+), 20 deletions(-) diff --git a/lisp/ol.el b/lisp/ol.el index b8bd7d234..d3a07a3ed 100644 --- a/lisp/ol.el +++ b/lisp/ol.el @@ -519,7 +519,10 @@ links more efficient." "Matches link with angular brackets, spaces are allowed.") =20 (defvar org-link-plain-re nil - "Matches plain link, without spaces.") + "Matches plain link, without spaces. +Group 1 must contain the link type (i.e. https). +Group 2 must contain the link path (i.e. //example.com). +Used by `org-element-link-parser'.") =20 (defvar org-link-bracket-re nil "Matches a link in double brackets.") @@ -807,26 +810,44 @@ This should be called after the variable `org-link-pa= rameters' has changed." (format "<%s:\\([^>\n]*\\(?:\n[ \t]*[^> \t\n][^>\n]*\\)*\\)>" types-re) org-link-plain-re - (concat - "\\<" types-re ":" - "\\([^][ \t\n()<>]+\\(?:([[:word:]0-9_]+)\\|\\([^[:punct:] \t\n]\\|/\\= )\\)\\)") - ;; "\\([^]\t\n\r<>() ]+[^]\t\n\r<>,.;() ]\\)") - org-link-bracket-re - (rx (seq "[[" - ;; URI part: match group 1. - (group - (one-or-more + (let* ((non-space-bracket "[^][ \t\n()<>]") + (parenthesis + `(seq "(" + (0+ (or (regex ,non-space-bracket) + (seq "(" + (0+ (regex ,non-space-bracket)) + ")"))) + ")"))) + ;; Heuristics for an URL link inspired by + ;; https://daringfireball.net/2010/07/improved_regex_for_matching_urls + (rx-to-string + `(seq word-start + ;; Link type: match group 1. + (regexp ,types-re) + ":" + ;; Link path: match group 2. + (group + (1+ (or (regex ,non-space-bracket) + ,parenthesis)) + (or (regexp "[^[:punct:] \t\n]") + ?/ + ,parenthesis))))) + org-link-bracket-re + (rx (seq "[[" + ;; URI part: match group 1. + (group + (one-or-more (or (not (any "[]\\")) - (and "\\" (zero-or-more "\\\\") (any "[]")) - (and (one-or-more "\\") (not (any "[]")))))) - "]" - ;; Description (optional): match group 2. - (opt "[" (group (+? anything)) "]") - "]")) - org-link-any-re - (concat "\\(" org-link-bracket-re "\\)\\|\\(" - org-link-angle-re "\\)\\|\\(" - org-link-plain-re "\\)")))) + (and "\\" (zero-or-more "\\\\") (any "[]")) + (and (one-or-more "\\") (not (any "[]")))))) + "]" + ;; Description (optional): match group 2. + (opt "[" (group (+? anything)) "]") + "]")) + org-link-any-re + (concat "\\(" org-link-bracket-re "\\)\\|\\(" + org-link-angle-re "\\)\\|\\(" + org-link-plain-re "\\)")))) =20 (defun org-link-complete-file (&optional arg) "Create a file link using completion." diff --git a/testing/lisp/test-ol.el b/testing/lisp/test-ol.el index 5b7dc513b..e6208cd38 100644 --- a/testing/lisp/test-ol.el +++ b/testing/lisp/test-ol.el @@ -491,5 +491,137 @@ (org-previous-link)) (buffer-substring (point) (line-end-position)))))) =20 + +;;; Link regexps + +(ert-deftest test-ol/plain-link-re () + "Test `org-link-plain-re'." + (should + (equal + '("https" "//example.com/qwe()") + (org-test-with-temp-text + "(Some text in parenthesis followed by link with brackets h= ttps://example.com/qwe())" + (list (org-element-property :type (org-element-link-parser)) + (org-element-property :path (org-element-link-parser)))))) + (should + (equal + '("https" "//doi.org/10.1016/0160-791x(79)90023-x") + (org-test-with-temp-text + "https://doi.org/10.1016/0160-791x(79)90023-x" + (list (org-element-property :type (org-element-link-parser)) + (org-element-property :path (org-element-link-parser)))))) + (should + (equal + '("file" "aa") + (org-test-with-temp-text + "The file:aa link" + (list (org-element-property :type (org-element-link-parser)) + (org-element-property :path (org-element-link-parser)))))) + (should + (equal + '("file" "a(b)c") + (org-test-with-temp-text + "The file:a(b)c link" + (list (org-element-property :type (org-element-link-parser)) + (org-element-property :path (org-element-link-parser)))))) + (should + (equal + '("file" "a()") + (org-test-with-temp-text + "The file:a() link" + (list (org-element-property :type (org-element-link-parser)) + (org-element-property :path (org-element-link-parser)))))) + (should + (equal + '("file" "aa((a))") + (org-test-with-temp-text + "The file:aa((a)) link" + (list (org-element-property :type (org-element-link-parser)) + (org-element-property :path (org-element-link-parser)))))) + (should + (equal + '("file" "aa(())") + (org-test-with-temp-text + "The file:aa(()) link" + (list (org-element-property :type (org-element-link-parser)) + (org-element-property :path (org-element-link-parser)))))) + (should + (equal + '("file" "/a") + (org-test-with-temp-text + "The file:/a link" + (list (org-element-property :type (org-element-link-parser)) + (org-element-property :path (org-element-link-parser)))))) + (should + (equal + '("file" "/a/") + (org-test-with-temp-text + "The file:/a/ link" + (list (org-element-property :type (org-element-link-parser)) + (org-element-property :path (org-element-link-parser)))))) + (should + (equal + '("http" "//") + (org-test-with-temp-text + "The http:// link" + (list (org-element-property :type (org-element-link-parser)) + (org-element-property :path (org-element-link-parser)))))) + (should + (equal + '("file" "ab") + (org-test-with-temp-text + "The (some file:ab) link" + (list (org-element-property :type (org-element-link-parser)) + (org-element-property :path (org-element-link-parser)))))) + (should + (equal + '("file" "aa") + (org-test-with-temp-text + "The file:aa) link" + (list (org-element-property :type (org-element-link-parser)) + (org-element-property :path (org-element-link-parser)))))) + (should + (equal + '("file" "aa") + (org-test-with-temp-text + "The file:aa( link" + (list (org-element-property :type (org-element-link-parser)) + (org-element-property :path (org-element-link-parser)))))) + (should + (equal + '("http" "//foo.com/more_(than)_one_(parens)") + (org-test-with-temp-text + "The http://foo.com/more_(than)_one_(parens) link" + (list (org-element-property :type (org-element-link-parser)) + (org-element-property :path (org-element-link-parser)))))) + (should + (equal + '("http" "//foo.com/blah_(wikipedia)#cite-1") + (org-test-with-temp-text + "The http://foo.com/blah_(wikipedia)#cite-1 link" + (list (org-element-property :type (org-element-link-parser)) + (org-element-property :path (org-element-link-parser)))))) + (should + (equal + '("http" "//foo.com/blah_(wikipedia)_blah#cite-1") + (org-test-with-temp-text + "The http://foo.com/blah_(wikipedia)_blah#cite-1 link" + (list (org-element-property :type (org-element-link-parser)) + (org-element-property :path (org-element-link-parser)))))) + (should + (equal + '("http" "//foo.com/unicode_(=E2=9C=AA)_in_parens") + (org-test-with-temp-text + "The http://foo.com/unicode_(=E2=9C=AA)_in_parens link" + (list (org-element-property :type (org-element-link-parser)) + (org-element-property :path (org-element-link-parser)))))) + (should + (equal + '("http" "//foo.com/(something)?after=3Dparens") + (org-test-with-temp-text + "The http://foo.com/(something)?after=3Dparens link" + (list (org-element-property :type (org-element-link-parser)) + (org-element-property :path (org-element-link-parser))))))) + (provide 'test-ol) ;;; test-ol.el ends here --=20 2.26.2 --=-=-=--