From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0.migadu.com ([2001:41d0:303:e224::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms13.migadu.com with LMTPS id IGwkDtcddGaacAEAqHPOHw:P1 (envelope-from ) for ; Thu, 20 Jun 2024 12:17:27 +0000 Received: from aspmx1.migadu.com ([2001:41d0:303:e224::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0.migadu.com with LMTPS id IGwkDtcddGaacAEAqHPOHw (envelope-from ) for ; Thu, 20 Jun 2024 14:17:27 +0200 X-Envelope-To: larch@yhetil.org Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org" ARC-Seal: i=1; s=key1; d=yhetil.org; t=1718885846; a=rsa-sha256; cv=none; b=gHC8iUFxxtKjgByo0uX0yQ0C6wRog0lFofkusCCZX9oT1lC18+Isk+KwHbPfQxLR+oC/Vg MsJ/7BcVedqXlIMz7DhendQxtzZLAPbANuimm8y1Uk5d/O7vEF0GPKW7iqBp6t9iYw9pt0 wsrTC1eKZ0ZHmuU2mVfGJZPWJ26/rOwafWpmaJCpHkFLhKZs7YzXpalL0ezKWhji9ZFCoO KWLXkum3t7JnOk2PMJ1bswyScs2WIl5lhV+TYsUXCQjFc9t99/FrqJgb1g0zQpPWcx+Pux tHtkWBKdERRlKl9sKn4pLJeP3BcFCkQQgNKjUgIldMBN+H+r6r+TfWNxpyo0OA== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1718885846; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=cgLF2z1Lzv9Hr/T70TpIjhs9uyQFnDfKaQOAD6L/Syg=; b=KUdhSXUpCJJnlXyuG2R/IC3gU+cNDvLuUzDKbgWC9aua5wguS9ss2O7OT/Yo6URIMlMcwO 5sS96LmPVUbnOEx8Kw52yRIbRC85UHPyFIeRixlkr2gndQtd9pWORwPXSfUkwqnAvbFTlI sez3YOEIeEvmg/h7n711pfS225OgrKwK+DGOsYq7WUjZ2qEY4pOHsTbhTrm6q4XFBI2X/D 0fbmFaSLphSMMEpFhVumHLio3Ud0mWkL2eBNlm6z5Shcts5YwKysw0X/O1QNwrRETHv7T4 W1blfqWSuYLX/pTCD+xU3+QSpPAboc4URtJTzHghqPk5+R58lOIau8+gfnpfZg== Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id B043565085 for ; Thu, 20 Jun 2024 14:17:26 +0200 (CEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sKGiP-0001qj-GL; Thu, 20 Jun 2024 08:16:33 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sKGi6-0001pC-5S for emacs-orgmode@gnu.org; Thu, 20 Jun 2024 08:16:15 -0400 Received: from ciao.gmane.io ([116.202.254.214]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sKGi4-0007AM-EM for emacs-orgmode@gnu.org; Thu, 20 Jun 2024 08:16:13 -0400 Received: from list by ciao.gmane.io with local (Exim 4.92) (envelope-from ) id 1sKGi1-0005lg-Ad for emacs-orgmode@gnu.org; Thu, 20 Jun 2024 14:16:09 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: emacs-orgmode@gnu.org From: Max Nikulin Subject: Re: [BUG] Trailing dash is not included in link [9.7.3 (9.7.3-2f1844 @ /home/mwillcock/.emacs.d/elpa/org-9.7.3/)] Date: Thu, 20 Jun 2024 19:15:58 +0700 Message-ID: References: <87sexh9ddv.fsf@ice9.digital> <87le37k4c8.fsf@localhost> <875xu86fq1.fsf@localhost> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit User-Agent: Mozilla Thunderbird Content-Language: en-US, ru-RU In-Reply-To: <875xu86fq1.fsf@localhost> Received-SPF: pass client-ip=116.202.254.214; envelope-from=geo-emacs-orgmode@m.gmane-mx.org; helo=ciao.gmane.io X-Spam_score_int: 26 X-Spam_score: 2.6 X-Spam_bar: ++ X-Spam_report: (2.6 / 5.0 requ) BAYES_00=-1.9, DKIM_ADSP_CUSTOM_MED=0.001, FORGED_GMAIL_RCVD=1, FORGED_MUA_MOZILLA=2.309, FREEMAIL_FORGED_FROMDOMAIN=0.001, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.25, NML_ADSP_CUSTOM_MED=0.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: emacs-orgmode-bounces+larch=yhetil.org@gnu.org X-Migadu-Flow: FLOW_IN X-Migadu-Country: US X-Migadu-Spam-Score: -5.75 X-Migadu-Scanner: mx12.migadu.com X-Spam-Score: -5.75 X-Migadu-Queue-Id: B043565085 X-TUID: 5G9lwNWC3Urg On 16/06/2024 22:59, Ihor Radchenko wrote: > Max Nikulin writes: >> >> I suspect, it worked prior to v9.5. Without a unit test it may be >> accidentally broken again. > > No, it did not work. > If you can, please do not make such assertions without testing. I am sorry, I had no intention to offend you. I missed that the removed line with explicit list of punctuation characters was commented out. I have tried the regexp used before (a part of v6.34) facedba05 2009-12-09 15:13:50 +0100 Carsten Dominik: Use John Gruber's regular expression for URL's and it seems trailing dash was allowed. >>> +: https://domain/test- >> >> example.org, example.net, example.com are domains reserved for usage in >> examples: >> > > And so? http://example.org/dash- may be a bit better for docs. (For IPv6 addresses the difference should be more noticeable, but I do not remember what range is reserved for usage in examples there.) >> I have realized that some Org regexps use [:punct:] *regexp class* and >> others *syntax class*, see latex math regexp. I am in doubts if the >> discrepancy is intentional. > > It is not intentional, but using syntax classes can sometimes be > fragile. Do you mean that result depends on current buffer? I do not have strong opinion what variant should be used. What I do not like is that in the case of $n$-th the character after second "$" is tested against syntax class, while regexp class is used for links. This subtle difference is almost certainly ignored in alternative implementations of the parser. However I am not sure what characters besides dash and apostrophe are affected and whether it depends on locale. >> 09ced6d2c 2024-02-03 15:15:46 +0100 Ihor Radchenko: org-link-plain-re: >> Improve regexp heuristics [...] >> (link http://example.org/a It is heuristics. We cannot be 100% right. So, it is what it is. From my point of view it is at least close to a regression. I do not have any argument against http://example.org/a, but the regexp should not match whole "http://example.org/a> Nowadays it is likely better to inspect >> autolinking code for GitHub/GitLab or widely used python packages. > > If you have concrete proposals, please share them. Not yet. I consider inspecting mozilla's code as a kind of negative result from the point of view of usefulness for Org. Expanding test suite by gathering examples of failed heuristics from bug reports require enough reports. https://wpt.live/url/resources/urltestdata.json (https://github.com/web-platform-tests/wpt) is too specific for browsers and HTML/JS. >> I would consider [:space:] or \s-. > > Do you mean "[^[:punct:][:space:]\t\n]"? I believe it might be an improvement ([:space:] includes \t).