emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Max Nikulin <manikulin@gmail.com>
To: emacs-orgmode@gnu.org
Subject: Re: [BUG] Trailing dash is not included in link [9.7.3 (9.7.3-2f1844 @ /home/mwillcock/.emacs.d/elpa/org-9.7.3/)]
Date: Sun, 16 Jun 2024 22:43:39 +0700	[thread overview]
Message-ID: <v4n17e$gnr$1@ciao.gmane.io> (raw)
In-Reply-To: <87le37k4c8.fsf@localhost>

On 14/06/2024 21:04, Ihor Radchenko wrote:
> Morgan Willcock writes:
> 
>> i.e. Inserting "https://domain/test-" into the buffer will create a
>> clickable link for "https://domain/test".
>>
> I improved the heuristics we use to detect plain links.
> Fixed, on main.
> https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=73da6beb5

> +++ b/etc/ORG-NEWS
[...]
> +*** Trailing =-= is now allowed in plain links

After a look into

7dcb1afb6 2021-03-24 21:27:24 +0800 Ihor Radchenko: Improve 
org-link-plain-re

I suspect, it worked prior to v9.5. Without a unit test it may be 
accidentally broken again.

> +: https://domain/test-

example.org, example.net, example.com are domains reserved for usage in 
examples: 
<https://www.iana.org/assignments/special-use-domain-names/special-use-domain-names.xhtml>

>                     (or (regexp "[^[:punct:] \t\n]")

I have realized that some Org regexps use [:punct:] *regexp class* and 
others *syntax class*, see latex math regexp. I am in doubts if the 
discrepancy is intentional.

I have noticed that the following change

09ced6d2c 2024-02-03 15:15:46 +0100 Ihor Radchenko: org-link-plain-re: 
Improve regexp heuristics

that causes

     (link http://example.org/a<b)

input is exported as

     <p>
     (link <a 
href="http://example.org/a%3Cb)">http://example.org/a%3Cb)</a></p>

I expect that ")" should not be parsed as a part of the link. Balanced 
brackets are tricky with regexps (and it is not possible to match 
arbitrary nested ones).

Perhaps "[^[:punct:] \t\n]" is too strict in respect to spaces. It does 
not allow the recommended workaround with zero width space:

(org-export-string-as
  "http://example.org\N{ZERO WIDTH SPACE}[fn::footnote]" 'html 'body)
"<p>
<a 
href=\"http://example.org​[fn::footnote]\">http://example.org​[fn::footnote]</a></p>
"

Actually some kind of non-breakable space should be better in such cases:

(org-export-string-as
  "http://example.org\N{NO-BREAK SPACE}[fn::footnote]" 'html 'body)
"<p>
<a 
href=\"http://example.org [fn::footnote]\">http://example.org [fn::footnote]</a></p>
"

I would consider [:space:] or \s-.

As to the original bug report, while reading it, I noticed that 
thunderbird includes dash into the recognized link for

   "https://domain/test-"

I decided to look into its implementation and to my surprise I found: 
``punctation chars and "-" at the end are stipped off.'' I realized that 
double quotes along with angle brackets are treated as a recommended way 
to mark URLs in plain text. Thunderbird does not consider dash as a part 
of links for e.g. http://example.org/t- It might be an attempt to 
reserve possibility to assemble URLs wrapped into several lines with 
added hyphenation marks, but it has not been implemented (RFC2396 
appendix E warns about accidentally added hyphens).

https://www.bucksch.org/1/projects/mozilla/16507/
https://searchfox.org/mozilla-central/source/netwerk/streamconv/converters/mozTXTToHTMLConv.cpp#line-243
mozTXTToHTMLConv::FindURLEnd

Implementation is tricky, I have not noticed anything that may be reused 
to improve heuristics for Org. Nowadays it is likely better to inspect 
autolinking code for GitHub/GitLab or widely used python packages.



  reply	other threads:[~2024-06-16 15:44 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-13 13:32 [BUG] Trailing dash is not included in link [9.7.3 (9.7.3-2f1844 @ /home/mwillcock/.emacs.d/elpa/org-9.7.3/)] Morgan Willcock
2024-06-14 14:04 ` Ihor Radchenko
2024-06-16 15:43   ` Max Nikulin [this message]
2024-06-16 15:59     ` Ihor Radchenko
2024-06-20 12:15       ` Max Nikulin
2024-06-22 13:41         ` Ihor Radchenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='v4n17e$gnr$1@ciao.gmane.io' \
    --to=manikulin@gmail.com \
    --cc=emacs-orgmode@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).