From: Maxim Nikulin <manikulin@gmail.com>
To: emacs-orgmode@gnu.org
Subject: Re: URLs with brackets not recognised
Date: Thu, 13 May 2021 23:30:56 +0700 [thread overview]
Message-ID: <s7jk82$q2n$1@ciao.gmane.io> (raw)
In-Reply-To: <m2k0o3u36t.fsf@me.com>
On 13/05/2021 03:06, Rudolf Adamkovič wrote:
> Maxim Nikulin writes:
>
>> I do not think it is a bug. Plain text links detection is a kind of
>> heuristics. It will be always possible to win competition with regexp.
>> Consider it as a limitation requiring some hints from an intelligent
>> user.
>
> I disagree.
Me too. I disagree with most of statements in this thread, even with
some arguments supposed to support my opinion. Exception is Ihor's
message. I hope, more liberal regexp will not interfere with parsing of
other constructs.
Actually I think, you do not realize that detection of URLs in arbitrary
text is tricky. Maybe you have not noticed corner cases before.
False positives may be even more annoying. At least in the past "smart"
detection of smiles and emoji in skype transformed code snippets into
unreadable mess of "glasses of wine" and other "funny" stuff.
> URLs are well-specified. Per RFC 3986,
It describes isolated URI assuming some protocol that allows to
determine begin and end of URI string. It is impossible to unambiguously
extract URLs from text written in human languages. Tom pointed that some
character sequences in URLs can interfere with org markup.
> the characters
> allowed in a URL are [A-Za-z0-9\-._~!$&'()*+,;=:@\/?].
1. Surrounded text may use the same characters. I do not think, you
would be happy if you got
- <https://orgmode.org/,>
- <https://orgmode.org/worg/org-faq.html)>
from
"(see https://orgmode.org/, https://orgmode.org/worg/org-faq.html)"
just because of "," and ")" characters are allowed in URIs. There is
just some heuristics that works more or less acceptable in common cases.
Various implementation have their strong and weak sides.
2. Allowed characters are specified at protocol level. Fortunately in
user interface most of unicode characters are allowed.
Certainly the following URLs are more portable and reliable
https://el.wikipedia.org/wiki/%CE%9B%CE%AC%CE%BC%CE%B4%CE%B1
https://ja.wikipedia.org/wiki/%E6%97%A5%E6%9C%AC
https://ru.wikipedia.org/wiki/%D0%A1%D1%82%D0%BE%D0%BB%D0%BB%D0%BC%D0%B0%D0%BD,_%D0%A0%D0%B8%D1%87%D0%B0%D1%80%D0%B4_%D0%9C%D1%8D%D1%82%D1%82%D1%8C%D1%8E#%D0%9A%D1%80%D0%B0%D1%82%D0%BA%D0%B0%D1%8F_%D0%B1%D0%B8%D0%BE%D0%B3%D1%80%D0%B0%D1%84%D0%B8%D1%8F
However unicode variants are more informative and readable for humans
https://el.wikipedia.org/wiki/Λάμδα
https://ja.wikipedia.org/wiki/日本
https://ru.wikipedia.org/wiki/Столлман,_Ричард_Мэттью#Краткая_биография
The same is applicable for domain names. Extreme case:
https://xn--i-7iq.ws/ - https://i❤️.ws/
Even space characters can be used in query part. Modern applications are
able to convert them to "+" or to "%20" for communication with HTTP servers.
> Org mode should
> implement proper URL detection, not asking its users "to give it some
> hints" and using "a kind of heuristics".
Some tools detect www.google.com as valid URL, others (including org) do
not. Heuristics can evolve in time. Org render on github can differ from
elisp original code. Explicit markup is a way to avoid problems.
More complicated regexp makes it harder to support it. (Explaining to
user that technologies have limitations is a kind of maintenance cost as
well). Long regexp will have performance penalty and still can be fooled.
Example of link that causes problems even with brackets:
https://lists.gnu.org/archive/html/emacs-orgmode/2020-12/msg00706.html
https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#metricsV2:graph=~(view~'timeSeries~stacked~false~metrics~(~(~'CWAgent~'backup_time~'host~'desktop~'metric_type~'timing))~region~'us-east-1);query=~'*7bCWAgent*2chost*2cmetric_type*7d
On 12/05/2021 23:44, Colin Baxter wrote:
> It might be worthwhile to issue an warning each time a url is written in
> an org file without enclosing brackets < > or [[ ]].
Simple links works well. I am afraid that detecting, whether a
particular link is a corner case that needs brackets, may require more
complicated logic than regexp detecting links.
On 13/05/2021 09:21, Tim Cross wrote:
> As this is defined and documented behaviour,
My impression that nuances of recognition of plain text links are not
documented. Even unit tests exists only in the proposed patch. Actually
I do not think that such details are necessary in the manual.
Fontification provides feedback. As soon as problems noticed, explicit
marks can be added.
On 13/05/2021 05:23, Tom Gillespie wrote:
> A quick fix is to percent encode the troublesome characters
org-lint does not like percent encoding in links. It is heritage of a
period when *extra* pass of percent encoding was used to escape square
brackets and spaces. Current recommendation is to escape only brackets
and backslashes leaving spaces as is (however org-fill-paragraph
believes that it has full rights to do something with spaces).
Personally I do not see why adding angle or double square brackets is a
problem. While approaching limits, it is better to stay on the safe
side. Particular case initiated this topic can be solved but more
complicated URLs will arise. Just admit that preparing of documents
requires some collaboration and assistance from users to make intentions
more explicit.
next prev parent reply other threads:[~2021-05-13 16:45 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-12 7:32 URLs with brackets not recognised Colin Baxter
2021-05-12 10:38 ` Nick Savage
2021-05-12 11:58 ` Maxim Nikulin
2021-05-12 13:32 ` Colin Baxter
2021-05-12 16:44 ` Colin Baxter
2021-05-12 20:06 ` Rudolf Adamkovič
2021-05-12 22:23 ` Tom Gillespie
2021-05-13 2:21 ` Tim Cross
2021-05-13 16:30 ` Maxim Nikulin [this message]
2021-05-13 1:25 ` Ihor Radchenko
2021-05-13 5:50 ` Colin Baxter
2021-05-15 9:06 ` Bastien
2021-05-15 9:29 ` Ihor Radchenko
2021-05-15 9:30 ` Ihor Radchenko
2021-05-15 9:47 ` Bastien
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.orgmode.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='s7jk82$q2n$1@ciao.gmane.io' \
--to=manikulin@gmail.com \
--cc=emacs-orgmode@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).