emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* URLs with brackets not recognised
@ 2021-05-12  7:32 Colin Baxter
  2021-05-12 10:38 ` Nick Savage
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Colin Baxter @ 2021-05-12  7:32 UTC (permalink / raw)
  To: emacs-orgmode

Hello,

If a URL link in an org file contains a bracket then the link is not
fully recognised and may be misdirected. For example,

1. emacs -Q <RET>
2. C-x C-f test.org <RET>
3. Insert the URL

https://en.wikipedia.org/wiki/Cathedral_Basilica_of_St._John_the_Baptist_(Savannah,_Georgia)

4. Click on the link.
5. Link is not directed correctly.


Best wishes,

Colin.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: URLs with brackets not recognised
  2021-05-12  7:32 URLs with brackets not recognised Colin Baxter
@ 2021-05-12 10:38 ` Nick Savage
  2021-05-12 11:58   ` Maxim Nikulin
  2021-05-13  1:25 ` Ihor Radchenko
  2021-05-15  9:06 ` Bastien
  2 siblings, 1 reply; 15+ messages in thread
From: Nick Savage @ 2021-05-12 10:38 UTC (permalink / raw)
  To: emacs-orgmode

I can confirm this as a bug.

On 5/12/21 3:32 AM, Colin Baxter wrote:
> Hello,
>
> If a URL link in an org file contains a bracket then the link is not
> fully recognised and may be misdirected. For example,
>
> 1. emacs -Q <RET>
> 2. C-x C-f test.org <RET>
> 3. Insert the URL
>
> https://en.wikipedia.org/wiki/Cathedral_Basilica_of_St._John_the_Baptist_(Savannah,_Georgia)
>
> 4. Click on the link.
> 5. Link is not directed correctly.
>
>
> Best wishes,
>
> Colin.
>
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: URLs with brackets not recognised
  2021-05-12 10:38 ` Nick Savage
@ 2021-05-12 11:58   ` Maxim Nikulin
  2021-05-12 13:32     ` Colin Baxter
  2021-05-12 20:06     ` Rudolf Adamkovič
  0 siblings, 2 replies; 15+ messages in thread
From: Maxim Nikulin @ 2021-05-12 11:58 UTC (permalink / raw)
  To: emacs-orgmode

On 12/05/2021 17:38, Nick Savage wrote:
> I can confirm this as a bug.

 > On 5/12/21 3:32 AM, Colin Baxter wrote:
>> https://en.wikipedia.org/wiki/Cathedral_Basilica_of_St._John_the_Baptist_(Savannah,_Georgia) 

I do not think it is a bug. Plain text links detection is a kind of 
heuristics. It will be always possible to win competition with regexp. 
Consider it as a limitation requiring some hints from an intelligent user.

Do the following links have the same problem?

- 
<https://en.wikipedia.org/wiki/Cathedral_Basilica_of_St._John_the_Baptist_(Savannah,_Georgia)>
- 
[[https://en.wikipedia.org/wiki/Cathedral_Basilica_of_St._John_the_Baptist_(Savannah,_Georgia)]]

Support of parenthesis exists, but it is not perfect. Punctuation often 
should be outside of URL.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: URLs with brackets not recognised
  2021-05-12 11:58   ` Maxim Nikulin
@ 2021-05-12 13:32     ` Colin Baxter
  2021-05-12 16:44       ` Colin Baxter
  2021-05-12 20:06     ` Rudolf Adamkovič
  1 sibling, 1 reply; 15+ messages in thread
From: Colin Baxter @ 2021-05-12 13:32 UTC (permalink / raw)
  To: Maxim Nikulin; +Cc: emacs-orgmode

>>>>> Maxim Nikulin <manikulin@gmail.com> writes:

    > On 12/05/2021 17:38, Nick Savage wrote:
    >> I can confirm this as a bug.

    >> On 5/12/21 3:32 AM, Colin Baxter wrote:
    >>> https://en.wikipedia.org/wiki/Cathedral_Basilica_of_St._John_the_Baptist_(Savannah,_Georgia)

    > I do not think it is a bug. Plain text links detection is a kind
    > of heuristics. It will be always possible to win competition with
    > regexp. Consider it as a limitation requiring some hints from an
    > intelligent user.

    > Do the following links have the same problem?

    > -
    > <https://en.wikipedia.org/wiki/Cathedral_Basilica_of_St._John_the_Baptist_(Savannah,_Georgia)>
    > -
    > [[https://en.wikipedia.org/wiki/Cathedral_Basilica_of_St._John_the_Baptist_(Savannah,_Georgia)]]

    > Support of parenthesis exists, but it is not perfect. Punctuation
    > often should be outside of URL.

Point taken.

On the other hand, there has been a series of posts on emacs-devel asking
how emacs could be made more user-friendly. Perhaps therefore not
requiring a user to consider how to enter a URL might be useful.

Best wishes,


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: URLs with brackets not recognised
  2021-05-12 13:32     ` Colin Baxter
@ 2021-05-12 16:44       ` Colin Baxter
  0 siblings, 0 replies; 15+ messages in thread
From: Colin Baxter @ 2021-05-12 16:44 UTC (permalink / raw)
  To: emacs-orgmode


It might be worthwhile to issue an warning each time a url is written in
an org file without enclosing brackets < > or [[ ]].



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: URLs with brackets not recognised
  2021-05-12 11:58   ` Maxim Nikulin
  2021-05-12 13:32     ` Colin Baxter
@ 2021-05-12 20:06     ` Rudolf Adamkovič
  2021-05-12 22:23       ` Tom Gillespie
                         ` (2 more replies)
  1 sibling, 3 replies; 15+ messages in thread
From: Rudolf Adamkovič @ 2021-05-12 20:06 UTC (permalink / raw)
  To: emacs-orgmode

Maxim Nikulin <manikulin@gmail.com> writes:

> I do not think it is a bug. Plain text links detection is a kind of
> heuristics. It will be always possible to win competition with regexp. 
> Consider it as a limitation requiring some hints from an intelligent
> user.

I disagree. URLs are well-specified. Per RFC 3986, the characters
allowed in a URL are [A-Za-z0-9\-._~!$&'()*+,;=:@\/?]. Org mode should
implement proper URL detection, not asking its users "to give it some
hints" and using "a kind of heuristics". A string either is a valid URL
per the relevant RFCs or it is not.

-- Rudy




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: URLs with brackets not recognised
  2021-05-12 20:06     ` Rudolf Adamkovič
@ 2021-05-12 22:23       ` Tom Gillespie
  2021-05-13  2:21       ` Tim Cross
  2021-05-13 16:30       ` Maxim Nikulin
  2 siblings, 0 replies; 15+ messages in thread
From: Tom Gillespie @ 2021-05-12 22:23 UTC (permalink / raw)
  To: emacs-orgmode

A quick fix is to percent encode the troublesome characters, but the
underlying issue is in org-link-any-re which is defined in
org-link-make-regexps which is what org uses to find the next link.
Some improvements might be possible for some of the edge cases there,
but a complete solution for bare urls is not possible due to conflicts
with native org syntax.

Org doesn't handle these cases well because in some cases org's own
syntax takes priority over url syntax at the moment adding bare url
syntax as part of org syntax is something that could be considered.
However, I would suggest against that because it will taint any org
parser in the future by forcing it to implement full url parsing at
arbitrary positions in paragraphs, which adds a lot of complexity. I
also suggest against it because org already has clear ways to
demarcate links using <> and [[]] which are guaranteed to behave
correctly even in cases where org syntax will always take priority.
For example with
https://en.wikipedia.org/wiki/Cathedral_Basilica_of_St._John_the_Baptist_[[Savannah,_Georgia]].

> It might be worthwhile to issue an warning each time a url is written in
> an org file without enclosing brackets < > or [[ ]].

Unfortunately warning on links without < > or [[ ]] will generate
countless annoying false positives for anyone who doesn't hit this
edge case. Maybe a separate function could be added to org lint that
would not run all the time?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: URLs with brackets not recognised
  2021-05-12  7:32 URLs with brackets not recognised Colin Baxter
  2021-05-12 10:38 ` Nick Savage
@ 2021-05-13  1:25 ` Ihor Radchenko
  2021-05-13  5:50   ` Colin Baxter
  2021-05-15  9:06 ` Bastien
  2 siblings, 1 reply; 15+ messages in thread
From: Ihor Radchenko @ 2021-05-13  1:25 UTC (permalink / raw)
  To: Colin Baxter; +Cc: emacs-orgmode

Colin Baxter <m43cap@yandex.com> writes:

> Hello,
>
> If a URL link in an org file contains a bracket then the link is not
> fully recognised and may be misdirected. For example,

This is a known bug [1]. You just need to wait for the patch to be
merged.

[1] https://orgmode.org/list/87v99g4p3m.fsf@localhost/


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: URLs with brackets not recognised
  2021-05-12 20:06     ` Rudolf Adamkovič
  2021-05-12 22:23       ` Tom Gillespie
@ 2021-05-13  2:21       ` Tim Cross
  2021-05-13 16:30       ` Maxim Nikulin
  2 siblings, 0 replies; 15+ messages in thread
From: Tim Cross @ 2021-05-13  2:21 UTC (permalink / raw)
  To: emacs-orgmode


Rudolf Adamkovič <salutis@me.com> writes:

> Maxim Nikulin <manikulin@gmail.com> writes:
>
>> I do not think it is a bug. Plain text links detection is a kind of
>> heuristics. It will be always possible to win competition with regexp. 
>> Consider it as a limitation requiring some hints from an intelligent
>> user.
>
> I disagree. URLs are well-specified. Per RFC 3986, the characters
> allowed in a URL are [A-Za-z0-9\-._~!$&'()*+,;=:@\/?]. Org mode should
> implement proper URL detection, not asking its users "to give it some
> hints" and using "a kind of heuristics". A string either is a valid URL
> per the relevant RFCs or it is not.
>

Limitations with plain text links are documented in the manual, with an
explanation of why you need to use the org link insertion commands to
created a valid link which escapes the problematic characters.

As this is defined and documented behaviour, I don't see how it can be
considered a bug. You might consider it a frustrating or even
unnecessary limitation, but not a bug.

I'm sure a patch which improves org handling of plain urls would be
considered. However, previous attempts at such enhancements have either
resulted in significant performance impact or unexpected and unwanted
side effects. In short, this is a non-trivial problem to solve. As the
need for such use cases in plain text links is a small use case and as
you can have those links using org link syntax, it would be very hard to
justify a patch which may have adverse performance impact for all users. 

This change could be considered a feature enhancement, but it is not a
bug.

-- 
Tim Cross


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: URLs with brackets not recognised
  2021-05-13  1:25 ` Ihor Radchenko
@ 2021-05-13  5:50   ` Colin Baxter
  0 siblings, 0 replies; 15+ messages in thread
From: Colin Baxter @ 2021-05-13  5:50 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: emacs-orgmode

>>>>> Ihor Radchenko <yantar92@gmail.com> writes:

    > Colin Baxter <m43cap@yandex.com> writes:
    >> Hello,
    >> 
    >> If a URL link in an org file contains a bracket then the link is
    >> not fully recognised and may be misdirected. For example,

    > This is a known bug [1]. You just need to wait for the patch to be
    > merged.

    > [1] https://orgmode.org/list/87v99g4p3m.fsf@localhost/


Thank you. I did check the bug list first, but obviously I missed that.

Best wishes,


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: URLs with brackets not recognised
  2021-05-12 20:06     ` Rudolf Adamkovič
  2021-05-12 22:23       ` Tom Gillespie
  2021-05-13  2:21       ` Tim Cross
@ 2021-05-13 16:30       ` Maxim Nikulin
  2 siblings, 0 replies; 15+ messages in thread
From: Maxim Nikulin @ 2021-05-13 16:30 UTC (permalink / raw)
  To: emacs-orgmode

On 13/05/2021 03:06, Rudolf Adamkovič wrote:
> Maxim Nikulin writes:
> 
>> I do not think it is a bug. Plain text links detection is a kind of
>> heuristics. It will be always possible to win competition with regexp.
>> Consider it as a limitation requiring some hints from an intelligent
>> user.
> 
> I disagree.

Me too. I disagree with most of statements in this thread, even with 
some arguments supposed to support my opinion. Exception is Ihor's 
message. I hope, more liberal regexp will not interfere with parsing of 
other constructs.

Actually I think, you do not realize that detection of URLs in arbitrary 
text is tricky. Maybe you have not noticed corner cases before.

False positives may be even more annoying. At least in the past "smart" 
detection of smiles and emoji in skype transformed code snippets into 
unreadable mess of "glasses of wine" and other "funny" stuff.

> URLs are well-specified. Per RFC 3986,

It describes isolated URI assuming some protocol that allows to 
determine begin and end of URI string. It is impossible to unambiguously 
extract URLs from text written in human languages. Tom pointed that some 
character sequences in URLs can interfere with org markup.

> the characters
> allowed in a URL are [A-Za-z0-9\-._~!$&'()*+,;=:@\/?].

1. Surrounded text may use the same characters. I do not think, you 
would be happy if you got

- <https://orgmode.org/,>
- <https://orgmode.org/worg/org-faq.html)>

from

"(see https://orgmode.org/, https://orgmode.org/worg/org-faq.html)"

just because of "," and ")" characters are allowed in URIs. There is 
just some heuristics that works more or less acceptable in common cases. 
Various implementation have their strong and weak sides.

2. Allowed characters are specified at protocol level. Fortunately in 
user interface most of unicode characters are allowed.

Certainly the following URLs are more portable and reliable
https://el.wikipedia.org/wiki/%CE%9B%CE%AC%CE%BC%CE%B4%CE%B1
https://ja.wikipedia.org/wiki/%E6%97%A5%E6%9C%AC
https://ru.wikipedia.org/wiki/%D0%A1%D1%82%D0%BE%D0%BB%D0%BB%D0%BC%D0%B0%D0%BD,_%D0%A0%D0%B8%D1%87%D0%B0%D1%80%D0%B4_%D0%9C%D1%8D%D1%82%D1%82%D1%8C%D1%8E#%D0%9A%D1%80%D0%B0%D1%82%D0%BA%D0%B0%D1%8F_%D0%B1%D0%B8%D0%BE%D0%B3%D1%80%D0%B0%D1%84%D0%B8%D1%8F
However unicode variants are more informative and readable for humans
https://el.wikipedia.org/wiki/Λάμδα
https://ja.wikipedia.org/wiki/日本
https://ru.wikipedia.org/wiki/Столлман,_Ричард_Мэттью#Краткая_биография

The same is applicable for domain names. Extreme case: 
https://xn--i-7iq.ws/ - https://i❤️.ws/

Even space characters can be used in query part. Modern applications are 
able to convert them to "+" or to "%20" for communication with HTTP servers.

> Org mode should
> implement proper URL detection, not asking its users "to give it some
> hints" and using "a kind of heuristics".

Some tools detect www.google.com as valid URL, others (including org) do 
not. Heuristics can evolve in time. Org render on github can differ from 
elisp original code. Explicit markup is a way to avoid problems.

More complicated regexp makes it harder to support it. (Explaining to 
user that technologies have limitations is a kind of maintenance cost as 
well). Long regexp will have performance penalty and still can be fooled.

Example of link that causes problems even with brackets:
https://lists.gnu.org/archive/html/emacs-orgmode/2020-12/msg00706.html
https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#metricsV2:graph=~(view~'timeSeries~stacked~false~metrics~(~(~'CWAgent~'backup_time~'host~'desktop~'metric_type~'timing))~region~'us-east-1);query=~'*7bCWAgent*2chost*2cmetric_type*7d

On 12/05/2021 23:44, Colin Baxter wrote:
> It might be worthwhile to issue an warning each time a url is written in
> an org file without enclosing brackets < > or [[ ]].

Simple links works well. I am afraid that detecting, whether a 
particular link is a corner case that needs brackets, may require more 
complicated logic than regexp detecting links.

On 13/05/2021 09:21, Tim Cross wrote:
 > As this is defined and documented behaviour,

My impression that nuances of recognition of plain text links are not 
documented. Even unit tests exists only in the proposed patch. Actually 
I do not think that such details are necessary in the manual. 
Fontification provides feedback. As soon as problems noticed, explicit 
marks can be added.

On 13/05/2021 05:23, Tom Gillespie wrote:
> A quick fix is to percent encode the troublesome characters

org-lint does not like percent encoding in links. It is heritage of a 
period when *extra* pass of percent encoding was used to escape square 
brackets and spaces. Current recommendation is to escape only brackets 
and backslashes leaving spaces as is (however org-fill-paragraph 
believes that it has full rights to do something with spaces).

Personally I do not see why adding angle or double square brackets is a 
problem. While approaching limits, it is better to stay on the safe 
side. Particular case initiated this topic can be solved but more 
complicated URLs will arise. Just admit that preparing of documents 
requires some collaboration and assistance from users to make intentions 
more explicit.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: URLs with brackets not recognised
  2021-05-12  7:32 URLs with brackets not recognised Colin Baxter
  2021-05-12 10:38 ` Nick Savage
  2021-05-13  1:25 ` Ihor Radchenko
@ 2021-05-15  9:06 ` Bastien
  2021-05-15  9:29   ` Ihor Radchenko
  2 siblings, 1 reply; 15+ messages in thread
From: Bastien @ 2021-05-15  9:06 UTC (permalink / raw)
  To: Colin Baxter; +Cc: emacs-orgmode

Hi Colin,

Colin Baxter <m43cap@yandex.com> writes:

> https://en.wikipedia.org/wiki/Cathedral_Basilica_of_St._John_the_Baptist_(Savannah,_Georgia)

I agree this is annoying.

Patch welcome, with the caveat that fixing this is more difficult than
it seems, as the thread suggests.

-- 
 Bastien


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: URLs with brackets not recognised
  2021-05-15  9:06 ` Bastien
@ 2021-05-15  9:29   ` Ihor Radchenko
  2021-05-15  9:30     ` Ihor Radchenko
  0 siblings, 1 reply; 15+ messages in thread
From: Ihor Radchenko @ 2021-05-15  9:29 UTC (permalink / raw)
  To: Bastien; +Cc: Colin Baxter, emacs-orgmode

Bastien <bzg@gnu.org> writes:
> Patch welcome, with the caveat that fixing this is more difficult than
> it seems, as the thread suggests.

I have provided the patch in [1]. Are there any issues with it?

[1] https://orgmode.org/list/87v99g4p3m.fsf@localhost/

Best,
Ihor


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: URLs with brackets not recognised
  2021-05-15  9:29   ` Ihor Radchenko
@ 2021-05-15  9:30     ` Ihor Radchenko
  2021-05-15  9:47       ` Bastien
  0 siblings, 1 reply; 15+ messages in thread
From: Ihor Radchenko @ 2021-05-15  9:30 UTC (permalink / raw)
  To: Bastien; +Cc: Colin Baxter, emacs-orgmode

Oops. Just realised that you already applied it. Sorry for the noise.

Ihor Radchenko <yantar92@gmail.com> writes:

> Bastien <bzg@gnu.org> writes:
>> Patch welcome, with the caveat that fixing this is more difficult than
>> it seems, as the thread suggests.
>
> I have provided the patch in [1]. Are there any issues with it?
>
> [1] https://orgmode.org/list/87v99g4p3m.fsf@localhost/
>
> Best,
> Ihor

-- 
Ihor Radchenko,
PhD,
Center for Advancing Materials Performance from the Nanoscale (CAMP-nano)
State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong University, Xi'an, China
Email: yantar92@gmail.com, ihor_radchenko@alumni.sutd.edu.sg


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: URLs with brackets not recognised
  2021-05-15  9:30     ` Ihor Radchenko
@ 2021-05-15  9:47       ` Bastien
  0 siblings, 0 replies; 15+ messages in thread
From: Bastien @ 2021-05-15  9:47 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: Colin Baxter, emacs-orgmode

Ihor Radchenko <yantar92@gmail.com> writes:

> Oops. Just realised that you already applied it. Sorry for the
> noise.

My bad, I thought your patch was fixing only part of the OP issue,
I see now it fixes all of it.  Thanks!

-- 
 Bastien


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-05-15  9:49 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-05-12  7:32 URLs with brackets not recognised Colin Baxter
2021-05-12 10:38 ` Nick Savage
2021-05-12 11:58   ` Maxim Nikulin
2021-05-12 13:32     ` Colin Baxter
2021-05-12 16:44       ` Colin Baxter
2021-05-12 20:06     ` Rudolf Adamkovič
2021-05-12 22:23       ` Tom Gillespie
2021-05-13  2:21       ` Tim Cross
2021-05-13 16:30       ` Maxim Nikulin
2021-05-13  1:25 ` Ihor Radchenko
2021-05-13  5:50   ` Colin Baxter
2021-05-15  9:06 ` Bastien
2021-05-15  9:29   ` Ihor Radchenko
2021-05-15  9:30     ` Ihor Radchenko
2021-05-15  9:47       ` Bastien

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).