emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Ihor Radchenko <yantar92@posteo.net>
To: Max Nikulin <manikulin@gmail.com>
Cc: emacs-orgmode@gnu.org
Subject: Re: [BUG] Trailing dash is not included in link [9.7.3 (9.7.3-2f1844 @ /home/mwillcock/.emacs.d/elpa/org-9.7.3/)]
Date: Sat, 22 Jun 2024 13:41:15 +0000	[thread overview]
Message-ID: <874j9lhz7o.fsf@localhost> (raw)
In-Reply-To: <v516i1$kuv$1@ciao.gmane.io>

Max Nikulin <manikulin@gmail.com> writes:

>> If you can, please do not make such assertions without testing.
>
> I am sorry, I had no intention to offend you. I missed that the removed 
> line with explicit list of punctuation characters was commented out. I 
> have tried the regexp used before (a part of v6.34)

>      facedba05 2009-12-09 15:13:50 +0100 Carsten Dominik: Use John 
> Gruber's regular expression for URL's
>
> and it seems trailing dash was allowed.

Hmm. That's a really long time ago, earlier than built-in Org in Emacs
versions that are available in various distros. My reading of "prior to
v9.5" was more like "not too far before v9.5" (and I tested everything
down to Org mode included into Emacs 26).

>>>> +: https://domain/test-
>>>
>>> example.org, example.net, example.com are domains reserved for usage in
>>> examples:
>>> <https://www.iana.org/assignments/special-use-domain-names/special-use-domain-names.xhtml>
>> 
>> And so?
>
> http://example.org/dash- may be a bit better for docs. (For IPv6 
> addresses the difference should be more noticeable, but I do not 
> remember what range is reserved for usage in examples there.)

I see. I would not mind installing a patch, if you submit it.

>>> I have realized that some Org regexps use [:punct:] *regexp class* and
>>> others *syntax class*, see latex math regexp. I am in doubts if the
>>> discrepancy is intentional.
>> 
>> It is not intentional, but using syntax classes can sometimes be
>> fragile.
>
> Do you mean that result depends on current buffer? I do not have strong 
> opinion what variant should be used.

Not current buffer. Current syntax table, inherited from
outline-mode. And that syntax table is customized by some users, leading
to Org parser behaving unexpectedly in some scenarios.

Also, there is 'syntax-table text property, and I have managed to break
Org parser in the past by trying to apply 'syntax-table property to code
blocks in Org mode (I was trying to solve `forward-sexp' bug people
frequently report).

So, we should generally avoid using syntax tables, so that Org syntax
becomes independent of user customizations in that area. Or, at least,
we should not introduce more syntax class uses when possible.

> ... What I do not like is that in the 
> case of $n$-th the character after second "$" is tested against syntax 
> class, while regexp class is used for links. This subtle difference is 
> almost certainly ignored in alternative implementations of the parser. 
> However I am not sure what characters besides dash and apostrophe are 
> affected and whether it depends on locale.

These kinds of inconsistencies should be solved eventually. We should not
use locale, but UTF syntax classes; and document it in org-syntax
document.

>>> 09ced6d2c 2024-02-03 15:15:46 +0100 Ihor Radchenko: org-link-plain-re:
>>> Improve regexp heuristics
> [...]
>>>       (link http://example.org/a<b)
> [...]
>> It is heuristics. We cannot be 100% right. So, it is what it is.
>
>  From my point of view it is at least close to a regression. I do not 
> have any argument against http://example.org/a<b>, but the regexp should 
> not match whole "http://example.org/a<b)"

No bug reports, so your point is rather theoretical.

I do not mind improving the regexp, of course, but I am afraid that we
will need PEG or `org-element--parse-paired-brackets' to match paired
brackets accurately. And that kind of change will be breaking - we will
need to trash the regexp variable.

>>> I would consider [:space:] or \s-.
>> 
>> Do you mean "[^[:punct:][:space:]\t\n]"?
>
> I believe it might be an improvement ([:space:] includes \t).

https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=6cada29c0

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


      reply	other threads:[~2024-06-22 13:40 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-13 13:32 [BUG] Trailing dash is not included in link [9.7.3 (9.7.3-2f1844 @ /home/mwillcock/.emacs.d/elpa/org-9.7.3/)] Morgan Willcock
2024-06-14 14:04 ` Ihor Radchenko
2024-06-16 15:43   ` Max Nikulin
2024-06-16 15:59     ` Ihor Radchenko
2024-06-20 12:15       ` Max Nikulin
2024-06-22 13:41         ` Ihor Radchenko [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874j9lhz7o.fsf@localhost \
    --to=yantar92@posteo.net \
    --cc=emacs-orgmode@gnu.org \
    --cc=manikulin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).