emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* [BUG] Exporting italic link with bang inside to html fails to parse the link [9.5.2 (N/A @ /gnu/store/89yvbijwnvsbpa5h33mvbgh1gy9w30n2-emacs-org-9.5.2/share/emacs/site-lisp/org-9.5.2/)]
@ 2022-04-05 10:18 Dr. Arne Babenhauserheide
  2022-04-30  9:37 ` Ihor Radchenko
  0 siblings, 1 reply; 7+ messages in thread
From: Dr. Arne Babenhauserheide @ 2022-04-05 10:18 UTC (permalink / raw)
  To: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 1071 bytes --]



Remember to cover the basics, that is, what you expected to happen and
what in fact did happen.  You don't know how to make a good report?  See

     https://orgmode.org/manual/Feedback.html#Feedback

Your bug report will be posted to the Org mailing list.
------------------------------------------------------------------------

To reproduce:

- create an org-file with the following content:
/Foo [[https://taz.de/!5843294/][link with a bang]]/
- M-x org-html-export-to-html

Expected: The HTML-file contains an italic link named "link with a bang".

Actual: The HTML-file contains a broken link with only the domain:
<i>Foo [[<a href="https://taz.de">https://taz.de</a></i>!5843294/][link with a bang]]/</p>

Best wishes,
Arne

Emacs  : GNU Emacs 27.2 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.30, cairo version 1.16.0)
Package: Org mode version 9.5.2 (N/A @ /gnu/store/89yvbijwnvsbpa5h33mvbgh1gy9w30n2-emacs-org-9.5.2/share/emacs/site-lisp/org-9.5.2/)
-- 
Unpolitisch sein
heißt politisch sein,
ohne es zu merken.
draketo.de

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 1125 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG] Exporting italic link with bang inside to html fails to parse the link [9.5.2 (N/A @ /gnu/store/89yvbijwnvsbpa5h33mvbgh1gy9w30n2-emacs-org-9.5.2/share/emacs/site-lisp/org-9.5.2/)]
  2022-04-05 10:18 [BUG] Exporting italic link with bang inside to html fails to parse the link [9.5.2 (N/A @ /gnu/store/89yvbijwnvsbpa5h33mvbgh1gy9w30n2-emacs-org-9.5.2/share/emacs/site-lisp/org-9.5.2/)] Dr. Arne Babenhauserheide
@ 2022-04-30  9:37 ` Ihor Radchenko
  2022-04-30 11:47   ` Max Nikulin
  0 siblings, 1 reply; 7+ messages in thread
From: Ihor Radchenko @ 2022-04-30  9:37 UTC (permalink / raw)
  To: Dr. Arne Babenhauserheide; +Cc: emacs-orgmode

"Dr. Arne Babenhauserheide" <arne_bab@web.de> writes:

> To reproduce:
>
> - create an org-file with the following content:
> /Foo [[https://taz.de/!5843294/][link with a bang]]/
> - M-x org-html-export-to-html
>
> Expected: The HTML-file contains an italic link named "link with a bang".
>
> Actual: The HTML-file contains a broken link with only the domain:
> <i>Foo [[<a href="https://taz.de">https://taz.de</a></i>!5843294/][link with a bang]]/</p>

Confirmed.
But with a caveat.

Despite intuition, your example can be treated in two ways:
1. <begin italic>/Foo [[https://taz.de<end italic>/!5843294/][link witha bang]]/
2. <begin italic>/Foo <begin link>[[...]]/<end italic>

Org mode always chooses the first case as it prioritise markup that
starts early and ends early.

To force Org mode not treat internal /! as italics ending, you can
insert a zero-width space before "/": <zero width space>/!
So, what you see is not exactly a bug, but non-intuitive behaviour of
Org. (I do not like it, but we have reasons why Org parser behaves this
way).

On the other hand, the example link could be inserted using
org-insert-link.

If one does the following:
1. emacs -Q /tmp/test.org
2. Type "/Begin italic "
3. C-c C-l https://taz.de/!5843294/ <RET> <test> <RET>
4. The inserted text is not a link because the problematic /! is not
   fixed automatically.

I consider the above to be at least a bug in org-insert-link.

Best,
Ihor



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG] Exporting italic link with bang inside to html fails to parse the link [9.5.2 (N/A @ /gnu/store/89yvbijwnvsbpa5h33mvbgh1gy9w30n2-emacs-org-9.5.2/share/emacs/site-lisp/org-9.5.2/)]
  2022-04-30  9:37 ` Ihor Radchenko
@ 2022-04-30 11:47   ` Max Nikulin
  2022-04-30 12:34     ` Ihor Radchenko
  0 siblings, 1 reply; 7+ messages in thread
From: Max Nikulin @ 2022-04-30 11:47 UTC (permalink / raw)
  To: Ihor Radchenko, Dr. Arne Babenhauserheide; +Cc: emacs-orgmode

On 30/04/2022 16:37, Ihor Radchenko wrote:
> "Dr. Arne Babenhauserheide" <arne_bab@web.de> writes:
> 
>> To reproduce:
>>
>> - create an org-file with the following content:
>> /Foo [[https://taz.de/!5843294/][link with a bang]]/
>> - M-x org-html-export-to-html
>>
>> Expected: The HTML-file contains an italic link named "link with a bang".
>>
>> Actual: The HTML-file contains a broken link with only the domain:
>> <i>Foo [[<a href="https://taz.de">https://taz.de</a></i>!5843294/][link with a bang]]/</p>
> 
> Confirmed.

Nicolas clearly expressed that it is a feature of the Org parser though.

Moreover, this is a duplicate of another item already tracked on 
updates.orgmode.org:

2021-09-03  5:17 Dr. Arne Babenhauserheide Bug: PDF Export of Link fails
https://list.orgmode.org/87pmtqp79s.fsf@web.de/T/#u

The following markup should be used instead:

     /Foo/ [[https://taz.de/!5843294/][/link with a bang/]]

> To force Org mode not treat internal /! as italics ending, you can
> insert a zero-width space before "/": <zero width space>/!

Unfortunately It requires an additional export filter to remove zero 
width spaces.

> On the other hand, the example link could be inserted using
> org-insert-link.
> 
> If one does the following:
> 1. emacs -Q /tmp/test.org
> 2. Type "/Begin italic "
> 3. C-c C-l https://taz.de/!5843294/ <RET> <test> <RET>
> 4. The inserted text is not a link because the problematic /! is not
>     fixed automatically.
> 
> I consider the above to be at least a bug in org-insert-link.

Timothy suggested to fix `org-insert-link' somehow in than thread.

P.S. Actually I like behavior of pandoc

    printf '%s' '/Foo [[https://taz.de/!5843294/][link with a bang]]/' |
        pandoc -f org -t html

    <p><em>Foo <a href="https://taz.de/!5843294/">link with a 
bang</a></em></p>

Juan Manuel Macías to emacs-orgmode. Pandoc and nested emhases. Fri, 18 
Feb 2022 00:47:18 +0000. https://list.orgmode.org/87sfshgfvt.fsf@posteo.net


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG] Exporting italic link with bang inside to html fails to parse the link [9.5.2 (N/A @ /gnu/store/89yvbijwnvsbpa5h33mvbgh1gy9w30n2-emacs-org-9.5.2/share/emacs/site-lisp/org-9.5.2/)]
  2022-04-30 11:47   ` Max Nikulin
@ 2022-04-30 12:34     ` Ihor Radchenko
  2022-04-30 14:25       ` Max Nikulin
  0 siblings, 1 reply; 7+ messages in thread
From: Ihor Radchenko @ 2022-04-30 12:34 UTC (permalink / raw)
  To: Max Nikulin; +Cc: Dr. Arne Babenhauserheide, emacs-orgmode

Max Nikulin <manikulin@gmail.com> writes:

>> Confirmed.
>
> Nicolas clearly expressed that it is a feature of the Org parser though.
>
> Moreover, this is a duplicate of another item already tracked on 
> updates.orgmode.org:
>
> 2021-09-03  5:17 Dr. Arne Babenhauserheide Bug: PDF Export of Link fails
> https://list.orgmode.org/87pmtqp79s.fsf@web.de/T/#u
>
> The following markup should be used instead:
>
>      /Foo/ [[https://taz.de/!5843294/][/link with a bang/]]
> ...
>> I consider the above to be at least a bug in org-insert-link.
>
> Timothy suggested to fix `org-insert-link' somehow in than thread.

Yeah. I recall a number of bug report related to this behaviour.
Though I wanted to focus on org-insert-link here.
We can expect users to change the markup if they type a problematic link
manually, but not when specialised functions like org-insert-link are
used. In this scenario, org-insert-link should take care about not
messing up the existing markup.

>> To force Org mode not treat internal /! as italics ending, you can
>> insert a zero-width space before "/": <zero width space>/!
>
> Unfortunately It requires an additional export filter to remove zero 
> width spaces.

Yeah. Right. It should even be an easy patch, which would be welcome :)

> P.S. Actually I like behavior of pandoc
>
>     printf '%s' '/Foo [[https://taz.de/!5843294/][link with a bang]]/' |
>         pandoc -f org -t html
>
>     <p><em>Foo <a href="https://taz.de/!5843294/">link with a 
> bang</a></em></p>

I also like such behaviour, but it would require multi-pass parsing or
parser tree branching. Nicolas opposed it.

Best,
Ihor


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG] Exporting italic link with bang inside to html fails to parse the link [9.5.2 (N/A @ /gnu/store/89yvbijwnvsbpa5h33mvbgh1gy9w30n2-emacs-org-9.5.2/share/emacs/site-lisp/org-9.5.2/)]
  2022-04-30 12:34     ` Ihor Radchenko
@ 2022-04-30 14:25       ` Max Nikulin
  2022-05-01  3:27         ` Ihor Radchenko
  0 siblings, 1 reply; 7+ messages in thread
From: Max Nikulin @ 2022-04-30 14:25 UTC (permalink / raw)
  To: emacs-orgmode

On 30/04/2022 19:34, Ihor Radchenko wrote:
> Max Nikulin writes:
> 
>> 2021-09-03  5:17 Dr. Arne Babenhauserheide Bug: PDF Export of Link fails
>> https://list.orgmode.org/87pmtqp79s.fsf@web.de/T/#u
>>
>> Timothy suggested to fix `org-insert-link' somehow in than thread.
> 
> Yeah. I recall a number of bug report related to this behaviour.
> Though I wanted to focus on org-insert-link here.

Then the older bug may be cancelled as a duplicate.

> We can expect users to change the markup if they type a problematic link
> manually, but not when specialised functions like org-insert-link are
> used. In this scenario, org-insert-link should take care about not
> messing up the existing markup.
> 
>>> To force Org mode not treat internal /! as italics ending, you can
>>> insert a zero-width space before "/": <zero width space>/!
>>
>> Unfortunately It requires an additional export filter to remove zero
>> width spaces.
> 
> Yeah. Right. It should even be an easy patch, which would be welcome :)

I meant a custom user filter. I consider zero width spaces as the last 
resort. Nicolas considered making zero width spaces an official part of 
syntax stripped during export and a way to preserve some of them. In the 
case of links I still prefer breaking emphasis at the link borders. 
`org-insert-link' may check after inserting the markup if it is parsed 
as a link and add more markers if necessary. Unfortunately it is not 
always possible. In the following case

     /inter[[https://orgmode.org/?oops=1][word]]link/

additional markers would not work (unless augmented by zero width 
spaces, but at least they will be outside of link target)

     /inter/[[https://orgmode.org/?oops=1][/word/]]/link/

However mostly it is a decent workaround since links are usually 
surrounded by spaces. At certain moment I was surprised that emphasis 
markers are not recognized at the borders of export snippets and they 
are active one at one side of links.

I am afraid that zero width spaces in link targets may lead to confusion 
of users since in the most cases e.g. http: URLs may be pasted to 
external application as is.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG] Exporting italic link with bang inside to html fails to parse the link [9.5.2 (N/A @ /gnu/store/89yvbijwnvsbpa5h33mvbgh1gy9w30n2-emacs-org-9.5.2/share/emacs/site-lisp/org-9.5.2/)]
  2022-04-30 14:25       ` Max Nikulin
@ 2022-05-01  3:27         ` Ihor Radchenko
  2022-05-02 12:32           ` Max Nikulin
  0 siblings, 1 reply; 7+ messages in thread
From: Ihor Radchenko @ 2022-05-01  3:27 UTC (permalink / raw)
  To: Max Nikulin; +Cc: emacs-orgmode

Max Nikulin <manikulin@gmail.com> writes:

>>> 2021-09-03  5:17 Dr. Arne Babenhauserheide Bug: PDF Export of Link fails
>>> https://list.orgmode.org/87pmtqp79s.fsf@web.de/T/#u
>>>
>>> Timothy suggested to fix `org-insert-link' somehow in than thread.
>> 
>> Yeah. I recall a number of bug report related to this behaviour.
>> Though I wanted to focus on org-insert-link here.
>
> Then the older bug may be cancelled as a duplicate.

Not sure. Even a fix to org-insert-link would not solve the problem with
unexpected export if the link is typed in manually. So, I'd rather keep
both the reports for the time being.

Or someone may go through all the related bugs and create a single giant
discussion to avoid scattering things around. In my notes, I have at
least 6 discussions related to edge cases of Org markup.

>>> Unfortunately It requires an additional export filter to remove zero
>>> width spaces.
>> 
>> Yeah. Right. It should even be an easy patch, which would be welcome :)
>
> I meant a custom user filter. I consider zero width spaces as the last 
> resort. Nicolas considered making zero width spaces an official part of 
> syntax stripped during export and a way to preserve some of them.

I think it is already kind of official. At least, we directly suggest
using zero width spaces in
https://orgmode.org/manual/Escape-Character.html#Escape-Character

The other thing is that ox.el does not do anything about zero width
spaces.

> In the case of links I still prefer breaking emphasis at the link
> borders. `org-insert-link' may check after inserting the markup if it
> is parsed as a link and add more markers if necessary.
> ...
>      /inter/[[https://orgmode.org/?oops=1][/word/]]/link/

I do not like this idea. It is fine when inserting a link into existing
emphasis, but what if an emphasis is applied around link later? We would
also need to update org-emphasize and still have an issue because many
users simply type the emphasis markers manually.

> I am afraid that zero width spaces in link targets may lead to confusion 
> of users since in the most cases e.g. http: URLs may be pasted to 
> external application as is.

We already escape '\' ,'[', and ']' in links. Zero width spaces will not
make things much different. Of course, org-link-escape and
org-link-unescape will need to be updated.

Note that even copying Urls directly can be worked around using
filter-buffer-substring-function.

Best,
Ihor



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG] Exporting italic link with bang inside to html fails to parse the link [9.5.2 (N/A @ /gnu/store/89yvbijwnvsbpa5h33mvbgh1gy9w30n2-emacs-org-9.5.2/share/emacs/site-lisp/org-9.5.2/)]
  2022-05-01  3:27         ` Ihor Radchenko
@ 2022-05-02 12:32           ` Max Nikulin
  0 siblings, 0 replies; 7+ messages in thread
From: Max Nikulin @ 2022-05-02 12:32 UTC (permalink / raw)
  To: emacs-orgmode

On 01/05/2022 10:27, Ihor Radchenko wrote:
> Max Nikulin writes:
> 
>>>> 2021-09-03  5:17 Dr. Arne Babenhauserheide Bug: PDF Export of Link fails
>>>> https://list.orgmode.org/87pmtqp79s.fsf@web.de/T/#u
>>
>> Then the older bug may be cancelled as a duplicate.
> 
> Not sure. Even a fix to org-insert-link would not solve the problem with
> unexpected export if the link is typed in manually. So, I'd rather keep
> both the reports for the time being.

I would not insist any more. My point was: the same reporter, the same 
case of punctuation after slash in link target, the same idea to make 
`org-insert-link' more smart.

> Or someone may go through all the related bugs and create a single giant
> discussion to avoid scattering things around. In my notes, I have at
> least 6 discussions related to edge cases of Org markup.

I have some notes as well. Though I think it should be either FAQ entry 
or a separate document describing limitations of the parser (and test 
data set for the parser).

> I think it is already kind of official. At least, we directly suggest
> using zero width spaces in
> https://orgmode.org/manual/Escape-Character.html#Escape-Character

Things are more complicated. Without a filter (that it is not mentioned) 
it may cause undesired line breaks (the primary purpose of zero width 
space). Fortunately PdfLaTeX ignores them.

Tom Gillespie. On zero width spaces and Org syntax. Fri, 3 Dec 2021 
20:04:28 -0800. 
https://list.orgmode.org/CA+G3_PM4cxHa8bU+3QG541UiOauLNAQFZQu-+UKczx3itOeTHg@mail.gmail.com

suggested word joiner U+2060, but this character is not a space for 
regular expressions. I experimented a bit, but I can not provide a 
summary yet, my notes are in early draft stage. the "Escape Character" 
section should be expanded to discuss more use cases.

>> In the case of links I still prefer breaking emphasis at the link
>> borders. `org-insert-link' may check after inserting the markup if it
>> is parsed as a link and add more markers if necessary.
>> ...
>>       /inter/[[https://orgmode.org/?oops=1][/word/]]/link/
> 
> I do not like this idea. It is fine when inserting a link into existing
> emphasis, but what if an emphasis is applied around link later? We would
> also need to update org-emphasize and still have an issue because many
> users simply type the emphasis markers manually.

Emphasis around other inline objects anyway can be easily broken. Try to 
make the whole string bold:

     begin =middle* verbatim= end

It may be useful to add a checker to `org-lint' that issues warnings for 
confusing link targets.

I believe that zero width space does not belong to "plain text markup" 
since it is invisible (at least by default). I see that printable ASCII 
characters are already in use, but I still think that U+200B should be 
used as rare as possible.

You are aware of my opinion now and I do not need more. You are free to 
ignore it since I can not offer anything better.



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-05-02 12:34 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-05 10:18 [BUG] Exporting italic link with bang inside to html fails to parse the link [9.5.2 (N/A @ /gnu/store/89yvbijwnvsbpa5h33mvbgh1gy9w30n2-emacs-org-9.5.2/share/emacs/site-lisp/org-9.5.2/)] Dr. Arne Babenhauserheide
2022-04-30  9:37 ` Ihor Radchenko
2022-04-30 11:47   ` Max Nikulin
2022-04-30 12:34     ` Ihor Radchenko
2022-04-30 14:25       ` Max Nikulin
2022-05-01  3:27         ` Ihor Radchenko
2022-05-02 12:32           ` Max Nikulin

Code repositories for project(s) associated with this inbox:

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).