emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* [BUG] External unicode links without a description in ox-html
@ 2016-07-20 14:44 Arun Isaac
  2016-07-20 20:04 ` Nicolas Goaziou
  0 siblings, 1 reply; 6+ messages in thread
From: Arun Isaac @ 2016-07-20 14:44 UTC (permalink / raw)
  To: org mode

[-- Attachment #1: Type: text/plain, Size: 335 bytes --]


For external links without a description, `org-html-link' resuses the
`path' as the inner text of the exported <a> element. However, the
`path' variable contains the URL encoded version of the `raw-path'. It
would be more appropriate to put the actual unescaped unicode URL in the
inner text of the <a> element.

Regards,
Arun Isaac.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] External unicode links without a description in ox-html
  2016-07-20 14:44 [BUG] External unicode links without a description in ox-html Arun Isaac
@ 2016-07-20 20:04 ` Nicolas Goaziou
  2016-07-23 15:34   ` Michael Brand
  0 siblings, 1 reply; 6+ messages in thread
From: Nicolas Goaziou @ 2016-07-20 20:04 UTC (permalink / raw)
  To: Arun Isaac; +Cc: org mode

Hello,

Arun Isaac <arunisaac@systemreboot.net> writes:

> For external links without a description, `org-html-link' resuses the
> `path' as the inner text of the exported <a> element. However, the
> `path' variable contains the URL encoded version of the `raw-path'. It
> would be more appropriate to put the actual unescaped unicode URL in the
> inner text of the <a> element.

Done. Thanks.

Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] External unicode links without a description in ox-html
  2016-07-20 20:04 ` Nicolas Goaziou
@ 2016-07-23 15:34   ` Michael Brand
  2016-07-25 12:52     ` Nicolas Goaziou
  0 siblings, 1 reply; 6+ messages in thread
From: Michael Brand @ 2016-07-23 15:34 UTC (permalink / raw)
  To: org mode

Hi Nicolas

There seems to be a related issue with an inconsistency between HTML
and other export formats in using org-link-unescape for the link
_destination_ part: With the Org file

1) https://duckduckgo.com/?q=Org+mode+%252B+Worg
2) https://duckduckgo.com/?q=Org+mode+%2B+Worg

org-open-at-point on link 1) opens a web browser with the search field
filled with "Org mode + Worg" as expected by me. The same happens when
using link 1) of the HTML export. But when exporting to PDF (via
LaTeX), ODT or ASCII (browse-url-at-point) I have to use link 2) to
get the same result. I think one should be able to consistently use
link 1) for all export formats.

Michael

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] External unicode links without a description in ox-html
  2016-07-23 15:34   ` Michael Brand
@ 2016-07-25 12:52     ` Nicolas Goaziou
  2016-07-25 20:33       ` Michael Brand
  0 siblings, 1 reply; 6+ messages in thread
From: Nicolas Goaziou @ 2016-07-25 12:52 UTC (permalink / raw)
  To: Michael Brand; +Cc: org mode

Hello,

Michael Brand <michael.ch.brand@gmail.com> writes:

> There seems to be a related issue with an inconsistency between HTML
> and other export formats in using org-link-unescape for the link
> _destination_ part: With the Org file
>
> 1) https://duckduckgo.com/?q=Org+mode+%252B+Worg
> 2) https://duckduckgo.com/?q=Org+mode+%2B+Worg
>
> org-open-at-point on link 1) opens a web browser with the search field
> filled with "Org mode + Worg" as expected by me.

This looks like an error to me. 

If I type https://duckduckgo.com/?q=Org+mode+%252B+Worg in my browser,
I get

  "Org mode %2B Worg"

as the search string. It should be the same when opening the link from
an Org document. These URI are /not/ equivalent.

> The same happens when using link 1) of the HTML export. But when
> exporting to PDF (via LaTeX), ODT or ASCII (browse-url-at-point)
> I have to use link 2) to get the same result. I think one should be
> able to consistently use link 1) for all export formats.

It looks as we're trying to paper over an Org problem here, which is the
redundant link escaping that happens when calling `org-insert-link' (C-c
C-l).

AFAICT, there are two reasons for Org to escape a link: when the link
contains either "]]" or multiple consecutive spaces. The former
obviously breaks Org link syntax. The latter doesn't survive a call to
`fill-paragraph'.

Alas, Org handles it the wrong way, by using a mechanism that cannot be
properly undone; you cannot possibly know how many times the desired URI
has been encoded, if at all. Moreover, this mechanism isn't user
friendly, i.e., you cannot reasonably ask a user to encode an URI on the
fly when jolting notes.

I can see two ways out:

1. Do not escape anything. 

   This prevent any link with a description to contain either "]]" or
   multiple spaces, but these requirements are so uncommon we probably
   shouldn't bother.

2. Use a different internal escape mechanism.

   By providing our own simple escape mechanism, e.g., \]\], we can
   solve the issues raised above.

In any case, Org should not create something as

  https://duckduckgo.com/?q=Org+mode+%252B+Worg

if the real URI is

  https://duckduckgo.com/?q=Org+mode+%2B+Worg

WDYT?


Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] External unicode links without a description in ox-html
  2016-07-25 12:52     ` Nicolas Goaziou
@ 2016-07-25 20:33       ` Michael Brand
  2016-07-26 19:05         ` Nicolas Goaziou
  0 siblings, 1 reply; 6+ messages in thread
From: Michael Brand @ 2016-07-25 20:33 UTC (permalink / raw)
  To: org mode

Hi Nicolas

Your suggestions are so convincing in going so far, I hope I
understand them right. If yes it is just thinking in terms of "[[",
"][" and "]]" instead of single brackets that I got used to with the
current escaping and unescaping in Org.

On Mon, Jul 25, 2016 at 2:52 PM, Nicolas Goaziou <mail@nicolasgoaziou.fr> wrote:

> Michael Brand <michael.ch.brand@gmail.com> writes:

>> There seems to be a related issue with an inconsistency between HTML
>> and other export formats in using org-link-unescape for the link
>> _destination_ part: With the Org file
>>
>> 1) https://duckduckgo.com/?q=Org+mode+%252B+Worg
>> 2) https://duckduckgo.com/?q=Org+mode+%2B+Worg
>>
>> org-open-at-point on link 1) opens a web browser with the search field
>> filled with "Org mode + Worg" as expected by me.
>
> This looks like an error to me.
>
> If I type https://duckduckgo.com/?q=Org+mode+%252B+Worg in my browser,
> I get
>
>   "Org mode %2B Worg"
>
> as the search string. It should be the same when opening the link from
> an Org document. These URI are /not/ equivalent.
>
>> The same happens when using link 1) of the HTML export. But when
>> exporting to PDF (via LaTeX), ODT or ASCII (browse-url-at-point)
>> I have to use link 2) to get the same result. I think one should be
>> able to consistently use link 1) for all export formats.
>
> It looks as we're trying to paper over an Org problem here, which is the
> redundant link escaping that happens when calling `org-insert-link' (C-c
> C-l).
>
> AFAICT, there are two reasons for Org to escape a link: when the link
> contains either "]]" or multiple consecutive spaces. The former
> obviously breaks Org link syntax. The latter doesn't survive a call to
> `fill-paragraph'.
>
> Alas, Org handles it the wrong way, by using a mechanism that cannot be
> properly undone; you cannot possibly know how many times the desired URI
> has been encoded, if at all. Moreover, this mechanism isn't user
> friendly, i.e., you cannot reasonably ask a user to encode an URI on the
> fly when jolting notes.

I agree.

> I can see two ways out:
>
> 1. Do not escape anything.
>
>    This prevent any link with a description to contain either "]]" or

... a single bracket at the border or a link destination part to
contain "][" or "]]" or a single bracket at the border or ...

>    multiple spaces, but these requirements are so uncommon we probably
>    shouldn't bother.

I never had such links and don't bother. If I am right these could
even be tweaked manually with %20, %5B and %5D to get working.

I can't tell for everyone but would happily adapt the escaped ones of
all my existing Org links accordingly if such a change happens in Org.

> 2. Use a different internal escape mechanism.
>
>    By providing our own simple escape mechanism, e.g., \]\], we can
>    solve the issues raised above.

In my opinion not necessary. Can be added later if really needed
anyway.

> In any case, Org should not create something as
>
>   https://duckduckgo.com/?q=Org+mode+%252B+Worg
>
> if the real URI is
>
>   https://duckduckgo.com/?q=Org+mode+%2B+Worg
>
> WDYT?

I agree.

Do I understand right that not escaping and unescaping would allow

:   https://duckduckgo.com/?q=[dest]dest
: [[https://duckduckgo.com/?q=[dest]dest]]
: [[https://duckduckgo.com/?q=[dest]dest][desc[desc]desc]]

etc. and even the same with link abbreviations instead of http(s)?

Michael

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] External unicode links without a description in ox-html
  2016-07-25 20:33       ` Michael Brand
@ 2016-07-26 19:05         ` Nicolas Goaziou
  0 siblings, 0 replies; 6+ messages in thread
From: Nicolas Goaziou @ 2016-07-26 19:05 UTC (permalink / raw)
  To: Michael Brand; +Cc: org mode

Hello,

Michael Brand <michael.ch.brand@gmail.com> writes:

>>    This prevent any link with a description to contain either "]]" or
>
> ... a single bracket at the border or a link destination part to
> contain "][" or "]]" or a single bracket at the border or ...

Correct.

>>    multiple spaces, but these requirements are so uncommon we probably
>>    shouldn't bother.
>
> I never had such links and don't bother. If I am right these could
> even be tweaked manually with %20, %5B and %5D to get working.

Not really, because Org would no longer un-escape the URI. This could
work for links opened with a browser, but not for other types (e.g.,
a target).

> Do I understand right that not escaping and unescaping would allow
>
> :   https://duckduckgo.com/?q=[dest]dest

This one is already possible, isn't it? Also, I suggest to change
brackets links, not plain links.

> : [[https://duckduckgo.com/?q=[dest]dest]]
> : [[https://duckduckgo.com/?q=[dest]dest][desc[desc]desc]]

Yes, those would become valid. The following regexp could be used as
a replacement for `org-bracket-link-regexp'

  "\\[\\[[^]+?\\(\\]\\[[^]]\\)?\\]\\]"

or, with groups,

  "\\[\\[\\([^]+?\\)\\(?:\\]\\[\\([^]+?\\)\\)?\\]\\]"


Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-07-26 19:05 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-20 14:44 [BUG] External unicode links without a description in ox-html Arun Isaac
2016-07-20 20:04 ` Nicolas Goaziou
2016-07-23 15:34   ` Michael Brand
2016-07-25 12:52     ` Nicolas Goaziou
2016-07-25 20:33       ` Michael Brand
2016-07-26 19:05         ` Nicolas Goaziou

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).