emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* [bug] Org link dialog escapes URL spaces incorrectly
@ 2011-11-02  2:02 Jeff Horn
  2011-11-04 15:59 ` Jeff Horn
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff Horn @ 2011-11-02  2:02 UTC (permalink / raw)
  To: Org-mode ml

Org-mode version 7.7 (release_7.7.404.ga17c.dirty)
GNU Emacs 24.0.50.3 (i386-apple-darwin9.8.0, NS apple-appkit-949.54)
of 2011-08-10 on braeburn.aquamacs.org - Aquamacs Distribution 3.xdev

Inserting a link through the link dialog doesn't escape URLs with
spaces properly. Where a space is '%20', org will insert the link as
'%2520'. I'm not certain of URL escape codes, but could org be trying
to escape the % sign? Perhaps a missing slash in a regexp somewhere?

1) Use =C-c C-l= to use dialog. Paste a link, like the following.

http://www.dartmouth.edu/~dirwin/Did%20France%20Cause%20the%20Great%20Depression.pdf

2) Use =C-c C-o= to open the link. Be weirded out about a 404. Inspect URL.

,----[ Actual ]
| - [ ] [[http://www.dartmouth.edu/~dirwin/Did%2520France%2520Cause%2520the%2520Great%2520Depression.pdf][Link
Description]]
`----

,----[ Expected ]
| - [ ] [[http://www.dartmouth.edu/~dirwin/Did%20France%20Cause%20the%20Great%20Depression.pdf][Link
Description]]
`----

--
Jeffrey Horn
http://www.failuretorefrain.com/jeff/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [bug] Org link dialog escapes URL spaces incorrectly
  2011-11-02  2:02 [bug] Org link dialog escapes URL spaces incorrectly Jeff Horn
@ 2011-11-04 15:59 ` Jeff Horn
  2011-11-04 16:14   ` Nick Dokos
  2011-11-04 16:33   ` Nick Dokos
  0 siblings, 2 replies; 15+ messages in thread
From: Jeff Horn @ 2011-11-04 15:59 UTC (permalink / raw)
  To: Org-mode ml

I just pulled the latest org-mode. The problem persists for me, even
though it was reported as fixed in a previous thread. Can anyone
replicate with the latest org?

Org-mode version 7.7 (release_7.7.513.g2a5877)
GNU Emacs 24.0.50.3 (i386-apple-darwin9.8.0, NS apple-appkit-949.54)
of 2011-08-10 on braeburn.aquamacs.org - Aquamacs Distribution 3.xdev

On Tue, Nov 1, 2011 at 22:02, Jeff Horn <jrhorn424@gmail.com> wrote:
> Org-mode version 7.7 (release_7.7.404.ga17c.dirty)
> GNU Emacs 24.0.50.3 (i386-apple-darwin9.8.0, NS apple-appkit-949.54)
> of 2011-08-10 on braeburn.aquamacs.org - Aquamacs Distribution 3.xdev
>
> Inserting a link through the link dialog doesn't escape URLs with
> spaces properly. Where a space is '%20', org will insert the link as
> '%2520'. I'm not certain of URL escape codes, but could org be trying
> to escape the % sign? Perhaps a missing slash in a regexp somewhere?
>
> 1) Use =C-c C-l= to use dialog. Paste a link, like the following.
>
> http://www.dartmouth.edu/~dirwin/Did%20France%20Cause%20the%20Great%20Depression.pdf
>
> 2) Use =C-c C-o= to open the link. Be weirded out about a 404. Inspect URL.
>
> ,----[ Actual ]
> | - [ ] [[http://www.dartmouth.edu/~dirwin/Did%2520France%2520Cause%2520the%2520Great%2520Depression.pdf][Link
> Description]]
> `----
>
> ,----[ Expected ]
> | - [ ] [[http://www.dartmouth.edu/~dirwin/Did%20France%20Cause%20the%20Great%20Depression.pdf][Link
> Description]]
> `----
>
> --
> Jeffrey Horn
> http://www.failuretorefrain.com/jeff/
>



-- 
Jeffrey Horn
http://www.failuretorefrain.com/jeff/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [bug] Org link dialog escapes URL spaces incorrectly
  2011-11-04 15:59 ` Jeff Horn
@ 2011-11-04 16:14   ` Nick Dokos
  2011-11-04 16:33   ` Nick Dokos
  1 sibling, 0 replies; 15+ messages in thread
From: Nick Dokos @ 2011-11-04 16:14 UTC (permalink / raw)
  To: Jeff Horn; +Cc: nicholas.dokos, Org-mode ml

Jeff Horn <jrhorn424@gmail.com> wrote:

> I just pulled the latest org-mode. The problem persists for me, even
> though it was reported as fixed in a previous thread. Can anyone
> replicate with the latest org?
> 

Yes - just pulled the same version as below - I see the same thing you
see.

Nick

> Org-mode version 7.7 (release_7.7.513.g2a5877)
> GNU Emacs 24.0.50.3 (i386-apple-darwin9.8.0, NS apple-appkit-949.54)
> of 2011-08-10 on braeburn.aquamacs.org - Aquamacs Distribution 3.xdev
> 
> On Tue, Nov 1, 2011 at 22:02, Jeff Horn <jrhorn424@gmail.com> wrote:
> > Org-mode version 7.7 (release_7.7.404.ga17c.dirty)
> > GNU Emacs 24.0.50.3 (i386-apple-darwin9.8.0, NS apple-appkit-949.54)
> > of 2011-08-10 on braeburn.aquamacs.org - Aquamacs Distribution 3.xdev
> >
> > Inserting a link through the link dialog doesn't escape URLs with
> > spaces properly. Where a space is '%20', org will insert the link as
> > '%2520'. I'm not certain of URL escape codes, but could org be trying
> > to escape the % sign? Perhaps a missing slash in a regexp somewhere?
> >
> > 1) Use =C-c C-l= to use dialog. Paste a link, like the following.
> >
> > http://www.dartmouth.edu/~dirwin/Did%20France%20Cause%20the%20Great%20Depression.pdf
> >
> > 2) Use =C-c C-o= to open the link. Be weirded out about a 404. Inspect URL.
> >
> > ,----[ Actual ]
> > | - [ ] [[http://www.dartmouth.edu/~dirwin/Did%2520France%2520Cause%2520the%2520Great%2520Depression.pdf][Link
> > Description]]
> > `----
> >
> > ,----[ Expected ]
> > | - [ ] [[http://www.dartmouth.edu/~dirwin/Did%20France%20Cause%20the%20Great%20Depression.pdf][Link
> > Description]]
> > `----
> >
> > --
> > Jeffrey Horn
> > http://www.failuretorefrain.com/jeff/
> >
> 
> 
> 
> -- 
> Jeffrey Horn
> http://www.failuretorefrain.com/jeff/
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [bug] Org link dialog escapes URL spaces incorrectly
  2011-11-04 15:59 ` Jeff Horn
  2011-11-04 16:14   ` Nick Dokos
@ 2011-11-04 16:33   ` Nick Dokos
  2011-11-04 16:52     ` Nick Dokos
  1 sibling, 1 reply; 15+ messages in thread
From: Nick Dokos @ 2011-11-04 16:33 UTC (permalink / raw)
  To: Jeff Horn; +Cc: nicholas.dokos, Org-mode ml

Jeff Horn <jrhorn424@gmail.com> wrote:

> I just pulled the latest org-mode. The problem persists for me, even
> though it was reported as fixed in a previous thread. Can anyone
> replicate with the latest org?
> 
> Org-mode version 7.7 (release_7.7.513.g2a5877)
> GNU Emacs 24.0.50.3 (i386-apple-darwin9.8.0, NS apple-appkit-949.54)
> of 2011-08-10 on braeburn.aquamacs.org - Aquamacs Distribution 3.xdev
> 
> On Tue, Nov 1, 2011 at 22:02, Jeff Horn <jrhorn424@gmail.com> wrote:
> > Org-mode version 7.7 (release_7.7.404.ga17c.dirty)
> > GNU Emacs 24.0.50.3 (i386-apple-darwin9.8.0, NS apple-appkit-949.54)
> > of 2011-08-10 on braeburn.aquamacs.org - Aquamacs Distribution 3.xdev
> >
> > Inserting a link through the link dialog doesn't escape URLs with
> > spaces properly. Where a space is '%20', org will insert the link as
> > '%2520'. I'm not certain of URL escape codes, but could org be trying
> > to escape the % sign? Perhaps a missing slash in a regexp somewhere?
> >
> > 1) Use =C-c C-l= to use dialog. Paste a link, like the following.
> >
> > http://www.dartmouth.edu/~dirwin/Did%20France%20Cause%20the%20Great%20Depression.pdf
> >
> > 2) Use =C-c C-o= to open the link. Be weirded out about a 404. Inspect URL.
> >
> > ,----[ Actual ]
> > | - [ ] [[http://www.dartmouth.edu/~dirwin/Did%2520France%2520Cause%2520the%2520Great%2520Depression.pdf][Link
> > Description]]
> > `----
> >
> > ,----[ Expected ]
> > | - [ ] [[http://www.dartmouth.edu/~dirwin/Did%20France%20Cause%20the%20Great%20Depression.pdf][Link
> > Description]]
> > `----
> >

The problem is in org-insert-link: in one case, when we edit the link at point,
the link is unescaped:

,----
|      ...
|      ((org-in-regexp org-bracket-link-regexp 1)
|       ;; We do have a link at point, and we are going to edit it.
|       (setq remove (list (match-beginning 0) (match-end 0)))
|       (setq desc (if (match-end 3) (org-match-string-no-properties 3)))
|       (setq link (read-string "Link: "
| 			      (org-link-unescape
| 			       (org-match-string-no-properties 1)))))
`----

but in the other case, when we just paste the link into the minibuffer,
it is not - check from the (unwind-protect ... ) on line 9088 of org.el
and ff to the end of the function: the link that's read from the minibuffer
is passed untouched (well, at least unescaped) to org-make-link-string on
the very last line of the function and apparently the latter reescapes everything:
try replacing the call

    (org-make-link-string link desc)

on the last line of org-insert-link with

    (org-make-link-string (org-link-unescape link) desc)

I think that'll fix it.

Nick

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [bug] Org link dialog escapes URL spaces incorrectly
  2011-11-04 16:33   ` Nick Dokos
@ 2011-11-04 16:52     ` Nick Dokos
       [not found]       ` <CALbR623mDyhC=LRAQBW9-vJxhsFWR0L+xFOKnE3eq5Wn92y04g@mail.gmail.com>
  2011-11-04 18:25       ` Nick Dokos
  0 siblings, 2 replies; 15+ messages in thread
From: Nick Dokos @ 2011-11-04 16:52 UTC (permalink / raw)
  Cc: Jeff Horn, Org-mode ml, nicholas.dokos

Nick Dokos <nicholas.dokos@hp.com> wrote:

> Jeff Horn <jrhorn424@gmail.com> wrote:
> 
> > I just pulled the latest org-mode. The problem persists for me, even
> > though it was reported as fixed in a previous thread. Can anyone
> > replicate with the latest org?
> > 
> > Org-mode version 7.7 (release_7.7.513.g2a5877)
> > GNU Emacs 24.0.50.3 (i386-apple-darwin9.8.0, NS apple-appkit-949.54)
> > of 2011-08-10 on braeburn.aquamacs.org - Aquamacs Distribution 3.xdev
> > 
> > On Tue, Nov 1, 2011 at 22:02, Jeff Horn <jrhorn424@gmail.com> wrote:
> > > Org-mode version 7.7 (release_7.7.404.ga17c.dirty)
> > > GNU Emacs 24.0.50.3 (i386-apple-darwin9.8.0, NS apple-appkit-949.54)
> > > of 2011-08-10 on braeburn.aquamacs.org - Aquamacs Distribution 3.xdev
> > >
> > > Inserting a link through the link dialog doesn't escape URLs with
> > > spaces properly. Where a space is '%20', org will insert the link as
> > > '%2520'. I'm not certain of URL escape codes, but could org be trying
> > > to escape the % sign? Perhaps a missing slash in a regexp somewhere?
> > >
> > > 1) Use =C-c C-l= to use dialog. Paste a link, like the following.
> > >
> > > http://www.dartmouth.edu/~dirwin/Did%20France%20Cause%20the%20Great%20Depression.pdf
> > >
> > > 2) Use =C-c C-o= to open the link. Be weirded out about a 404. Inspect URL.
> > >
> > > ,----[ Actual ]
> > > | - [ ] [[http://www.dartmouth.edu/~dirwin/Did%2520France%2520Cause%2520the%2520Great%2520Depression.pdf][Link
> > > Description]]
> > > `----
> > >
> > > ,----[ Expected ]
> > > | - [ ] [[http://www.dartmouth.edu/~dirwin/Did%20France%20Cause%20the%20Great%20Depression.pdf][Link
> > > Description]]
> > > `----
> > >
> 
> The problem is in org-insert-link: in one case, when we edit the link at point,
> the link is unescaped:
> 
> ,----
> |      ...
> |      ((org-in-regexp org-bracket-link-regexp 1)
> |       ;; We do have a link at point, and we are going to edit it.
> |       (setq remove (list (match-beginning 0) (match-end 0)))
> |       (setq desc (if (match-end 3) (org-match-string-no-properties 3)))
> |       (setq link (read-string "Link: "
> | 			      (org-link-unescape
> | 			       (org-match-string-no-properties 1)))))
> `----
> 
> but in the other case, when we just paste the link into the minibuffer,
> it is not - check from the (unwind-protect ... ) on line 9088 of org.el
> and ff to the end of the function: the link that's read from the minibuffer
> is passed untouched (well, at least unescaped) to org-make-link-string on
> the very last line of the function and apparently the latter reescapes everything:
> try replacing the call
> 
>     (org-make-link-string link desc)
> 
> on the last line of org-insert-link with
> 
>     (org-make-link-string (org-link-unescape link) desc)
> 
> I think that'll fix it.
> 

It probably does, but that's probably not the best place to do it: it might be
better to do it in the (setq link on line 9090 or thereabouts. Otherwise, in
the *other* case (editing the link at point), we'll end up unescaping twice:
probably not a problem, since unescaping should be idempotent (in contrast to
escaping ;-) ) but why do it twice?

Nick

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [bug] Org link dialog escapes URL spaces incorrectly
       [not found]       ` <CALbR623mDyhC=LRAQBW9-vJxhsFWR0L+xFOKnE3eq5Wn92y04g@mail.gmail.com>
@ 2011-11-04 18:19         ` Nick Dokos
  0 siblings, 0 replies; 15+ messages in thread
From: Nick Dokos @ 2011-11-04 18:19 UTC (permalink / raw)
  To: Brian Wightman; +Cc: nicholas.dokos, emacs-orgmode

Brian Wightman <MidLifeXis@wightmanfam.org> wrote:

> On Fri, Nov 4, 2011 at 11:52 AM, Nick Dokos <nicholas.dokos@hp.com> wrote:
> > probably not a problem, since unescaping should be idempotent (in contrast to
> > escaping ;-) ) but why do it twice?
> 
> I am not sure I am reading this correctly - I am reading this as
> unescaping a url-encoded twice should give the same results as
> unescaping the same url-encoded string once.  If that is what you are
> saying, I would disagree.  Taking a string similar to the one provided
> originally:
> 
> Did%2520France%2520Cause%2520the%2520Great%2520Depression.pdf
> 
> would be unescaped as
> 
> Did%20France%20Cause%20the%20Great%20Depression.pdf
> 
> and a second unescape would generate
> 
> Did France Cause the Great Depression.pdf
> 
> Hopefully, I am just reading your statement wrong :-)
> 

Nope - you read it right and it (the statement, not the reading)
is wrong.

But I'm not sure what the implications to the code are: do we
have to unescape repeatedly until there are no more % signs
in there? Then the final call to org-make-link-string would
do one escaping pass and that would be that?

My URL fu is weak. Any help would be appreciated.

Thanks,
Nick


 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [bug] Org link dialog escapes URL spaces incorrectly
  2011-11-04 16:52     ` Nick Dokos
       [not found]       ` <CALbR623mDyhC=LRAQBW9-vJxhsFWR0L+xFOKnE3eq5Wn92y04g@mail.gmail.com>
@ 2011-11-04 18:25       ` Nick Dokos
  2011-11-05 14:04         ` David Maus
  1 sibling, 1 reply; 15+ messages in thread
From: Nick Dokos @ 2011-11-04 18:25 UTC (permalink / raw)
  Cc: Jeff Horn, Org-mode ml, nicholas.dokos

Nick Dokos <nicholas.dokos@hp.com> wrote:

> It probably does, but that's probably not the best place to do it: it might be
> better to do it in the (setq link on line 9090 or thereabouts. Otherwise, in
> the *other* case (editing the link at point), we'll end up unescaping twice:
> probably not a problem, since unescaping should be idempotent (in contrast to
> escaping ;-) ) but why do it twice?
> 

Brian Wightman pointed out to me that the idempotent part of the
statement above is definitely wrong (d'oh). The original URL that Jeff
Horn posted, when unescaped once, would be completely free of % signs.
But if the second (doubly-escaped) form is pasted into the minibuffer,
then unescaping once would not be enough. So I presume the thing to do
is to take the URL and unescape it repeatedly until it loses all
escapes, and then escape it *once* before inserting it in the org
buffer.

Sounds icky, kludgy, dirty. The question is: 1) is it a solution?
and 2) is there a better one?

Nick

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [bug] Org link dialog escapes URL spaces incorrectly
  2011-11-04 18:25       ` Nick Dokos
@ 2011-11-05 14:04         ` David Maus
  2011-11-05 15:38           ` Nick Dokos
  0 siblings, 1 reply; 15+ messages in thread
From: David Maus @ 2011-11-05 14:04 UTC (permalink / raw)
  To: nicholas.dokos; +Cc: Jeff Horn, Org-mode ml

[-- Attachment #1: Type: text/plain, Size: 3029 bytes --]

At Fri, 04 Nov 2011 14:25:42 -0400,
Nick Dokos wrote:
>
> Nick Dokos <nicholas.dokos@hp.com> wrote:
>
> > It probably does, but that's probably not the best place to do it: it might be
> > better to do it in the (setq link on line 9090 or thereabouts. Otherwise, in
> > the *other* case (editing the link at point), we'll end up unescaping twice:
> > probably not a problem, since unescaping should be idempotent (in contrast to
> > escaping ;-) ) but why do it twice?
> >
>
> Brian Wightman pointed out to me that the idempotent part of the
> statement above is definitely wrong (d'oh). The original URL that Jeff
> Horn posted, when unescaped once, would be completely free of % signs.
> But if the second (doubly-escaped) form is pasted into the minibuffer,
> then unescaping once would not be enough. So I presume the thing to do
> is to take the URL and unescape it repeatedly until it loses all
> escapes, and then escape it *once* before inserting it in the org
> buffer.
>
> Sounds icky, kludgy, dirty. The question is: 1) is it a solution?
> and 2) is there a better one?

No, this wouldn't be a solution. Consider a link with the sequence
%2525 -- Unescape until no more escapes (or rather "escapes") will
produce a single `%', not %25. Either escape once, or not at all.

What roughly happens is this:

1. The user enters a link via `org-insert-link'
2. Org escapes the link and writes it to the buffer
3. The user opens the link with `org-open-at-poin'
4. Org reads the link from the buffer and unescapes it
5. The link gets escaped and passed to the cosuming application (i.e. browser)

For steps 2 and 4 it is guaranteed that

(string= link (org-link-unescape (org-link-escape link)))

Thus, the problem is not in 2 or 4, but in 1 or 5.

Step 5 assumes, that a link entered by the user in step 1 was an
unescaped link and thus needs escaping before it is passed to the
cosuming application. If you enter a link in step 1 that already is
escaped, this assumption fails and you'll end up with a double-escaped
link that is passed to the consumer.

In other words, the question is: How to decide whether an arbitrary
URL is percent-escaped or not?

Now here's the problem: You can't. Is

"http://example.tld/foo%40bar"

already escaped or not? You can't tell for sure. It depends on the
application you copied the link from.[1]

What we could do in step 5 is... guess. If the (unescaped) link
produced by step 4 does contain characters that need escaping, we
escape the link. Otherwise we don't.

Not quiet sure about the impact of such a change.

Best,
 -- David

[1] Even worse: It may even depend on /how/ or /where/ you copied the
link. E.g. the link to a wikipedia page about set theory is copied as

http://de.wikipedia.org/wiki/Menge_%28Mathematik%29

if C-c'ed from the address bar but copied as

http://de.wikipedia.org/wiki/Menge_(Mathematik)

if C-c'ed via "Copy link to clipboard" at another page (Iceweasel
3.6.23).
--
OpenPGP... 0x99ADB83B5A4478E6
Jabber.... dmjena@jabber.org
Email..... dmaus@ictsoc.de

[-- Attachment #2: Type: application/pgp-signature, Size: 230 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [bug] Org link dialog escapes URL spaces incorrectly
  2011-11-05 14:04         ` David Maus
@ 2011-11-05 15:38           ` Nick Dokos
  2011-11-06 14:48             ` David Maus
  0 siblings, 1 reply; 15+ messages in thread
From: Nick Dokos @ 2011-11-05 15:38 UTC (permalink / raw)
  To: David Maus; +Cc: Jeff Horn, Org-mode ml, nicholas.dokos

David Maus <dmaus@ictsoc.de> wrote:

> At Fri, 04 Nov 2011 14:25:42 -0400,
> Nick Dokos wrote:
> >
> > Nick Dokos <nicholas.dokos@hp.com> wrote:
> >
> > > It probably does, but that's probably not the best place to do it: it might be
> > > better to do it in the (setq link on line 9090 or thereabouts. Otherwise, in
> > > the *other* case (editing the link at point), we'll end up unescaping twice:
> > > probably not a problem, since unescaping should be idempotent (in contrast to
> > > escaping ;-) ) but why do it twice?
> > >
> >
> > Brian Wightman pointed out to me that the idempotent part of the
> > statement above is definitely wrong (d'oh). The original URL that Jeff
> > Horn posted, when unescaped once, would be completely free of % signs.
> > But if the second (doubly-escaped) form is pasted into the minibuffer,
> > then unescaping once would not be enough. So I presume the thing to do
> > is to take the URL and unescape it repeatedly until it loses all
> > escapes, and then escape it *once* before inserting it in the org
> > buffer.
> >
> > Sounds icky, kludgy, dirty. The question is: 1) is it a solution?
> > and 2) is there a better one?
> 
> No, this wouldn't be a solution. Consider a link with the sequence
> %2525 -- Unescape until no more escapes (or rather "escapes") will
> produce a single `%', not %25. Either escape once, or not at all.
> 
> What roughly happens is this:
> 
> 1. The user enters a link via `org-insert-link'
> 2. Org escapes the link and writes it to the buffer
> 3. The user opens the link with `org-open-at-poin'
> 4. Org reads the link from the buffer and unescapes it
> 5. The link gets escaped and passed to the cosuming application (i.e. browser)
> 
> For steps 2 and 4 it is guaranteed that
> 
> (string= link (org-link-unescape (org-link-escape link)))
> 
> Thus, the problem is not in 2 or 4, but in 1 or 5.
> 
> Step 5 assumes, that a link entered by the user in step 1 was an
> unescaped link and thus needs escaping before it is passed to the
> cosuming application. If you enter a link in step 1 that already is
> escaped, this assumption fails and you'll end up with a double-escaped
> link that is passed to the consumer.
> 
> In other words, the question is: How to decide whether an arbitrary
> URL is percent-escaped or not?
> 
> Now here's the problem: You can't. Is
> 
> "http://example.tld/foo%40bar"
> 
> already escaped or not? You can't tell for sure. It depends on the
> application you copied the link from.[1]
> 
> What we could do in step 5 is... guess. If the (unescaped) link
> produced by step 4 does contain characters that need escaping, we
> escape the link. Otherwise we don't.
> 
> Not quiet sure about the impact of such a change.
> 
> Best,
>  -- David
> 
> [1] Even worse: It may even depend on /how/ or /where/ you copied the
> link. E.g. the link to a wikipedia page about set theory is copied as
> 
> http://de.wikipedia.org/wiki/Menge_%28Mathematik%29
> 
> if C-c'ed from the address bar but copied as
> 
> http://de.wikipedia.org/wiki/Menge_(Mathematik)
> 
> if C-c'ed via "Copy link to clipboard" at another page (Iceweasel
> 3.6.23).

What a mess - thanks for the clear explanation. Be that as it may, the
treatment of links in org-insert-link is inconsistent: in one case, the
link is unescaped once, in the other not at all. So I presume that a
single unescaping is desirable (it would e.g. deal with both wikipedia
links correctly). But there will be corner cases where this will fail
and some manual editing of the link will be necessary, so in addition, a
warning in the documentation should be added, if not there already -
after my other experience with the calendar, I should check the manual:
ok, a cursory look shows no evidence, so maybe a footnote where
org-insert-link is described would be a good idea.

Thoughts?
Nick

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [bug] Org link dialog escapes URL spaces incorrectly
  2011-11-05 15:38           ` Nick Dokos
@ 2011-11-06 14:48             ` David Maus
  2011-11-06 22:29               ` Nick Dokos
  0 siblings, 1 reply; 15+ messages in thread
From: David Maus @ 2011-11-06 14:48 UTC (permalink / raw)
  To: nicholas.dokos; +Cc: David Maus, Jeff Horn, Org-mode ml

[-- Attachment #1: Type: text/plain, Size: 4650 bytes --]

At Sat, 05 Nov 2011 11:38:56 -0400,
Nick Dokos wrote:
>
> David Maus <dmaus@ictsoc.de> wrote:
>
> > At Fri, 04 Nov 2011 14:25:42 -0400,
> > Nick Dokos wrote:
> > >
> > > Nick Dokos <nicholas.dokos@hp.com> wrote:
> > >
> > > > It probably does, but that's probably not the best place to do it: it might be
> > > > better to do it in the (setq link on line 9090 or thereabouts. Otherwise, in
> > > > the *other* case (editing the link at point), we'll end up unescaping twice:
> > > > probably not a problem, since unescaping should be idempotent (in contrast to
> > > > escaping ;-) ) but why do it twice?
> > > >
> > >
> > > Brian Wightman pointed out to me that the idempotent part of the
> > > statement above is definitely wrong (d'oh). The original URL that Jeff
> > > Horn posted, when unescaped once, would be completely free of % signs.
> > > But if the second (doubly-escaped) form is pasted into the minibuffer,
> > > then unescaping once would not be enough. So I presume the thing to do
> > > is to take the URL and unescape it repeatedly until it loses all
> > > escapes, and then escape it *once* before inserting it in the org
> > > buffer.
> > >
> > > Sounds icky, kludgy, dirty. The question is: 1) is it a solution?
> > > and 2) is there a better one?
> >
> > No, this wouldn't be a solution. Consider a link with the sequence
> > %2525 -- Unescape until no more escapes (or rather "escapes") will
> > produce a single `%', not %25. Either escape once, or not at all.
> >
> > What roughly happens is this:
> >
> > 1. The user enters a link via `org-insert-link'
> > 2. Org escapes the link and writes it to the buffer
> > 3. The user opens the link with `org-open-at-poin'
> > 4. Org reads the link from the buffer and unescapes it
> > 5. The link gets escaped and passed to the cosuming application (i.e. browser)
> >
> > For steps 2 and 4 it is guaranteed that
> >
> > (string= link (org-link-unescape (org-link-escape link)))
> >
> > Thus, the problem is not in 2 or 4, but in 1 or 5.
> >
> > Step 5 assumes, that a link entered by the user in step 1 was an
> > unescaped link and thus needs escaping before it is passed to the
> > cosuming application. If you enter a link in step 1 that already is
> > escaped, this assumption fails and you'll end up with a double-escaped
> > link that is passed to the consumer.
> >
> > In other words, the question is: How to decide whether an arbitrary
> > URL is percent-escaped or not?
> >
> > Now here's the problem: You can't. Is
> >
> > "http://example.tld/foo%40bar"
> >
> > already escaped or not? You can't tell for sure. It depends on the
> > application you copied the link from.[1]
> >
> > What we could do in step 5 is... guess. If the (unescaped) link
> > produced by step 4 does contain characters that need escaping, we
> > escape the link. Otherwise we don't.
> >
> > Not quiet sure about the impact of such a change.
> >
> > Best,
> >  -- David
> >
> > [1] Even worse: It may even depend on /how/ or /where/ you copied the
> > link. E.g. the link to a wikipedia page about set theory is copied as
> >
> > http://de.wikipedia.org/wiki/Menge_%28Mathematik%29
> >
> > if C-c'ed from the address bar but copied as
> >
> > http://de.wikipedia.org/wiki/Menge_(Mathematik)
> >
> > if C-c'ed via "Copy link to clipboard" at another page (Iceweasel
> > 3.6.23).
>
> What a mess - thanks for the clear explanation. Be that as it may, the
> treatment of links in org-insert-link is inconsistent: in one case, the
> link is unescaped once, in the other not at all.

Hm. I still don't see an inconsistency here: `org-insert-link' reads a
link, `org-open-at-point' calls the target application with exactly
this link but applies percent escaping.

Maybe we could drop this last escaping and push the responsibilty of
handling non-conformant URLs to the target application. After all: If
the user enters an unescaped link and the target application can't
handle it, it's the users fault.

> So I presume that a single unescaping is desirable (it would
> e.g. deal with both wikipedia links correctly). But there will be
> corner cases where this will fail and some manual editing of the
> link will be necessary,

If you mean unescape the user input in step 1 (`org-insert-link'):
This could work, too. Besides the edge cases it requires some care to
stored links (via `org-store-links') -- the percent character is a
valid character for certain link types (e.g. message ids).

Personally I start to like the idea of just dopping the escaping
before calling the target application.

Best,
  -- David
--
OpenPGP... 0x99ADB83B5A4478E6
Jabber.... dmjena@jabber.org
Email..... dmaus@ictsoc.de

[-- Attachment #2: Type: application/pgp-signature, Size: 230 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [bug] Org link dialog escapes URL spaces incorrectly
  2011-11-06 14:48             ` David Maus
@ 2011-11-06 22:29               ` Nick Dokos
  2011-11-13  9:12                 ` David Maus
  0 siblings, 1 reply; 15+ messages in thread
From: Nick Dokos @ 2011-11-06 22:29 UTC (permalink / raw)
  To: David Maus; +Cc: Jeff Horn, Org-mode ml, nicholas.dokos

David Maus <dmaus@ictsoc.de> wrote:

> > What a mess - thanks for the clear explanation. Be that as it may, the
> > treatment of links in org-insert-link is inconsistent: in one case, the
> > link is unescaped once, in the other not at all.
> 
> Hm. I still don't see an inconsistency here: `org-insert-link' reads a
> link, `org-open-at-point' calls the target application with exactly
> this link but applies percent escaping.
> 

No, I mean the handling in org-insert-link itself:

line 9048 says

     	    (setq link (org-extract-attributes
			(org-link-unescape (org-match-string-no-properties 1))))


but further down, on line 9114 the link is not unescaped:

	    (setq link
		  (let ((org-completion-use-ido nil)
			(org-completion-use-iswitchb nil))
		    (org-completing-read
		     "Link: "
		     (append
		      (mapcar (lambda (x) (list (concat x ":")))
			      all-prefixes)
		      (mapcar 'car org-stored-links))
		     nil nil nil
		     'tmphist
		     (car (car org-stored-links)))))

The last line of org-insert-link then takes the value of link (unescaped in
one case but not the other), passes it to org-make-link-string and inserts
the result.

That's what I thought was inconsistent.

Nick

> Maybe we could drop this last escaping and push the responsibilty of
> handling non-conformant URLs to the target application. After all: If
> the user enters an unescaped link and the target application can't
> handle it, it's the users fault.
> 
> > So I presume that a single unescaping is desirable (it would
> > e.g. deal with both wikipedia links correctly). But there will be
> > corner cases where this will fail and some manual editing of the
> > link will be necessary,
> 
> If you mean unescape the user input in step 1 (`org-insert-link'):
> This could work, too. Besides the edge cases it requires some care to
> stored links (via `org-store-links') -- the percent character is a
> valid character for certain link types (e.g. message ids).
> 
> Personally I start to like the idea of just dopping the escaping
> before calling the target application.
> 
> Best,
>   -- David
> --
> OpenPGP... 0x99ADB83B5A4478E6
> Jabber.... dmjena@jabber.org
> Email..... dmaus@ictsoc.de

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [bug] Org link dialog escapes URL spaces incorrectly
  2011-11-06 22:29               ` Nick Dokos
@ 2011-11-13  9:12                 ` David Maus
  2011-11-13 17:32                   ` Nick Dokos
  0 siblings, 1 reply; 15+ messages in thread
From: David Maus @ 2011-11-13  9:12 UTC (permalink / raw)
  To: nicholas.dokos; +Cc: David Maus, Jeff Horn, Org-mode ml

[-- Attachment #1: Type: text/plain, Size: 2700 bytes --]

First sorry for the late response, some pressing family matters kept
me busy last week.

At Sun, 06 Nov 2011 17:29:06 -0500,
Nick Dokos wrote:
> No, I mean the handling in org-insert-link itself:
>
> line 9048 says
>
>      	    (setq link (org-extract-attributes
> 			(org-link-unescape (org-match-string-no-properties 1))))

This is step 4: Org reads a bracket link from the buffer and these
links are expected to be escaped (step 2). The unescape restores the
original link.

> but further down, on line 9114 the link is not unescaped:
>
> 	    (setq link
> 		  (let ((org-completion-use-ido nil)
> 			(org-completion-use-iswitchb nil))
> 		    (org-completing-read
> 		     "Link: "
> 		     (append
> 		      (mapcar (lambda (x) (list (concat x ":")))
> 			      all-prefixes)
> 		      (mapcar 'car org-stored-links))
> 		     nil nil nil
> 		     'tmphist
> 		     (car (car org-stored-links)))))
Here we read the user's input and provide completion for stored links
and registered link types. There's no need to unescape the link, we
assume that whatever the user enters here is the link in its normal
form.

In both cases the minibuffer contains a link in its normal form. In
the first case obtaining the normal form requires unescaping because
whatever the user enters in the minibuffer is escaped once before
written to buffer.

To make things a little bit clearer lets look into the specs
(RFC3986):

#+begin_quote
2.4.  When to Encode or Decode

   Under normal circumstances, the only time when octets within a URI
   are percent-encoded is during the process of producing the URI from
   its component parts.  This is when an implementation determines which
   of the reserved characters are to be used as subcomponent delimiters
   and which can be safely used as data.  Once produced, a URI is always
   in its percent-encoded form.
#+end_quote

In other words: Steps 2 and 4, escaping and unescaping when a link is
written to or read from the buffer has nothing to do with
percent-escaping of URIs. Org just happens to use the percent-escaping
algorithm to store a link in the buffer and encode special chars
(i.e. brackets).

After reading the specs I'm quite confident that dropping the
percent-escaping of HTTP links in `org-open-at-point' is the right
thing to do: If you paste a URI from the browser or enter it manually
Org does not *produce* the URI and therefor is not responsible for
proper escaping or unescaping.

I think we should remove the percent-escaping in `org-open-at-point'
after 7.8 was released and see if this works out without breaking to
many links.

Best,
  -- David
--
OpenPGP... 0x99ADB83B5A4478E6
Jabber.... dmjena@jabber.org
Email..... dmaus@ictsoc.de

[-- Attachment #2: Type: application/pgp-signature, Size: 230 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [bug] Org link dialog escapes URL spaces incorrectly
  2011-11-13  9:12                 ` David Maus
@ 2011-11-13 17:32                   ` Nick Dokos
  2011-12-15  1:30                     ` Jeff Horn
  0 siblings, 1 reply; 15+ messages in thread
From: Nick Dokos @ 2011-11-13 17:32 UTC (permalink / raw)
  To: David Maus; +Cc: Jeff Horn, Org-mode ml, nicholas.dokos

David Maus <dmaus@ictsoc.de> wrote:

> First sorry for the late response, some pressing family matters kept
> me busy last week.
> 
> At Sun, 06 Nov 2011 17:29:06 -0500,
> Nick Dokos wrote:
> > No, I mean the handling in org-insert-link itself:
> >
> > line 9048 says
> >
> >      	    (setq link (org-extract-attributes
> > 			(org-link-unescape (org-match-string-no-properties 1))))
> 
> This is step 4: Org reads a bracket link from the buffer and these
> links are expected to be escaped (step 2). The unescape restores the
> original link.
> 
> > but further down, on line 9114 the link is not unescaped:
> >
> > 	    (setq link
> > 		  (let ((org-completion-use-ido nil)
> > 			(org-completion-use-iswitchb nil))
> > 		    (org-completing-read
> > 		     "Link: "
> > 		     (append
> > 		      (mapcar (lambda (x) (list (concat x ":")))
> > 			      all-prefixes)
> > 		      (mapcar 'car org-stored-links))
> > 		     nil nil nil
> > 		     'tmphist
> > 		     (car (car org-stored-links)))))
> Here we read the user's input and provide completion for stored links
> and registered link types. There's no need to unescape the link, we
> assume that whatever the user enters here is the link in its normal
> form.
> 
> In both cases the minibuffer contains a link in its normal form. In
> the first case obtaining the normal form requires unescaping because
> whatever the user enters in the minibuffer is escaped once before
> written to buffer.
> 
> To make things a little bit clearer lets look into the specs
> (RFC3986):
> 
> #+begin_quote
> 2.4.  When to Encode or Decode
> 
>    Under normal circumstances, the only time when octets within a URI
>    are percent-encoded is during the process of producing the URI from
>    its component parts.  This is when an implementation determines which
>    of the reserved characters are to be used as subcomponent delimiters
>    and which can be safely used as data.  Once produced, a URI is always
>    in its percent-encoded form.
> #+end_quote
> 
> In other words: Steps 2 and 4, escaping and unescaping when a link is
> written to or read from the buffer has nothing to do with
> percent-escaping of URIs. Org just happens to use the percent-escaping
> algorithm to store a link in the buffer and encode special chars
> (i.e. brackets).
> 

OK - thanks for the checking and the info.

> After reading the specs I'm quite confident that dropping the
> percent-escaping of HTTP links in `org-open-at-point' is the right
> thing to do: If you paste a URI from the browser or enter it manually
> Org does not *produce* the URI and therefor is not responsible for
> proper escaping or unescaping.
> 
> I think we should remove the percent-escaping in `org-open-at-point'
> after 7.8 was released and see if this works out without breaking to
> many links.
> 

Good enough for me. I don't remember whether you supplied a patch for this
(sorry, no time to check atm) but if you did, maybe the OP can test it and
make sure that it does what he expected (or maybe he's done that already -
I can't remember and I'm completely discombobulated right now, even more than
usual).

Thanks,
Nick

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [bug] Org link dialog escapes URL spaces incorrectly
  2011-11-13 17:32                   ` Nick Dokos
@ 2011-12-15  1:30                     ` Jeff Horn
  2011-12-18 17:13                       ` David Maus
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff Horn @ 2011-12-15  1:30 UTC (permalink / raw)
  To: nicholas.dokos; +Cc: David Maus, Org-mode ml

On Sun, Nov 13, 2011 at 12:32, Nick Dokos <nicholas.dokos@hp.com> wrote:
> Good enough for me. I don't remember whether you supplied a patch for this
> (sorry, no time to check atm) but if you did, maybe the OP can test it and
> make sure that it does what he expected (or maybe he's done that already -
> I can't remember and I'm completely discombobulated right now, even more than
> usual).

I was sorry to hear about your family emergency. I hope you are feeling better.

I just tried with the latest source, and the problem still exists for me.

Org-mode version 7.8.02 (release_7.8.02.3.gfb8c)
GNU Emacs 24.0.90.3 (i386-apple-darwin9.8.0, NS apple-appkit-949.54)
of 2011-12-14 on braeburn.aquamacs.org - Aquamacs Distribution 3.xdev

-- 
Jeffrey Horn
http://www.failuretorefrain.com/jeff/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [bug] Org link dialog escapes URL spaces incorrectly
  2011-12-15  1:30                     ` Jeff Horn
@ 2011-12-18 17:13                       ` David Maus
  0 siblings, 0 replies; 15+ messages in thread
From: David Maus @ 2011-12-18 17:13 UTC (permalink / raw)
  To: Jeff Horn; +Cc: David Maus, nicholas.dokos, Org-mode ml

[-- Attachment #1: Type: text/plain, Size: 1124 bytes --]

Hi Jeff,

At Wed, 14 Dec 2011 20:30:06 -0500,
Jeff Horn wrote:
>
> On Sun, Nov 13, 2011 at 12:32, Nick Dokos <nicholas.dokos@hp.com> wrote:
> > Good enough for me. I don't remember whether you supplied a patch for this
> > (sorry, no time to check atm) but if you did, maybe the OP can test it and
> > make sure that it does what he expected (or maybe he's done that already -
> > I can't remember and I'm completely discombobulated right now, even more than
> > usual).
>
> I was sorry to hear about your family emergency. I hope you are feeling better.
>
> I just tried with the latest source, and the problem still exists for me.
>
> Org-mode version 7.8.02 (release_7.8.02.3.gfb8c)
> GNU Emacs 24.0.90.3 (i386-apple-darwin9.8.0, NS apple-appkit-949.54)
> of 2011-12-14 on braeburn.aquamacs.org - Aquamacs Distribution 3.xdev

Yes, the problem still persists. I was waiting for 7.8 to be
released. I'll send a patch for the issue via mailing list this
evening and I hope this change won't break to much old links.

Best,
  -- David
--
OpenPGP... 0x99ADB83B5A4478E6
Jabber.... dmjena@jabber.org
Email..... dmaus@ictsoc.de

[-- Attachment #2: Type: application/pgp-signature, Size: 230 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-12-18 17:13 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-02  2:02 [bug] Org link dialog escapes URL spaces incorrectly Jeff Horn
2011-11-04 15:59 ` Jeff Horn
2011-11-04 16:14   ` Nick Dokos
2011-11-04 16:33   ` Nick Dokos
2011-11-04 16:52     ` Nick Dokos
     [not found]       ` <CALbR623mDyhC=LRAQBW9-vJxhsFWR0L+xFOKnE3eq5Wn92y04g@mail.gmail.com>
2011-11-04 18:19         ` Nick Dokos
2011-11-04 18:25       ` Nick Dokos
2011-11-05 14:04         ` David Maus
2011-11-05 15:38           ` Nick Dokos
2011-11-06 14:48             ` David Maus
2011-11-06 22:29               ` Nick Dokos
2011-11-13  9:12                 ` David Maus
2011-11-13 17:32                   ` Nick Dokos
2011-12-15  1:30                     ` Jeff Horn
2011-12-18 17:13                       ` David Maus

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).