From: Nick Dokos <nicholas.dokos@hp.com>
To: David Maus <dmaus@ictsoc.de>
Cc: Jeff Horn <jrhorn424@gmail.com>,
Org-mode ml <emacs-orgmode@gnu.org>,
nicholas.dokos@hp.com
Subject: Re: [bug] Org link dialog escapes URL spaces incorrectly
Date: Sat, 05 Nov 2011 11:38:56 -0400 [thread overview]
Message-ID: <23704.1320507536@alphaville.dokosmarshall.org> (raw)
In-Reply-To: Message from David Maus <dmaus@ictsoc.de> of "Sat, 05 Nov 2011 15:04:32 BST." <87mxcazmvj.wl%dmaus@ictsoc.de>
David Maus <dmaus@ictsoc.de> wrote:
> At Fri, 04 Nov 2011 14:25:42 -0400,
> Nick Dokos wrote:
> >
> > Nick Dokos <nicholas.dokos@hp.com> wrote:
> >
> > > It probably does, but that's probably not the best place to do it: it might be
> > > better to do it in the (setq link on line 9090 or thereabouts. Otherwise, in
> > > the *other* case (editing the link at point), we'll end up unescaping twice:
> > > probably not a problem, since unescaping should be idempotent (in contrast to
> > > escaping ;-) ) but why do it twice?
> > >
> >
> > Brian Wightman pointed out to me that the idempotent part of the
> > statement above is definitely wrong (d'oh). The original URL that Jeff
> > Horn posted, when unescaped once, would be completely free of % signs.
> > But if the second (doubly-escaped) form is pasted into the minibuffer,
> > then unescaping once would not be enough. So I presume the thing to do
> > is to take the URL and unescape it repeatedly until it loses all
> > escapes, and then escape it *once* before inserting it in the org
> > buffer.
> >
> > Sounds icky, kludgy, dirty. The question is: 1) is it a solution?
> > and 2) is there a better one?
>
> No, this wouldn't be a solution. Consider a link with the sequence
> %2525 -- Unescape until no more escapes (or rather "escapes") will
> produce a single `%', not %25. Either escape once, or not at all.
>
> What roughly happens is this:
>
> 1. The user enters a link via `org-insert-link'
> 2. Org escapes the link and writes it to the buffer
> 3. The user opens the link with `org-open-at-poin'
> 4. Org reads the link from the buffer and unescapes it
> 5. The link gets escaped and passed to the cosuming application (i.e. browser)
>
> For steps 2 and 4 it is guaranteed that
>
> (string= link (org-link-unescape (org-link-escape link)))
>
> Thus, the problem is not in 2 or 4, but in 1 or 5.
>
> Step 5 assumes, that a link entered by the user in step 1 was an
> unescaped link and thus needs escaping before it is passed to the
> cosuming application. If you enter a link in step 1 that already is
> escaped, this assumption fails and you'll end up with a double-escaped
> link that is passed to the consumer.
>
> In other words, the question is: How to decide whether an arbitrary
> URL is percent-escaped or not?
>
> Now here's the problem: You can't. Is
>
> "http://example.tld/foo%40bar"
>
> already escaped or not? You can't tell for sure. It depends on the
> application you copied the link from.[1]
>
> What we could do in step 5 is... guess. If the (unescaped) link
> produced by step 4 does contain characters that need escaping, we
> escape the link. Otherwise we don't.
>
> Not quiet sure about the impact of such a change.
>
> Best,
> -- David
>
> [1] Even worse: It may even depend on /how/ or /where/ you copied the
> link. E.g. the link to a wikipedia page about set theory is copied as
>
> http://de.wikipedia.org/wiki/Menge_%28Mathematik%29
>
> if C-c'ed from the address bar but copied as
>
> http://de.wikipedia.org/wiki/Menge_(Mathematik)
>
> if C-c'ed via "Copy link to clipboard" at another page (Iceweasel
> 3.6.23).
What a mess - thanks for the clear explanation. Be that as it may, the
treatment of links in org-insert-link is inconsistent: in one case, the
link is unescaped once, in the other not at all. So I presume that a
single unescaping is desirable (it would e.g. deal with both wikipedia
links correctly). But there will be corner cases where this will fail
and some manual editing of the link will be necessary, so in addition, a
warning in the documentation should be added, if not there already -
after my other experience with the calendar, I should check the manual:
ok, a cursory look shows no evidence, so maybe a footnote where
org-insert-link is described would be a good idea.
Thoughts?
Nick
next prev parent reply other threads:[~2011-11-05 15:39 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-11-02 2:02 [bug] Org link dialog escapes URL spaces incorrectly Jeff Horn
2011-11-04 15:59 ` Jeff Horn
2011-11-04 16:14 ` Nick Dokos
2011-11-04 16:33 ` Nick Dokos
2011-11-04 16:52 ` Nick Dokos
[not found] ` <CALbR623mDyhC=LRAQBW9-vJxhsFWR0L+xFOKnE3eq5Wn92y04g@mail.gmail.com>
2011-11-04 18:19 ` Nick Dokos
2011-11-04 18:25 ` Nick Dokos
2011-11-05 14:04 ` David Maus
2011-11-05 15:38 ` Nick Dokos [this message]
2011-11-06 14:48 ` David Maus
2011-11-06 22:29 ` Nick Dokos
2011-11-13 9:12 ` David Maus
2011-11-13 17:32 ` Nick Dokos
2011-12-15 1:30 ` Jeff Horn
2011-12-18 17:13 ` David Maus
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.orgmode.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=23704.1320507536@alphaville.dokosmarshall.org \
--to=nicholas.dokos@hp.com \
--cc=dmaus@ictsoc.de \
--cc=emacs-orgmode@gnu.org \
--cc=jrhorn424@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).