From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nick Dokos Subject: Re: [bug] Org link dialog escapes URL spaces incorrectly Date: Sat, 05 Nov 2011 11:38:56 -0400 Message-ID: <23704.1320507536@alphaville.dokosmarshall.org> References: <23807.1320424380@alphaville.dokosmarshall.org> <24425.1320425559@alphaville.dokosmarshall.org> <26969.1320431142@alphaville.dokosmarshall.org> <87mxcazmvj.wl%dmaus@ictsoc.de> Reply-To: nicholas.dokos@hp.com Return-path: Received: from eggs.gnu.org ([140.186.70.92]:48111) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RMiKj-0001il-DV for emacs-orgmode@gnu.org; Sat, 05 Nov 2011 11:39:02 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RMiKh-0001jE-PH for emacs-orgmode@gnu.org; Sat, 05 Nov 2011 11:39:01 -0400 Received: from g5t0006.atlanta.hp.com ([15.192.0.43]:1290) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RMiKh-0001j4-Ic for emacs-orgmode@gnu.org; Sat, 05 Nov 2011 11:38:59 -0400 In-Reply-To: Message from David Maus of "Sat, 05 Nov 2011 15:04:32 BST." <87mxcazmvj.wl%dmaus@ictsoc.de> List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: David Maus Cc: Jeff Horn , Org-mode ml , nicholas.dokos@hp.com David Maus wrote: > At Fri, 04 Nov 2011 14:25:42 -0400, > Nick Dokos wrote: > > > > Nick Dokos wrote: > > > > > It probably does, but that's probably not the best place to do it: it might be > > > better to do it in the (setq link on line 9090 or thereabouts. Otherwise, in > > > the *other* case (editing the link at point), we'll end up unescaping twice: > > > probably not a problem, since unescaping should be idempotent (in contrast to > > > escaping ;-) ) but why do it twice? > > > > > > > Brian Wightman pointed out to me that the idempotent part of the > > statement above is definitely wrong (d'oh). The original URL that Jeff > > Horn posted, when unescaped once, would be completely free of % signs. > > But if the second (doubly-escaped) form is pasted into the minibuffer, > > then unescaping once would not be enough. So I presume the thing to do > > is to take the URL and unescape it repeatedly until it loses all > > escapes, and then escape it *once* before inserting it in the org > > buffer. > > > > Sounds icky, kludgy, dirty. The question is: 1) is it a solution? > > and 2) is there a better one? > > No, this wouldn't be a solution. Consider a link with the sequence > %2525 -- Unescape until no more escapes (or rather "escapes") will > produce a single `%', not %25. Either escape once, or not at all. > > What roughly happens is this: > > 1. The user enters a link via `org-insert-link' > 2. Org escapes the link and writes it to the buffer > 3. The user opens the link with `org-open-at-poin' > 4. Org reads the link from the buffer and unescapes it > 5. The link gets escaped and passed to the cosuming application (i.e. browser) > > For steps 2 and 4 it is guaranteed that > > (string= link (org-link-unescape (org-link-escape link))) > > Thus, the problem is not in 2 or 4, but in 1 or 5. > > Step 5 assumes, that a link entered by the user in step 1 was an > unescaped link and thus needs escaping before it is passed to the > cosuming application. If you enter a link in step 1 that already is > escaped, this assumption fails and you'll end up with a double-escaped > link that is passed to the consumer. > > In other words, the question is: How to decide whether an arbitrary > URL is percent-escaped or not? > > Now here's the problem: You can't. Is > > "http://example.tld/foo%40bar" > > already escaped or not? You can't tell for sure. It depends on the > application you copied the link from.[1] > > What we could do in step 5 is... guess. If the (unescaped) link > produced by step 4 does contain characters that need escaping, we > escape the link. Otherwise we don't. > > Not quiet sure about the impact of such a change. > > Best, > -- David > > [1] Even worse: It may even depend on /how/ or /where/ you copied the > link. E.g. the link to a wikipedia page about set theory is copied as > > http://de.wikipedia.org/wiki/Menge_%28Mathematik%29 > > if C-c'ed from the address bar but copied as > > http://de.wikipedia.org/wiki/Menge_(Mathematik) > > if C-c'ed via "Copy link to clipboard" at another page (Iceweasel > 3.6.23). What a mess - thanks for the clear explanation. Be that as it may, the treatment of links in org-insert-link is inconsistent: in one case, the link is unescaped once, in the other not at all. So I presume that a single unescaping is desirable (it would e.g. deal with both wikipedia links correctly). But there will be corner cases where this will fail and some manual editing of the link will be necessary, so in addition, a warning in the documentation should be added, if not there already - after my other experience with the calendar, I should check the manual: ok, a cursory look shows no evidence, so maybe a footnote where org-insert-link is described would be a good idea. Thoughts? Nick