From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Maus Subject: Re: Encoding Problem in export? Date: Thu, 25 Jul 2013 06:05:24 +0200 Message-ID: <87y58vp9mj.wl%dmaus@ictsoc.de> References: <87bo5s27ey.fsf@sachwertpartner.de> <877ggg7suh.fsf@gmail.com> <51EF32F4.9030309@gmx.de> <87txjk5s2q.fsf@gmail.com> <87a9lcfg9g.fsf@gmail.com> <877ggg5i5q.fsf@gmail.com> Mime-Version: 1.0 (generated by SEMI-EPG 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:47701) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1V2Cnx-0003HV-6g for emacs-orgmode@gnu.org; Thu, 25 Jul 2013 00:05:30 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1V2Cnv-0002Uo-UM for emacs-orgmode@gnu.org; Thu, 25 Jul 2013 00:05:29 -0400 Received: from mail.xlhost.de ([213.202.242.118]:45988 helo=app1b.xlhost.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1V2Cnv-0002US-KK for emacs-orgmode@gnu.org; Thu, 25 Jul 2013 00:05:27 -0400 In-Reply-To: <877ggg5i5q.fsf@gmail.com> List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Nicolas Goaziou Cc: Nick Dokos , emacs-orgmode@gnu.org, David Maus Hi Nicolas, Hi Nick, At Wed, 24 Jul 2013 13:09:05 +0200, Nicolas Goaziou wrote: > > Hello, > > Nick Dokos writes: > > > Maybe the thing to do is to delete '=' from org-link-escape-chars and > > see what problems arise. > > AFAICT, `url-encode-url' is subtler than that. It encodes characters > whenever they are really forbidden, which is not the case of > `org-link-escape'. Hence my initial question: do we need to reinvent the > wheel? > > > But I did find that '%' was originally in org-link-escape-chars and > > David Maus hardcoded it (commit 139cc1d4), so that it is *always* > > escaped. > > I Cc David Maus in case he has time to enlighten us about his choice. > IIRC org-link-escape is not used to create URLs but to escape characters in a link that would otherwise conflict with Orgmode syntax (e.g. square brackets). Org applies percent escaping to a link before it is stored in the buffer and applies unescaping when it reads a link back. The percent sign is hardcoded because if org-link-escape/unescape is used in this way we must make sure that the identity of a link is preserved. If we would *not* escape the percent sign, then an original link with percent encoded characters would be read back wrongly, i.e. with the percent escaped characters unescaped. This broke links. E.g. consider a redirector link to the target url `http://target.example.org?id=33&format=html"': ,---- | http://redirect.example.org?url=http%3A%2F%2Ftarget.example.org%3Fid%3D33%26format%3Dhtml `---- If we don't escape the percent sign but apply unescaping when, say, the user opens the link we would get: ,---- | http://redirect.example.org?url=http://target.example.org?id=33&format=html `---- And voila: The `format' parameter is turned into a query parameter of redirect.example.org, not target.example.org. The specs (RFC3986) have to say the following about escaping: ,---- | Because the percent ("%") character serves as the indicator for | percent-encoded octets, it must be percent-encoded as "%25" for that | octet to be used as data within a URI. Implementations must not | percent-encode or decode the same string more than once, as decoding | an already decoded string might lead to misinterpreting a percent | data octet as the beginning of a percent-encoding, or vice versa in | the case of percent-encoding an already percent-encoded string. `---- There is, of course, the nasty thing that we don't know if the link in a buffer went through org-link-escape or not. E.g. if you paste ,---- | [[http://redirect.example.org?url=http%3A%2F%2Ftarget.example.org%3Fid%3D33%26format%3Dhtml]] `---- into the buffer you'll get a broken link because org-link-open assumes the link to be escaped by org. The bottom-line: Org creates link programmatically (org-store-link) and needs a mechanism to protected conflicting characters. It chose percent-escaping and in order to preserve the identity of a link Org has to escape the escape-character. Hope that helps! Best, -- David -- OpenPGP... 0x99ADB83B5A4478E6 Jabber.... dmjena@jabber.org Email..... dmaus@ictsoc.de