From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nicolas Goaziou Subject: Re: Encoding Problem in export? Date: Thu, 25 Jul 2013 23:46:34 +0200 Message-ID: <87li4u48jp.fsf@gmail.com> References: <87bo5s27ey.fsf@sachwertpartner.de> <877ggg7suh.fsf@gmail.com> <51EF32F4.9030309@gmx.de> <87txjk5s2q.fsf@gmail.com> <87a9lcfg9g.fsf@gmail.com> <877ggg5i5q.fsf@gmail.com> <87y58vp9mj.wl%dmaus@ictsoc.de> Mime-Version: 1.0 Content-Type: text/plain Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:57677) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1V2TMf-0000UW-1P for emacs-orgmode@gnu.org; Thu, 25 Jul 2013 17:46:27 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1V2TMd-00021J-Rk for emacs-orgmode@gnu.org; Thu, 25 Jul 2013 17:46:24 -0400 Received: from mail-we0-x230.google.com ([2a00:1450:400c:c03::230]:50487) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1V2TMd-00020B-Lp for emacs-orgmode@gnu.org; Thu, 25 Jul 2013 17:46:23 -0400 Received: by mail-we0-f176.google.com with SMTP id q56so1177237wes.35 for ; Thu, 25 Jul 2013 14:46:22 -0700 (PDT) In-Reply-To: <87y58vp9mj.wl%dmaus@ictsoc.de> (David Maus's message of "Thu, 25 Jul 2013 06:05:24 +0200") List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: David Maus Cc: Nick Dokos , emacs-orgmode@gnu.org Hello, David Maus writes: > IIRC org-link-escape is not used to create URLs but to escape > characters in a link that would otherwise conflict with Orgmode syntax > (e.g. square brackets). > Org applies percent escaping to a link before > it is stored in the buffer and applies unescaping when it reads a link > back. > > The percent sign is hardcoded because if org-link-escape/unescape is > used in this way we must make sure that the identity of a link is > preserved. If we would *not* escape the percent sign, then an original > link with percent encoded characters would be read back wrongly, > i.e. with the percent escaped characters unescaped. [...] > There is, of course, the nasty thing that we don't know if the link in > a buffer went through org-link-escape or not. E.g. if you paste > > ,---- > | [[http://redirect.example.org?url=http%3A%2F%2Ftarget.example.org%3Fid%3D33%26format%3Dhtml]] > `---- > > into the buffer you'll get a broken link because org-link-open assumes > the link to be escaped by org. > > The bottom-line: Org creates link programmatically (org-store-link) > and needs a mechanism to protected conflicting characters. It chose > percent-escaping and in order to preserve the identity of a link Org > has to escape the escape-character. > > Hope that helps! It does. I think we are hunting two hares and that's why we are failing so far. There are two URI transformations involved. One is mandatory (escape square brackets in URI), and the other one is optional (normalize URI for external processes consumption). The former must be bi-directional, as escaping brackets must be transparent to the user (e.g., when editing a link with `org-insert-link'). The latter needn't and can happen on the fly, just before the URI is sent to whatever needs it (e.g., a browser). Therefore, I suggest to use three functions: - `org-link-escape will first %-escape "%" characters, and then "[" and "]" characters. `org-link-unescape' will reverse the operation. These function cannot break a link, encoded or not. They are applied when a link is created programmatically and read back for user editing. - `org-link-encode'[1] will %-escape every forbidden character in the URI. It doesn't need any "reverse" function. It will be called when opening a link, or parsing it. I think it shouldn't escape "%" characters, though, so that it can be applied on both encoded and plain strings. Since it isn't perfect (it doesn't parse URI), it should also be very conservative (i.e. allow more characters such as "=" or "&") and not get in the way. WDYT? Regards, [1] `url-encode-url' was introduced in Emacs 24.3. It is too young to be used mainstream, even though it does a better job than `org-link-escape'. We will benefit from it when Emacs 25 is out (i.e. when Emacs 23 support is dropped). -- Nicolas Goaziou