From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Maus Subject: Re: Encoding Problem in export? Date: Fri, 26 Jul 2013 06:03:41 +0200 Message-ID: <87r4emdl2a.wl%dmaus@ictsoc.de> References: <87bo5s27ey.fsf@sachwertpartner.de> <877ggg7suh.fsf@gmail.com> <51EF32F4.9030309@gmx.de> <87txjk5s2q.fsf@gmail.com> <87a9lcfg9g.fsf@gmail.com> <877ggg5i5q.fsf@gmail.com> <87y58vp9mj.wl%dmaus@ictsoc.de> <87li4u48jp.fsf@gmail.com> Mime-Version: 1.0 (generated by SEMI-EPG 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:45090) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1V2ZFp-0001OG-T2 for emacs-orgmode@gnu.org; Fri, 26 Jul 2013 00:03:46 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1V2ZFo-0002RL-LU for emacs-orgmode@gnu.org; Fri, 26 Jul 2013 00:03:45 -0400 Received: from mail.xlhost.de ([213.202.242.118]:37259 helo=app1b.xlhost.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1V2ZFo-0002RH-CL for emacs-orgmode@gnu.org; Fri, 26 Jul 2013 00:03:44 -0400 In-Reply-To: <87li4u48jp.fsf@gmail.com> List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Nicolas Goaziou Cc: David Maus , emacs-orgmode@gnu.org, Nick Dokos At Thu, 25 Jul 2013 23:46:34 +0200, Nicolas Goaziou wrote: > > Hello, > > David Maus writes: > > > > > The bottom-line: Org creates link programmatically (org-store-link) > > and needs a mechanism to protected conflicting characters. It chose > > percent-escaping and in order to preserve the identity of a link Org > > has to escape the escape-character. > > > > Hope that helps! > > It does. > > I think we are hunting two hares and that's why we are failing so far. > > > There are two URI transformations involved. One is mandatory (escape > square brackets in URI), and the other one is optional (normalize URI > for external processes consumption). The former must be bi-directional, > as escaping brackets must be transparent to the user (e.g., when editing > a link with `org-insert-link'). The latter needn't and can happen on the > fly, just before the URI is sent to whatever needs it (e.g., a browser). > > Therefore, I suggest to use three functions: > > - `org-link-escape will first %-escape "%" characters, and then "[" > and "]" characters. `org-link-unescape' will reverse the operation. > > These function cannot break a link, encoded or not. They are applied > when a link is created programmatically and read back for user > editing. It's not just square brackets, but also non-ascii characters. Consider a link that contains UTF-8 encoded characters and is inserted into a Org buffer encoded in ISO-8859-1. Oh, and: ASCII controll characters. A link description with newlines. Obviously changing the algorithm of org-link-escape/unescape also creates a BC-issue. > > - `org-link-encode'[1] will %-escape every forbidden character in the > URI. It doesn't need any "reverse" function. It will be called when > opening a link, or parsing it. > > I think it shouldn't escape "%" characters, though, so that it can > be applied on both encoded and plain strings. Since it isn't perfect > (it doesn't parse URI), it should also be very conservative (i.e. > allow more characters such as "=" or "&") and not get in the way. You would have to select the list of forbidden characters based on the link protocol. The assumption underlying the current implementation is to delegate dealing with forbidden characters to the consuming application. Thus I would limit this to known URI protocols, i.e. http: and https:. Best, -- David > > WDYT? > > > Regards, > > [1] `url-encode-url' was introduced in Emacs 24.3. It is too young to be > used mainstream, even though it does a better job than > `org-link-escape'. We will benefit from it when Emacs 25 is out (i.e. > when Emacs 23 support is dropped). > > -- > Nicolas Goaziou -- OpenPGP... 0x99ADB83B5A4478E6 Jabber.... dmjena@jabber.org Email..... dmaus@ictsoc.de