From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nicolas Goaziou Subject: Re: Encoding Problem in export? Date: Fri, 26 Jul 2013 12:20:37 +0200 Message-ID: <87d2q54o7e.fsf@gmail.com> References: <87bo5s27ey.fsf@sachwertpartner.de> <877ggg7suh.fsf@gmail.com> <51EF32F4.9030309@gmx.de> <87txjk5s2q.fsf@gmail.com> <87a9lcfg9g.fsf@gmail.com> <877ggg5i5q.fsf@gmail.com> <87y58vp9mj.wl%dmaus@ictsoc.de> <87li4u48jp.fsf@gmail.com> <87r4emdl2a.wl%dmaus@ictsoc.de> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:36679) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1V2f8M-0006VS-LK for emacs-orgmode@gnu.org; Fri, 26 Jul 2013 06:20:28 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1V2f8L-000504-0T for emacs-orgmode@gnu.org; Fri, 26 Jul 2013 06:20:26 -0400 Received: from mail-wg0-x230.google.com ([2a00:1450:400c:c00::230]:62356) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1V2f8K-0004zz-Pr for emacs-orgmode@gnu.org; Fri, 26 Jul 2013 06:20:24 -0400 Received: by mail-wg0-f48.google.com with SMTP id f12so2028196wgh.15 for ; Fri, 26 Jul 2013 03:20:24 -0700 (PDT) In-Reply-To: <87r4emdl2a.wl%dmaus@ictsoc.de> (David Maus's message of "Fri, 26 Jul 2013 06:03:41 +0200") List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: David Maus Cc: Nick Dokos , emacs-orgmode@gnu.org David Maus writes: Thanks for your answer. It seems I got confused with the current state of URI-encoding. Please scratch my previous suggestion and let's start over. > The assumption underlying the current implementation is > to delegate dealing with forbidden characters to the consuming > application. I agree with this assumption, even though I think some URI-fixing (=C3=A0 la `url-encode-url') would be nice too. But that's not the topic here. Also, the current implementation doesn't totally follow this assumption (e.g. `org-link-escape-chars-browser'). Alas, there is a serious flaw in the current implementation. As you said: > There is, of course, the nasty thing that we don't know if the link in > a buffer went through org-link-escape or not. E.g. if you paste > > ,---- > | [[http://redirect.example.org?url=3Dhttp%3A%2F%2Ftarget.example.org%3Fi= d%3D33%26format%3Dhtml]] > `---- > > into the buffer you'll get a broken link because org-link-open assumes > the link to be escaped by org. There is, indeed, no easy way to know if a link went through `org-link-escape', so we cannot unescape it properly in every situation. We could use text properties on escaped links, but that seems awkward. I think there is a simpler solution: we never "unescape" links, which means that escaping must be at its minimum. For example, we could only replace "[" and "]" with, respectively, "%5B" and "%5D" and newlines with spaces. It doesn't cripple link's readability very muc=C4=A5, and is safe as "[", "]" and "\n" are always forbidden in URI anyway. Replacing non-ascii characters would make the link unreadable to a human. Also, we don't prevent encoding mismatch (e.g., from UTF-8 to ISO-8859-1) when yanking regular text in an Org buffer, so there's no particular reason to do it for links. This operation is clearly idempotent. When sending the URL to the consuming, there will be problems, according to the assumption at the beginning of this message. But that is to be expected. Regards, --=20 Nicolas Goaziou