From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sebastian Rose Subject: Re: [bug] org-link-escape and (wrong-type-argument stringp nil) Date: Sun, 26 Sep 2010 23:23:22 +0200 Message-ID: <87lj6oi4hh.fsf@gmx.de> References: <87tylkwpq0.fsf@mundaneum.com> <87mxrc1bwj.wl%dmaus@ictsoc.de> <87sk14rz3t.fsf@gmx.de> <87sk128cuk.wl%dmaus@ictsoc.de> <87fwx1yhw5.fsf@gmx.de> <87aan8xpzy.wl%dmaus@ictsoc.de> <87d3s48c8c.fsf@gmx.de> <87bp7kicuh.wl%dmaus@ictsoc.de> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from [140.186.70.92] (port=36743 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Ozyh1-0004Nb-1l for emacs-orgmode@gnu.org; Sun, 26 Sep 2010 17:23:32 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1Ozygz-0002U9-Pi for emacs-orgmode@gnu.org; Sun, 26 Sep 2010 17:23:30 -0400 Received: from mailout-de.gmx.net ([213.165.64.23]:55549 helo=mail.gmx.net) by eggs.gnu.org with smtp (Exim 4.69) (envelope-from ) id 1Ozygz-0002U2-DZ for emacs-orgmode@gnu.org; Sun, 26 Sep 2010 17:23:29 -0400 In-Reply-To: <87bp7kicuh.wl%dmaus@ictsoc.de> (David Maus's message of "Sun, 26 Sep 2010 20:22:46 +0200") List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: David Maus Cc: =?utf-8?Q?S=C3=A9bastien?= Vauban , emacs-orgmode@gnu.org >>The binary representation of 127 is 0111 1111 and valid ascii char. DEL >>actually (sh$ man ascii) > > Right, and that's why it is encoded: No control characters in a URI. Great ! :) > The final algorithm for the shiny new unicode aware percent encoding > function would be: > > - percent encode all characters in TABLE > - percent encode all characters below 32 and above 126 > - encode the char in utf-8 > - percent escape all bytes of the encoded char > > The remaining problem is keeping backward compatibility. There are Org > files out there where "=C3=A1" is encoded as "%E1" and not "%C3A1". The > percent decoding function should be able to recognize these old > escapes and return the right value.=20=20 > > I looks like this could be done by changing the behavior of > `org-protocol-unhex-string'. Currently it returns the empty string > for "%E1" because it does not represent a valid utf-8 encoded unicode > char. Maybe we could say: If the percent encoded sequence does not > form a valid char, use the old method (extended ASCII?) to decode the > sequences. Well, yes. The function _should_ return something if the end of the string is reached or something else but a `%' is found. I'll have to find out where the function has to look up the correct char. 167 will be a different character for different encodings. This will not handle cases like `Gr=C3=B6=C3=9Fe' though. Are there cases where strings are encoded the way you showed above, and decoded using `org-unhex-string'? Sebastian