From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Maus Subject: Re: [bug] org-link-escape and (wrong-type-argument stringp nil) Date: Sun, 26 Sep 2010 20:22:46 +0200 Message-ID: <87bp7kicuh.wl%dmaus@ictsoc.de> References: <87tylkwpq0.fsf@mundaneum.com> <87mxrc1bwj.wl%dmaus@ictsoc.de> <87sk14rz3t.fsf@gmx.de> <87sk128cuk.wl%dmaus@ictsoc.de> <87fwx1yhw5.fsf@gmx.de> <87aan8xpzy.wl%dmaus@ictsoc.de> <87d3s48c8c.fsf@gmx.de> Mime-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: multipart/mixed; boundary="===============1406799023==" Return-path: Received: from [140.186.70.92] (port=49619 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OzvsE-0000XA-43 for emacs-orgmode@gnu.org; Sun, 26 Sep 2010 14:22:55 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OzvsC-0006Wd-3e for emacs-orgmode@gnu.org; Sun, 26 Sep 2010 14:22:53 -0400 Received: from mysql1.xlhost.de ([213.202.242.106]:59152) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OzvsB-0006WS-Tz for emacs-orgmode@gnu.org; Sun, 26 Sep 2010 14:22:52 -0400 In-Reply-To: <87d3s48c8c.fsf@gmx.de> List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Sebastian Rose Cc: David Maus , emacs-orgmode@gnu.org, =?UTF-8?B?U8OpYmFzdGllbg==?= Vauban --===============1406799023== Content-Type: multipart/signed; boundary="pgp-sign-Multipart_Sun_Sep_26_20:22:46_2010-1"; micalg=pgp-sha256; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit --pgp-sign-Multipart_Sun_Sep_26_20:22:46_2010-1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Sebastian Rose wrote: >David Maus writes: >>> sh$ man utf-8 >> >> Thanks! I finally get a grip on one of my personal nightmares. >It's not that bad, is it? :D Even better: It makes sense ;) >> The attached patch is the first step in this direction: It modifies >> the algorithm of `org-link-escape', now iterating over the input >> string with `mapconcat' and escaping all characters in the escape >> table or are between 127 and 255. >Between 128 (1000 0000) and 255 ?? >The binary representation of 127 is 0111 1111 and valid ascii char. DEL >actually (sh$ man ascii) Right, and that's why it is encoded: No control characters in a URI. The final algorithm for the shiny new unicode aware percent encoding function would be: - percent encode all characters in TABLE - percent encode all characters below 32 and above 126 - encode the char in utf-8 - percent escape all bytes of the encoded char The remaining problem is keeping backward compatibility. There are Org files out there where "=C3=A1" is encoded as "%E1" and not "%C3A1". The percent decoding function should be able to recognize these old escapes and return the right value. =20 I looks like this could be done by changing the behavior of `org-protocol-unhex-string'. Currently it returns the empty string for "%E1" because it does not represent a valid utf-8 encoded unicode char. Maybe we could say: If the percent encoded sequence does not form a valid char, use the old method (extended ASCII?) to decode the sequences. Sadly (or luckily?) chances are good that I will be somewhat offline for the next two weeks -- I think implementing this unicode aware escaping function should be the way to go but requires some careful checking for it's consequences for old Org files. Best, -- David --=20 OpenPGP... 0x99ADB83B5A4478E6 Jabber.... dmjena@jabber.org Email..... dmaus@ictsoc.de --pgp-sign-Multipart_Sun_Sep_26_20:22:46_2010-1 Content-Type: application/pgp-signature Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iF4EABEIAAYFAkyfj3YACgkQma24O1pEeOZFnAD+LNl8AESM/I+2Ci2AFV4YV8KK +HJrbCjl/a61nvJxnD4A/jWIgAhpHDmL8qBo3sCReIongnrKvZuY1Gujephr9cmU =aNqI -----END PGP SIGNATURE----- --pgp-sign-Multipart_Sun_Sep_26_20:22:46_2010-1-- --===============1406799023== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Emacs-orgmode mailing list Please use `Reply All' to send replies to the list. Emacs-orgmode@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-orgmode --===============1406799023==--