From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Maus Subject: Re: [bug] org-link-escape and (wrong-type-argument stringp nil) Date: Thu, 23 Sep 2010 20:40:33 +0200 Message-ID: <87aan8xpzy.wl%dmaus@ictsoc.de> References: <87tylkwpq0.fsf@mundaneum.com> <87mxrc1bwj.wl%dmaus@ictsoc.de> <87sk14rz3t.fsf@gmx.de> <87sk128cuk.wl%dmaus@ictsoc.de> <87fwx1yhw5.fsf@gmx.de> Mime-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: multipart/mixed; boundary="===============1360829712==" Return-path: Received: from [140.186.70.92] (port=59042 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Oyqio-0001rM-7A for emacs-orgmode@gnu.org; Thu, 23 Sep 2010 14:40:43 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1Oyqim-00043w-Gj for emacs-orgmode@gnu.org; Thu, 23 Sep 2010 14:40:42 -0400 Received: from mysql1.xlhost.de ([213.202.242.106]:33105) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Oyqim-00043R-8H for emacs-orgmode@gnu.org; Thu, 23 Sep 2010 14:40:40 -0400 In-Reply-To: <87fwx1yhw5.fsf@gmx.de> List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Sebastian Rose Cc: David Maus , emacs-orgmode@gnu.org, =?UTF-8?B?U8OpYmFzdGllbg==?= Vauban --===============1360829712== Content-Type: multipart/signed; boundary="pgp-sign-Multipart_Thu_Sep_23_20:40:32_2010-1"; micalg=pgp-sha256; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit --pgp-sign-Multipart_Thu_Sep_23_20:40:32_2010-1 Content-Type: multipart/mixed; boundary="Multipart_Thu_Sep_23_20:40:32_2010-1" --Multipart_Thu_Sep_23_20:40:32_2010-1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Sebastian Rose wrote: >David Maus writes: >> Sebastian Rose wrote: >>>Is there a reason for this distinction between multibyte and unibyte? >>>I favour the "shotgun-approach" if not. It's bullet-proof. >> >>>The JavaScript function `encodeURIComponent()' encodes the German Umlaut >>>`=C3=BC' as `%C3%B6' regardless of the sources encoding actually. That'= s why >>>I wrote the two functions `org-protocol-unhex-string' and >>>`org-protocol-unhex-compound' (s. org-protocol.el). >> >> Ah, yes. From my understandig of the RFC %C3%BC is a valid >> representation of the "=C3=BC" character. =20 >> >> I do not yet fully understand >> how to unescape such a representation. E.g. Is %C3%BC a hexencoded >> multibyte char or a succession of two singlebyte chars? >It's a hexencoded multibyte char. >JavaScript implementations seem to turn non-ascii singlebyte chars >into multibyte chars first, then encode the result. >This means if a page is iso-8859-1 encoded (singlebyte `=C3=BC'), >JavaScript will recode the `=C3=BC'. It's funny, but that's what I found >when writing org-protocol.el >`org-protocol-unhex-string' and `org-protocol-unhex-compound' decode >such a representation. >The trick is in the utf-8 encoding itself. If a byte starts with a 1, >another byte will follow. The number of leading `1's denotes the amount >of bytes used for one character. On a GNU/Linux system try > sh$ man utf-8 Thanks! I finally get a grip on one of my personal nightmares. The attached patch is the first step in this direction: It modifies the algorithm of `org-link-escape', now iterating over the input string with `mapconcat' and escaping all characters in the escape table or are between 127 and 255. I'll try to figure out the escaping/unescaping of multibyte characters next. Sent as a patch because of it's possible side-effects: The new algorithm ignores the cdr of the escape table cons -- Thus things will break if they use this function for anything else then percent escaping. Best, -- David --=20 OpenPGP... 0x99ADB83B5A4478E6 Jabber.... dmjena@jabber.org Email..... dmaus@ictsoc.de --Multipart_Thu_Sep_23_20:40:32_2010-1 Content-Type: text/plain; type=patch; charset=US-ASCII Content-Disposition: attachment; filename="0001-New-algorithm-for-percent-escaping.patch" Content-Transfer-Encoding: base64 RnJvbSA4MjA5Y2I4MzFkMGQzODdkMDNiMTA0MTYyMzVkMjkxMGE3NGY4MGY3IE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBEYXZpZCBNYXVzIDxkbWF1c0BpY3Rzb2MuZGU+CkRhdGU6IFRo dSwgMjMgU2VwIDIwMTAgMjA6MzA6MTMgKzAyMDAKU3ViamVjdDogW1BBVENIXSBOZXcgYWxnb3Jp dGhtIGZvciBwZXJjZW50IGVzY2FwaW5nCgoqIG9yZy5lbCAob3JnLWxpbmstZXNjYXBlKTogTmV3 IGFsZ29yaXRobSBmb3IgcGVyY2VudCBlc2NhcGluZy4KCkludGVyYXRlIG92ZXIgVEVYVCBhbmQg cmVwbGFjZSBjaGFycyB0aGF0IGFyZSBpbiBUQUJMRSBvciBhcmUKbm9uLUFTQ0lJIHNpbmdsZSBi eXRlIGNoYXJhY3RlcnMuICBNdWx0aWJ5dGUgY2hhcmFjdGVycyBhcmUgbGVmdAp1bnRvdWNoZWQu Ci0tLQogbGlzcC9vcmcuZWwgfCAgIDE2ICsrKysrLS0tLS0tLS0tLS0KIDEgZmlsZXMgY2hhbmdl ZCwgNSBpbnNlcnRpb25zKCspLCAxMSBkZWxldGlvbnMoLSkKCmRpZmYgLS1naXQgYS9saXNwL29y Zy5lbCBiL2xpc3Avb3JnLmVsCmluZGV4IGQ3YWEzZDIuLjJjM2YxYjcgMTAwNjQ0Ci0tLSBhL2xp c3Avb3JnLmVsCisrKyBiL2xpc3Avb3JnLmVsCkBAIC04NDkxLDE3ICs4NDkxLDExIEBAIFRoaXMg aXMgdGhlIGxpc3QgdGhhdCBpcyB1c2VkIGJlZm9yZSBoYW5kaW5nIG92ZXIgdG8gdGhlIGJyb3dz ZXIuIikKICAgKGlmIChhbmQgb3JnLXVybC1lbmNvZGluZy11c2UtdXJsLWhleGlmeSAobm90IHRh YmxlKSkKICAgICAgICh1cmwtaGV4aWZ5LXN0cmluZyB0ZXh0KQogICAgIChzZXRxIHRhYmxlIChv ciB0YWJsZSBvcmctbGluay1lc2NhcGUtY2hhcnMpKQotICAgICh3aGVuIHRleHQKLSAgICAgIChs ZXQgKChyZSAobWFwY29uY2F0IChsYW1iZGEgKHgpIChyZWdleHAtcXVvdGUKLQkJCQkJKGNoYXIt dG8tc3RyaW5nIChjYXIgeCkpKSkKLQkJCSAgIHRhYmxlICJcXHwiKSkpCi0JKHdoaWxlIChzdHJp bmctbWF0Y2ggcmUgdGV4dCkKLQkgIChzZXRxIHRleHQKLQkJKHJlcGxhY2UtbWF0Y2gKLQkJIChj ZHIgKGFzc29jIChzdHJpbmctdG8tY2hhciAobWF0Y2gtc3RyaW5nIDAgdGV4dCkpCi0JCQkgICAg IHRhYmxlKSkKLQkgICAgICAgdCB0IHRleHQpKSkKLQl0ZXh0KSkpKQorICAgIChtYXBjb25jYXQg KGxhbWJkYSAoYykKKwkJIChpZiAob3IgKGFzc29jIGMgdGFibGUpCisJCQkgKGFuZCAoPiBjIDEy NikgKDwgYyAyNTUpKSkKKwkJICAgICAoZm9ybWF0ICIlJSVYIiBjKQorCQkgICAoY2hhci10by1z dHJpbmcgYykpKSB0ZXh0ICIiKSkpCiAKIChkZWZ1biBvcmctbGluay11bmVzY2FwZSAodGV4dCAm b3B0aW9uYWwgdGFibGUpCiAgICJSZXZlcnNlIHRoZSBhY3Rpb24gb2YgYG9yZy1saW5rLWVzY2Fw ZScuIgotLSAKMS43LjEKCg== --Multipart_Thu_Sep_23_20:40:32_2010-1-- --pgp-sign-Multipart_Thu_Sep_23_20:40:32_2010-1 Content-Type: application/pgp-signature Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iF4EABEIAAYFAkybnyAACgkQma24O1pEeOaDaQD7BF4zmfTsLINEM1ngtlZ1+i82 teRuMyqhCfSboFTsBh8BAJQoRtCOb/ktyYX4N73F1ypQiTEGnkbrlG00y6oDa78p =s262 -----END PGP SIGNATURE----- --pgp-sign-Multipart_Thu_Sep_23_20:40:32_2010-1-- --===============1360829712== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Emacs-orgmode mailing list Please use `Reply All' to send replies to the list. Emacs-orgmode@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-orgmode --===============1360829712==--