From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Maus Subject: Re: [bug] org-link-escape and (wrong-type-argument stringp nil) Date: Thu, 04 Nov 2010 21:35:53 +0100 Message-ID: <877hgs25ee.wl%dmaus@ictsoc.de> References: <87tylkwpq0.fsf@mundaneum.com> <87mxrc1bwj.wl%dmaus@ictsoc.de> <87sk14rz3t.fsf@gmx.de> <87sk128cuk.wl%dmaus@ictsoc.de> <87fwx1yhw5.fsf@gmx.de> <87aan8xpzy.wl%dmaus@ictsoc.de> <87d3s48c8c.fsf@gmx.de> <87bp7kicuh.wl%dmaus@ictsoc.de> <8739swi0f0.fsf@gmx.de> Mime-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: multipart/mixed; boundary="===============1082893169==" Return-path: Received: from [140.186.70.92] (port=60122 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PE6Xm-0004zM-LV for emacs-orgmode@gnu.org; Thu, 04 Nov 2010 16:36:23 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PE6Xk-0000Fl-UH for emacs-orgmode@gnu.org; Thu, 04 Nov 2010 16:36:22 -0400 Received: from mailout110.xlhost.de ([213.202.242.110]:34491 helo=mysql1.xlhost.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PE6Xk-0000FW-Lm for emacs-orgmode@gnu.org; Thu, 04 Nov 2010 16:36:20 -0400 In-Reply-To: <8739swi0f0.fsf@gmx.de> List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Sebastian Rose Cc: David Maus , carsten.dominik@gmail.com, emacs-orgmode@gnu.org, =?UTF-8?B?U8OpYmFzdGllbg==?= Vauban --===============1082893169== Content-Type: multipart/signed; boundary="pgp-sign-Multipart_Thu_Nov__4_21:35:41_2010-1"; micalg=pgp-sha256; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit --pgp-sign-Multipart_Thu_Nov__4_21:35:41_2010-1 Content-Type: text/plain; charset=US-ASCII Okay, back to link escaping. What this is about: Current implementation of percent escaping URIs uses a whitelist approach, e.g. only percent escapes characters that are in `org-link-escape-chars' or in a user supplied list. This is a problem because using this function requires knowledge about all possible characters that could occur in a URI -- and URIs are limited to plain ASCII, meaning a call to the function must list literally all possible characters and their escapings to get a properly percent escaped string. To solve this problem the behavior of the function is changed to percent escape every character that is an ASCII controll character or not an ASCII character. Subsequently the unescaping function is changed accordingly to handle percent encoded multibyte unicode characters. 1/ I did some testing with the new proposed org-link-escape and the modified `org-protocol-unhex-string': Create a random string with ASCII and multibyte unicode characters, randomly taken from (ucs-names); perform escape-unescape; compare the result with the original string. Works perfect. Testing randomly created string with old escaping of non-ascii strings is on the list. 2/ Of course there could still be the problem, that a user had created a sequence of old escapes that the new unescaping function will interpret wrongly. Not sure how likely this is, but in theory this could happen. Personally I think we should risk breaking peoples' links in this way. 3/ I highly suggest changing the syntax of `org-link-escape-chars'. Currently it is a list of cons with the character in car and the replacement string in cdr. Using such a table in escaping is easy (assq char table), but in the unescaping process it might get tricky. Moreover if the function should do percent escaping, the escpae sequence is already determined by the string to replace. The new syntax would be simply a list of characters to escape in addition to the rule mentioned above (< 32 and > 126). This would break compatibility with functions that have used org-link-escape/unescape for something else than percent escaping (e.g. replace ] by %FF and not %5D and such). But this again is bearable: Although it the docstring talks about escaping things that are problematic, the only way to do such escaping in a standardized way is percent escaping. 4/ If all agree that breaking backward incompatibility in the case mentioned above (or did I forgot one?) is bearable, I would go ahead and perform the necessary changes: 1. Use the new algorithm in `org-link-escape' 2. Modify Syntax of `org-link-escape-chars' 3. Issue a warning if someone calls `org-link-escape' with a table of the old syntax. 4. Move the unescaping functions from org-protocol.el to org.el and rename them. 5. Declare `org-protocol-unhex-string' and `org-protocol-unhex-compound' obsolete (make-obsolete). 6. Drop a message to the list informing about these changes. 7. Wait some months and purge the obsolete functions. Best, -- David -- OpenPGP... 0x99ADB83B5A4478E6 Jabber.... dmjena@jabber.org Email..... dmaus@ictsoc.de --pgp-sign-Multipart_Thu_Nov__4_21:35:41_2010-1 Content-Type: application/pgp-signature Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iF4EABEIAAYFAkzTGR0ACgkQma24O1pEeObNigEAjS9jnvuTlehLDIxSe0bLqgVG c9XyLmJQ/RlJk9AVgPABAJfaqDAlxuoCJHh2AVNqNXDrh+ksVLwit10Cr52HD03Q =D/kX -----END PGP SIGNATURE----- --pgp-sign-Multipart_Thu_Nov__4_21:35:41_2010-1-- --===============1082893169== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Emacs-orgmode mailing list Please use `Reply All' to send replies to the list. Emacs-orgmode@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-orgmode --===============1082893169==--