From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Maus Subject: Re: %20 in file://... URL Date: Wed, 24 Nov 2010 21:57:39 +0100 Message-ID: <87fwuq783g.wl%dmaus@ictsoc.de> References: <80ipzofw6j.fsf@gmail.com> Mime-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: multipart/mixed; boundary="===============1221302119==" Return-path: Received: from [140.186.70.92] (port=59768 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PLMPX-0005mk-Hi for emacs-orgmode@gnu.org; Wed, 24 Nov 2010 15:57:52 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PLMPV-0000JD-Sy for emacs-orgmode@gnu.org; Wed, 24 Nov 2010 15:57:51 -0500 Received: from mailout110.xlhost.de ([213.202.242.110]:37103 helo=mysql1.xlhost.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PLMPV-0000Ih-G6 for emacs-orgmode@gnu.org; Wed, 24 Nov 2010 15:57:49 -0500 In-Reply-To: <80ipzofw6j.fsf@gmail.com> List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Vincent =?UTF-8?B?QmVsYcOvY2hl?= Cc: Org mode --===============1221302119== Content-Type: multipart/signed; boundary="pgp-sign-Multipart_Wed_Nov_24_21:57:38_2010-1"; micalg=pgp-sha256; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit --pgp-sign-Multipart_Wed_Nov_24_21:57:38_2010-1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable At Tue, 23 Nov 2010 06:25:40 +0100, Vincent Bela=C3=AFche wrote: >=20 > >From: David Maus > >Subject: Re: [Orgmode] %20 in file://... URL > >Date: Mon, 22 Nov 2010 19:16:09 +0100 > >User-agent: Wanderlust/2.15.9 (Almost Unreal) SEMI/1.14.6 (Maruoka) FLI= M/1.14.9 (Goj=C3=85) APEL/10.8 Emacs/23.2 (i486-pc-linux-gnu) MULE/6.0 (HAN= ACHIRUSATO) > > > >At Mon, 22 Nov 2010 16:46:44 +0100, > >Vincent Bela=C3=AFche wrote: > >> I see, so I understand that you will someday modify a function creating > >> links in order to implement character escaping. I can give a hand if > >> tell me the function name. > > > >To be exact: Org already escapes some characters (C-h v > >org-link-escape-chars RET) and the colon is a candidate for beeing on > >the list. =20 >=20 > What does "already" exactly means ?=20 It means that Org performs percent escaping in some cases but there are currently three problems: 1. The percent escaping/unescaping functions are not unicode aware; 2. The percent escaping/unescaping functions require a user to explicitly tell which characters should be escaped; 3. There is no clear rule in place when to escape/unescape -- that is the problem you'Ve hit. The solution for the 3rd problem is not to modify `org-open-file' but to implement the rule that says:=20 - If a link is written in an Org file, the everything after the scheme (type) is percent escaped. IIRC this is already done for characters that would breake the parser (square brackets) but some chars are missing (the percent sign, obviously). - if a link is read from an Org file and passed to another function, the part after scheme is unescaped. These two rule would cover the problem you face: Although the link is not created by Org, it is unescaped before it is opened, so %3A would expand to ":". >=20 > Ok, you mean that some version of org already does the job, but not the > org that is on the official Git depo ? >=20 Yes, kind of. It's a personal working copy of the offical repository and when I am finished I either merge it into the offical one or ask someone review the changes and "pull" my version into the offical repo. Think of it as the bleeding bleeding edge -- highly unstable, not guaranteed to work at all. >=20 > The following is just comments on the code, most of it is a matter of > taste, which you may well disagree with. >=20 > 1. In the org.el file in the link which you provided I found also these > functions org-entry-protect-space & org-entry-restore-space which > does also some escaping, why not use a unique function Good point. >=20 > 2. In the function org-link-escape, there is a lambda expression =20 >=20 > (lambda (sequence) > (format "%%%.2X" sequence)) >=20 > The argument name should be sequence-element rather than sequence. Dto. Changed it. >=20 > 3. In org-link-unescape, there are 3 substringing-or-concatenations, but > you could make it simpler by a single replace-match and using a start-= position in the > string-match. That would look like this (*not tested*): >=20 > (defun org-link-unescape (str) > "Unhex hexified unicode strings as returned from the JavaScript function > encodeURIComponent. E.g. `%C3%B6' is the german Umlaut `=C3=B6'." > (setq str (or str "")) > (let ((case-fold-search t) > (pos 0)) > (while (string-match "\\(%[0-9a-f][0-9a-f]\\)+" str pos) > (setq pos (+ pos (/ (- (match-end 0) (match-beginning 0)) > 3)) > str (replace-match=20 > (org-link-unescape-compound (upcase (match-string 0 str); hex > )) > t t str)))) > str)) >=20 > My feeling that the kind of code above is slightly simpler in > execution as there is only one string manipulation at each > iteration instead of two, and also easier to maintain as is has > fewer use cases (i.e. it does not really matter if the escaped > sequence is at the end of string or not). You also avoid some > intermediate variables like `replacement' as the use of > replace-match make it self explanatory that the result of > org-link-unescape-compound is a replacement. Agreed, refactoring the unescape functions is on the list. >=20 > 3. in org-link-unescape-compound, =20 >=20 > (remove "" (split-string hex "%")) >=20 >=20 > can be replaced by (cdr (split-string hex "%")) because there is > always only one empty string in the sequence and it is in the 1st > place. Agreed. >=20 > 4. in org-link-unescape-compound, you could have made fewer comparison > by replacing code >=20 > (shift > (if (=3D 0 eat) ;; new byte > (if (>=3D val 252) 6 > (if (>=3D val 248) 5 > (if (>=3D val 240) 4 > (if (>=3D val 224) 3 > (if (>=3D val 192) 2 0))))) > 6)) > (xor > (if (=3D 0 eat) ;; new byte > (if (>=3D val 252) 252 > (if (>=3D val 248) 248 > (if (>=3D val 240) 240 > (if (>=3D val 224) 224 > (if (>=3D val 192) 192 0))))) > 128))) >=20 > by (*not tested*): >=20 > (shift-xor > (if (=3D 0 eat) ;; new byte > (if (>=3D val 252) '(6 . 252) > (if (>=3D val 248) '(5 . 248) > (if (>=3D val 240) '(4 . 240) > (if (>=3D val 224) '(3 . 224) > (if (>=3D val 192) '(2 . 192) '(0. 0)))))) > '(6 . 128))) > (shift (car shift-xor)) > (xor (cdr shift-xor)) >=20 >=20 > the code above looks more concise to me, depending on val it may also > run faster. Okay, I have to look at this suggestion. >=20 > hoping that the above helps. Definitely. Last not least: On this mailing list you should normally Cc: answers to the original poster -- some are not subscribed to the list at all, some (like me) read the list in a different account than their main mail account and miss answers etc. Best and thanks, -- David --=20 OpenPGP... 0x99ADB83B5A4478E6 Jabber.... dmjena@jabber.org Email..... dmaus@ictsoc.de --pgp-sign-Multipart_Wed_Nov_24_21:57:38_2010-1 Content-Type: application/pgp-signature Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iF4EABEIAAYFAkztfEIACgkQma24O1pEeOam0QEAmPYLlF7wZmF1gyV5XNM0M8Ee JQFsVjNdyOcsOVjlJbIA/0TAJyO8MWnHaYc+tZ/F18VZJBQujjr3/EtdveKCSRNa =UmOs -----END PGP SIGNATURE----- --pgp-sign-Multipart_Wed_Nov_24_21:57:38_2010-1-- --===============1221302119== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Emacs-orgmode mailing list Please use `Reply All' to send replies to the list. Emacs-orgmode@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-orgmode --===============1221302119==--