From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?iso-8859-1?Q?Vincent_Bela=EFche?= Subject: Re: %20 in file://... URL Date: Tue, 23 Nov 2010 06:25:40 +0100 Message-ID: <80ipzofw6j.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Return-path: Received: from [140.186.70.92] (port=41445 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PKlO2-0002gH-1r for emacs-orgmode@gnu.org; Tue, 23 Nov 2010 00:25:51 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PKlO0-0004yy-K0 for emacs-orgmode@gnu.org; Tue, 23 Nov 2010 00:25:49 -0500 Received: from smtp09.smtpout.orange.fr ([80.12.242.131]:54229 helo=smtp.smtpout.orange.fr) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PKlO0-0004ym-Aq for emacs-orgmode@gnu.org; Tue, 23 Nov 2010 00:25:48 -0500 List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Org mode Cc: =?iso-8859-1?Q?Vincent_Bela=EFche?= >From: David Maus >Subject: Re: [Orgmode] %20 in file://... URL >Date: Mon, 22 Nov 2010 19:16:09 +0100 >User-agent: Wanderlust/2.15.9 (Almost Unreal) SEMI/1.14.6 (Maruoka) FLIM/= 1.14.9 (Goj=C5) APEL/10.8 Emacs/23.2 (i486-pc-linux-gnu) MULE/6.0 (HANACHIR= USATO) > >At Mon, 22 Nov 2010 16:46:44 +0100, >Vincent Bela=EFche wrote: >> I see, so I understand that you will someday modify a function creating >> links in order to implement character escaping. I can give a hand if >> tell me the function name. > >To be exact: Org already escapes some characters (C-h v >org-link-escape-chars RET) and the colon is a candidate for beeing on >the list.=20=20 What does "already" exactly means ? I pushed the colon '(?: . "%3A") into this org-link-escape-chars list, and I made a trial with a link like this: [[file://localhost/c%3A/msys/1.0/temp/jay.html][link]] I get this message: "if: No such file: //localhost/c%3A/msys/1.0/temp/jay.html", evaluating the full org.el on the link you gave does not make it either because I get the message that org-complete cannot be loaded. >The functions responsible for escaping/unescaping are `org-link-escape' >and `org-link-unescape' and the new implementations of these functions >can be found in > >https://github.com/dmj/dmj-org-mode/tree/feature/org-percent-escaping > Ok, you mean that some version of org already does the job, but not the org that is on the official Git depo ? >The task at hand: Anticipate the consquences of the new implementation. >I.e. what will happen to links created with the old algorithm. > I have no idea of the consequences, I can be a beta tester of it, but for the time being this code does not work with the kind of link which I use. >Patches, ideas, and comments on the modifications are welcome. > The following is just comments on the code, most of it is a matter of taste, which you may well disagree with. 1. In the org.el file in the link which you provided I found also these functions org-entry-protect-space & org-entry-restore-space which does also some escaping, why not use a unique function 2. In the function org-link-escape, there is a lambda expression=20=20 (lambda (sequence) (format "%%%.2X" sequence)) The argument name should be sequence-element rather than sequence. 3. In org-link-unescape, there are 3 substringing-or-concatenations, but you could make it simpler by a single replace-match and using a start-po= sition in the string-match. That would look like this (*not tested*): (defun org-link-unescape (str) "Unhex hexified unicode strings as returned from the JavaScript function encodeURIComponent. E.g. `%C3%B6' is the german Umlaut `=F6'." (setq str (or str "")) (let ((case-fold-search t) (pos 0)) (while (string-match "\\(%[0-9a-f][0-9a-f]\\)+" str pos) (setq pos (+ pos (/ (- (match-end 0) (match-beginning 0)) 3)) str (replace-match=20 (org-link-unescape-compound (upcase (match-string 0 str); hex )) t t str)))) str)) My feeling that the kind of code above is slightly simpler in execution as there is only one string manipulation at each iteration instead of two, and also easier to maintain as is has fewer use cases (i.e. it does not really matter if the escaped sequence is at the end of string or not). You also avoid some intermediate variables like `replacement' as the use of replace-match make it self explanatory that the result of org-link-unescape-compound is a replacement. 3. in org-link-unescape-compound,=20=20 (remove "" (split-string hex "%")) can be replaced by (cdr (split-string hex "%")) because there is always only one empty string in the sequence and it is in the 1st place. 4. in org-link-unescape-compound, you could have made fewer comparison by replacing code (shift (if (=3D 0 eat) ;; new byte (if (>=3D val 252) 6 (if (>=3D val 248) 5 (if (>=3D val 240) 4 (if (>=3D val 224) 3 (if (>=3D val 192) 2 0))))) 6)) (xor (if (=3D 0 eat) ;; new byte (if (>=3D val 252) 252 (if (>=3D val 248) 248 (if (>=3D val 240) 240 (if (>=3D val 224) 224 (if (>=3D val 192) 192 0))))) 128))) by (*not tested*): (shift-xor (if (=3D 0 eat) ;; new byte (if (>=3D val 252) '(6 . 252) (if (>=3D val 248) '(5 . 248) (if (>=3D val 240) '(4 . 240) (if (>=3D val 224) '(3 . 224) (if (>=3D val 192) '(2 . 192) '(0. 0)))))) '(6 . 128))) (shift (car shift-xor)) (xor (cdr shift-xor)) the code above looks more concise to me, depending on val it may also run faster. hoping that the above helps. > >Best, > -- David >--=20 BR, Vincent. [...]