From: David Maus <dmaus@ictsoc.de>
To: "Vincent Belaïche" <vincent.b.1@hotmail.fr>
Cc: Org mode <emacs-orgmode@gnu.org>
Subject: Re: %20 in file://... URL
Date: Wed, 24 Nov 2010 21:57:39 +0100 [thread overview]
Message-ID: <87fwuq783g.wl%dmaus@ictsoc.de> (raw)
In-Reply-To: <80ipzofw6j.fsf@gmail.com>
[-- Attachment #1.1: Type: text/plain, Size: 5984 bytes --]
At Tue, 23 Nov 2010 06:25:40 +0100,
Vincent Belaïche wrote:
>
> >From: David Maus
> >Subject: Re: [Orgmode] %20 in file://... URL
> >Date: Mon, 22 Nov 2010 19:16:09 +0100
> >User-agent: Wanderlust/2.15.9 (Almost Unreal) SEMI/1.14.6 (Maruoka) FLIM/1.14.9 (GojÅ) APEL/10.8 Emacs/23.2 (i486-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO)
> >
> >At Mon, 22 Nov 2010 16:46:44 +0100,
> >Vincent Belaïche wrote:
> >> I see, so I understand that you will someday modify a function creating
> >> links in order to implement character escaping. I can give a hand if
> >> tell me the function name.
> >
> >To be exact: Org already escapes some characters (C-h v
> >org-link-escape-chars RET) and the colon is a candidate for beeing on
> >the list.
>
> What does "already" exactly means ?
It means that Org performs percent escaping in some cases but there
are currently three problems:
1. The percent escaping/unescaping functions are not unicode aware;
2. The percent escaping/unescaping functions require a user to
explicitly tell which characters should be escaped;
3. There is no clear rule in place when to escape/unescape -- that is
the problem you'Ve hit.
The solution for the 3rd problem is not to modify `org-open-file' but
to implement the rule that says:
- If a link is written in an Org file, the everything after the
scheme (type) is percent escaped. IIRC this is already done for
characters that would breake the parser (square brackets) but some
chars are missing (the percent sign, obviously).
- if a link is read from an Org file and passed to another function,
the part after scheme is unescaped.
These two rule would cover the problem you face: Although the link is
not created by Org, it is unescaped before it is opened, so %3A would
expand to ":".
>
> Ok, you mean that some version of org already does the job, but not the
> org that is on the official Git depo ?
>
Yes, kind of. It's a personal working copy of the offical repository
and when I am finished I either merge it into the offical one or ask
someone review the changes and "pull" my version into the offical
repo. Think of it as the bleeding bleeding edge -- highly unstable,
not guaranteed to work at all.
>
> The following is just comments on the code, most of it is a matter of
> taste, which you may well disagree with.
>
> 1. In the org.el file in the link which you provided I found also these
> functions org-entry-protect-space & org-entry-restore-space which
> does also some escaping, why not use a unique function
Good point.
>
> 2. In the function org-link-escape, there is a lambda expression
>
> (lambda (sequence)
> (format "%%%.2X" sequence))
>
> The argument name should be sequence-element rather than sequence.
Dto. Changed it.
>
> 3. In org-link-unescape, there are 3 substringing-or-concatenations, but
> you could make it simpler by a single replace-match and using a start-position in the
> string-match. That would look like this (*not tested*):
>
> (defun org-link-unescape (str)
> "Unhex hexified unicode strings as returned from the JavaScript function
> encodeURIComponent. E.g. `%C3%B6' is the german Umlaut `ö'."
> (setq str (or str ""))
> (let ((case-fold-search t)
> (pos 0))
> (while (string-match "\\(%[0-9a-f][0-9a-f]\\)+" str pos)
> (setq pos (+ pos (/ (- (match-end 0) (match-beginning 0))
> 3))
> str (replace-match
> (org-link-unescape-compound (upcase (match-string 0 str); hex
> ))
> t t str))))
> str))
>
> My feeling that the kind of code above is slightly simpler in
> execution as there is only one string manipulation at each
> iteration instead of two, and also easier to maintain as is has
> fewer use cases (i.e. it does not really matter if the escaped
> sequence is at the end of string or not). You also avoid some
> intermediate variables like `replacement' as the use of
> replace-match make it self explanatory that the result of
> org-link-unescape-compound is a replacement.
Agreed, refactoring the unescape functions is on the list.
>
> 3. in org-link-unescape-compound,
>
> (remove "" (split-string hex "%"))
>
>
> can be replaced by (cdr (split-string hex "%")) because there is
> always only one empty string in the sequence and it is in the 1st
> place.
Agreed.
>
> 4. in org-link-unescape-compound, you could have made fewer comparison
> by replacing code
>
> (shift
> (if (= 0 eat) ;; new byte
> (if (>= val 252) 6
> (if (>= val 248) 5
> (if (>= val 240) 4
> (if (>= val 224) 3
> (if (>= val 192) 2 0)))))
> 6))
> (xor
> (if (= 0 eat) ;; new byte
> (if (>= val 252) 252
> (if (>= val 248) 248
> (if (>= val 240) 240
> (if (>= val 224) 224
> (if (>= val 192) 192 0)))))
> 128)))
>
> by (*not tested*):
>
> (shift-xor
> (if (= 0 eat) ;; new byte
> (if (>= val 252) '(6 . 252)
> (if (>= val 248) '(5 . 248)
> (if (>= val 240) '(4 . 240)
> (if (>= val 224) '(3 . 224)
> (if (>= val 192) '(2 . 192) '(0. 0))))))
> '(6 . 128)))
> (shift (car shift-xor))
> (xor (cdr shift-xor))
>
>
> the code above looks more concise to me, depending on val it may also
> run faster.
Okay, I have to look at this suggestion.
>
> hoping that the above helps.
Definitely.
Last not least: On this mailing list you should normally Cc: answers
to the original poster -- some are not subscribed to the list at all,
some (like me) read the list in a different account than their main
mail account and miss answers etc.
Best and thanks,
-- David
--
OpenPGP... 0x99ADB83B5A4478E6
Jabber.... dmjena@jabber.org
Email..... dmaus@ictsoc.de
[-- Attachment #1.2: Type: application/pgp-signature, Size: 230 bytes --]
[-- Attachment #2: Type: text/plain, Size: 201 bytes --]
_______________________________________________
Emacs-orgmode mailing list
Please use `Reply All' to send replies to the list.
Emacs-orgmode@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-orgmode
next prev parent reply other threads:[~2010-11-24 20:57 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-23 5:25 %20 in file://... URL Vincent Belaïche
2010-11-24 20:57 ` David Maus [this message]
2011-02-12 14:36 ` Bastien
[not found] ` <BLU104-W15A3F7F6097ED8F6D95CEB84210@phx.gbl>
2010-11-29 20:03 ` David Maus
-- strict thread matches above, loose matches on Subject: below --
2010-12-30 5:29 Vincent Belaïche
2010-11-22 15:46 Vincent Belaïche
2010-11-22 18:16 ` David Maus
2011-02-12 15:02 ` Bastien
2010-11-22 15:46 Vincent Belaïche
2010-11-13 6:18 Vincent Belaïche
2010-11-13 6:28 ` Vincent Belaïche
2010-11-14 17:30 ` David Maus
2010-11-17 20:43 ` David Maus
2010-11-17 20:43 ` David Maus
2010-11-05 6:42 Vincent Belaïche
2010-11-05 8:39 ` Giovanni Ridolfi
2010-10-27 21:19 Vincent Belaïche
2010-10-26 5:15 Vincent Belaïche
2010-10-26 15:39 ` Giovanni Ridolfi
2010-10-24 20:49 Vincent Belaïche
2010-10-24 21:02 ` David Maus
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.orgmode.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87fwuq783g.wl%dmaus@ictsoc.de \
--to=dmaus@ictsoc.de \
--cc=emacs-orgmode@gnu.org \
--cc=vincent.b.1@hotmail.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).