emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: David Maus <dmaus@ictsoc.de>
To: Nicolas Goaziou <n.goaziou@gmail.com>
Cc: Nick Dokos <ndokos@gmail.com>,
	emacs-orgmode@gnu.org, David Maus <dmaus@ictsoc.de>
Subject: Re: Encoding Problem in export?
Date: Thu, 25 Jul 2013 06:05:24 +0200	[thread overview]
Message-ID: <87y58vp9mj.wl%dmaus@ictsoc.de> (raw)
In-Reply-To: <877ggg5i5q.fsf@gmail.com>

Hi Nicolas,
Hi Nick,

At Wed, 24 Jul 2013 13:09:05 +0200,
Nicolas Goaziou wrote:
> 
> Hello,
> 
> Nick Dokos <ndokos@gmail.com> writes:
> 
> > Maybe the thing to do is to delete '=' from org-link-escape-chars and
> > see what problems arise.
> 
> AFAICT, `url-encode-url' is subtler than that. It encodes characters
> whenever they are really forbidden, which is not the case of
> `org-link-escape'. Hence my initial question: do we need to reinvent the
> wheel?
> 
> > But I did find that '%' was originally in org-link-escape-chars and
> > David Maus hardcoded it (commit 139cc1d4), so that it is *always*
> > escaped.
> 
> I Cc David Maus in case he has time to enlighten us about his choice.
>

IIRC org-link-escape is not used to create URLs but to escape
characters in a link that would otherwise conflict with Orgmode syntax
(e.g. square brackets). Org applies percent escaping to a link before
it is stored in the buffer and applies unescaping when it reads a link
back.

The percent sign is hardcoded because if org-link-escape/unescape is
used in this way we must make sure that the identity of a link is
preserved. If we would *not* escape the percent sign, then an original
link with percent encoded characters would be read back wrongly,
i.e. with the percent escaped characters unescaped.

This broke links.

E.g. consider a redirector link to the target url
`http://target.example.org?id=33&format=html"':

,----
| http://redirect.example.org?url=http%3A%2F%2Ftarget.example.org%3Fid%3D33%26format%3Dhtml
`----

If we don't escape the percent sign but apply unescaping when, say,
the user opens the link we would get:

,----
| http://redirect.example.org?url=http://target.example.org?id=33&format=html
`----

And voila: The `format' parameter is turned into a query parameter of
redirect.example.org, not target.example.org.

The specs (RFC3986) have to say the following about escaping:

,----
|    Because the percent ("%") character serves as the indicator for
|    percent-encoded octets, it must be percent-encoded as "%25" for that
|    octet to be used as data within a URI.  Implementations must not
|    percent-encode or decode the same string more than once, as decoding
|    an already decoded string might lead to misinterpreting a percent
|    data octet as the beginning of a percent-encoding, or vice versa in
|    the case of percent-encoding an already percent-encoded string.
`----

There is, of course, the nasty thing that we don't know if the link in
a buffer went through org-link-escape or not. E.g. if you paste

,----
| [[http://redirect.example.org?url=http%3A%2F%2Ftarget.example.org%3Fid%3D33%26format%3Dhtml]]
`----

into the buffer you'll get a broken link because org-link-open assumes
the link to be escaped by org.

The bottom-line: Org creates link programmatically (org-store-link)
and needs a mechanism to protected conflicting characters. It chose
percent-escaping and in order to preserve the identity of a link Org
has to escape the escape-character.

Hope that helps!

Best,
  -- David
-- 
OpenPGP... 0x99ADB83B5A4478E6
Jabber.... dmjena@jabber.org
Email..... dmaus@ictsoc.de

  reply	other threads:[~2013-07-25  4:05 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-23 23:17 Encoding Problem in export? Robert Eckl
2013-07-23 23:35 ` Nicolas Goaziou
2013-07-24  1:50   ` Robert Eckl
2013-07-24  7:34     ` Nicolas Goaziou
2013-07-24  8:46       ` Robert Eckl
2013-07-24  9:16         ` Nicolas Goaziou
2013-07-24 10:27           ` Robert Eckl
2013-07-24  9:39       ` Nick Dokos
2013-07-24 11:09         ` Nicolas Goaziou
2013-07-25  4:05           ` David Maus [this message]
2013-07-25 21:46             ` Nicolas Goaziou
2013-07-26  4:03               ` David Maus
2013-07-26 10:20                 ` Nicolas Goaziou
2013-07-27  7:23                   ` David Maus
2013-07-27 11:09                     ` Nicolas Goaziou
2013-07-28  8:36                       ` Jambunathan K
2013-07-28  8:54                         ` Jambunathan K
2013-07-28 11:16                         ` David Maus
2013-07-28 11:22                         ` Nicolas Goaziou
2013-07-29  6:59                           ` Jambunathan K
2013-11-16 15:16       ` Michael Brand
2013-11-16 20:43         ` Nicolas Goaziou
2013-11-17 11:06           ` Michael Brand
2013-11-17 11:46             ` Nicolas Goaziou
2013-11-17 11:51               ` Michael Brand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y58vp9mj.wl%dmaus@ictsoc.de \
    --to=dmaus@ictsoc.de \
    --cc=emacs-orgmode@gnu.org \
    --cc=n.goaziou@gmail.com \
    --cc=ndokos@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).