emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: David Maus <dmaus@ictsoc.de>
To: Nicolas Goaziou <n.goaziou@gmail.com>
Cc: David Maus <dmaus@ictsoc.de>,
	emacs-orgmode@gnu.org, Nick Dokos <ndokos@gmail.com>
Subject: Re: Encoding Problem in export?
Date: Sat, 27 Jul 2013 09:23:20 +0200	[thread overview]
Message-ID: <87k3kc1n6f.wl%dmaus@ictsoc.de> (raw)
In-Reply-To: <87d2q54o7e.fsf@gmail.com>

At Fri, 26 Jul 2013 12:20:37 +0200,
Nicolas Goaziou wrote:
> 
> David Maus <dmaus@ictsoc.de> writes:
> 
> Thanks for your answer. It seems I got confused with the current state
> of URI-encoding. Please scratch my previous suggestion and let's start
> over.

The more I think about it the more I grow certain that it is NOT about
URI encoding but protecting a string. Unless we parse the URI and know
the protocol we cannot tell if square brackets are allowed or not.

> 
> Alas, there is a serious flaw in the current implementation. As you
> said:
> 
> > There is, of course, the nasty thing that we don't know if the link in
> > a buffer went through org-link-escape or not. E.g. if you paste
> >
> >  ,----
> > | [[http://redirect.example.org?url=http%3A%2F%2Ftarget.example.org%3Fid%3D33%26format%3Dhtml]]
> >  `----
> >
> > into the buffer you'll get a broken link because org-link-open assumes
> > the link to be escaped by org.
> 
> There is, indeed, no easy way to know if a link went through
> `org-link-escape', so we cannot unescape it properly in every situation.
> We could use text properties on escaped links, but that seems awkward.
> 
> I think there is a simpler solution: we never "unescape" links,
> which means that escaping must be at its minimum. For example, we
> could only replace "[" and "]" with, respectively, "%5B" and "%5D"
> and newlines with spaces. It doesn't cripple link's readability very
> mucĥ, and is safe as "[", "]" and "\n" are always forbidden in URI
> anyway.

`[' and `]' are not forbidden per se, they belong to the set of
reserved characters (see RFC 3986, 2.2.).

"characters in the reserved set are protected from normalization and
are therefore safe to be used by scheme-specific and producer-specific
algorithms for delimiting data subcomponents within a URI."
(RFC 3986, p. 12)

Moreover they are explicitly required in the host part to denote a
IPv6 address literal (RFC 3986, 3.2.2).

If I am not mistaken then this is a valid http-URI with a XPointer
fragment pointing to the third `p' element in a locally hosted file:

http://[::1]/foo.xml#xpointer(//p[3])

,----[ http://www.w3.org/TR/xptr-framework/#escaping
| IRI references can be converted to URI references for consumption by
| URI resolvers. The disallowed characters in URI references include all
| non-ASCII characters, plus the excluded characters listed in Section
| 2.4 of [RFC 2396], except for the number sign (#) and percent sign (%)
| and the square bracket characters re-allowed in [RFC 2732]. 
`----

> When sending the URL to the consuming, there will be problems, according
> to the assumption at the beginning of this message. But that is to be
> expected.

If we escape but don't unescape there are *other* problems: Depending
on the protocol an escaped square bracket and a unescaped square
bracket can have different meaning. The assumption I mentioned referes
to unescaped characters. A consuming application knows the protocol
and can infer the characters that need to be escaped.

> Replacing non-ascii characters would make the link unreadable to a
> human. Also, we don't prevent encoding mismatch (e.g., from UTF-8 to
> ISO-8859-1) when yanking regular text in an Org buffer, so there's
> no particular reason to do it for links.

ACK. It's not about creating URIs but protecting strings, thus the
rules for percent escaping don't have to be applied.

Best,
  -- David

-- 
OpenPGP... 0x99ADB83B5A4478E6
Jabber.... dmjena@jabber.org
Email..... dmaus@ictsoc.de

  reply	other threads:[~2013-07-27  7:23 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-23 23:17 Encoding Problem in export? Robert Eckl
2013-07-23 23:35 ` Nicolas Goaziou
2013-07-24  1:50   ` Robert Eckl
2013-07-24  7:34     ` Nicolas Goaziou
2013-07-24  8:46       ` Robert Eckl
2013-07-24  9:16         ` Nicolas Goaziou
2013-07-24 10:27           ` Robert Eckl
2013-07-24  9:39       ` Nick Dokos
2013-07-24 11:09         ` Nicolas Goaziou
2013-07-25  4:05           ` David Maus
2013-07-25 21:46             ` Nicolas Goaziou
2013-07-26  4:03               ` David Maus
2013-07-26 10:20                 ` Nicolas Goaziou
2013-07-27  7:23                   ` David Maus [this message]
2013-07-27 11:09                     ` Nicolas Goaziou
2013-07-28  8:36                       ` Jambunathan K
2013-07-28  8:54                         ` Jambunathan K
2013-07-28 11:16                         ` David Maus
2013-07-28 11:22                         ` Nicolas Goaziou
2013-07-29  6:59                           ` Jambunathan K
2013-11-16 15:16       ` Michael Brand
2013-11-16 20:43         ` Nicolas Goaziou
2013-11-17 11:06           ` Michael Brand
2013-11-17 11:46             ` Nicolas Goaziou
2013-11-17 11:51               ` Michael Brand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k3kc1n6f.wl%dmaus@ictsoc.de \
    --to=dmaus@ictsoc.de \
    --cc=emacs-orgmode@gnu.org \
    --cc=n.goaziou@gmail.com \
    --cc=ndokos@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).