emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: David Maus <dmaus@ictsoc.de>
To: nicholas.dokos@hp.com
Cc: David Maus <dmaus@ictsoc.de>, Jeff Horn <jrhorn424@gmail.com>,
	Org-mode ml <emacs-orgmode@gnu.org>
Subject: Re: [bug] Org link dialog escapes URL spaces incorrectly
Date: Sun, 06 Nov 2011 15:48:44 +0100	[thread overview]
Message-ID: <87hb2hcnn7.wl%dmaus@ictsoc.de> (raw)
In-Reply-To: <23704.1320507536@alphaville.dokosmarshall.org>

[-- Attachment #1: Type: text/plain, Size: 4650 bytes --]

At Sat, 05 Nov 2011 11:38:56 -0400,
Nick Dokos wrote:
>
> David Maus <dmaus@ictsoc.de> wrote:
>
> > At Fri, 04 Nov 2011 14:25:42 -0400,
> > Nick Dokos wrote:
> > >
> > > Nick Dokos <nicholas.dokos@hp.com> wrote:
> > >
> > > > It probably does, but that's probably not the best place to do it: it might be
> > > > better to do it in the (setq link on line 9090 or thereabouts. Otherwise, in
> > > > the *other* case (editing the link at point), we'll end up unescaping twice:
> > > > probably not a problem, since unescaping should be idempotent (in contrast to
> > > > escaping ;-) ) but why do it twice?
> > > >
> > >
> > > Brian Wightman pointed out to me that the idempotent part of the
> > > statement above is definitely wrong (d'oh). The original URL that Jeff
> > > Horn posted, when unescaped once, would be completely free of % signs.
> > > But if the second (doubly-escaped) form is pasted into the minibuffer,
> > > then unescaping once would not be enough. So I presume the thing to do
> > > is to take the URL and unescape it repeatedly until it loses all
> > > escapes, and then escape it *once* before inserting it in the org
> > > buffer.
> > >
> > > Sounds icky, kludgy, dirty. The question is: 1) is it a solution?
> > > and 2) is there a better one?
> >
> > No, this wouldn't be a solution. Consider a link with the sequence
> > %2525 -- Unescape until no more escapes (or rather "escapes") will
> > produce a single `%', not %25. Either escape once, or not at all.
> >
> > What roughly happens is this:
> >
> > 1. The user enters a link via `org-insert-link'
> > 2. Org escapes the link and writes it to the buffer
> > 3. The user opens the link with `org-open-at-poin'
> > 4. Org reads the link from the buffer and unescapes it
> > 5. The link gets escaped and passed to the cosuming application (i.e. browser)
> >
> > For steps 2 and 4 it is guaranteed that
> >
> > (string= link (org-link-unescape (org-link-escape link)))
> >
> > Thus, the problem is not in 2 or 4, but in 1 or 5.
> >
> > Step 5 assumes, that a link entered by the user in step 1 was an
> > unescaped link and thus needs escaping before it is passed to the
> > cosuming application. If you enter a link in step 1 that already is
> > escaped, this assumption fails and you'll end up with a double-escaped
> > link that is passed to the consumer.
> >
> > In other words, the question is: How to decide whether an arbitrary
> > URL is percent-escaped or not?
> >
> > Now here's the problem: You can't. Is
> >
> > "http://example.tld/foo%40bar"
> >
> > already escaped or not? You can't tell for sure. It depends on the
> > application you copied the link from.[1]
> >
> > What we could do in step 5 is... guess. If the (unescaped) link
> > produced by step 4 does contain characters that need escaping, we
> > escape the link. Otherwise we don't.
> >
> > Not quiet sure about the impact of such a change.
> >
> > Best,
> >  -- David
> >
> > [1] Even worse: It may even depend on /how/ or /where/ you copied the
> > link. E.g. the link to a wikipedia page about set theory is copied as
> >
> > http://de.wikipedia.org/wiki/Menge_%28Mathematik%29
> >
> > if C-c'ed from the address bar but copied as
> >
> > http://de.wikipedia.org/wiki/Menge_(Mathematik)
> >
> > if C-c'ed via "Copy link to clipboard" at another page (Iceweasel
> > 3.6.23).
>
> What a mess - thanks for the clear explanation. Be that as it may, the
> treatment of links in org-insert-link is inconsistent: in one case, the
> link is unescaped once, in the other not at all.

Hm. I still don't see an inconsistency here: `org-insert-link' reads a
link, `org-open-at-point' calls the target application with exactly
this link but applies percent escaping.

Maybe we could drop this last escaping and push the responsibilty of
handling non-conformant URLs to the target application. After all: If
the user enters an unescaped link and the target application can't
handle it, it's the users fault.

> So I presume that a single unescaping is desirable (it would
> e.g. deal with both wikipedia links correctly). But there will be
> corner cases where this will fail and some manual editing of the
> link will be necessary,

If you mean unescape the user input in step 1 (`org-insert-link'):
This could work, too. Besides the edge cases it requires some care to
stored links (via `org-store-links') -- the percent character is a
valid character for certain link types (e.g. message ids).

Personally I start to like the idea of just dopping the escaping
before calling the target application.

Best,
  -- David
--
OpenPGP... 0x99ADB83B5A4478E6
Jabber.... dmjena@jabber.org
Email..... dmaus@ictsoc.de

[-- Attachment #2: Type: application/pgp-signature, Size: 230 bytes --]

  reply	other threads:[~2011-11-06 14:48 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-02  2:02 [bug] Org link dialog escapes URL spaces incorrectly Jeff Horn
2011-11-04 15:59 ` Jeff Horn
2011-11-04 16:14   ` Nick Dokos
2011-11-04 16:33   ` Nick Dokos
2011-11-04 16:52     ` Nick Dokos
     [not found]       ` <CALbR623mDyhC=LRAQBW9-vJxhsFWR0L+xFOKnE3eq5Wn92y04g@mail.gmail.com>
2011-11-04 18:19         ` Nick Dokos
2011-11-04 18:25       ` Nick Dokos
2011-11-05 14:04         ` David Maus
2011-11-05 15:38           ` Nick Dokos
2011-11-06 14:48             ` David Maus [this message]
2011-11-06 22:29               ` Nick Dokos
2011-11-13  9:12                 ` David Maus
2011-11-13 17:32                   ` Nick Dokos
2011-12-15  1:30                     ` Jeff Horn
2011-12-18 17:13                       ` David Maus

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87hb2hcnn7.wl%dmaus@ictsoc.de \
    --to=dmaus@ictsoc.de \
    --cc=emacs-orgmode@gnu.org \
    --cc=jrhorn424@gmail.com \
    --cc=nicholas.dokos@hp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).