emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: "Mark E. Shoulson" <mark@shoulson.com>
To: org-mode mailing list <emacs-orgmode@gnu.org>
Subject: Re: Spaces in bare URLs?
Date: Wed, 18 Mar 2020 16:25:15 -0400	[thread overview]
Message-ID: <b918e406-d5d5-8386-fefb-15cafcb65edf@kli.org> (raw)
In-Reply-To: <875zf2gfqh.fsf@nicolasgoaziou.fr>

On 3/18/20 5:43 AM, Nicolas Goaziou wrote:
> Hello,
>
> "Mark E. Shoulson" <mark@shoulson.com> writes:
>
>> So... what is one supposed to do about spaces in URLs?
>> When they're in [[link format]], with or without a description, it's no problem, but org-mode has a long tradition of support for "bare" URLs too.  We're used to being able to type a URL or other link format
>> and have it work, right?  And that doesn't seem (to me) to be a thing
>> that we'd want to abandon.
>>
>> In org-mode 9.1.9, I can type "info:elisp#Syntactic%20Font%20Lock" and it'd work.  (Maybe not the greatest example, since %-encoding is seen more with http-based URIs, but still).  The
>> percent-encoding is well-established and reliable
> Unfortunately, that wasn't reliable. As it is not idempotent, you can
> never know how many times you need to decode an URL before sending it.

Well, any form of escaping is pretty much by definition not idempotent.  
That's the whole point of escaping: you have something you can't say, so 
you make some magical character that changes the meaning of nearby 
characters so you can describe it in characters you can't say.  And the 
price you pay is that now you can no longer say your magical character 
plain, you have to use another form of escaping to express it (usually 
the same form as the others).  It's like how it's impossible to compress 
*every* file to make it smaller and some even have to get bigger.  The 
pigeonhole principle shows _why_ it isn't possible, and escaping shows 
(one way) _how_ it isn't: say you use high-ascii bytes to represent 
common strings or something.  How do you represent them when they're 
really in the text?  You have to escape them... which makes your file 
*larger*.

> The thing is URL encoding is not for human consumption, i.e., we
> shouldn't have to deal with it.
This is a good point.  While on one hand it makes sense to be able to 
type URLs that have spaces in them without spaces, it is sort of 
ridiculous to expect users feel "natural" about typing "%20" instead.  
(I think this is why the specs say that you can also escape a space by 
using the "+" character, in order to make it easier for this most-common 
of characters... but that weird exception has caused all kinds of 
hassles in code from that day to this; I know from my own experience.)
>> and you can *count* on it when nothing else works, because you can
>> always fall back on plain ascii.
> Current backslash escaping is also well established, and as much
> ASCII-like as anyone would expect.

Really?  As ASCII-like as I could expect?  What if my URL is 
https://he.wikipedia.com/שלום_עליכם ?  If I am in some backward 
environment (still all too common) where all I can rely on is ASCII, I 
can percent-encode the UTF-8 representation and it will work.  Can we 
count on being able to backslash-quote things clear down to ASCII?  I 
don't see a way in the docs I've seen.

>> But that won't work in org-mode 9.3.6.  Nor will
>> "info:elisp#Syntactic Font Lock" or "info:elisp#Syntactic\ Font\ Lock"
>> or any other variant I've tried, short of putting it inside [[]]s or
>> <>s (in other words, no longer using a bare URL).
> True, but that's a minor annoyance.
>
> You apparently prefer to encode a URL manually, replacing each space
> with %20 (and other characters with more baroque escape sequences),
> rather than adding <...> (or [[...]]) around it and be done with it.
> Perhaps this one was the bad idea, after all?

Yes, using <>s works, as does [[]].  And yes, I do have to concede that 
claiming it should be "natural" for a user to hand-escape things with 
%20s is sort of ridiculous.  Having to reprocess all old org-files for 
such a common notation still seems like more trouble than it was worth, 
but then you didn't ask me (and you were QUITE RIGHT not to do so!)  I 
guess a converter-script should also enclose bare URLs in <>, at least 
if they have spaces or other whitespace.

Still don't know about org-protocol and store-link, because I'm lazy.  
Right now, at least some of the emacsen I'm working with still use 
org-9.1.9, so I haven't converted anything.

~mark

      reply	other threads:[~2020-03-18 20:25 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-18  2:14 Spaces in bare URLs? Mark E. Shoulson
2020-03-18  9:43 ` Nicolas Goaziou
2020-03-18 20:25   ` Mark E. Shoulson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b918e406-d5d5-8386-fefb-15cafcb65edf@kli.org \
    --to=mark@shoulson.com \
    --cc=emacs-orgmode@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).