From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Mark E. Shoulson" Subject: Re: Spaces in bare URLs? Date: Wed, 18 Mar 2020 16:25:15 -0400 Message-ID: References: <78f598f3-44e1-63aa-751a-49c2f7208fe7@kli.org> <875zf2gfqh.fsf@nicolasgoaziou.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:470:142:3::10]:57288) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jEfFe-0007tK-9J for emacs-orgmode@gnu.org; Wed, 18 Mar 2020 16:25:19 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jEfFc-00040U-CW for emacs-orgmode@gnu.org; Wed, 18 Mar 2020 16:25:18 -0400 Received: from pi.meson.org ([96.56.207.26]:53706) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1jEfFc-0003xX-86 for emacs-orgmode@gnu.org; Wed, 18 Mar 2020 16:25:16 -0400 Received: from nagas.meson.org (nagas [192.168.2.101]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by pi.meson.org (Postfix) with ESMTPS id A018222007D for ; Wed, 18 Mar 2020 16:25:15 -0400 (EDT) In-Reply-To: <875zf2gfqh.fsf@nicolasgoaziou.fr> Content-Language: en-US List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane-mx.org@gnu.org Sender: "Emacs-orgmode" To: org-mode mailing list On 3/18/20 5:43 AM, Nicolas Goaziou wrote: > Hello, > > "Mark E. Shoulson" writes: > >> So... what is one supposed to do about spaces in URLs? >> When they're in [[link format]], with or without a description, it's n= o problem, but org-mode has a long tradition of support for "bare" URLs t= oo. We're used to being able to type a URL or other link format >> and have it work, right? And that doesn't seem (to me) to be a thing >> that we'd want to abandon. >> >> In org-mode 9.1.9, I can type "info:elisp#Syntactic%20Font%20Lock" and= it'd work. (Maybe not the greatest example, since %-encoding is seen mo= re with http-based URIs, but still). The >> percent-encoding is well-established and reliable > Unfortunately, that wasn't reliable. As it is not idempotent, you can > never know how many times you need to decode an URL before sending it. Well, any form of escaping is pretty much by definition not idempotent.=C2= =A0=20 That's the whole point of escaping: you have something you can't say, so=20 you make some magical character that changes the meaning of nearby=20 characters so you can describe it in characters you can't say.=C2=A0 And = the=20 price you pay is that now you can no longer say your magical character=20 plain, you have to use another form of escaping to express it (usually=20 the same form as the others).=C2=A0 It's like how it's impossible to comp= ress=20 *every* file to make it smaller and some even have to get bigger.=C2=A0 T= he=20 pigeonhole principle shows _why_ it isn't possible, and escaping shows=20 (one way) _how_ it isn't: say you use high-ascii bytes to represent=20 common strings or something.=C2=A0 How do you represent them when they're= =20 really in the text?=C2=A0 You have to escape them... which makes your fil= e=20 *larger*. > The thing is URL encoding is not for human consumption, i.e., we > shouldn't have to deal with it. This is a good point.=C2=A0 While on one hand it makes sense to be able t= o=20 type URLs that have spaces in them without spaces, it is sort of=20 ridiculous to expect users feel "natural" about typing "%20" instead.=C2=A0= =20 (I think this is why the specs say that you can also escape a space by=20 using the "+" character, in order to make it easier for this most-common=20 of characters... but that weird exception has caused all kinds of=20 hassles in code from that day to this; I know from my own experience.) >> and you can *count* on it when nothing else works, because you can >> always fall back on plain ascii. > Current backslash escaping is also well established, and as much > ASCII-like as anyone would expect. Really?=C2=A0 As ASCII-like as I could expect?=C2=A0 What if my URL is=20 https://he.wikipedia.com/=D7=A9=D7=9C=D7=95=D7=9D_=D7=A2=D7=9C=D7=99=D7=9B= =D7=9D ?=C2=A0 If I am in some backward=20 environment (still all too common) where all I can rely on is ASCII, I=20 can percent-encode the UTF-8 representation and it will work.=C2=A0 Can w= e=20 count on being able to backslash-quote things clear down to ASCII?=C2=A0 = I=20 don't see a way in the docs I've seen. >> But that won't work in org-mode 9.3.6. Nor will >> "info:elisp#Syntactic Font Lock" or "info:elisp#Syntactic\ Font\ Lock" >> or any other variant I've tried, short of putting it inside [[]]s or >> <>s (in other words, no longer using a bare URL). > True, but that's a minor annoyance. > > You apparently prefer to encode a URL manually, replacing each space > with %20 (and other characters with more baroque escape sequences), > rather than adding <...> (or [[...]]) around it and be done with it. > Perhaps this one was the bad idea, after all? Yes, using <>s works, as does [[]].=C2=A0 And yes, I do have to concede t= hat=20 claiming it should be "natural" for a user to hand-escape things with=20 %20s is sort of ridiculous.=C2=A0 Having to reprocess all old org-files f= or=20 such a common notation still seems like more trouble than it was worth,=20 but then you didn't ask me (and you were QUITE RIGHT not to do so!)=C2=A0= I=20 guess a converter-script should also enclose bare URLs in <>, at least=20 if they have spaces or other whitespace. Still don't know about org-protocol and store-link, because I'm lazy.=C2=A0= =20 Right now, at least some of the emacsen I'm working with still use=20 org-9.1.9, so I haven't converted anything. ~mark