From mboxrd@z Thu Jan 1 00:00:00 1970 From: Neil Jerram Subject: Re: [RFC] Fixing link encoding once and for all Date: Sun, 24 Feb 2019 23:04:27 +0000 Message-ID: References: <87tvguyohn.fsf@nicolasgoaziou.fr> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Return-path: Received: from eggs.gnu.org ([209.51.188.92]:60284) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gy2pV-0005ru-Ow for emacs-orgmode@gnu.org; Sun, 24 Feb 2019 18:05:06 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gy2pR-00032e-3N for emacs-orgmode@gnu.org; Sun, 24 Feb 2019 18:05:03 -0500 Received: from mail-lj1-x22e.google.com ([2a00:1450:4864:20::22e]:39185) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gy2pG-0002yA-Sh for emacs-orgmode@gnu.org; Sun, 24 Feb 2019 18:04:54 -0500 Received: by mail-lj1-x22e.google.com with SMTP id g80so5790994ljg.6 for ; Sun, 24 Feb 2019 15:04:43 -0800 (PST) In-Reply-To: <87tvguyohn.fsf@nicolasgoaziou.fr> List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: "Emacs-orgmode" To: Org Mode List I'm not sure how much freedom you have here, but I think it would be both clearer - by avoiding confusion with URL-escaping - and easier to type, to use an entirely different form of escaping in the Org syntax; probably just this: \[ and \] to include a square bracket in a link \\ to include a backslash Regards, Neil On Sun, 24 Feb 2019 at 01:18, Nicolas Goaziou wrote: > > Hello, > > Recently[1], issues about link escaping have resurfaced. I'd like to > solve this once and for all. > > As a reminder, the initial issue is that bracket links, i.e., "[[path]]" > or "[[path][description]]", cannot contain square brackets, for obvious > reasons. Therefore, they need to be escaped somehow. For some historical > reason, the "somehow" settled, for the path part[2], on URL encoding. > Therefore [ and ] in a link must appear as, respectively, "%5B" and > "%5D". Of course, the initial link could already contain any of these > strings, so percent signs also need to be escaped, as "%25". Eventually, > consecutive spaces are not very handled very gracefully by > `fill-paragraph' function, so it is also useful, but not mandatory, to > be able to escape white spaces, with "%20". It can sadly be confusing > when Org encoding is applied on top an already encoded URI. > > To sum it up, `org-link-escape', by default, URL encodes only square > brackets, percent signs and white spaces. Note that, however, > `org-link-unescape' is not its reciprocal function, despite its > docstring. It URL decodes every percent encoded combination. > > Anyway, square brackets in a bracket link almost looks like a solved > problem. Alas, if some links are inserted by helper functions, such as > `org-insert-link', others could have been typed right into the buffer. > Therefore, there is usually no way to know if a link is already > Org-encoded or not. Consequently, there is usually no way to know when > a link needs to be Org-decoded. This is the root of all evil, or at > least, all bugs encountered so far. Some links end up being encoded or > decoded once too many. > > To solve this, we must assume that every bracket link is properly > Org-encoded in a buffer. In other words, when typing, or yanking, > a bracket link right into a buffer, users are required to use %5B, %5D, > and %25 in the path part of the link, if necessary. I understand it will > bite some users, but using `org-insert-link' would mitigate the pain. It > is also limited to square brackets, which, I assume, is not the type of > link you usually yank. > > With that assumption, the parser can safely Org-decode links > appropriately, and store paths in their decoded form. Consumers, like > export back-ends, need not call `org-link-unescape' anymore. In fact, > the only situation where `org-link-unescape' is still needed is when > extracting the path part of a bracket link from the buffer, e.g., > through regexp matching. > > Of course, the manual should mention this assumption, if we agree on it. > > Thoughts? > > Regards, > > Footnotes: > > [1] E.g., > or . > > [2] There is no clear mechanism for the description part. > `org-insert-link' will replace square brackets with curly ones. We could > also use entities, but none of them appears as a square bracket. Anyway, > I'll ignore this issue for the time being. > > -- > Nicolas Goaziou >