From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nicolas Goaziou Subject: Re: [RFC] Fixing link encoding once and for all Date: Tue, 05 Mar 2019 01:23:24 +0100 Message-ID: <87ef7m2mpv.fsf@nicolasgoaziou.fr> References: <87tvguyohn.fsf@nicolasgoaziou.fr> <87sgw9cxr8.fsf@nicolasgoaziou.fr> <87lg1znh9t.fsf@nicolasgoaziou.fr> Mime-Version: 1.0 Content-Type: text/plain Return-path: Received: from eggs.gnu.org ([209.51.188.92]:38627) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1h0xrq-0001Db-OJ for emacs-orgmode@gnu.org; Mon, 04 Mar 2019 19:23:35 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1h0xro-000295-Sl for emacs-orgmode@gnu.org; Mon, 04 Mar 2019 19:23:34 -0500 Received: from relay9-d.mail.gandi.net ([217.70.183.199]:43447) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1h0xrm-000255-AB for emacs-orgmode@gnu.org; Mon, 04 Mar 2019 19:23:30 -0500 In-Reply-To: (Neil Jerram's message of "Mon, 4 Mar 2019 23:16:07 +0000") List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: "Emacs-orgmode" To: Neil Jerram Cc: Org Mode List Hello, Neil Jerram writes: > On Fri, 1 Mar 2019 at 08:14, Nicolas Goaziou wrote: >> The regexp for bracket links could be, in its simple (!) form: >> >> \[\[\(.*?[^\\]\(?:\\\)*\)\]\(?:\[\([^\000]+?\)\]\)?\] > > [then a bit later] >> Small update, in its string form now: >> >> "\\[\\[\\([^\000]*?[^\\]\\(\\\\\\\\\\)*\\)\\]\\(?:\\[\\([^\000]+?\\)\\]\\)?\\]" > > Is [^\000] the only (or best) way of saying "any character, including > newlines"? There is also "\(.\|\n\)", or "[[:ascii:][:nonascii:]]". > Could there be actual NUL characters in the document? Good question. I used [^\000] out of habit. You are right, "\(.\|\n\)" is more robust. So, the new challenger is: "\\[\\[\\(\\(?:.\\|\n\\)*?[^\\]\\(\\\\\\\\\\)*\\)\\]\\(?:\\[\\(\\(?:.\\|\n\\)+?\\)\\]\\)?\\]" Beautiful. The commented rx equivalent would be: (seq "[" ;; URI part: match group 1. "[" (group (*? anything) ;; Allow an even number of backslashes before the closing bracket. (not (any "\\")) (zero-or-more (group "\\\\"))) "]" ;; Description (optional): match group 2. (opt "[" (group (+? anything)) "]") "]") > \( # begin group 3 > ? # don't understand > :\[ # literal :[ [...] > but there's at least a ? that I don't understand, and I'm afraid I'm > not seeing how it's useful. \(?: ... \) is a shy group. > If you think it works, I'm happy to defer to your judgement on that! > Although I suggested the idea, I don't know Org nearly well enough to > be sure that I haven't missed problems; We are solving the problem with a regexp. What bad things could happen? ;) Regards, -- Nicolas Goaziou