From mboxrd@z Thu Jan 1 00:00:00 1970 From: Kaushal Modi Subject: Re: Recent regression in Org master with parsing of verbatim code with spaces Date: Thu, 12 Oct 2017 15:29:17 +0000 Message-ID: References: <877ew076lb.fsf@nicolasgoaziou.fr> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="94eb2c146d5cb4a44b055b5b340e" Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:33394) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1e2fQS-0003NU-LB for emacs-orgmode@gnu.org; Thu, 12 Oct 2017 11:29:33 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1e2fQR-0004TK-Ie for emacs-orgmode@gnu.org; Thu, 12 Oct 2017 11:29:32 -0400 Received: from mail-qt0-x233.google.com ([2607:f8b0:400d:c0d::233]:47927) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1e2fQR-0004St-Cc for emacs-orgmode@gnu.org; Thu, 12 Oct 2017 11:29:31 -0400 Received: by mail-qt0-x233.google.com with SMTP id z50so14314616qtj.4 for ; Thu, 12 Oct 2017 08:29:31 -0700 (PDT) In-Reply-To: List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: "Emacs-orgmode" To: Nicolas Goaziou Cc: emacs-org list --94eb2c146d5cb4a44b055b5b340e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Oct 12, 2017 at 11:20 AM Kaushal Modi wrote: > On Thu, Oct 12, 2017 at 11:01 AM Nicolas Goaziou > wrote: > >> Why do you consider it to be a regression? IIRC, the change allowed to >> use zero-width space as a way to escape markup. > > Also I spent few hours trying to understand why (setq org-emphasis-regexp-components '("-[:space:]('\"{" ;prematch allowed "-[:space:].,:!?;'\")}\\[" ;postmatch allowed " \t\r\n" ;border *not* allowed "." ;body 20)) ;newlines seems to resolve this issue.. I just changed the "border not allowed" regexp from "[:space:]" to " \t\r\n". I spent few hours, using edebug, message based debugging but couldn't figure out. If I try this sandbox method to debug: (dolist (str '("=3Dtitle =3D \"foo\"=3D ~title =3D \"foo\"=E2=80=8B~")) (message "String: %s" str) (let* ((working-verbatim-re "\\([-[:space:]('\"{]\\|^\\)\\(\\(?3:[=3D~]\\)\\([^ \n]\\|[^ \n].*?\\(?:\n.*?\\)\\{0,1\\}[^ \n]\\)\\3\\)\\([-[:space:].,:!?;'\")}\\[]\\|$\\)") (bad-verbatim-re "\\([-[:space:]('\"{]\\|^\\)\\(\\([=3D~]\\)\\([^[:space:]]\\|[^[:space:]].*= ?\\(?:\n.*?\\)\\{0,1\\}[^[:space:]]\\)\\3\\)\\([-[:space:].,:!?;'\")}\\[]\\= |$\\)") (using-working-verbatim-re (string-match-p working-verbatim-re str)) (using-bad-verbatim-re (string-match-p bad-verbatim-re str)) (match? (equal using-working-verbatim-re using-bad-verbatim-re))) (message ".. working match %S" using-working-verbatim-re) (message ".. bad match %S" using-bad-verbatim-re) (message ".. match? %S" match?))) Both good and "bad" regexp are matching that test string. So don't know how exactly this commit is breaking this.. but I know for sure that reverting org-emphasis-regexp-components to the earlier value fixes things. --=20 Kaushal Modi --94eb2c146d5cb4a44b055b5b340e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Thu, Oct 12= , 2017 at 11:20 AM Kaushal Modi <kaushal.modi@gmail.com> wrote:
On Thu, = Oct 12, 2017 at 11:01 AM Nicolas Goaziou <mail@nicolasgoaziou.fr> wrote:
Why do you consider it to be a regression?= IIRC, the change allowed to
use zero-width space as a way to escape markup.

Also I spent few hours trying to understand w= hy

(setq org-emphasis-regexp-components
= =C2=A0 =C2=A0 =C2=A0 '("-[:space:]('\"{"=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0;prematch allowed
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 "-[:space:].,:!?;'\")}\\["=C2=A0 = =C2=A0 =C2=A0;postmatch allowed
=C2=A0 =C2=A0 =C2=A0 =C2=A0 "= ; \t\r\n"=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0;border *not* allowed
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 "."=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0;body
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 20))=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0;newlines=C2=A0

<= div>seems to resolve this issue.. I just changed the "border not allow= ed" regexp from "[:space:]" to " \t\r\n".=C2=A0

I spent few hours, using edebug, message based debug= ging but couldn't figure out.

If I try this sa= ndbox method to debug:

(dolist (str '(&qu= ot;=3Dtitle =3D \"foo\"=3D

~title =3D \&= quot;foo\"=E2=80=8B~"))
=C2=A0 (message "String: %= s" str)
=C2=A0 (let* ((working-verbatim-re "\\([-[:spac= e:]('\"{]\\|^\\)\\(\\(?3:[=3D~]\\)\\([^
\n]\\|[^

\n].*?\\(?:\n.*?= \\)\\{0,1\\}[^

\n]\\)\\3\\)\\([-[:space:].,:!?;= 9;\")}\\[]\\|$\\)")
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(= bad-verbatim-re "\\([-[:space:]('\"{]\\|^\\)\\(\\([=3D~]\\)\\= ([^[:space:]]\\|[^[:space:]].*?\\(?:\n.*?\\)\\{0,1\\}[^[:space:]]\\)\\3\\)\= \([-[:space:].,:!?;'\")}\\[]\\|$\\)")
=C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0(using-working-verbatim-re (string-match-p working-ver= batim-re str))
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(using-bad-verba= tim-re (string-match-p bad-verbatim-re str))
=C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0(match? (equal using-working-verbatim-re using-bad-verbatim-r= e)))
=C2=A0 =C2=A0 (message ".. working match %S" using= -working-verbatim-re)
=C2=A0 =C2=A0 (message ".. bad match %= S" using-bad-verbatim-re)
=C2=A0 =C2=A0 (message ".. ma= tch? %S" match?)))

Both good and "= bad" regexp are matching that test string.

So= don't know how exactly this commit is breaking this.. but I know for s= ure that reverting org-emphasis-regexp-components to the earlier value fixe= s things.
--

Kaushal Modi=

--94eb2c146d5cb4a44b055b5b340e--