From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:2:4a6f::]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id AKlWKIgkgGAMXQEAgWs5BA (envelope-from ) for ; Wed, 21 Apr 2021 15:11:36 +0200 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id qPEKJIgkgGAYdAAA1q6Kng (envelope-from ) for ; Wed, 21 Apr 2021 13:11:36 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id D7D281555B for ; Wed, 21 Apr 2021 15:11:35 +0200 (CEST) Received: from localhost ([::1]:55998 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lZCdj-0000FQ-1U for larch@yhetil.org; Wed, 21 Apr 2021 09:11:35 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:59808) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lZCdC-0000FK-L9 for emacs-orgmode@gnu.org; Wed, 21 Apr 2021 09:11:02 -0400 Received: from ciao.gmane.io ([116.202.254.214]:41156) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lZCdA-0001gp-HB for emacs-orgmode@gnu.org; Wed, 21 Apr 2021 09:11:02 -0400 Received: from list by ciao.gmane.io with local (Exim 4.92) (envelope-from ) id 1lZCd6-0004aL-SX for emacs-orgmode@gnu.org; Wed, 21 Apr 2021 15:10:56 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: emacs-orgmode@gnu.org From: Maxim Nikulin Subject: Re: [Patch] to correctly sort the items with emphasis marks in a list Date: Wed, 21 Apr 2021 20:10:50 +0700 Message-ID: References: <87a6qg1rjx.fsf@posteo.net> <874kgft7n1.fsf@nicolasgoaziou.fr> <87blanxb1z.fsf@posteo.net> <87lf9fa3ak.fsf@posteo.net> <87sg3n8f33.fsf@posteo.net> <87tuo2d670.fsf@nicolasgoaziou.fr> <87r1j6b6ku.fsf@nicolasgoaziou.fr> <87a6pt9hyd.fsf@nicolasgoaziou.fr> <874kg0ae0k.fsf@nicolasgoaziou.fr> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------981C0082F282FD4BD9889DB2" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1 In-Reply-To: <874kg0ae0k.fsf@nicolasgoaziou.fr> Content-Language: en-US Received-SPF: pass client-ip=116.202.254.214; envelope-from=geo-emacs-orgmode@m.gmane-mx.org; helo=ciao.gmane.io X-Spam_score_int: 28 X-Spam_score: 2.8 X-Spam_bar: ++ X-Spam_report: (2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_ADSP_CUSTOM_MED=0.001, FORGED_GMAIL_RCVD=1, FORGED_MUA_MOZILLA=2.309, FREEMAIL_FORGED_FROMDOMAIN=0.25, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, NICE_REPLY_A=-0.001, NML_ADSP_CUSTOM_MED=0.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1619010696; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post; bh=rJrpvXUwibDw6DROiMeEEjWPwKmsYhHGxmIeIv9dGak=; b=KrBZY0215BrkrUAzH+jelYntwf37HWp/zxTaz/VaKFnWPpx9obCTPvM+zcKfDlLbWmdiK5 qKhblEHpz2vDBvUxnuTvV9/ZhQR0rC9i6MZexwjukprMPMZKNnOrb2s3QhmTVoMeiV5uHt QsLbpQVQmogQuCQw8J9+MFYnkkp5yaKF6/dN4QBDrjAbnWjMPJDbjkNp5rlQ0Q7aQoNrsU qFX9S5203ik7/Cd6wD6zm2MP+D45Cm2Yvc37bOvOsVeLI6jDRDajUE7kuzkj0kcGJ/xEe4 TsggguUTCpQ/bORNYWH0BbB4l57lvV1cLsy7h3oG0dcqiqvcZRW1bYRoR4a/aw== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1619010696; a=rsa-sha256; cv=none; b=hcexeaomM7zKqMw2jApkQQy+vY9lkU6kZ4X+OM56M9qshKYkkYHgvkEBj8iKQf0LCniJE3 +/hOBDRJC+nCX5pQsY5N4PStI09TzWHOamn1CCJAV7mXXEep4hXuPFQ7uvsJMI0wye/Lk4 f76wjX7PdnmxTjtxAgPZ3nUrSinHLoytiE5THse6IbG8WAWo1ORsm/yle9qLYh1Gk2UDPq 54piE2xQHbD9hYHumCuAfb2tOPE376QwPLevYGyo+OMlgj0sfKUazWGGbYfRYgKeY7auv1 /q4tHKr5VrpxNbXg6ZAfBXR37CBLivkLJC/0ERcN5y2bJONDon4G7dv8CnQXyQ== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Migadu-Spam-Score: -1.84 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Migadu-Queue-Id: D7D281555B X-Spam-Score: -1.84 X-Migadu-Scanner: scn0.migadu.com X-TUID: dCGQw38W91GW This is a multi-part message in MIME format. --------------981C0082F282FD4BD9889DB2 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit On 21/04/2021 03:37, Nicolas Goaziou wrote: > Maxim Nikulin writes: > >> (let ((s (org-sort-remove-invisible >> "A wrapping [[https://orgmode.org?a=b&c=d#e][link]] emphasis/" >> >> I expect "A wrapping link emphasis". > > Yet, your expectations are wrong. There is no link in the text above. > Emphasized text starts at "/wrapping" and ends before "/?". > > Granted, this is a situation where the Org syntax is not very intuitive. > Anyway, the new function is more accurate. I think, new variant should be committed even it does not fix Juan's case. He have not confirmed the fix yet. > Maybe we should require a space after punctuation following emphasized > text. I don't know. This is orthogonal to the current discussion. I still believe in my expectation, however I admit such limitation of parser. At first I have not recognized that the issue may be similar to https://orgmode.org/list/CAH+Wm4-_XHUZKFTf=ZtbfnCPvQWkbEoeGs8EpYm+8SPmu8LHFg@mail.gmail.com/ Anyway for my example workaround is to add more markers before, inside, and after the link. Maybe I will look closer at Tom's parser if it solves ambiguity in the same way. >> In the meanwhile I have tried >> >> (benchmark-run 1 (org-sort-list t ?a)) >> >> in a file (1100 lines) obtained using > > I don't think performance is really an issue. Of course, the suggested > function is clearly slower than the current one. It is OK since difference is not really huge, especially taking into account that new variant was not compiled. Do you still have problem with locale dependency of added tests? I can not guess what could be its source and expect that test should work reliably. Disregard "/3" in the subject of the patches. Third change is your code. --------------981C0082F282FD4BD9889DB2 Content-Type: text/x-patch; charset=UTF-8; name="0001-More-tests-for-org-sort-list.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0001-More-tests-for-org-sort-list.patch" >From c2c46d133e80ffa2323618ac88a1902e63ba1efc Mon Sep 17 00:00:00 2001 From: Max Nikulin Date: Mon, 19 Apr 2021 19:06:36 +0700 Subject: [PATCH 1/3] More tests for `org-sort-list' * lisp/org.el (org-verbatim-re): Add to the docstring a statement concerning subgroups meaning. * testing/lisp/test-org-list.el (test-org-list/sort): Add cases with text emphasis. Stress in comments that it is significant whether locale-specific collation rules ignores spaces. * testing/lisp/test-org.el (test-org/org-sort-remove-invisible): Add tests for `org-sort-list' helper. --- lisp/org.el | 3 ++- testing/lisp/test-org-list.el | 34 +++++++++++++++++++++++++++++++++- testing/lisp/test-org.el | 25 +++++++++++++++++++++++++ 3 files changed, 60 insertions(+), 2 deletions(-) diff --git a/lisp/org.el b/lisp/org.el index c2738100f..3d4f5553a 100644 --- a/lisp/org.el +++ b/lisp/org.el @@ -3685,7 +3685,8 @@ After a match, the match groups contain these elements: 5 The character after the match, empty at the end of a line") (defvar org-verbatim-re nil - "Regular expression for matching verbatim text.") + "Regular expression for matching verbatim text. +Groups 1-5 have the same meaning as in `org-emph-re' pattern.") (defvar org-emphasis-regexp-components) ; defined just below (defvar org-emphasis-alist) ; defined just below diff --git a/testing/lisp/test-org-list.el b/testing/lisp/test-org-list.el index 3689a172f..cc7914c8d 100644 --- a/testing/lisp/test-org-list.el +++ b/testing/lisp/test-org-list.el @@ -1225,7 +1225,39 @@ b. Item 2" (equal "- b\n- a\n- C\n" (org-test-with-temp-text "- b\n- C\n- a\n" (org-sort-list t ?A) - (buffer-string)))))) + (buffer-string)))) + ;; Emphasis in the beginning. + (should + (equal "- a\n- /z/\n" + (org-test-with-temp-text "- /z/\n- a\n" + (org-sort-list t ?a) + (buffer-string)))) + (should + (equal "- *a*\n- z\n" + (org-test-with-temp-text "- z\n- *a*\n" + (org-sort-list t ?a) + (buffer-string)))) + ;; Emphasis of second word. + ;; Locales with significant spaces (C, es_ES, pl_PL) + ;; are more sensitive to problems with sort. + (should + (equal "- a b\n- a /z/\n" + (org-test-with-temp-text "- a /z/\n- a b\n" + (org-sort-list t ?a) + (buffer-string)))) + (should + (equal "- a *b*\n- a z\n" + (org-test-with-temp-text "- a z\n- a *b*\n" + (org-sort-list t ?a) + (buffer-string)))) + ;; Space role in sorting. + ;; Test would fail for locales with ignored space, e.g. en_US, it works + ;; in C and currently rare locales having significant space (es_ES, pl_PL) + (should + (equal "- Time stamp\n- Timer\n" + (org-test-with-temp-text "- Timer\n- Time stamp\n" + (org-sort-list t ?a) + (buffer-string)))))) ;; Sort numerically. (should (equal "- 1\n- 2\n- 10\n" diff --git a/testing/lisp/test-org.el b/testing/lisp/test-org.el index f6fb4b3ca..3f74b3f35 100644 --- a/testing/lisp/test-org.el +++ b/testing/lisp/test-org.el @@ -8262,4 +8262,29 @@ two (provide 'test-org) +(ert-deftest test-org/org-sort-remove-invisible () + "Empasis markers are stripped by `org-sort-remove-invisible'." + (dolist (case '(("Test *bold* text" . "Test bold text") + ("Test /italic/ text" . "Test italic text") + ("Test _underlined_ text" . "Test underlined text") + ("Test +strike through+ text" . "Test strike through text") + ("Test =verbatim= text" . "Test verbatim text") + ("Test ~code~ text" . "Test code text") + ("*Beginning* of a line" . "Beginning of a line") + ("End =of line=" . "End of line") + ("Braces (*around*)" . "Braces (around)") + ("Multiple *emphasis* options /in the/ same *line*" . + "Multiple emphasis options in the same line") + ("/Adjucent/ *emphasis*" . "Adjucent emphasis") + ("Verbatim =with *emphasis* example=" . + "Verbatim with *emphasis* example") + ("Emphasis [[http://orgmode.org/][inside /a link/]]" . + "Emphasis inside a link") + ("*Wrap [[https:/orgmode.org][link]] emphasis*" . + "Wrap link emphasis") + ("A =verbatim [[https://orgmode.org/][link]] example=" . + "A verbatim [[https://orgmode.org/][link]] example"))) + (let ((test-string (car case)) + (expectation (cdr case))) + (should (equal expectation (org-sort-remove-invisible test-string)))))) ;;; test-org.el ends here -- 2.25.1 --------------981C0082F282FD4BD9889DB2 Content-Type: text/x-patch; charset=UTF-8; name="0002-testing-lisp-test-org.el-Non-obvious-cases-for-org-s.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename*0="0002-testing-lisp-test-org.el-Non-obvious-cases-for-org-s.pa"; filename*1="tch" >From e7a326fff98fdc56b91818af526d398d8387bda4 Mon Sep 17 00:00:00 2001 From: Max Nikulin Date: Wed, 21 Apr 2021 19:22:18 +0700 Subject: [PATCH 2/3] testing/lisp/test-org.el: Non-obvious cases for org-sort-remove-invisible testing/lisp/test-org.el (test-org/org-sort-remove-invisible): Add test cases illustrating corner cases in parser behavior. Fontification in Emacs buffer is not consistent with result of `org-sort-remove-invisible` for 2 of 3 added examples. The intention is to make author of changes in parser aware of consequences, however it is unlikely that users rely on such nuances. Update expectations if change is considered as improvement. --- testing/lisp/test-org.el | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/testing/lisp/test-org.el b/testing/lisp/test-org.el index 3f74b3f35..4f987f509 100644 --- a/testing/lisp/test-org.el +++ b/testing/lisp/test-org.el @@ -8283,7 +8283,18 @@ two ("*Wrap [[https:/orgmode.org][link]] emphasis*" . "Wrap link emphasis") ("A =verbatim [[https://orgmode.org/][link]] example=" . - "A verbatim [[https://orgmode.org/][link]] example"))) + "A verbatim [[https://orgmode.org/][link]] example") + ;; A bit strange cases to catch changes of behavior. + ;; It may be reasonable to update expectation in the case of failure. + ("/Overlapping [[http://orgmode.org][structure/ with link]]" . + ;; not "Overlapping structure with link" + "Overlapping [[http://orgmode.org][structure with link]]") + ("Another [[http://orgmode.org][overlapping *structure]] with* link" . + ;; not "Another overlapping structure with link" + "Another overlapping *structure with* link") + ("A /wrapping [[https://orgmode.org/?bot=true][link]] emphasis/" . + ;; not "A wrapping link emphasis", lost "/" before "?" instead + "A wrapping [[https://orgmode.org?bot=true][link]] emphasis/"))) (let ((test-string (car case)) (expectation (cdr case))) (should (equal expectation (org-sort-remove-invisible test-string)))))) -- 2.25.1 --------------981C0082F282FD4BD9889DB2--