From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2 ([2001:41d0:8:6d80::]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id ECy+N8p5fWDTYwEAgWs5BA (envelope-from ) for ; Mon, 19 Apr 2021 14:38:34 +0200 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2 with LMTPS id GK88M8p5fWAcdAAAB5/wlQ (envelope-from ) for ; Mon, 19 Apr 2021 12:38:34 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id B461632622 for ; Mon, 19 Apr 2021 14:38:33 +0200 (CEST) Received: from localhost ([::1]:40264 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lYTAe-0001iB-Fs for larch@yhetil.org; Mon, 19 Apr 2021 08:38:32 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:33914) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lYT7A-0007XF-Vg for emacs-orgmode@gnu.org; Mon, 19 Apr 2021 08:34:57 -0400 Received: from ciao.gmane.io ([116.202.254.214]:58680) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lYT78-0004Yp-Ra for emacs-orgmode@gnu.org; Mon, 19 Apr 2021 08:34:56 -0400 Received: from list by ciao.gmane.io with local (Exim 4.92) (envelope-from ) id 1lYT72-0007FE-Dr for emacs-orgmode@gnu.org; Mon, 19 Apr 2021 14:34:48 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: emacs-orgmode@gnu.org From: Maxim Nikulin Subject: Re: [Patch] to correctly sort the items with emphasis marks in a list Date: Mon, 19 Apr 2021 19:34:42 +0700 Message-ID: References: <87a6qg1rjx.fsf@posteo.net> <874kgft7n1.fsf@nicolasgoaziou.fr> <87blanxb1z.fsf@posteo.net> <87lf9fa3ak.fsf@posteo.net> <87sg3n8f33.fsf@posteo.net> <87tuo2d670.fsf@nicolasgoaziou.fr> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------F21B5822FEADDDB8F2428066" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1 In-Reply-To: <87tuo2d670.fsf@nicolasgoaziou.fr> Content-Language: en-US Received-SPF: pass client-ip=116.202.254.214; envelope-from=geo-emacs-orgmode@m.gmane-mx.org; helo=ciao.gmane.io X-Spam_score_int: 28 X-Spam_score: 2.8 X-Spam_bar: ++ X-Spam_report: (2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_ADSP_CUSTOM_MED=0.001, FORGED_GMAIL_RCVD=1, FORGED_MUA_MOZILLA=2.309, FREEMAIL_FORGED_FROMDOMAIN=0.25, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, NICE_REPLY_A=-0.001, NML_ADSP_CUSTOM_MED=0.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1618835914; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post; bh=2Vw4GK9iOO11ftrxiIW37GzxpWJ1UqdMRumJN88UZVE=; b=Au3yqUbP8sfoqFztpyZbgG+uqm4hBXMKBooz8cHxnspTzyymEgx87FyvqHENGhSbXFjzXS s0kREW+XmSRV33KdTxj3hVp8YUoZi0/zmBqlFsD8D5i51Wc9II0O61JtqoNVu6g4ChGmRW 7LD5QWQUQjW+qkelP8V5MDTk/2DxHMGdvEvmjKv6pwC46D0pRBKYXBrMXy1pcMd9UeBfUV /iBfsL/Ab02qrga9fCKcEyx2DUYqC0w5bu/mCpcVwwQ68Y6P6dkzo/+Y6Y2YG5kyvsea/p kCKMI3cgmQMsP1ESeWwjLLyUJVaZOBIdk2nzJkgrzkWSdD3Uy3nxlFdaVuhQTQ== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1618835914; a=rsa-sha256; cv=none; b=D800IOh2N+X4OcQ3dkzfg3nHTYJHx7Svv4qCK7cA9cum2wYWlN8U5gCPV/MfnnFEFcEnJv HHnPXzHmeHnYiOik6mt1ZIBvkGWJRiiiCEwb+b9ORfcKlQRo7pFYhFJCfI0VtZa7oC1afJ omaQgv4Uf5zoGE2/ZO/I1h0s0D9taDLlrppMV3s+FPzqJSTjieaG8F0cXBqO09mUGvbw2B FAslbQEE96UxlO9i49s+eGd6EASJKchNYHkUICbFiZP7L1Q2D5mP5XYI/L1ZWTLnk9Ktg+ vAABTHYhC9bxkK2qvmATOBGJ8y/MyMWz+0G4hagdiW+Dj/B5oQibYnIX2Hx8wg== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Migadu-Spam-Score: -1.84 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Migadu-Queue-Id: B461632622 X-Spam-Score: -1.84 X-Migadu-Scanner: scn0.migadu.com X-TUID: Y0WjLivixeJM This is a multi-part message in MIME format. --------------F21B5822FEADDDB8F2428066 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit On 19/04/2021 15:33, Nicolas Goaziou wrote: > > Could you try the following instead? > > --8<---------------cut here---------------start------------->8--- > (defun org-sort-remove-invisible (s) > "Remove invisible part of links and emphasis markers from string S." > (let ((remove-markers > (lambda (m) Just a curiosity, what is style guide recommendation: let - lambda or cl-labels? > (concat (match-string 1 m) > (match-string 4 m) > (match-string 5 m))))) > (remove-text-properties 0 (length s) org-rm-props s) > (replace-regexp-in-string > org-verbatim-re remove-markers > (replace-regexp-in-string org-emph-re remove-markers (org-link-display-format s) t tt) Single "t" is at the end, isn't it? > t t))) I think, the patch is an improvement. There is still a minor shortcoming that ordering of emphasized and non-emphasized words is undefined - /a/ - *a* - a Let's leave it aside since it requires multilevel comparison similar to collation in locales and more strict definition which part of string is compared at each level: - /a/ :: A - /a/ :: Z - a :: A - a :: Z In my opinion, a more severe limitation comes from sequential regexp-based approach. Consider stripping markers from 1. "a =b *c* d= e" 2. "*b* /i/" First case could be solved by splitting the input string by verbatim regexp at first and by applying emphasis substitution only to non-verbatim parts. However I am rather shy to experiment with regexps definition to avoid including chars before and after emphasis markers. It would be great to get role of particular characters from more reliable parser (or at least from text properties affected by fontification). I have some tests for `org-sort-remove-invisible', see the attachment. Actually I do not like such style of tests, I strongly prefer to see all failures at once, but I have not found if such technique is used anywhere in org tests. --------------F21B5822FEADDDB8F2428066 Content-Type: text/x-patch; charset=UTF-8; name="0001-More-tests-for-org-sort-list.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0001-More-tests-for-org-sort-list.patch" >From 6719386aa5a95394a4cb98ce28b889f545620bd9 Mon Sep 17 00:00:00 2001 From: Max Nikulin Date: Mon, 19 Apr 2021 19:06:36 +0700 Subject: [PATCH] More tests for `org-sort-list' * lisp/org.el (org-verbatim-re): Add to the docstring a statement concerning subgroups meaning. * testing/lisp/test-org-list.el (test-org-list/sort): Add cases with text emphasis. Stress in comments that it is significant whether locale-specific collation rules ignores spaces. * testing/lisp/test-org.el (test-org/org-sort-remove-invisible): Add tests for `org-sort-list' helper. Add a test with expected failures to make apparent limitation of simple regexp-based approach in `org-sort-remove-invisible' function. --- lisp/org.el | 3 ++- testing/lisp/test-org-list.el | 34 +++++++++++++++++++++++++++++++++- testing/lisp/test-org.el | 27 +++++++++++++++++++++++++++ 3 files changed, 62 insertions(+), 2 deletions(-) diff --git a/lisp/org.el b/lisp/org.el index c2738100f..3d4f5553a 100644 --- a/lisp/org.el +++ b/lisp/org.el @@ -3685,7 +3685,8 @@ After a match, the match groups contain these elements: 5 The character after the match, empty at the end of a line") (defvar org-verbatim-re nil - "Regular expression for matching verbatim text.") + "Regular expression for matching verbatim text. +Groups 1-5 have the same meaning as in `org-emph-re' pattern.") (defvar org-emphasis-regexp-components) ; defined just below (defvar org-emphasis-alist) ; defined just below diff --git a/testing/lisp/test-org-list.el b/testing/lisp/test-org-list.el index 3689a172f..cc7914c8d 100644 --- a/testing/lisp/test-org-list.el +++ b/testing/lisp/test-org-list.el @@ -1225,7 +1225,39 @@ b. Item 2" (equal "- b\n- a\n- C\n" (org-test-with-temp-text "- b\n- C\n- a\n" (org-sort-list t ?A) - (buffer-string)))))) + (buffer-string)))) + ;; Emphasis in the beginning. + (should + (equal "- a\n- /z/\n" + (org-test-with-temp-text "- /z/\n- a\n" + (org-sort-list t ?a) + (buffer-string)))) + (should + (equal "- *a*\n- z\n" + (org-test-with-temp-text "- z\n- *a*\n" + (org-sort-list t ?a) + (buffer-string)))) + ;; Emphasis of second word. + ;; Locales with significant spaces (C, es_ES, pl_PL) + ;; are more sensitive to problems with sort. + (should + (equal "- a b\n- a /z/\n" + (org-test-with-temp-text "- a /z/\n- a b\n" + (org-sort-list t ?a) + (buffer-string)))) + (should + (equal "- a *b*\n- a z\n" + (org-test-with-temp-text "- a z\n- a *b*\n" + (org-sort-list t ?a) + (buffer-string)))) + ;; Space role in sorting. + ;; Test would fail for locales with ignored space, e.g. en_US, it works + ;; in C and currently rare locales having significant space (es_ES, pl_PL) + (should + (equal "- Time stamp\n- Timer\n" + (org-test-with-temp-text "- Timer\n- Time stamp\n" + (org-sort-list t ?a) + (buffer-string)))))) ;; Sort numerically. (should (equal "- 1\n- 2\n- 10\n" diff --git a/testing/lisp/test-org.el b/testing/lisp/test-org.el index f6fb4b3ca..307b30265 100644 --- a/testing/lisp/test-org.el +++ b/testing/lisp/test-org.el @@ -8262,4 +8262,31 @@ two (provide 'test-org) +(ert-deftest test-org/org-sort-remove-invisible () + "Empasis markers are stripped by `org-sort-remove-invisible'." + (dolist (case '(("Test *bold* text" . "Test bold text") + ("Test /italic/ text" . "Test italic text") + ("Test _underlined_ text" . "Test underlined text") + ("Test +strike through+ text" . "Test strike through text") + ("Test =verbatim= text" . "Test verbatim text") + ("Test ~code~ text" . "Test code text") + ("*Beginning* of a line" . "Beginning of a line") + ("End =of line=" . "End of line") + ("Braces (*around*)" . "Braces (around)") + ("Multiple *emphasis* options /in the/ same *line*" . + "Multiple emphasis options in the same line"))) + (let ((test-string (car case)) + (expectation (cdr case))) + (should (equal expectation (org-sort-remove-invisible test-string)))))) + +(ert-deftest test-org/org-sort-remove-invisible-failures () + "Examples of `org-sort-remove-invisible` failures." + :expected-result :failed + (dolist (case '(("Lost =in *verbatim* text= fragment" . + "Lost in *verbatim* text fragment") + ("Adjucent emphasis: *Overlapped* /regexps/" . + "Adjucent emphasis: Ovelapped regexps"))) + (let ((test-string (car case)) + (expectation (cdr case))) + (should (equal expectation (org-sort-remove-invisible test-string)))))) ;;; test-org.el ends here -- 2.25.1 --------------F21B5822FEADDDB8F2428066--