emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Ihor Radchenko <yantar92@posteo.net>
To: Max Nikulin <manikulin@gmail.com>
Cc: emacs-orgmode@gnu.org
Subject: Re: [DISCUSSION] Sorting strings in Org mode vs. system locale (was: test-org-table/sort-lines: Failing test on macOS)
Date: Sun, 05 May 2024 11:59:51 +0000	[thread overview]
Message-ID: <87fruwtqo8.fsf@localhost> (raw)
In-Reply-To: <87jzleptcs.fsf@localhost>

[-- Attachment #1: Type: text/plain, Size: 769 bytes --]

Ihor Radchenko <yantar92@posteo.net> writes:

>> The only additional consideration is that compare function should be 
>> configurable. If a user access same files from Linux and macOS then it 
>> may be really annoying to get different order of entries in agenda. For 
>> most of Linux users it is better to use more smart 
>> `string-collate-lessp'. Some care is required to sort entries obtained 
>> from multiple buffers in predictable environment (locale, case 
>> conversion table).
>
> I agree. We can introduce a new customization -
> `org-string-sort-function' that will be used across Org mode to sort
> user text.

See the attached tentative patch.
I added a customization, made everything in Org obey it, and provided
some default options for MacOS users.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-org-New-Org-wide-custom-option-org-sort-function.patch --]
[-- Type: text/x-patch, Size: 13446 bytes --]

From dbc3929d8c7a26da3bf31fb52a651da68d1f733b Mon Sep 17 00:00:00 2001
Message-ID: <dbc3929d8c7a26da3bf31fb52a651da68d1f733b.1714910323.git.yantar92@posteo.net>
From: Ihor Radchenko <yantar92@posteo.net>
Date: Sun, 5 May 2024 14:37:52 +0300
Subject: [PATCH] org: New Org-wide custom option `org-sort-function'

* lisp/org-macs.el (org-sort-function): New customization defining how
Org mode should sort headlines, table lines, agenda lines, etc.
(org-string<):
(org-string<=):
(org-string>=):
(org-string>): Use the new customization.
(org-string<>): Add docstring.
(org-sort-function-downcase): New helper function to help users on
MacOS where `string-collate-lessp' is not reliable.
* lisp/oc-basic.el (org-cite-basic--field-less-p):
* lisp/org-agenda.el (org-cmp-category):
(org-cmp-alpha):
* lisp/org-list.el (org-sort-list):
* lisp/org-mouse.el (org-mouse-list-options-menu):
* lisp/org-table.el (org-table-sort-lines):
* lisp/org.el (org-tags-sort-function):
(org-sort-entries):
* lisp/ox-publish.el (org-publish-sitemap): Honor the new
customization.
* lisp/org-mouse.el (org-mouse-tag-menu):
(org-mouse-popup-global-menu):
* lisp/org-agenda.el (org-cmp-tag): Honor `org-tags-sort-function' and
falling back to `org-string<' if note set.
* etc/ORG-NEWS (New option controlling how Org mode sorts things
~org-sort-function~): Announce the change.

This change aims to standardize the way Org mode performs sorting of
user data.  In particular, it addresses issues with oddities of string
collation rules on MacOS and tricky language environments like
Turkish.

Link: https://orgmode.org/list/87jzleptcs.fsf@localhost
---
 etc/ORG-NEWS       | 20 ++++++++++++++
 lisp/oc-basic.el   |  2 +-
 lisp/org-agenda.el | 12 ++++-----
 lisp/org-list.el   |  2 +-
 lisp/org-macs.el   | 66 +++++++++++++++++++++++++++++++++++++---------
 lisp/org-mouse.el  | 13 +++++----
 lisp/org-table.el  |  4 +--
 lisp/org.el        |  6 ++---
 lisp/ox-publish.el |  9 +++----
 9 files changed, 98 insertions(+), 36 deletions(-)

diff --git a/etc/ORG-NEWS b/etc/ORG-NEWS
index 3c597db40..af88febb1 100644
--- a/etc/ORG-NEWS
+++ b/etc/ORG-NEWS
@@ -710,6 +710,26 @@ any more.  Run ~org-ctags-enable~ to setup hooks and advices:
 #+end_src
 
 ** New and changed options
+*** New option controlling how Org mode sorts things ~org-sort-function~
+
+Sorting of agenda items, tables, menus, headlines, etc can now be
+controlled using a new custom option ~org-sort-function~.
+
+By default, Org mode sorts things according to the operation system
+language.  However, language sorting rules may or may not produce good
+results depending on the use case.  For example, multi-language
+documents may be sorted weirdly when sorting rules for system language
+are applied on the text written using different language.  Also, some
+operations systems (e.g. MacOS), do not provide accurate string
+sorting rules.
+
+Org mode provides 4 possible values for ~org-sort-function~:
+1. (default) Sort using system language rules.
+2. Sort using dumb string comparison. It is the most reliable option.
+3. Sort case-insensitively, making use of UTF case conversion.  This
+   may work better for mixed-language documents and on MacOS.
+4. Custom function, if the above does not fit the needs.
+
 *** =ob-latex= now uses a new option ~org-babel-latex-process-alist~ to generate png output
 
 Previously, =ob-latex= used ~org-preview-latex-default-process~ from
diff --git a/lisp/oc-basic.el b/lisp/oc-basic.el
index 8959bb065..6e3142fa1 100644
--- a/lisp/oc-basic.el
+++ b/lisp/oc-basic.el
@@ -680,7 +680,7 @@ (defun org-cite-basic--field-less-p (field info)
 INFO is the export state, as a property list."
   (and field
        (lambda (a b)
-         (string-collate-lessp
+         (org-string<
           (org-cite-basic--get-field field a info 'raw)
           (org-cite-basic--get-field field b info 'raw)
           nil t))))
diff --git a/lisp/org-agenda.el b/lisp/org-agenda.el
index 93c6acef2..05d2f94c0 100644
--- a/lisp/org-agenda.el
+++ b/lisp/org-agenda.el
@@ -7489,8 +7489,8 @@ (defsubst org-cmp-category (a b)
   "Compare the string values of categories of strings A and B."
   (let ((ca (or (get-text-property (1- (length a)) 'org-category a) ""))
 	(cb (or (get-text-property (1- (length b)) 'org-category b) "")))
-    (cond ((string-lessp ca cb) -1)
-	  ((string-lessp cb ca) +1))))
+    (cond ((org-string< ca cb) -1)
+	  ((org-string< cb ca) +1))))
 
 (defsubst org-cmp-todo-state (a b)
   "Compare the todo states of strings A and B."
@@ -7536,8 +7536,8 @@ (defsubst org-cmp-alpha (a b)
     (cond ((not (or ta tb)) nil)
 	  ((not ta) +1)
 	  ((not tb) -1)
-	  ((string-lessp ta tb) -1)
-	  ((string-lessp tb ta) +1))))
+	  ((org-string< ta tb) -1)
+	  ((org-string< tb ta) +1))))
 
 (defsubst org-cmp-tag (a b)
   "Compare the string values of the first tags of A and B."
@@ -7546,8 +7546,8 @@ (defsubst org-cmp-tag (a b)
     (cond ((not (or ta tb)) nil)
 	  ((not ta) +1)
 	  ((not tb) -1)
-	  ((string-lessp ta tb) -1)
-	  ((string-lessp tb ta) +1))))
+	  ((funcall (or org-tags-sort-function #'org-string<) ta tb) -1)
+	  ((funcall (or org-tags-sort-function #'org-string<) tb ta) +1))))
 
 (defsubst org-cmp-time (a b)
   "Compare the time-of-day values of strings A and B."
diff --git a/lisp/org-list.el b/lisp/org-list.el
index fca3758c8..d7559d2a7 100644
--- a/lisp/org-list.el
+++ b/lisp/org-list.el
@@ -2979,7 +2979,7 @@ (defun org-sort-list
 		   (error "Missing key extractor"))))
 	 (sort-func
 	  (cond
-	   ((= dcst ?a) #'string-collate-lessp)
+	   ((= dcst ?a) #'org-string<)
 	   ((= dcst ?f)
 	    (or compare-func
 		(and interactive?
diff --git a/lisp/org-macs.el b/lisp/org-macs.el
index 1254ddb54..c3bef66cd 100644
--- a/lisp/org-macs.el
+++ b/lisp/org-macs.el
@@ -113,7 +113,6 @@ (declare-function org-fold-save-outline-visibility "org-fold" (use-markers &rest
 (declare-function org-fold-next-visibility-change "org-fold" (&optional pos limit ignore-hidden-p previous-p))
 (declare-function org-fold-core-with-forced-fontification "org-fold" (&rest body))
 (declare-function org-fold-folded-p "org-fold" (&optional pos limit ignore-hidden-p previous-p))
-(declare-function string-collate-lessp "org-compat" (s1 s2 &optional locale ignore-case))
 (declare-function org-time-convert-to-list "org-compat" (time))
 (declare-function org-buffer-text-pixel-width "org-compat" ())
 
@@ -982,20 +981,63 @@ (defun org-uuidgen-p (s)
 \f
 ;;; String manipulation
 
-(defun org-string< (a b)
-  (string-collate-lessp a b))
-
-(defun org-string<= (a b)
-  (or (string= a b) (string-collate-lessp a b)))
-
-(defun org-string>= (a b)
-  (not (string-collate-lessp a b)))
-
-(defun org-string> (a b)
+(defcustom org-sort-function #'string-collate-lessp
+  "Function used to compare strings when sorting.
+This function affects how Org mode sorts headlines, agenda items,
+table lines, etc.
+
+The function must accept either 2 or 4 arguments: strings to compare
+and, optionally, LOCALE and IGNORE-CASE - locale name and flag to make
+comparison case-insensitive.
+
+The default value uses sorting rules according to OS language.  Users
+who want to make sorting language-independent, may customize the value
+to `string-lessp'.
+
+Note that some string sorting rules are known to be not accurate on
+MacOS.  See https://debbugs.gnu.org/cgi/bugreport.cgi?bug=59275.
+MacOS users may customize the value to `org-sort-function-downcase'."
+  :group 'org
+  :package-version '(Org . "9.7")
+  :type '(choice
+          (const :tag "According to OS language" string-collate-lessp)
+          (const :tag "Using string comparison" string-lessp)
+          (const :tag "Case-insensitive string comparison" org-sort-function-downcase)
+          (function :tag "Custom function")))
+
+(defun org-sort-function-downcase (a b &optional _ _)
+  "Return non-nil when downcased string A < string B.  Ignore case."
+  (string-lessp (downcase a) (downcase b)))
+
+(defun org-string< (a b &optional locale ignore-case)
+  "Return non-nil when string A < string B.
+LOCALE is the locale name.  IGNORE-CASE, when non-nil, makes comparison
+ignore case."
+  (if (= 4 (cdr (func-arity org-sort-function)))
+      (funcall org-sort-function a b locale ignore-case)
+    (funcall org-sort-function a b)))
+
+(defun org-string<= (a b &optional locale ignore-case)
+  "Return non-nil when string A <= string B.
+LOCALE is the locale name.  IGNORE-CASE, when non-nil, makes comparison
+ignore case."
+  (or (string= a b) (org-string< a b locale ignore-case)))
+
+(defun org-string>= (a b &optional locale ignore-case)
+  "Return non-nil when string A >= string B.
+LOCALE is the locale name.  IGNORE-CASE, when non-nil, makes comparison
+ignore case."
+  (not (org-string< a b locale ignore-case)))
+
+(defun org-string> (a b &optional locale ignore-case)
+  "Return non-nil when string A > string B.
+LOCALE is the locale name.  IGNORE-CASE, when non-nil, makes comparison
+ignore case."
   (and (not (string= a b))
-       (not (string-collate-lessp a b))))
+       (not (org-string< a b locale ignore-case))))
 
 (defun org-string<> (a b)
+  "Return non-nil when string A and string B are not equal."
   (not (string= a b)))
 
 (defsubst org-trim (s &optional keep-lead)
diff --git a/lisp/org-mouse.el b/lisp/org-mouse.el
index 2904bad1f..0b1ddaa6e 100644
--- a/lisp/org-mouse.el
+++ b/lisp/org-mouse.el
@@ -426,13 +426,14 @@ (defun org-mouse-tag-menu ()		;todo
   (append
    (let ((tags (org-get-tags nil t)))
      (org-mouse-keyword-menu
-      (sort (mapcar #'car (org-get-buffer-tags)) #'string-lessp)
+      (sort (mapcar #'car (org-get-buffer-tags))
+            (or org-tags-sort-function #'org-string<))
       (lambda (tag)
 	(org-mouse-set-tags
 	 (sort (if (member tag tags)
 		   (delete tag tags)
 		 (cons tag tags))
-	       #'string-lessp)))
+	       (or org-tags-sort-function #'org-string<))))
       (lambda (tag) (member tag tags))
       ))
    '("--"
@@ -473,7 +474,7 @@ (defun org-mouse-list-options-menu (alloptions &optional function)
 				    (sort (if (member ',name ',options)
 					      (delete ',name ',options)
 					    (cons ',name ',options))
-					  'string-lessp)
+					  #'org-string<)
 				    " ")
 			 nil nil nil 1)
 			(when (functionp ',function) (funcall ',function)))
@@ -502,7 +503,8 @@ (defun org-mouse-popup-global-menu ()
      ["Check TODOs" org-show-todo-tree t]
      ("Check Tags"
       ,@(org-mouse-keyword-menu
-	 (sort (mapcar #'car (org-get-buffer-tags)) #'string-lessp)
+	 (sort (mapcar #'car (org-get-buffer-tags))
+               (or org-tags-sort-function #'org-string<))
          (lambda (tag) (org-tags-sparse-tree nil tag)))
       "--"
       ["Custom Tag ..." org-tags-sparse-tree t])
@@ -512,7 +514,8 @@ (defun org-mouse-popup-global-menu ()
      ["Display TODO List" org-todo-list t]
      ("Display Tags"
       ,@(org-mouse-keyword-menu
-	 (sort (mapcar #'car (org-get-buffer-tags)) #'string-lessp)
+	 (sort (mapcar #'car (org-get-buffer-tags))
+               (or org-tags-sort-function #'org-string<))
          (lambda (tag) (org-tags-view nil tag)))
       "--"
       ["Custom Tag ..." org-tags-view t])
diff --git a/lisp/org-table.el b/lisp/org-table.el
index 0c2dc27ed..45fe4d0fa 100644
--- a/lisp/org-table.el
+++ b/lisp/org-table.el
@@ -4637,8 +4637,8 @@ (defun org-table-sort-lines
 	     (predicate
 	      (cl-case sorting-type
 		((?n ?N ?t ?T) #'<)
-		((?a ?A) (if with-case #'string-collate-lessp
-			   (lambda (s1 s2) (string-collate-lessp s1 s2 nil t))))
+		((?a ?A) (if with-case #'org-string<
+			   (lambda (s1 s2) (org-string< s1 s2 nil t))))
 		((?f ?F)
 		 (or compare-func
 		     (and interactive?
diff --git a/lisp/org.el b/lisp/org.el
index 20879685c..f9a9332aa 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -2944,8 +2944,8 @@ (defcustom org-tags-sort-function nil
   :group 'org-tags
   :type '(choice
 	  (const :tag "No sorting" nil)
-	  (const :tag "Alphabetical" string-collate-lessp)
-	  (const :tag "Reverse alphabetical" org-string-collate-greaterp)
+	  (const :tag "Alphabetical" org-string<)
+	  (const :tag "Reverse alphabetical" org-string>)
 	  (function :tag "Custom function" nil)))
 
 (defvar org-tags-history nil
@@ -7955,7 +7955,7 @@ (defun org-sort-entries
 	     (t (error "Invalid sorting type `%c'" sorting-type))))
 	  nil
 	  (cond
-	   ((= dcst ?a) 'string-collate-lessp)
+	   ((= dcst ?a) #'org-string<)
 	   ((= dcst ?f)
 	    (or compare-func
 		(and interactive?
diff --git a/lisp/ox-publish.el b/lisp/ox-publish.el
index 3e526b813..1b623ce9f 100644
--- a/lisp/ox-publish.el
+++ b/lisp/ox-publish.el
@@ -794,17 +794,14 @@ (defun org-publish-sitemap (project &optional sitemap-filename)
 			      (concat (file-name-directory b)
 				      (org-publish-find-title b project))
 			    b)))
-		   (setq retval
-			 (if ignore-case
-			     (not (string-lessp (upcase B) (upcase A)))
-			   (not (string-lessp B A))))))
+		   (setq retval (org-string<= A B nil ignore-case))))
 		((or `anti-chronologically `chronologically)
 		 (let* ((adate (org-publish-find-date a project))
 			(bdate (org-publish-find-date b project)))
 		   (setq retval
 			 (not (if (eq sort-files 'chronologically)
-				  (time-less-p bdate adate)
-				(time-less-p adate bdate))))))
+				(time-less-p bdate adate)
+			      (time-less-p adate bdate))))))
 		(`nil nil)
 		(_ (user-error "Invalid sort value %s" sort-files)))
 	      ;; Directory-wise wins:
-- 
2.45.0


[-- Attachment #3: Type: text/plain, Size: 224 bytes --]


-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>

  reply	other threads:[~2024-05-05 11:59 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-06 20:15 test-org-table/sort-lines: Failing test on macOS Rudolf Adamkovič
2022-10-07 12:04 ` Max Nikulin
2022-10-08  5:25   ` Ihor Radchenko
2022-10-08 14:27     ` Max Nikulin
2022-10-09  3:59       ` Ihor Radchenko
2022-10-09 15:38         ` Rudolf Adamkovič
2022-10-09 16:53           ` Max Nikulin
2022-10-10 22:25             ` Rudolf Adamkovič
2022-10-12 16:09               ` Max Nikulin
2022-11-15  4:10                 ` Ihor Radchenko
2022-11-20  4:18                   ` Ihor Radchenko
2022-11-20  8:00                     ` Max Nikulin
2022-11-21  3:15                       ` Ihor Radchenko
2022-11-21 16:48                         ` Max Nikulin
2022-11-22  1:14                           ` Ihor Radchenko
2022-11-22 16:01                             ` Max Nikulin
2022-11-23 10:37                               ` Ihor Radchenko
2022-11-23 15:27                                 ` Max Nikulin
2022-11-23 17:01                                   ` Max Nikulin
2022-11-26  2:05                                   ` Ihor Radchenko
2022-11-29 16:40                                     ` Max Nikulin
2024-04-03 11:40                                       ` [DISCUSSION] Sorting strings in Org mode vs. system locale (was: test-org-table/sort-lines: Failing test on macOS) Ihor Radchenko
2024-05-05 11:59                                         ` Ihor Radchenko [this message]
2024-05-07 11:06                                           ` [DISCUSSION] Sorting strings in Org mode vs. system locale Max Nikulin
2024-05-07 13:09                                             ` Ihor Radchenko
2024-05-07 16:47                                               ` Max Nikulin
2024-05-11  9:38                                                 ` Ihor Radchenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87fruwtqo8.fsf@localhost \
    --to=yantar92@posteo.net \
    --cc=emacs-orgmode@gnu.org \
    --cc=manikulin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).