* test-org-table/sort-lines: Failing test on macOS
@ 2022-10-06 20:15 Rudolf Adamkovič
2022-10-07 12:04 ` Max Nikulin
0 siblings, 1 reply; 31+ messages in thread
From: Rudolf Adamkovič @ 2022-10-06 20:15 UTC (permalink / raw)
To: emacs-orgmode
Howdy, howdy!
I see the test failure below on macOS.
Test test-org-table/sort-lines condition:
(ert-test-failed
((should
(equal "| a | x |\n| B | 4 |\n| c | 3 |\n"
(org-test-with-temp-text "| <point>a | x |\n| c | 3 |\n| B | 4 |\n"
... ...)))
:form
(equal "| a | x |\n| B | 4 |\n| c | 3 |\n"
#("| B | 4 |\n| a | x |\n| c | 3 |\n" 0 9
(face org-table)
10 19
(face org-table)
20 29
(face org-table)))
:value nil :explanation
(array-elt 2
(different-atoms
(97 "#x61" "?a")
(66 "#x42" "?B")))))
FAILED 796/952 test-org-table/sort-lines (0.003410 sec)
at ../lisp/test-org-table.el:1880
The isolated part of the test file that fails:
(let ((original-string-collate-lessp (symbol-function 'string-collate-lessp)))
(cl-letf (((symbol-function 'string-collate-lessp)
(lambda (s1 s2 &optional _locale ignore-case)
(funcall original-string-collate-lessp
s1 s2 "C" nil))))
(should
(equal "| a | x |\n| B | 4 |\n| c | 3 |\n"
(org-test-with-temp-text "| <point>a | x |\n| c | 3 |\n| B | 4 |\n"
(org-table-sort-lines nil ?a)
(buffer-string))))))
If I understand, "a" should be less than "B" when under "C" locale when
ignoring case (nil) , right? Yet, I get the following:
(string-collate-lessp "a" "B" "C" nil) ; => nil
[FYI: If I replace nil with t, the procedure returns nil too.]
Tested on Emacs 29 (adaa2fc90e) and Org 9.5.5 (580f28614).
Rudy
--
"It is no paradox to say that in our most theoretical moods we may be
nearest to our most practical applications."
-- Alfred North Whitehead, 1861-1947
Rudolf Adamkovič <salutis@me.com> [he/him]
Studenohorská 25
84103 Bratislava
Slovakia
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS
2022-10-06 20:15 test-org-table/sort-lines: Failing test on macOS Rudolf Adamkovič
@ 2022-10-07 12:04 ` Max Nikulin
2022-10-08 5:25 ` Ihor Radchenko
0 siblings, 1 reply; 31+ messages in thread
From: Max Nikulin @ 2022-10-07 12:04 UTC (permalink / raw)
To: emacs-orgmode
On 07/10/2022 03:15, Rudolf Adamkovič wrote:
>
> If I understand, "a" should be less than "B" when under "C" locale when
> ignoring case (nil) , right? Yet, I get the following:
>
> (string-collate-lessp "a" "B" "C" nil) ; => nil
When case is not ignored (4th argument is nil) locale-dependent
collation rules are used, so you get the expected result.
$ printf 'a\nB\n' | LC_COLLATE=C sort
B
a
$ printf 'a\nB\n' | LC_COLLATE=en_US.UTF-8 sort
a
B
> [FYI: If I replace nil with t, the procedure returns nil too.]
>
> Tested on Emacs 29 (adaa2fc90e) and Org 9.5.5 (580f28614).
Strange. Emacs-26, Linux
(string-collate-lessp "a" "B" "C" t)
t
If libc is sane (assuming that sort is linked to the same libc)
printf 'a\nb\n' | LC_COLLATE=C sort
printf 'b\na\n' | LC_COLLATE=C sort
printf 'A\nB\n' | LC_COLLATE=C sort
printf 'B\nA\n' | LC_COLLATE=C sort
printf 'a\nb\n' | LC_COLLATE=C.UTF-8 sort
printf 'b\na\n' | LC_COLLATE=C.UTF-8 sort
printf 'A\nB\n' | LC_COLLATE=C.UTF-8 sort
printf 'B\nA\n' | LC_COLLATE=C.UTF-8 sort
then you might face an Emacs bug.
P.S. Example of a subtle issue with sorting: significant space added to
some locales like es_ES & Co, pl_PL:
Maxim Nikulin. Re: [Patch] to correctly sort the items with emphasis
marks in a list. Fri, 16 Apr 2021 21:59:51 +0700.
https://list.orgmode.org/s5c8p9$97n$1@ciao.gmane.io
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS
2022-10-07 12:04 ` Max Nikulin
@ 2022-10-08 5:25 ` Ihor Radchenko
2022-10-08 14:27 ` Max Nikulin
0 siblings, 1 reply; 31+ messages in thread
From: Ihor Radchenko @ 2022-10-08 5:25 UTC (permalink / raw)
To: Max Nikulin; +Cc: emacs-orgmode
Max Nikulin <manikulin@gmail.com> writes:
> On 07/10/2022 03:15, Rudolf Adamkovič wrote:
>>
>> If I understand, "a" should be less than "B" when under "C" locale when
>> ignoring case (nil) , right? Yet, I get the following:
>>
>> (string-collate-lessp "a" "B" "C" nil) ; => nil
>
> When case is not ignored (4th argument is nil) locale-dependent
> collation rules are used, so you get the expected result.
>
> $ printf 'a\nB\n' | LC_COLLATE=C sort
> B
> a
> $ printf 'a\nB\n' | LC_COLLATE=en_US.UTF-8 sort
> a
> B
Should we then modify the test to set locale explicitly?
>> [FYI: If I replace nil with t, the procedure returns nil too.]
>>
>> Tested on Emacs 29 (adaa2fc90e) and Org 9.5.5 (580f28614).
>
> Strange. Emacs-26, Linux
>
> (string-collate-lessp "a" "B" "C" t)
> t
>
> If libc is sane (assuming that sort is linked to the same libc)
>
> printf 'a\nb\n' | LC_COLLATE=C sort
> printf 'b\na\n' | LC_COLLATE=C sort
> printf 'A\nB\n' | LC_COLLATE=C sort
> printf 'B\nA\n' | LC_COLLATE=C sort
> printf 'a\nb\n' | LC_COLLATE=C.UTF-8 sort
> printf 'b\na\n' | LC_COLLATE=C.UTF-8 sort
> printf 'A\nB\n' | LC_COLLATE=C.UTF-8 sort
> printf 'B\nA\n' | LC_COLLATE=C.UTF-8 sort
>
> then you might face an Emacs bug.
IDK if it is related, but there was a recent (fixed) bug in
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=55787
Note that Rudolf is using Emacs 29.
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS
2022-10-08 5:25 ` Ihor Radchenko
@ 2022-10-08 14:27 ` Max Nikulin
2022-10-09 3:59 ` Ihor Radchenko
0 siblings, 1 reply; 31+ messages in thread
From: Max Nikulin @ 2022-10-08 14:27 UTC (permalink / raw)
To: emacs-orgmode
On 08/10/2022 12:25, Ihor Radchenko wrote:
> Max Nikulin writes:
>>
>> When case is not ignored (4th argument is nil) locale-dependent
>> collation rules are used, so you get the expected result.
>>
>> $ printf 'a\nB\n' | LC_COLLATE=C sort
>> B
>> a
>> $ printf 'a\nB\n' | LC_COLLATE=en_US.UTF-8 sort
>> a
>> B
>
> Should we then modify the test to set locale explicitly?
Rudolf cited the context of this test and "C" locale is explicitly used
there.
> IDK if it is related, but there was a recent (fixed) bug in
> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=55787
I have not tried to find commits to check if only version sort is affected.
> Note that Rudolf is using Emacs 29.
and macOS, so libc and locales version may be different as well.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS
2022-10-08 14:27 ` Max Nikulin
@ 2022-10-09 3:59 ` Ihor Radchenko
2022-10-09 15:38 ` Rudolf Adamkovič
0 siblings, 1 reply; 31+ messages in thread
From: Ihor Radchenko @ 2022-10-09 3:59 UTC (permalink / raw)
To: Max Nikulin, Rudolf Adamkovič; +Cc: emacs-orgmode
[I am adding Rudolf's email back to CC just in case]
Max Nikulin <manikulin@gmail.com> writes:
>> Should we then modify the test to set locale explicitly?
>
> Rudolf cited the context of this test and "C" locale is explicitly used
> there.
Oops. Missed it. Thanks for the clarification.
>> Note that Rudolf is using Emacs 29.
>
> and macOS, so libc and locales version may be different as well.
[Max, correct me if my further suggestion is wrong.]
Rudolf, can you (1) try sort in terminal to confirm that "C" locale
behaves as expected in MacOS; (2) If sort works fine, you may consider
reporting Emacs bug.
> If libc is sane (assuming that sort is linked to the same libc)
>
> printf 'a\nb\n' | LC_COLLATE=C sort
> printf 'b\na\n' | LC_COLLATE=C sort
> printf 'A\nB\n' | LC_COLLATE=C sort
> printf 'B\nA\n' | LC_COLLATE=C sort
> printf 'a\nb\n' | LC_COLLATE=C.UTF-8 sort
> printf 'b\na\n' | LC_COLLATE=C.UTF-8 sort
> printf 'A\nB\n' | LC_COLLATE=C.UTF-8 sort
> printf 'B\nA\n' | LC_COLLATE=C.UTF-8 sort
>
> then you might face an Emacs bug.
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS
2022-10-09 3:59 ` Ihor Radchenko
@ 2022-10-09 15:38 ` Rudolf Adamkovič
2022-10-09 16:53 ` Max Nikulin
0 siblings, 1 reply; 31+ messages in thread
From: Rudolf Adamkovič @ 2022-10-09 15:38 UTC (permalink / raw)
To: Ihor Radchenko, Max Nikulin; +Cc: emacs-orgmode
Ihor Radchenko <yantar92@gmail.com> writes:
> Rudolf, can you (1) try sort in terminal to confirm that "C" locale
> behaves as expected in MacOS; (2) If sort works fine, you may consider
> reporting Emacs bug.
For the two examples given by Max on Linux, I get on macOS:
printf 'a\nB\n' | LC_COLLATE=C sort
B
a
printf 'a\nB\n' | LC_COLLATE=en_US.UTF-8 sort
B
a
For the other examples mentioned, I get on macOS:
printf 'a\nb\n' | LC_COLLATE=C sort
a
b
printf 'b\na\n' | LC_COLLATE=C sort
a
b
printf 'A\nB\n' | LC_COLLATE=C sort
A
B
printf 'B\nA\n' | LC_COLLATE=C sort
A
B
printf 'a\nb\n' | LC_COLLATE=C.UTF-8 sort
a
b
printf 'b\na\n' | LC_COLLATE=C.UTF-8 sort
a
b
printf 'A\nB\n' | LC_COLLATE=C.UTF-8 sort
A
B
printf 'B\nA\n' | LC_COLLATE=C.UTF-8 sort
A
B
Rudy
--
"Chop your own wood and it will warm you twice."
-- Henry Ford; Francis Kinloch, 1819; Henry David Thoreau, 1854
Rudolf Adamkovič <salutis@me.com> [he/him]
Studenohorská 25
84103 Bratislava
Slovakia
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS
2022-10-09 15:38 ` Rudolf Adamkovič
@ 2022-10-09 16:53 ` Max Nikulin
2022-10-10 22:25 ` Rudolf Adamkovič
0 siblings, 1 reply; 31+ messages in thread
From: Max Nikulin @ 2022-10-09 16:53 UTC (permalink / raw)
To: emacs-orgmode
On 09/10/2022 22:38, Rudolf Adamkovič wrote:
>
> For the two examples given by Max on Linux, I get on macOS:
>
> printf 'a\nB\n' | LC_COLLATE=C sort
> B
> a
This is the expected behavior.
> printf 'a\nB\n' | LC_COLLATE=en_US.UTF-8 sort
> B
> a
This one is not consistent with what I see on Linux with glibc.
printf 'a\nB\n' | LC_COLLATE=en_US.UTF-8 sort
a
B
Perhaps you do not have en_US locale generated
locale -a | grep en_US
en_US.utf8
At least sort uses the same "C" locale definition as expected by Org
tests. Either Emacs is linked with another libc or there is a bug in Emacs.
> printf 'a\nb\n' | LC_COLLATE=C sort
> a
> b
Sanity test passed for sort. You may try the same set of pairs with
`string-collate-lessp'.
I am curious if "POSIX" locale works similar to "C" and "C.UTF-8" in
your case
(string-collate-lessp "a" "B" "POSIX" nil)
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS
2022-10-09 16:53 ` Max Nikulin
@ 2022-10-10 22:25 ` Rudolf Adamkovič
2022-10-12 16:09 ` Max Nikulin
0 siblings, 1 reply; 31+ messages in thread
From: Rudolf Adamkovič @ 2022-10-10 22:25 UTC (permalink / raw)
To: Max Nikulin, emacs-orgmode
Max Nikulin <manikulin@gmail.com> writes:
> This one is not consistent with what I see on Linux with glibc.
Yeah, I noticed. :)
> Perhaps you do not have en_US locale generated
>
> locale -a | grep en_US
> en_US.utf8
$ locale -a | grep en_US
en_US.US-ASCII
en_US.UTF-8
en_US
en_US.ISO8859-15
en_US.ISO8859-1
> Sanity test passed for sort. You may try the same set of pairs with
> `string-collate-lessp'.
(string-collate-lessp "a" "b" "C" t) ; t
(string-collate-lessp "b" "a" "C" t) ; nil
(string-collate-lessp "A" "B" "C" t) ; t
(string-collate-lessp "B" "A" "C" t) ; nil
(string-collate-lessp "a" "b" "C" t) ; t
(string-collate-lessp "b" "a" "C" t) ; nil
(string-collate-lessp "A" "B" "C" t) ; t
(string-collate-lessp "B" "A" "C" t) ; nil
(string-collate-lessp "a" "b" "C" nil) ; t
(string-collate-lessp "b" "a" "C" nil) ; nil
(string-collate-lessp "A" "B" "C" nil) ; t
(string-collate-lessp "B" "A" "C" nil) ; nil
(string-collate-lessp "a" "b" "C" nil) ; t
(string-collate-lessp "b" "a" "C" nil) ; nil
(string-collate-lessp "A" "B" "C" nil) ; t
(string-collate-lessp "B" "A" "C" nil) ; nil
> I am curious if "POSIX" locale works similar to "C" and "C.UTF-8" in
> your case (string-collate-lessp "a" "B" "POSIX" nil).
(string-collate-lessp "a" "B" "POSIX" nil) ; nil
Rudy
--
"'Contrariwise,' continued Tweedledee, 'if it was so, it might be; and
if it were so, it would be; but as it isn't, it ain't. That's logic.'"
-- Lewis Carroll, Through the Looking Glass, 1871/1872
Rudolf Adamkovič <salutis@me.com> [he/him]
Studenohorská 25
84103 Bratislava
Slovakia
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS
2022-10-10 22:25 ` Rudolf Adamkovič
@ 2022-10-12 16:09 ` Max Nikulin
2022-11-15 4:10 ` Ihor Radchenko
0 siblings, 1 reply; 31+ messages in thread
From: Max Nikulin @ 2022-10-12 16:09 UTC (permalink / raw)
To: emacs-orgmode
On 11/10/2022 05:25, Rudolf Adamkovič wrote:
> (string-collate-lessp "a" "b" "C" t) ; t
..
> (string-collate-lessp "a" "b" "C" nil) ; t
..
So basic sanity tests passed.
> (string-collate-lessp "a" "B" "C" nil) ; => nil
> (string-collate-lessp "a" "B" "POSIX" nil) ; nil
is expected behavior as well. What I do not like is
> (string-collate-lessp "a" "B" "C" t) ; => nil
Actually you wrote
> [FYI: If I replace nil with t, the procedure returns nil too.]
From my point of view it is a reason to file an Emacs bug because I get
(string-collate-lessp "a" "B" "C" t) ; => t
It seems case folding works strange for comparison because when case is
the same "a" < "b" as expected:
> (string-collate-lessp "a" "b" "C" t) ; t
> (string-collate-lessp "A" "B" "C" t) ; t
> (string-collate-lessp "a" "b" "C" nil) ; t
> (string-collate-lessp "A" "B" "C" nil) ; t
May it happen that IGNORE-CASE argument is ignored in your case? I
believe, it is improbable that C locale is not generated, so case fold
rules are undefined
locale -a | grep C
Another your strange result is
> $ locale -a | grep en_US
> en_US.US-ASCII
> en_US.UTF-8
..
so en_US locale is defined but collation rules are different from glibc
> printf 'a\nB\n' | LC_COLLATE=en_US.UTF-8 sort
> B
> a
I have no idea if sort and Emacs use the same libc and the same locale
definitions. I am unaware which way it is organized in MacOS.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS
2022-10-12 16:09 ` Max Nikulin
@ 2022-11-15 4:10 ` Ihor Radchenko
2022-11-20 4:18 ` Ihor Radchenko
0 siblings, 1 reply; 31+ messages in thread
From: Ihor Radchenko @ 2022-11-15 4:10 UTC (permalink / raw)
To: Max Nikulin; +Cc: emacs-orgmode
Max Nikulin <manikulin@gmail.com> writes:
> > (string-collate-lessp "a" "B" "C" t) ; => nil
> Actually you wrote
>> [FYI: If I replace nil with t, the procedure returns nil too.]
> From my point of view it is a reason to file an Emacs bug because I get
>
> (string-collate-lessp "a" "B" "C" t) ; => t
I submitted the bug report to Emacs.
See https://debbugs.gnu.org/cgi/bugreport.cgi?bug=59275
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS
2022-11-15 4:10 ` Ihor Radchenko
@ 2022-11-20 4:18 ` Ihor Radchenko
2022-11-20 8:00 ` Max Nikulin
0 siblings, 1 reply; 31+ messages in thread
From: Ihor Radchenko @ 2022-11-20 4:18 UTC (permalink / raw)
To: Max Nikulin; +Cc: emacs-orgmode
Ihor Radchenko <yantar92@posteo.net> writes:
> Max Nikulin <manikulin@gmail.com> writes:
>
>> > (string-collate-lessp "a" "B" "C" t) ; => nil
>> Actually you wrote
>>> [FYI: If I replace nil with t, the procedure returns nil too.]
>> From my point of view it is a reason to file an Emacs bug because I get
>>
>> (string-collate-lessp "a" "B" "C" t) ; => t
>
> I submitted the bug report to Emacs.
> See https://debbugs.gnu.org/cgi/bugreport.cgi?bug=59275
According to the discussion on debbugs, it looks like we can use
`compare-strings' instead. It will be independent of the system locale
and always follow Unicode rules.
However, I am not sure if ignoring locale is something we really want.
WDYT?
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS
2022-11-20 4:18 ` Ihor Radchenko
@ 2022-11-20 8:00 ` Max Nikulin
2022-11-21 3:15 ` Ihor Radchenko
0 siblings, 1 reply; 31+ messages in thread
From: Max Nikulin @ 2022-11-20 8:00 UTC (permalink / raw)
To: emacs-orgmode
On 20/11/2022 11:18, Ihor Radchenko wrote:
>> Max Nikulin writes:
>>> From my point of view it is a reason to file an Emacs bug because I get
>>>
>>> (string-collate-lessp "a" "B" "C" t) ; => t
>>
>> See https://debbugs.gnu.org/cgi/bugreport.cgi?bug=59275
>
> According to the discussion on debbugs, it looks like we can use
> `compare-strings' instead. It will be independent of the system locale
> and always follow Unicode rules.
>
> However, I am not sure if ignoring locale is something we really want.
> WDYT?
I think we should keep `string-collate-lessp' in the
`org-table-sort-lines' implementation. Users expect sorting accordingly
to their locales. However it is better to add a warning to
`org-table-sort-lines' docstring and to the manual that caseless sort
depends on its implementation in libc, so currently it does not work in
clang/llvm and so e.g. on MacOS.
Concerning the test, I would split the current testcase into 2 parts
depending on WITH-CASE argument, check if caseless collation is
available and skip the related test otherwise.
As to the thread linked to the bug report
https://lists.gnu.org/archive/html/emacs-devel/2022-07/msg00940.html
"case-insensitive string comparison." Tue, 19 Jul 2022 13:27:50 -0400,
there is a link
https://stackoverflow.com/questions/319426/how-do-i-do-a-case-insensitive-string-comparison
unrelated to the issue, but comments and answers there describe a lot of
pitfalls and explain why string comparison ignoring case is not trivial.
(It is a Sisyphean task in some sense, I like the comment on 3 sigmas.)
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS
2022-11-20 8:00 ` Max Nikulin
@ 2022-11-21 3:15 ` Ihor Radchenko
2022-11-21 16:48 ` Max Nikulin
0 siblings, 1 reply; 31+ messages in thread
From: Ihor Radchenko @ 2022-11-21 3:15 UTC (permalink / raw)
To: Max Nikulin; +Cc: emacs-orgmode
Max Nikulin <manikulin@gmail.com> writes:
>> However, I am not sure if ignoring locale is something we really want.
>> WDYT?
>
> I think we should keep `string-collate-lessp' in the
> `org-table-sort-lines' implementation. Users expect sorting accordingly
> to their locales. However it is better to add a warning to
> `org-table-sort-lines' docstring and to the manual that caseless sort
> depends on its implementation in libc, so currently it does not work in
> clang/llvm and so e.g. on MacOS.
Sounds reasonable.
Note that not only `org-table-sort-lines' is using
`string-collate-lessp'. The full list of functions potentially affected
by libc sorting is:
1. Bibliography order in `org-cite-basic-export-bibliography'
(via org-cite-basic--sort-keys -> org-cite-basic--field-less-p)
2. `org-sort-list'
3. `org-table-sort-lines'
4. `org-set-tags' (tag order), when `org-tags-sort-function' is set to
"Alphabetical" or "Reverse alphabetical".
5. `org-sort-entries'
6. Agenda sorting, when alphabetical sorting is involved
7. `org-map-entries'
I am not 100% sure where we should add the information to
docstring/manual and where we should not.
> Concerning the test, I would split the current testcase into 2 parts
> depending on WITH-CASE argument, check if caseless collation is
> available and skip the related test otherwise.
How can we check the availability?
> As to the thread linked to the bug report
> https://lists.gnu.org/archive/html/emacs-devel/2022-07/msg00940.html
> "case-insensitive string comparison." Tue, 19 Jul 2022 13:27:50 -0400,
> there is a link
> https://stackoverflow.com/questions/319426/how-do-i-do-a-case-insensitive-string-comparison
> unrelated to the issue, but comments and answers there describe a lot of
> pitfalls and explain why string comparison ignoring case is not trivial.
> (It is a Sisyphean task in some sense, I like the comment on 3 sigmas.)
Indeed. Also, see https://nullprogram.com/blog/2014/06/13/. However,
what we are concerned about here is consistency. Not the pitfalls per
se.
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS
2022-11-21 3:15 ` Ihor Radchenko
@ 2022-11-21 16:48 ` Max Nikulin
2022-11-22 1:14 ` Ihor Radchenko
0 siblings, 1 reply; 31+ messages in thread
From: Max Nikulin @ 2022-11-21 16:48 UTC (permalink / raw)
To: emacs-orgmode
On 21/11/2022 10:15, Ihor Radchenko wrote:
> Max Nikulin writes:
>
>>> However, I am not sure if ignoring locale is something we really want.
>>> WDYT?
>>
>> I think we should keep `string-collate-lessp' in the
>> `org-table-sort-lines' implementation. Users expect sorting accordingly
>> to their locales. However it is better to add a warning to
>> `org-table-sort-lines' docstring and to the manual that caseless sort
>> depends on its implementation in libc, so currently it does not work in
>> clang/llvm and so e.g. on MacOS.
>
> Sounds reasonable.
>
> Note that not only `org-table-sort-lines' is using
> `string-collate-lessp'. The full list of functions potentially affected
> by libc sorting is:
>
> 1. Bibliography order in `org-cite-basic-export-bibliography'
> (via org-cite-basic--sort-keys -> org-cite-basic--field-less-p)
> 3. `org-table-sort-lines'
Confirmed.
> 2. `org-sort-list'
> 5. `org-sort-entries'
`downcase' is used, not proper case folding, so a potential issue
> 4. `org-set-tags' (tag order), when `org-tags-sort-function' is set to
> "Alphabetical" or "Reverse alphabetical".
IGNORE-CASE argument is not used, perhaps `downcase' is hidden in the code.
> 6. Agenda sorting, when alphabetical sorting is involved
`string-lessp' and `downcase' so even more severe locale-related issues
might be expected.
> 7. `org-map-entries'
Unsure which predicate is used.
> I am not 100% sure where we should add the information to
> docstring/manual and where we should not.
If footnotes in the manual had fixed labels then I would suggest
reference the same footnote in the manual and in the docstrings.
Perhaps, a new subsection should be added to info "(org) Miscellaneous"
and "see info node ..." should be added to all involved docstrings.
>> Concerning the test, I would split the current testcase into 2 parts
>> depending on WITH-CASE argument, check if caseless collation is
>> available and skip the related test otherwise.
>
> How can we check the availability?
(string-collate-lessp "a" "B" "C" t)
> Indeed. Also, see https://nullprogram.com/blog/2014/06/13/. However,
> what we are concerned about here is consistency. Not the pitfalls per
> se.
Achieving consistency across Org code requires additional efforts.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS
2022-11-21 16:48 ` Max Nikulin
@ 2022-11-22 1:14 ` Ihor Radchenko
2022-11-22 16:01 ` Max Nikulin
2025-01-06 18:22 ` test-org-table/sort-lines: Failing test on macOS Michael Brand
0 siblings, 2 replies; 31+ messages in thread
From: Ihor Radchenko @ 2022-11-22 1:14 UTC (permalink / raw)
To: Max Nikulin; +Cc: emacs-orgmode
Max Nikulin <manikulin@gmail.com> writes:
>> 2. `org-sort-list'
>> 5. `org-sort-entries'
> `downcase' is used, not proper case folding, so a potential issue
`downcase' is used to determine user input about sorting type.
Not for sorting itself.
>> 4. `org-set-tags' (tag order), when `org-tags-sort-function' is set to
>> "Alphabetical" or "Reverse alphabetical".
>
> IGNORE-CASE argument is not used, perhaps `downcase' is hidden in the code.
I feel like we are slightly miscommunicating here.
I mostly tried to list the uses of libc-sensitive sorting. Not
specifically cases when we try to ignore the case.
The problem is not limited to case-sensitive comparisons. Some systems
may fail to implement specific locales and thus sorting may downgrade to
simple string-lessp.
No `downcase' is hidden anywhere there.
>> 6. Agenda sorting, when alphabetical sorting is involved
>
> `string-lessp' and `downcase' so even more severe locale-related issues
> might be expected.
Could you please elaborate?
>> 7. `org-map-entries'
>
> Unsure which predicate is used.
It is a similar scenario with agenda. `org-map-entries' uses
`org-make-tags-matcher', which calls `org-op-to-function' when user
wants to select property values via </<=/>/>= criterion.
`org-op-to-function' calls `org-string<' or similar that, in turn, uses
`string-collate-lessp' with nil IGNORE-CASE argument.
>> I am not 100% sure where we should add the information to
>> docstring/manual and where we should not.
>
> If footnotes in the manual had fixed labels then I would suggest
> reference the same footnote in the manual and in the docstrings.
> Perhaps, a new subsection should be added to info "(org) Miscellaneous"
> and "see info node ..." should be added to all involved docstrings.
Sounds reasonable.
>>> Concerning the test, I would split the current testcase into 2 parts
>>> depending on WITH-CASE argument, check if caseless collation is
>>> available and skip the related test otherwise.
>>
>> How can we check the availability?
>
> (string-collate-lessp "a" "B" "C" t)
Thanks!
>> Indeed. Also, see https://nullprogram.com/blog/2014/06/13/. However,
>> what we are concerned about here is consistency. Not the pitfalls per
>> se.
>
> Achieving consistency across Org code requires additional efforts.
Well. Just using `string-lessp' would make things very consistent.
Easily and with no efforts.
The question though is what is the right thing to do for users while
also keeping consistency.
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS
2022-11-22 1:14 ` Ihor Radchenko
@ 2022-11-22 16:01 ` Max Nikulin
2022-11-23 10:37 ` Ihor Radchenko
2025-01-06 18:22 ` test-org-table/sort-lines: Failing test on macOS Michael Brand
1 sibling, 1 reply; 31+ messages in thread
From: Max Nikulin @ 2022-11-22 16:01 UTC (permalink / raw)
To: emacs-orgmode
On 22/11/2022 08:14, Ihor Radchenko wrote:
> Max Nikulin writes:
>
>>> 2. `org-sort-list'
>>> 5. `org-sort-entries'
>> `downcase' is used, not proper case folding, so a potential issue
>
> `downcase' is used to determine user input about sorting type.
> Not for sorting itself.
See case-func variable. Its initialization depends on the IGNORE-CASE
argument. Strings to sort are passed either through `identity' or
through `downcase'.
>>> 4. `org-set-tags' (tag order), when `org-tags-sort-function' is set to
>>> "Alphabetical" or "Reverse alphabetical".
>>
>> IGNORE-CASE argument is not used, perhaps `downcase' is hidden in the code.
>
> I feel like we are slightly miscommunicating here.
> I mostly tried to list the uses of libc-sensitive sorting. Not
> specifically cases when we try to ignore the case.
>
> The problem is not limited to case-sensitive comparisons. Some systems
> may fail to implement specific locales and thus sorting may downgrade to
> simple string-lessp.
When case folding is not involved, I consider `string-lessp' as a
graceful degradation. Despite locale rules are not applied, strings are
mostly sorted. Exceptions exist, but usually order is reasonable.
Completely disregarding IGNORE-CASE argument of `string-collate-lessp'
on MacOS (that is not a heavily stripped embedded OS) is a bad surprise
for me.
>>> 6. Agenda sorting, when alphabetical sorting is involved
>>
>> `string-lessp' and `downcase' so even more severe locale-related issues
>> might be expected.
>
> Could you please elaborate?
I admit that `downcase' may be an acceptable workaround since
`string-collate-lessp' may not work IGNORE-CASE, but I believe, when
available, `string-collate-lessp' should be the preferred option for
sorting.
>> Achieving consistency across Org code requires additional efforts.
>
> Well. Just using `string-lessp' would make things very consistent.
> Easily and with no efforts.
With hope that clang will get better Unicode support, I would move in
the opposite direction, namely wider usage of `string-collate-lessp'.
Just using `string-lessp' means no ignore case sort even where it is
available now.
I have an idea of a compatibility wrapper for `string-collate-lessp'
with special treatment of ignoring case and bad libc implementation.
Apply `downcase' before passing arguments to `string-lessp'. It should
provide consistency, best user experience when locales works properly,
and graceful degradation otherwise. I hope, it is acceptable for Org
even though such trick is undesired for Emacs due to performance reasons.
However I am afraid of compatibility shims after
d3a9c424b 2022-08-16 17:15:27 +0800 Ihor Radchenko: org-encode-time:
Refactor into top-level `defmacro'
P.S. I am not motivated enough to build Emacs on Linux using clang to
check if locale information will be available. I am almost sure that
some locale information is available on MacOS, e.g. at least strcasecmp
even if full CLDR can not be easily accessed from C. I do not have a Mac
to check state of affairs. For objective-C there is e.g.
comareCaseIndependent.
I do not like that Emacs relies on locale support (and timezone as well)
in libc. It becomes a problem as soon as more than one locale should be
used in simultaneously. I agree that there are enough complications and
sometimes locale depends on the document (e.g. #+LANGUAGE:), sometimes
specific locale even restricted to a part of a document. It is tricky to
handle such cases, but current limitations are too strict (and defective
`string-collate-lessp' on MacOS is an example).
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS
2022-11-22 16:01 ` Max Nikulin
@ 2022-11-23 10:37 ` Ihor Radchenko
2022-11-23 15:27 ` Max Nikulin
0 siblings, 1 reply; 31+ messages in thread
From: Ihor Radchenko @ 2022-11-23 10:37 UTC (permalink / raw)
To: Max Nikulin; +Cc: emacs-orgmode
Max Nikulin <manikulin@gmail.com> writes:
> On 22/11/2022 08:14, Ihor Radchenko wrote:
>> Max Nikulin writes:
>>
>>>> 2. `org-sort-list'
>>>> 5. `org-sort-entries'
>>> `downcase' is used, not proper case folding, so a potential issue
>>
>> `downcase' is used to determine user input about sorting type.
>> Not for sorting itself.
>
> See case-func variable. Its initialization depends on the IGNORE-CASE
> argument. Strings to sort are passed either through `identity' or
> through `downcase'.
Thanks for the pointer.
Now, I am getting more confused though.
Do we even need to use `string-collate-lessp' then?
Eli even argued that `string-collate-lessp' is strictly worse compared
to more predictable approach. See
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=59275#40
Do you remember any cases when users actually demanded locale-specific
sorting?
>>> IGNORE-CASE argument is not used, perhaps `downcase' is hidden in the code.
>>
>> I feel like we are slightly miscommunicating here.
>> I mostly tried to list the uses of libc-sensitive sorting. Not
>> specifically cases when we try to ignore the case.
>>
>> The problem is not limited to case-sensitive comparisons. Some systems
>> may fail to implement specific locales and thus sorting may downgrade to
>> simple string-lessp.
>
> When case folding is not involved, I consider `string-lessp' as a
> graceful degradation. Despite locale rules are not applied, strings are
> mostly sorted. Exceptions exist, but usually order is reasonable.
>
> Completely disregarding IGNORE-CASE argument of `string-collate-lessp'
> on MacOS (that is not a heavily stripped embedded OS) is a bad surprise
> for me.
It was a surprise for me as well. Should be at least a bit more clear
now as I updated the docstring of `string-collate-lessp'.
However, I feel a bit lost about what to do on Org side.
We can put a disclaimer in the manual and all that, but it still feels
too complex.
>>>> 6. Agenda sorting, when alphabetical sorting is involved
>>>
>>> `string-lessp' and `downcase' so even more severe locale-related issues
>>> might be expected.
>>
>> Could you please elaborate?
>
> I admit that `downcase' may be an acceptable workaround since
> `string-collate-lessp' may not work IGNORE-CASE, but I believe, when
> available, `string-collate-lessp' should be the preferred option for
> sorting.
As I pointed above, Eli has an opposite opinion.
I feel that my understanding of the topic is not sufficient to judge.
Maybe we should ask users? (But who is even aware about these things
happening under the hood?)
> I have an idea of a compatibility wrapper for `string-collate-lessp'
> with special treatment of ignoring case and bad libc implementation.
> Apply `downcase' before passing arguments to `string-lessp'. It should
> provide consistency, best user experience when locales works properly,
> and graceful degradation otherwise. I hope, it is acceptable for Org
> even though such trick is undesired for Emacs due to performance reasons.
Macro idea sounds reasonable. Though I am still unsure which direction
we need to go.
> However I am afraid of compatibility shims after
>
> d3a9c424b 2022-08-16 17:15:27 +0800 Ihor Radchenko: org-encode-time:
> Refactor into top-level `defmacro'
What do you refer to?
> I do not like that Emacs relies on locale support (and timezone as well)
> in libc. It becomes a problem as soon as more than one locale should be
> used in simultaneously. I agree that there are enough complications and
> sometimes locale depends on the document (e.g. #+LANGUAGE:), sometimes
> specific locale even restricted to a part of a document. It is tricky to
> handle such cases, but current limitations are too strict (and defective
> `string-collate-lessp' on MacOS is an example).
The question is what can be done and, more importantly, how much effort
will it take to implement and maintain an alternative.
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS
2022-11-23 10:37 ` Ihor Radchenko
@ 2022-11-23 15:27 ` Max Nikulin
2022-11-23 17:01 ` Max Nikulin
2022-11-26 2:05 ` Ihor Radchenko
0 siblings, 2 replies; 31+ messages in thread
From: Max Nikulin @ 2022-11-23 15:27 UTC (permalink / raw)
To: emacs-orgmode
On 23/11/2022 17:37, Ihor Radchenko wrote:
> Max Nikulin writes:
>>
>> Strings to sort are passed either through `identity' or
>> through `downcase'.
>
> Thanks for the pointer.
> Now, I am getting more confused though.
> Do we even need to use `string-collate-lessp' then?
I think we do because sort result is presented to humans.
(setq lst '("semana" "señor" "sepia"))
(sort lst #'string-lessp) ; => ("semana" "sepia" "señor")
(sort lst #'string-collate-lessp) ; => ("semana" "señor" "sepia")
> Eli even argued that `string-collate-lessp' is strictly worse compared
> to more predictable approach. See
> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=59275#40
In this particular case Eli may assume that e.g. list is a elisp
structure, not a kind of text formatting. In general, I am quite
pessimistic concerning quality of locales support in Emacs while Eli may
have rather different point of view.
> Do you remember any cases when users actually demanded locale-specific
> sorting?
I think, users too often face poor locale support in various
applications, so they are not surprised when see incorrect results. In
some sense such results are consistent (erroneous in the same way).
Formatting of numbers in Emacs is the extreme case of consistency. For
the sake of reliably reading/writing of numbers from/to files or network
it is impossible to present a number accordingly to the current locale.
An exception is en_US that has some dedicated code in calc.
I believe, it is silly to adhere to a common denominator and to not use
`string-collate-lessp' just because it is unavailable in some environments.
> However, I feel a bit lost about what to do on Org side.
> We can put a disclaimer in the manual and all that, but it still feels
> too complex.
My current suggestion is to provide a fallback to `downcase' in the code
and to explain in the manual that runtime environments (OSes) are not
equal and quality of locale support varies. Emacs heavily depends on
libc in this area.
>> However I am afraid of compatibility shims after
>>
>> d3a9c424b 2022-08-16 17:15:27 +0800 Ihor Radchenko: org-encode-time:
>> Refactor into top-level `defmacro'
>
> What do you refer to?
Implementation must be chosen at compile (or load) time. Due to some
issues with native compiling it does not work. For string comparison
runtime performance penalty may be higher than for timestamp processing.
> The question is what can be done and, more importantly, how much effort
> will it take to implement and maintain an alternative.
Effort is significant however e.g. browsers have their own
implementation of Unicode-related stuff. There is ICU library, but Eli
is against it because Emacs already has partial implementation of
Unicode and it would mean duplication of character database.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS
2022-11-23 15:27 ` Max Nikulin
@ 2022-11-23 17:01 ` Max Nikulin
2022-11-26 2:05 ` Ihor Radchenko
1 sibling, 0 replies; 31+ messages in thread
From: Max Nikulin @ 2022-11-23 17:01 UTC (permalink / raw)
To: emacs-orgmode
On 23/11/2022 22:27, Max Nikulin wrote:
>
> (setq lst '("semana" "señor" "sepia"))
> (sort lst #'string-lessp) ; => ("semana" "sepia" "señor")
> (sort lst #'string-collate-lessp) ; => ("semana" "señor" "sepia")
>
> On 23/11/2022 17:37, Ihor Radchenko wrote:
>> Eli even argued that `string-collate-lessp' is strictly worse compared
>> to more predictable approach. See
>> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=59275#40
I think, Eli is afraid of the following sort of inconsistency
(string-collate-lessp "z" "ö" "de_DE.UTF-8") ; => nil
(string-collate-lessp "z" "ö" "sv_SE.UTF-8") ; => t
Mixed language example: U+0049 LATIN CAPITAL LETTER I vs. U+0406
CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
(sort '("Івана" "Ivan" "Термін" "Вони")
(lambda (a b) (string-collate-lessp a b "uk_UA.UTF-8")))
("Вони" "Івана" "Термін" "Ivan")
(sort '("Івана" "Ivan" "Термін" "Вони")
(lambda (a b) (string-collate-lessp a b "en_US.UTF-8")))
("Ivan" "Вони" "Івана" "Термін")
I suppose users should get result native to their languages even though
others may get another order.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS
2022-11-23 15:27 ` Max Nikulin
2022-11-23 17:01 ` Max Nikulin
@ 2022-11-26 2:05 ` Ihor Radchenko
2022-11-29 16:40 ` Max Nikulin
1 sibling, 1 reply; 31+ messages in thread
From: Ihor Radchenko @ 2022-11-26 2:05 UTC (permalink / raw)
To: Max Nikulin; +Cc: emacs-orgmode
Max Nikulin <manikulin@gmail.com> writes:
>> However, I feel a bit lost about what to do on Org side.
>> We can put a disclaimer in the manual and all that, but it still feels
>> too complex.
>
> My current suggestion is to provide a fallback to `downcase' in the code
> and to explain in the manual that runtime environments (OSes) are not
> equal and quality of locale support varies. Emacs heavily depends on
> libc in this area.
This sounds like something to be adapted to Emacs upstream.
I suggested to change `string-collate-lessp' fallback behaviour to use
`downcase' when IGNORE-CASE is non-nil. See my last message in
bug#59275.
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS
2022-11-26 2:05 ` Ihor Radchenko
@ 2022-11-29 16:40 ` Max Nikulin
2024-04-03 11:40 ` [DISCUSSION] Sorting strings in Org mode vs. system locale (was: test-org-table/sort-lines: Failing test on macOS) Ihor Radchenko
0 siblings, 1 reply; 31+ messages in thread
From: Max Nikulin @ 2022-11-29 16:40 UTC (permalink / raw)
To: emacs-orgmode
On 26/11/2022 09:05, Ihor Radchenko wrote:
> Max Nikulin writes:
>
> This sounds like something to be adapted to Emacs upstream.
> I suggested to change `string-collate-lessp' fallback behaviour to use
> `downcase' when IGNORE-CASE is non-nil. See my last message in
> bug#59275.
I do not share Eli's position "all or nothing". I prefer graceful
degradation and best result achievable with reasonable efforts.
However either the reason is performance or correctness, both variants
are against modification of `string-collate-lessp'. I still think that
Org will benefit from a compatibility wrapper with `downcase'.
The only additional consideration is that compare function should be
configurable. If a user access same files from Linux and macOS then it
may be really annoying to get different order of entries in agenda. For
most of Linux users it is better to use more smart
`string-collate-lessp'. Some care is required to sort entries obtained
from multiple buffers in predictable environment (locale, case
conversion table).
^ permalink raw reply [flat|nested] 31+ messages in thread
* [DISCUSSION] Sorting strings in Org mode vs. system locale (was: test-org-table/sort-lines: Failing test on macOS)
2022-11-29 16:40 ` Max Nikulin
@ 2024-04-03 11:40 ` Ihor Radchenko
2024-05-05 11:59 ` Ihor Radchenko
0 siblings, 1 reply; 31+ messages in thread
From: Ihor Radchenko @ 2024-04-03 11:40 UTC (permalink / raw)
To: Max Nikulin; +Cc: emacs-orgmode
Max Nikulin <manikulin@gmail.com> writes:
>> This sounds like something to be adapted to Emacs upstream.
>> I suggested to change `string-collate-lessp' fallback behaviour to use
>> `downcase' when IGNORE-CASE is non-nil. See my last message in
>> bug#59275.
>
> I do not share Eli's position "all or nothing". I prefer graceful
> degradation and best result achievable with reasonable efforts.
> However either the reason is performance or correctness, both variants
> are against modification of `string-collate-lessp'. I still think that
> Org will benefit from a compatibility wrapper with `downcase'.
Unless we have user complaints with real-world use-cases, I am leaning
towards keeping things consistent with Emacs. Including Emacs-wide
fallback for `string-collate-lessp'. This will make our life easier.
Maintaining an Org-specific fallback will (1) cost maintenance time; (2)
may confuse users used to global Emacs behaviour; (3) has no clear
benefit other than our theoretical discussion.
> The only additional consideration is that compare function should be
> configurable. If a user access same files from Linux and macOS then it
> may be really annoying to get different order of entries in agenda. For
> most of Linux users it is better to use more smart
> `string-collate-lessp'. Some care is required to sort entries obtained
> from multiple buffers in predictable environment (locale, case
> conversion table).
I agree. We can introduce a new customization -
`org-string-sort-function' that will be used across Org mode to sort
user text.
It would be even better to allow smart sort function that depends on
document #+language, but I do not see an easy way to implement such
feature - `string-collate-lessp' does accept LOCALE argument, but I have
no idea how to link #+LANGUAGE to locale deterministically.
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [DISCUSSION] Sorting strings in Org mode vs. system locale (was: test-org-table/sort-lines: Failing test on macOS)
2024-04-03 11:40 ` [DISCUSSION] Sorting strings in Org mode vs. system locale (was: test-org-table/sort-lines: Failing test on macOS) Ihor Radchenko
@ 2024-05-05 11:59 ` Ihor Radchenko
2024-05-07 11:06 ` [DISCUSSION] Sorting strings in Org mode vs. system locale Max Nikulin
0 siblings, 1 reply; 31+ messages in thread
From: Ihor Radchenko @ 2024-05-05 11:59 UTC (permalink / raw)
To: Max Nikulin; +Cc: emacs-orgmode
[-- Attachment #1: Type: text/plain, Size: 769 bytes --]
Ihor Radchenko <yantar92@posteo.net> writes:
>> The only additional consideration is that compare function should be
>> configurable. If a user access same files from Linux and macOS then it
>> may be really annoying to get different order of entries in agenda. For
>> most of Linux users it is better to use more smart
>> `string-collate-lessp'. Some care is required to sort entries obtained
>> from multiple buffers in predictable environment (locale, case
>> conversion table).
>
> I agree. We can introduce a new customization -
> `org-string-sort-function' that will be used across Org mode to sort
> user text.
See the attached tentative patch.
I added a customization, made everything in Org obey it, and provided
some default options for MacOS users.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-org-New-Org-wide-custom-option-org-sort-function.patch --]
[-- Type: text/x-patch, Size: 13446 bytes --]
From dbc3929d8c7a26da3bf31fb52a651da68d1f733b Mon Sep 17 00:00:00 2001
Message-ID: <dbc3929d8c7a26da3bf31fb52a651da68d1f733b.1714910323.git.yantar92@posteo.net>
From: Ihor Radchenko <yantar92@posteo.net>
Date: Sun, 5 May 2024 14:37:52 +0300
Subject: [PATCH] org: New Org-wide custom option `org-sort-function'
* lisp/org-macs.el (org-sort-function): New customization defining how
Org mode should sort headlines, table lines, agenda lines, etc.
(org-string<):
(org-string<=):
(org-string>=):
(org-string>): Use the new customization.
(org-string<>): Add docstring.
(org-sort-function-downcase): New helper function to help users on
MacOS where `string-collate-lessp' is not reliable.
* lisp/oc-basic.el (org-cite-basic--field-less-p):
* lisp/org-agenda.el (org-cmp-category):
(org-cmp-alpha):
* lisp/org-list.el (org-sort-list):
* lisp/org-mouse.el (org-mouse-list-options-menu):
* lisp/org-table.el (org-table-sort-lines):
* lisp/org.el (org-tags-sort-function):
(org-sort-entries):
* lisp/ox-publish.el (org-publish-sitemap): Honor the new
customization.
* lisp/org-mouse.el (org-mouse-tag-menu):
(org-mouse-popup-global-menu):
* lisp/org-agenda.el (org-cmp-tag): Honor `org-tags-sort-function' and
falling back to `org-string<' if note set.
* etc/ORG-NEWS (New option controlling how Org mode sorts things
~org-sort-function~): Announce the change.
This change aims to standardize the way Org mode performs sorting of
user data. In particular, it addresses issues with oddities of string
collation rules on MacOS and tricky language environments like
Turkish.
Link: https://orgmode.org/list/87jzleptcs.fsf@localhost
---
etc/ORG-NEWS | 20 ++++++++++++++
lisp/oc-basic.el | 2 +-
lisp/org-agenda.el | 12 ++++-----
lisp/org-list.el | 2 +-
lisp/org-macs.el | 66 +++++++++++++++++++++++++++++++++++++---------
lisp/org-mouse.el | 13 +++++----
lisp/org-table.el | 4 +--
lisp/org.el | 6 ++---
lisp/ox-publish.el | 9 +++----
9 files changed, 98 insertions(+), 36 deletions(-)
diff --git a/etc/ORG-NEWS b/etc/ORG-NEWS
index 3c597db40..af88febb1 100644
--- a/etc/ORG-NEWS
+++ b/etc/ORG-NEWS
@@ -710,6 +710,26 @@ any more. Run ~org-ctags-enable~ to setup hooks and advices:
#+end_src
** New and changed options
+*** New option controlling how Org mode sorts things ~org-sort-function~
+
+Sorting of agenda items, tables, menus, headlines, etc can now be
+controlled using a new custom option ~org-sort-function~.
+
+By default, Org mode sorts things according to the operation system
+language. However, language sorting rules may or may not produce good
+results depending on the use case. For example, multi-language
+documents may be sorted weirdly when sorting rules for system language
+are applied on the text written using different language. Also, some
+operations systems (e.g. MacOS), do not provide accurate string
+sorting rules.
+
+Org mode provides 4 possible values for ~org-sort-function~:
+1. (default) Sort using system language rules.
+2. Sort using dumb string comparison. It is the most reliable option.
+3. Sort case-insensitively, making use of UTF case conversion. This
+ may work better for mixed-language documents and on MacOS.
+4. Custom function, if the above does not fit the needs.
+
*** =ob-latex= now uses a new option ~org-babel-latex-process-alist~ to generate png output
Previously, =ob-latex= used ~org-preview-latex-default-process~ from
diff --git a/lisp/oc-basic.el b/lisp/oc-basic.el
index 8959bb065..6e3142fa1 100644
--- a/lisp/oc-basic.el
+++ b/lisp/oc-basic.el
@@ -680,7 +680,7 @@ (defun org-cite-basic--field-less-p (field info)
INFO is the export state, as a property list."
(and field
(lambda (a b)
- (string-collate-lessp
+ (org-string<
(org-cite-basic--get-field field a info 'raw)
(org-cite-basic--get-field field b info 'raw)
nil t))))
diff --git a/lisp/org-agenda.el b/lisp/org-agenda.el
index 93c6acef2..05d2f94c0 100644
--- a/lisp/org-agenda.el
+++ b/lisp/org-agenda.el
@@ -7489,8 +7489,8 @@ (defsubst org-cmp-category (a b)
"Compare the string values of categories of strings A and B."
(let ((ca (or (get-text-property (1- (length a)) 'org-category a) ""))
(cb (or (get-text-property (1- (length b)) 'org-category b) "")))
- (cond ((string-lessp ca cb) -1)
- ((string-lessp cb ca) +1))))
+ (cond ((org-string< ca cb) -1)
+ ((org-string< cb ca) +1))))
(defsubst org-cmp-todo-state (a b)
"Compare the todo states of strings A and B."
@@ -7536,8 +7536,8 @@ (defsubst org-cmp-alpha (a b)
(cond ((not (or ta tb)) nil)
((not ta) +1)
((not tb) -1)
- ((string-lessp ta tb) -1)
- ((string-lessp tb ta) +1))))
+ ((org-string< ta tb) -1)
+ ((org-string< tb ta) +1))))
(defsubst org-cmp-tag (a b)
"Compare the string values of the first tags of A and B."
@@ -7546,8 +7546,8 @@ (defsubst org-cmp-tag (a b)
(cond ((not (or ta tb)) nil)
((not ta) +1)
((not tb) -1)
- ((string-lessp ta tb) -1)
- ((string-lessp tb ta) +1))))
+ ((funcall (or org-tags-sort-function #'org-string<) ta tb) -1)
+ ((funcall (or org-tags-sort-function #'org-string<) tb ta) +1))))
(defsubst org-cmp-time (a b)
"Compare the time-of-day values of strings A and B."
diff --git a/lisp/org-list.el b/lisp/org-list.el
index fca3758c8..d7559d2a7 100644
--- a/lisp/org-list.el
+++ b/lisp/org-list.el
@@ -2979,7 +2979,7 @@ (defun org-sort-list
(error "Missing key extractor"))))
(sort-func
(cond
- ((= dcst ?a) #'string-collate-lessp)
+ ((= dcst ?a) #'org-string<)
((= dcst ?f)
(or compare-func
(and interactive?
diff --git a/lisp/org-macs.el b/lisp/org-macs.el
index 1254ddb54..c3bef66cd 100644
--- a/lisp/org-macs.el
+++ b/lisp/org-macs.el
@@ -113,7 +113,6 @@ (declare-function org-fold-save-outline-visibility "org-fold" (use-markers &rest
(declare-function org-fold-next-visibility-change "org-fold" (&optional pos limit ignore-hidden-p previous-p))
(declare-function org-fold-core-with-forced-fontification "org-fold" (&rest body))
(declare-function org-fold-folded-p "org-fold" (&optional pos limit ignore-hidden-p previous-p))
-(declare-function string-collate-lessp "org-compat" (s1 s2 &optional locale ignore-case))
(declare-function org-time-convert-to-list "org-compat" (time))
(declare-function org-buffer-text-pixel-width "org-compat" ())
@@ -982,20 +981,63 @@ (defun org-uuidgen-p (s)
\f
;;; String manipulation
-(defun org-string< (a b)
- (string-collate-lessp a b))
-
-(defun org-string<= (a b)
- (or (string= a b) (string-collate-lessp a b)))
-
-(defun org-string>= (a b)
- (not (string-collate-lessp a b)))
-
-(defun org-string> (a b)
+(defcustom org-sort-function #'string-collate-lessp
+ "Function used to compare strings when sorting.
+This function affects how Org mode sorts headlines, agenda items,
+table lines, etc.
+
+The function must accept either 2 or 4 arguments: strings to compare
+and, optionally, LOCALE and IGNORE-CASE - locale name and flag to make
+comparison case-insensitive.
+
+The default value uses sorting rules according to OS language. Users
+who want to make sorting language-independent, may customize the value
+to `string-lessp'.
+
+Note that some string sorting rules are known to be not accurate on
+MacOS. See https://debbugs.gnu.org/cgi/bugreport.cgi?bug=59275.
+MacOS users may customize the value to `org-sort-function-downcase'."
+ :group 'org
+ :package-version '(Org . "9.7")
+ :type '(choice
+ (const :tag "According to OS language" string-collate-lessp)
+ (const :tag "Using string comparison" string-lessp)
+ (const :tag "Case-insensitive string comparison" org-sort-function-downcase)
+ (function :tag "Custom function")))
+
+(defun org-sort-function-downcase (a b &optional _ _)
+ "Return non-nil when downcased string A < string B. Ignore case."
+ (string-lessp (downcase a) (downcase b)))
+
+(defun org-string< (a b &optional locale ignore-case)
+ "Return non-nil when string A < string B.
+LOCALE is the locale name. IGNORE-CASE, when non-nil, makes comparison
+ignore case."
+ (if (= 4 (cdr (func-arity org-sort-function)))
+ (funcall org-sort-function a b locale ignore-case)
+ (funcall org-sort-function a b)))
+
+(defun org-string<= (a b &optional locale ignore-case)
+ "Return non-nil when string A <= string B.
+LOCALE is the locale name. IGNORE-CASE, when non-nil, makes comparison
+ignore case."
+ (or (string= a b) (org-string< a b locale ignore-case)))
+
+(defun org-string>= (a b &optional locale ignore-case)
+ "Return non-nil when string A >= string B.
+LOCALE is the locale name. IGNORE-CASE, when non-nil, makes comparison
+ignore case."
+ (not (org-string< a b locale ignore-case)))
+
+(defun org-string> (a b &optional locale ignore-case)
+ "Return non-nil when string A > string B.
+LOCALE is the locale name. IGNORE-CASE, when non-nil, makes comparison
+ignore case."
(and (not (string= a b))
- (not (string-collate-lessp a b))))
+ (not (org-string< a b locale ignore-case))))
(defun org-string<> (a b)
+ "Return non-nil when string A and string B are not equal."
(not (string= a b)))
(defsubst org-trim (s &optional keep-lead)
diff --git a/lisp/org-mouse.el b/lisp/org-mouse.el
index 2904bad1f..0b1ddaa6e 100644
--- a/lisp/org-mouse.el
+++ b/lisp/org-mouse.el
@@ -426,13 +426,14 @@ (defun org-mouse-tag-menu () ;todo
(append
(let ((tags (org-get-tags nil t)))
(org-mouse-keyword-menu
- (sort (mapcar #'car (org-get-buffer-tags)) #'string-lessp)
+ (sort (mapcar #'car (org-get-buffer-tags))
+ (or org-tags-sort-function #'org-string<))
(lambda (tag)
(org-mouse-set-tags
(sort (if (member tag tags)
(delete tag tags)
(cons tag tags))
- #'string-lessp)))
+ (or org-tags-sort-function #'org-string<))))
(lambda (tag) (member tag tags))
))
'("--"
@@ -473,7 +474,7 @@ (defun org-mouse-list-options-menu (alloptions &optional function)
(sort (if (member ',name ',options)
(delete ',name ',options)
(cons ',name ',options))
- 'string-lessp)
+ #'org-string<)
" ")
nil nil nil 1)
(when (functionp ',function) (funcall ',function)))
@@ -502,7 +503,8 @@ (defun org-mouse-popup-global-menu ()
["Check TODOs" org-show-todo-tree t]
("Check Tags"
,@(org-mouse-keyword-menu
- (sort (mapcar #'car (org-get-buffer-tags)) #'string-lessp)
+ (sort (mapcar #'car (org-get-buffer-tags))
+ (or org-tags-sort-function #'org-string<))
(lambda (tag) (org-tags-sparse-tree nil tag)))
"--"
["Custom Tag ..." org-tags-sparse-tree t])
@@ -512,7 +514,8 @@ (defun org-mouse-popup-global-menu ()
["Display TODO List" org-todo-list t]
("Display Tags"
,@(org-mouse-keyword-menu
- (sort (mapcar #'car (org-get-buffer-tags)) #'string-lessp)
+ (sort (mapcar #'car (org-get-buffer-tags))
+ (or org-tags-sort-function #'org-string<))
(lambda (tag) (org-tags-view nil tag)))
"--"
["Custom Tag ..." org-tags-view t])
diff --git a/lisp/org-table.el b/lisp/org-table.el
index 0c2dc27ed..45fe4d0fa 100644
--- a/lisp/org-table.el
+++ b/lisp/org-table.el
@@ -4637,8 +4637,8 @@ (defun org-table-sort-lines
(predicate
(cl-case sorting-type
((?n ?N ?t ?T) #'<)
- ((?a ?A) (if with-case #'string-collate-lessp
- (lambda (s1 s2) (string-collate-lessp s1 s2 nil t))))
+ ((?a ?A) (if with-case #'org-string<
+ (lambda (s1 s2) (org-string< s1 s2 nil t))))
((?f ?F)
(or compare-func
(and interactive?
diff --git a/lisp/org.el b/lisp/org.el
index 20879685c..f9a9332aa 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -2944,8 +2944,8 @@ (defcustom org-tags-sort-function nil
:group 'org-tags
:type '(choice
(const :tag "No sorting" nil)
- (const :tag "Alphabetical" string-collate-lessp)
- (const :tag "Reverse alphabetical" org-string-collate-greaterp)
+ (const :tag "Alphabetical" org-string<)
+ (const :tag "Reverse alphabetical" org-string>)
(function :tag "Custom function" nil)))
(defvar org-tags-history nil
@@ -7955,7 +7955,7 @@ (defun org-sort-entries
(t (error "Invalid sorting type `%c'" sorting-type))))
nil
(cond
- ((= dcst ?a) 'string-collate-lessp)
+ ((= dcst ?a) #'org-string<)
((= dcst ?f)
(or compare-func
(and interactive?
diff --git a/lisp/ox-publish.el b/lisp/ox-publish.el
index 3e526b813..1b623ce9f 100644
--- a/lisp/ox-publish.el
+++ b/lisp/ox-publish.el
@@ -794,17 +794,14 @@ (defun org-publish-sitemap (project &optional sitemap-filename)
(concat (file-name-directory b)
(org-publish-find-title b project))
b)))
- (setq retval
- (if ignore-case
- (not (string-lessp (upcase B) (upcase A)))
- (not (string-lessp B A))))))
+ (setq retval (org-string<= A B nil ignore-case))))
((or `anti-chronologically `chronologically)
(let* ((adate (org-publish-find-date a project))
(bdate (org-publish-find-date b project)))
(setq retval
(not (if (eq sort-files 'chronologically)
- (time-less-p bdate adate)
- (time-less-p adate bdate))))))
+ (time-less-p bdate adate)
+ (time-less-p adate bdate))))))
(`nil nil)
(_ (user-error "Invalid sort value %s" sort-files)))
;; Directory-wise wins:
--
2.45.0
[-- Attachment #3: Type: text/plain, Size: 224 bytes --]
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [DISCUSSION] Sorting strings in Org mode vs. system locale
2024-05-05 11:59 ` Ihor Radchenko
@ 2024-05-07 11:06 ` Max Nikulin
2024-05-07 13:09 ` Ihor Radchenko
0 siblings, 1 reply; 31+ messages in thread
From: Max Nikulin @ 2024-05-07 11:06 UTC (permalink / raw)
To: emacs-orgmode
On 05/05/2024 18:59, Ihor Radchenko wrote:
> Ihor Radchenko writes:
>
>>> If a user access same files from Linux and macOS then it
>>> may be really annoying to get different order of entries in agenda. For
>>> most of Linux users it is better to use more smart
>>> `string-collate-lessp'. Some care is required to sort entries obtained
>>> from multiple buffers in predictable environment (locale, case
>>> conversion table).
>>
>> I agree. We can introduce a new customization -
>> `org-string-sort-function' that will be used across Org mode to sort
>> user text.
>
> See the attached tentative patch.
> I added a customization, made everything in Org obey it, and provided
> some default options for MacOS users.
Contrary to Eli, I still think that there are enough locales where
completely disregarding IGNORE-CASE is worse than fallback to `downcase'
when IGNORE-CASE is t. Perhaps some kind of normalization (NFD?) may
improve results further.
I consider the following as a kind of graceful degradation
(defun org-sort-function-fallback-downcase
(a b &optional LOCALE IGNORE-CASE)
(if ignore-case
(string-collate-lessp (downcase a) (downcase b) locale ignore-case)
(string-collate-lessp a b locale ignore-case)))
(defcustom org-sort-function
(if (string-collate-lessp "a" "B" "C" t)
#'string-collate-lessp
#'org-sort-function-fallback-downcase))
I would consider a setter function for `org-sort-function' to avoid
branches based of `func-arity' in `org-string<'.
I see a little point in purely downcase comparator
`org-sort-function-downcase'.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [DISCUSSION] Sorting strings in Org mode vs. system locale
2024-05-07 11:06 ` [DISCUSSION] Sorting strings in Org mode vs. system locale Max Nikulin
@ 2024-05-07 13:09 ` Ihor Radchenko
2024-05-07 16:47 ` Max Nikulin
0 siblings, 1 reply; 31+ messages in thread
From: Ihor Radchenko @ 2024-05-07 13:09 UTC (permalink / raw)
To: Max Nikulin; +Cc: emacs-orgmode
Max Nikulin <manikulin@gmail.com> writes:
> I consider the following as a kind of graceful degradation
>
> (defun org-sort-function-fallback-downcase
> (a b &optional LOCALE IGNORE-CASE)
> (if ignore-case
> (string-collate-lessp (downcase a) (downcase b) locale ignore-case)
> (string-collate-lessp a b locale ignore-case)))
It is indeed better than `org-sort-function-downcase'.
> (defcustom org-sort-function
> (if (string-collate-lessp "a" "B" "C" t)
> #'string-collate-lessp
> #'org-sort-function-fallback-downcase))
No. Let's be consistent with Emacs here.
> I would consider a setter function for `org-sort-function' to avoid
> branches based of `func-arity' in `org-string<'.
Setter is not reliable when setq is used, so I prefer arity check.
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [DISCUSSION] Sorting strings in Org mode vs. system locale
2024-05-07 13:09 ` Ihor Radchenko
@ 2024-05-07 16:47 ` Max Nikulin
2024-05-11 9:38 ` Ihor Radchenko
0 siblings, 1 reply; 31+ messages in thread
From: Max Nikulin @ 2024-05-07 16:47 UTC (permalink / raw)
To: emacs-orgmode
On 07/05/2024 20:09, Ihor Radchenko wrote:
> Max Nikulin writes:
>
>> I consider the following as a kind of graceful degradation
>>
>> (defun org-sort-function-fallback-downcase
>> (a b &optional LOCALE IGNORE-CASE)
>> (if ignore-case
>> (string-collate-lessp (downcase a) (downcase b) locale ignore-case)
>> (string-collate-lessp a b locale ignore-case)))
>
> It is indeed better than `org-sort-function-downcase'.
`compare-strings' with upcase conversion under the hood may be an
alternative.
>> I would consider a setter function for `org-sort-function' to avoid
>> branches based of `func-arity' in `org-string<'.
>
> Setter is not reliable when setq is used, so I prefer arity check.
I bothers me as well. Another idea is to require 2 optional argument and
thus wrappers for 2 argument functions. My expectation that extra
function call still may be cheaper.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [DISCUSSION] Sorting strings in Org mode vs. system locale
2024-05-07 16:47 ` Max Nikulin
@ 2024-05-11 9:38 ` Ihor Radchenko
0 siblings, 0 replies; 31+ messages in thread
From: Ihor Radchenko @ 2024-05-11 9:38 UTC (permalink / raw)
To: Max Nikulin; +Cc: emacs-orgmode
Max Nikulin <manikulin@gmail.com> writes:
>>
>>> I consider the following as a kind of graceful degradation
>>>
>>> (defun org-sort-function-fallback-downcase
>>> (a b &optional LOCALE IGNORE-CASE)
>>> (if ignore-case
>>> (string-collate-lessp (downcase a) (downcase b) locale ignore-case)
>>> (string-collate-lessp a b locale ignore-case)))
>>
>> It is indeed better than `org-sort-function-downcase'.
>
> `compare-strings' with upcase conversion under the hood may be an
> alternative.
Applied, onto main.
https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=413192698
I replaced the two fallback variants with #'string< and custom downcase
function with a single fallback that uses `compare-strings'.
Closed.
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS
2022-11-22 1:14 ` Ihor Radchenko
2022-11-22 16:01 ` Max Nikulin
@ 2025-01-06 18:22 ` Michael Brand
2025-01-06 18:40 ` Ihor Radchenko
1 sibling, 1 reply; 31+ messages in thread
From: Michael Brand @ 2025-01-06 18:22 UTC (permalink / raw)
To: Ihor Radchenko; +Cc: Max Nikulin, emacs-orgmode
Hi all
On Tue, Nov 22, 2022 at 2:14 AM Ihor Radchenko <yantar92@posteo.net> wrote:
>
> Max Nikulin <manikulin@gmail.com> writes:
>
> >>> Concerning the test, I would split the current testcase into 2 parts
> >>> depending on WITH-CASE argument, check if caseless collation is
> >>> available and skip the related test otherwise.
> >>
> >> How can we check the availability?
> >
> > (string-collate-lessp "a" "B" "C" t)
>
> Thanks!
For this specific subject I can not see a follow up in this thread, or
an implementation on the branches bugfix or main. And the complaint of
the OP about the failing test is still valid on bugfix (on main ~make
test~ aborted before with "void-variable (file)" on "(delete-file
file)" quite early when loading testing/lisp/test-ob-tangle.el).
As far as I understand and agree, some Org functionality wants to
stick with ~string-collate-lessp~. Is it welcome if I finish working
on my local patch that plans to use
#+begin_src emacs-lisp
(defun org-test-string-collate-lessp-ignore-case-supported-p
(&optional locale)
"Whether `string-collate-lessp' supports ignore case for LOCALE.
According to the docstring of `string-collate-lessp' it does not
implement ignore case for some locale on some operating
systems (actually depending on libc of Emacs). E. g. on macOS
`string-collate-lessp' does not ignore case for the locale C.
See also https://debbugs.gnu.org/cgi/bugreport.cgi?bug=59275 and
https://list.orgmode.org/orgmode/m2ilkwso8r.fsf@me.com"
(let ((ignore-case t))
(string-collate-lessp "a" "B" locale ignore-case)))
#+end_src
not to skip the test as suggested in the quote at the beginning of
this message but to bifurcate the expected result of the ERT to fix
the test for e. g. macOS on bugfix?
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS
2025-01-06 18:22 ` test-org-table/sort-lines: Failing test on macOS Michael Brand
@ 2025-01-06 18:40 ` Ihor Radchenko
2025-01-06 20:49 ` Michael Brand
0 siblings, 1 reply; 31+ messages in thread
From: Ihor Radchenko @ 2025-01-06 18:40 UTC (permalink / raw)
To: Michael Brand; +Cc: Max Nikulin, emacs-orgmode
Michael Brand <michael.ch.brand@gmail.com> writes:
> For this specific subject I can not see a follow up in this thread, or
> an implementation on the branches bugfix or main. And the complaint of
> the OP about the failing test is still valid on bugfix (on main ~make
> test~ aborted before with "void-variable (file)" on "(delete-file
> file)" quite early when loading testing/lisp/test-ob-tangle.el).
There was a transient problem on main. Now, make test should run.
This thread concluded with
https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=413192698
(on main)
> As far as I understand and agree, some Org functionality wants to
> stick with ~string-collate-lessp~. Is it welcome if I finish working
> on my local patch that plans to use
>
> #+begin_src emacs-lisp
> (defun org-test-string-collate-lessp-ignore-case-supported-p
> (&optional locale)
> "Whether `string-collate-lessp' supports ignore case for LOCALE.
> According to the docstring of `string-collate-lessp' it does not
> implement ignore case for some locale on some operating
> systems (actually depending on libc of Emacs). E. g. on macOS
> `string-collate-lessp' does not ignore case for the locale C.
>
> See also https://debbugs.gnu.org/cgi/bugreport.cgi?bug=59275 and
> https://list.orgmode.org/orgmode/m2ilkwso8r.fsf@me.com"
> (let ((ignore-case t))
> (string-collate-lessp "a" "B" locale ignore-case)))
> #+end_src
>
> not to skip the test as suggested in the quote at the beginning of
> this message but to bifurcate the expected result of the ERT to fix
> the test for e. g. macOS on bugfix?
AFAIU, this is more complex. We cannot have a simple test for all
possible locales. Or did you find one? My conclusion from bug#59275
discussion is that simple tries like
(string-collate-lessp "a" "B" locale t)
will not work.
Especially for non-US locales.
--
Ihor Radchenko // yantar92,
Org mode maintainer,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS
2025-01-06 18:40 ` Ihor Radchenko
@ 2025-01-06 20:49 ` Michael Brand
2025-01-08 16:49 ` Michael Brand
0 siblings, 1 reply; 31+ messages in thread
From: Michael Brand @ 2025-01-06 20:49 UTC (permalink / raw)
To: Ihor Radchenko; +Cc: Max Nikulin, emacs-orgmode
On Mon, Jan 6, 2025 at 7:38 PM Ihor Radchenko <yantar92@posteo.net> wrote:
> There was a transient problem on main. Now, make test should run.
Now I see, fixed a few minutes before my post, thank you. This reveals
that the test failure discussed here is the same on main as on bugfix.
In the following I will use the term "bifurcation" as short for: Using
~(string-collate-lessp "a" "B" nil t)~ for the purpose of telling
whether the parameter to ignore case has an effect on
~string-collate-lessp~ or not and to choose the right expected test
result (the currently implemented case-insensitive or the to be added
case-sensitive) for a test result involving ~string-collate-lessp~.
> AFAIU, this is more complex.
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=59275 looks complicated
to me too, but just the very narrow sub-subject of bifurcation seems
quite simple to me and I don't agree.
> We cannot have a simple test for all possible locales.
Why is bifurcation not doing it?
> Or did you find one?
To my understanding it is bifurcation.
> My conclusion from bug#59275 discussion is that simple tries like
> (string-collate-lessp "a" "B" locale t)
> will not work.
I have read a large part of bug#59275 but missed what you mean by will
not work. Probably not work for this and that, but for the purpose of
just bifurcation I expect ~(string-collate-lessp "a" "B" null t)~ to
be sufficient.
> Especially for non-US locales.
I expect bifurcation to work with _any_ existing or future locale.
~(org-test-string-collate-lessp-ignore-case-supported-p)~ simply
measures whether the parameter for ignore case of
~string-collate-lessp~ has an effect in the locale of the current
scope, in any locale, libc etc., regardless of how complicated their
impact on this matter is. Although it is not necessary for bifurcation
alone, the locale in the scope of
#+begin_src emacs-lisp
;; Sort alphabetically. Enforce the C locale for consistent results.
(let ((original-string-collate-lessp (symbol-function 'string-collate-lessp)))
(cl-letf (((symbol-function 'string-collate-lessp)
(lambda (s1 s2 &optional _locale ignore-case)
(funcall original-string-collate-lessp
s1 s2 "C" ignore-case))))
#+end_src
in test-org-table/sort-lines is even clearly defined and a very well
known and simple one.
I don't have any concern about bifurcation. I would even go further
and suspect that my patch not just fixes the tests but (as a side
effect, not as my goal) obsoletes the current "Enforce the C locale
for consistent results". My patch _neutralizes_ the ignore case
behavior of ~string-collate-lessp~. Contrary, the current
implementation tries to _fixate_ at least the ignore case behavior of
~string-collate-lessp~ which is not possible for some cases according
to its docstring.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS
2025-01-06 20:49 ` Michael Brand
@ 2025-01-08 16:49 ` Michael Brand
0 siblings, 0 replies; 31+ messages in thread
From: Michael Brand @ 2025-01-08 16:49 UTC (permalink / raw)
To: Ihor Radchenko; +Cc: Max Nikulin, emacs-orgmode
[-- Attachment #1: Type: text/plain, Size: 411 bytes --]
On Mon, Jan 6, 2025 at 9:49 PM Michael Brand <michael.ch.brand@gmail.com> wrote:
> My patch _neutralizes_ the ignore case behavior of ~string-collate-lessp~.
The attached patches implement this suggestion and are applicable on
bugfix and main. Tested with Emacs 29.4 of Homebrew on macOS. Please
review and if someone can test the final version also on Linux and
Windows I suggest to apply on bugfix.
[-- Attachment #2: 0001-Align-test-with-other-tests.patch --]
[-- Type: application/x-patch, Size: 1508 bytes --]
[-- Attachment #3: 0002-Fix-test-for-when-ignore-case-is-not-supported.patch --]
[-- Type: application/x-patch, Size: 2980 bytes --]
^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2025-01-08 16:50 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-10-06 20:15 test-org-table/sort-lines: Failing test on macOS Rudolf Adamkovič
2022-10-07 12:04 ` Max Nikulin
2022-10-08 5:25 ` Ihor Radchenko
2022-10-08 14:27 ` Max Nikulin
2022-10-09 3:59 ` Ihor Radchenko
2022-10-09 15:38 ` Rudolf Adamkovič
2022-10-09 16:53 ` Max Nikulin
2022-10-10 22:25 ` Rudolf Adamkovič
2022-10-12 16:09 ` Max Nikulin
2022-11-15 4:10 ` Ihor Radchenko
2022-11-20 4:18 ` Ihor Radchenko
2022-11-20 8:00 ` Max Nikulin
2022-11-21 3:15 ` Ihor Radchenko
2022-11-21 16:48 ` Max Nikulin
2022-11-22 1:14 ` Ihor Radchenko
2022-11-22 16:01 ` Max Nikulin
2022-11-23 10:37 ` Ihor Radchenko
2022-11-23 15:27 ` Max Nikulin
2022-11-23 17:01 ` Max Nikulin
2022-11-26 2:05 ` Ihor Radchenko
2022-11-29 16:40 ` Max Nikulin
2024-04-03 11:40 ` [DISCUSSION] Sorting strings in Org mode vs. system locale (was: test-org-table/sort-lines: Failing test on macOS) Ihor Radchenko
2024-05-05 11:59 ` Ihor Radchenko
2024-05-07 11:06 ` [DISCUSSION] Sorting strings in Org mode vs. system locale Max Nikulin
2024-05-07 13:09 ` Ihor Radchenko
2024-05-07 16:47 ` Max Nikulin
2024-05-11 9:38 ` Ihor Radchenko
2025-01-06 18:22 ` test-org-table/sort-lines: Failing test on macOS Michael Brand
2025-01-06 18:40 ` Ihor Radchenko
2025-01-06 20:49 ` Michael Brand
2025-01-08 16:49 ` Michael Brand
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).