sort: gnu coreutils compatibility (sort float.sh)#9839
sort: gnu coreutils compatibility (sort float.sh)#9839sylvestre merged 11 commits intouutils:mainfrom
Conversation
|
GNU testsuite comparison: |
|
Could you please add a test to make sure we don't regress in the future? |
|
Did you run some benchmark on this btw? |
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
Add locale-aware handling for decimal separators (comma vs. period) in general numeric sorting mode. This allows proper sorting of numeric fields in locales where comma is the standard decimal separator, such as many European countries. Changes include: - Enabling the "i18n-decimal" feature in uucore dependencies - Introducing locale_decimal_pt() to retrieve the locale-specific decimal separator - Updating get_leading_gen() and general_bd_parse() to accept and use the decimal point parameter - Converting input to standard period notation for parsing when locale uses comma This fixes sorting inaccuracies for international users with comma-based decimal notation.
…g Option::then Refactored the `general_bd_parse` function to replace manual mutable vector initialization and conditional assignment with `Option::then` and `as_deref`, making the code more concise and idiomatic while maintaining the same logic for normalizing decimal points.
Add `effective_decimal_pt` function to intelligently select the decimal point based on input content, ensuring correct sorting when locale uses comma but input uses period as decimal separator. This fixes parsing issues in general numeric mode by prioritizing input-based detection over strict locale adherence.
Updated various crates in fuzz/Cargo.lock: windows-sys to 0.61.2, bumpalo to 3.19.1, cc to 1.2.50, crc to 3.3.0, crc-fast to 1.9.0, icu_properties and data to 2.1.2, jiff to 0.2.17. Added fixed_decimal, icu_decimal, and icu_decimal_data for improved functionality and compatibility.
…ntains method Use input.contains() instead of input.iter().any() for checking presence of comma and dot bytes, improving readability and reducing boilerplate.
Split the uucore dependency features into multiple lines to improve code formatting and maintain consistency with other multi-line dependency declarations.
Add test to verify that sort's general numeric (-g) option correctly handles decimal separators based on locale settings, ensuring proper ordering of numbers like "1,9" vs "1,10" in French locale (comma as separator).
d958bc3 to
8da07b8
Compare
src/uu/sort/src/sort.rs
Outdated
| } | ||
| } | ||
|
|
||
| fn effective_decimal_pt(input: &[u8], locale_decimal: u8) -> u8 { |
There was a problem hiding this comment.
Why is this function used? Shouldn't it always just default to the locale?
src/uu/sort/src/sort.rs
Outdated
| let locale_decimal = locale_decimal_pt(); | ||
| let decimal_pt = effective_decimal_pt(initial_selection, locale_decimal); |
There was a problem hiding this comment.
I think this can just be:
let decimal_pt = locale_decimal_pt();
An example of where this causes issues is:
echo -e "1.9\n1.1" | LC_ALL=fr_FR.utf8 /usr/bin/sort -g --stable
# GNU output: 1.9, 1.1 (both = 1, stable)
# PR output: 1.1, 1.9 (WRONG - treats as floats)
echo -e "2,5\n1.9\n1,1" | LC_ALL=fr_FR.utf8 /usr/bin/sort -g
# GNU output: 1.9, 1,1, 2,5 (values: 1, 1.1, 2.5)
# PR output: 1,1, 1.9, 2,5 (WRONG - inconsistent decimal handling per line)
echo -e "1.5e10\n2" | LC_ALL=fr_FR.utf8 /usr/bin/sort -g
# GNU output: 1.5e10, 2 (1 < 2)
# PR output: 2, 1.5e10 (WRONG - treats as 15000000000 > 2)
There was a problem hiding this comment.
Would be good to add these cases to the GNU test suite, will follow up with that
src/uu/sort/src/sort.rs
Outdated
| // Parse this number as BigDecimal, as this is the requirement for general numeric sorting. | ||
| Selection::AsBigDecimal(general_bd_parse(&range_str[get_leading_gen(range_str)])) | ||
| let locale_decimal = locale_decimal_pt(); | ||
| let decimal_pt = effective_decimal_pt(range_str, locale_decimal); |
There was a problem hiding this comment.
Same here for the comment above, you can take that entire helper function out
…e_decimal_pt Remove the effective_decimal_pt function and update its usages in Line and FieldSelector to directly call locale_decimal_pt, simplifying the code without altering functionality. This refactor eliminates unnecessary logic for determining decimal points in general numeric sort mode.
|
GNU testsuite comparison: |
ping? |
Since regression is occurring, I will investigate. |
|
GNU testsuite comparison: |
Merging this PR will not alter performance
Comparing Footnotes
|
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
* feat(sort): support international decimal separators in numeric sorting M
We have made corrections to ensure the GNU coreutils tests pass.
sort-float.sh