Skip to content

StringComparison.OrdinalIgnoreCase compares "¡a" and "¡B" incorrectly #71018

@bgrainger

Description

@bgrainger

Description

In net5.0 and net48 on Windows, string.Compare("¡a", "¡B", StringComparison.OrdinalIgnoreCase) returns a value < 0 (specifically -1).

But In net6.0 and net7.0, that expression returns a value > 0 (specifically, 31).

The net5.0 result fulfills the meaning of StringComparison.OrdinalIgnoreCase; the net6.0 result does not.

Setting $env:DOTNET_SYSTEM_GLOBALIZATION_USENLS='true' restores the net5.0 behaviour, which indicates that this may be related to #30960?

(Even if this is the actual result returned by ICU for the comparison--and I'm not sure if that's true or not--it doesn't match this programmer's expectations for what StringComparison.OrdinalIgnoreCase means.)

Reproduction Steps

Program.cs:

using System;

// prints -1 for prefix <= U+007F, 31 for prefix >= U+0080
string prefix = "\u00A1";
Console.WriteLine(string.Compare($"{prefix}a", $"{prefix}B", StringComparison.OrdinalIgnoreCase));

Compare.csproj

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFrameworks>net48;net5.0;net6.0;net7.0</TargetFrameworks>
    <LangVersion>10.0</LangVersion>
  </PropertyGroup>

</Project>

Expected behavior

Per https://docs.microsoft.com/en-us/dotnet/standard/base-types/best-practices-strings#ordinal-string-operations:

Case-insensitive ordinal comparisons are the next most conservative approach. These comparisons ignore most casing; for example, "windows" matches "Windows". When dealing with ASCII characters, this policy is equivalent to StringComparison.Ordinal, except that it ignores the usual ASCII casing. Therefore, any character in [A, Z] (\u0041-\u005A) matches the corresponding character in [a,z] (\u0061-\007A). Casing outside the ASCII range uses the invariant culture's tables.

Thus, it is expected that ¡ will be considered equal in both strings, then a will be compared to B and the first string will sort first by an case-insensitive ordinal comparison. That is, "¡a" sorts before "¡B" using a case-insensitive ordinal comparison.

Actual behavior

"¡a" sorts after "¡B" using a case-insensitive ordinal comparison.

Regression?

Yes. This worked correctly in net48 and net5.0 on Windows and Linux; I have not tested net5.0 and earlier on macOS.

Known Workarounds

Use StringComparison.InvariantCultureIgnoreCase.
Set the DOTNET_SYSTEM_GLOBALIZATION_USENLS environment variable to true.

Configuration

SDKs: 6.0.301; 7.0.100-preview.5.22307.18
Windows 10 19044.1766 x64

Other information

No response

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions