Add ReadOnlySpan string-like StartsWith and EndsWith API with globalization support#16223
Conversation
| uint dwFindNLSStringFlags, | ||
| string lpStringSource, | ||
| ReadOnlySpan<char> lpStringSource, | ||
| int startSource, |
There was a problem hiding this comment.
The int index+length arguments are not needed now that you are passing in Span.
| SR.ArgumentNull_String); | ||
| } | ||
|
|
||
| return IsPrefix(source.AsReadOnlySpan(), prefix.AsReadOnlySpan(), options); |
There was a problem hiding this comment.
IsPrefix method is public virtual. What happens if somebody overrides it?
There was a problem hiding this comment.
I don't follow. It was a public virtual method before so whoever overrides it has to implement it just as they would before. Are you asking if we should make the Span based API public as well?
internal virtual bool IsPrefix(ReadOnlySpan<char> source, ReadOnlySpan<char> prefix, CompareOptions options)There was a problem hiding this comment.
I believe Jan is highlighting that if someone is currently overriding it, and we switch code to use the new span-based IsPrefix method instead of the string-based IsPrefix method, we'll be bypassing the existing overrides.
There was a problem hiding this comment.
The span-based IsPrefix is an internal implementation detail of the string IsPrefix. How will the existing string IsPrefix overrides be affected by this change? Essentially, I just extracted the code from the string method into an internal span based method.
There was a problem hiding this comment.
How will the existing string IsPrefix overrides be affected by this change?
The new ones won't respect any existing overrides already in place.
A similar situation has arisen in streams in the past. Consider MemoryStream. Imagine you'd derived a MaxSizeMemoryStream : MemoryStream, where you overrode Write(byte[], int, int) to track how much data has been written and to throw an exception from Write if too much data is written. You've overridden all possible Write methods. Now in the next release we add a new virtual Write(Span) method, and code starts using it with your type. There's now a way to Write to this stream that doesn't respect the overrides and wishes of your type, such that more than the max can be written.
I don't know if it's something that needs to be addressed here; I'm just calling out a potential issue.
There was a problem hiding this comment.
Ah I see! Thanks for the explanation.
I understand that if we add new public span overloads and there are implicit conversions (like array to span), then existing overrides won't be called. However, public virtual bool IsPrefix(string source, string prefix, CompareOptions options) doesn't call any virtual methods. And unless someone explicitly casts the string to Span, the following method public virtual bool IsPrefix(string source, string prefix) will not call the internal IsPrefix(ROS...) method.
Given that no public surface area is changing, and the public virtual methods that have new span overloads are not calling other virtual methods, I am not seeing the potential issues.
There was a problem hiding this comment.
CompareInfo does not have a public instance constructor, so it is de-facto sealed. So this is not a problem.
| return false; | ||
| } | ||
|
|
||
| return (CompareInfo.CompareOrdinalIgnoreCase(this, 0, value.Length, value, 0, value.Length) == 0); |
There was a problem hiding this comment.
Nit: You can change this to CompareOrdinalIgnoreCase(this, value) so that it calls the span version directly.
There was a problem hiding this comment.
Making this change was affecting performance of the other cases, especially Ordinal. The implicit conversion seems to increase the switch/method size too much. Therefore, I have reverted this change.
| Debug.Assert(!prefix.IsEmpty); | ||
| Debug.Assert((options & (CompareOptions.Ordinal | CompareOptions.OrdinalIgnoreCase)) == 0); | ||
|
|
||
| if (_isAsciiEqualityOrdinal && CanUseAsciiOrdinalForOptions(options) && source.IsFastSort() && prefix.IsFastSort()) |
There was a problem hiding this comment.
How should I deal with the fact that this method relies on string.IsFastSort() which doesn't exist on Span?
There was a problem hiding this comment.
Refactor the implementation so that it is not needed. It will be likely faster without it if done properly, even for the string case: https://github.com/dotnet/coreclr/issues/14320
| /// <summary> | ||
| /// Determines whether the beginning of the span matches the specified value when compared using the specified comparison option. | ||
| /// </summary> | ||
| public static bool StartsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison) |
There was a problem hiding this comment.
comparison -> comparisonType to match the naming used everywhere else.
| /// <summary> | ||
| /// Determines whether the beginning of the span matches the specified value when compared using the specified comparison option. | ||
| /// </summary> | ||
| public static bool StartsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison) |
There was a problem hiding this comment.
Some of the other extensions methods in this file (e.g. AsBytes) uses source instead of span for the this parameter name. It'd be nice if the naming was consistent.
There was a problem hiding this comment.
The APIs where this is source have another span parameter as destination. The other ones don't. That is the reason for the inconsistency. If we feel strongly about the parameter renaming, I can do it in all methods.
| /// </summary> | ||
| public static bool StartsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison) | ||
| { | ||
| if ((uint)(comparison - StringComparison.CurrentCulture) > (StringComparison.OrdinalIgnoreCase - StringComparison.CurrentCulture)) |
There was a problem hiding this comment.
Any downside to moving this inside the length check below? e.g.
if (span.Length < value.Length)
{
if ((uint)(comparison - StringComparison.CurrentCulture) > (StringComparison.OrdinalIgnoreCase - StringComparison.CurrentCulture))
{
throw new ArgumentException(SR.NotSupported_StringComparison, nameof(comparison));
}
return false;
}There was a problem hiding this comment.
I wanted to maintain the code shape (borrowed from string). But I see your point. Since the switch statement already has a default which throws, we don't need to check this ahead of time for all cases.
There was a problem hiding this comment.
With the latest change, this feedback is no longer applicable.
There was a problem hiding this comment.
There's now an if (value.Length == 0) check. Move it inside that to avoid the unnecessary check in most cases?
| throw new NotImplementedException(); | ||
|
|
||
| case StringComparison.OrdinalIgnoreCase: | ||
| return CompareInfo.CompareOrdinalIgnoreCase(span, value) == 0; |
There was a problem hiding this comment.
This isn't correct, is it? Shouldn't it be span.Slice(0, value.Length) rather than span?
Does this behave the same as with strings when value.Length == 0?
There was a problem hiding this comment.
Steve is correct. this comparison is wrong. you should slice the span with the value length.
for value.Length === 0 this check should be performed early for all type of comparisons and we should return true at that time.
In reply to: 166247652 [](ancestors = 166247652)
| case StringComparison.Ordinal: | ||
| // TODO: https://github.com/dotnet/corefx/issues/25182 | ||
| // return span.StartsWith(value); | ||
| throw new NotImplementedException(); |
There was a problem hiding this comment.
Rather than throwing, can you put a basic implementation in place now and then follow-up subsequently with whatever changes/improvements you want? It should only be a few lines.
There was a problem hiding this comment.
right the implementation should be straight forward
In reply to: 166248055 [](ancestors = 166248055)
| return IsSuffix(source.AsReadOnlySpan(), suffix.AsReadOnlySpan(), options); | ||
| } | ||
|
|
||
| internal virtual bool IsSuffix(ReadOnlySpan<char> source, ReadOnlySpan<char> suffix, CompareOptions options) |
There was a problem hiding this comment.
As it's internal, if we don't override it anywhere, it doesn't need to be virtual.
| case StringComparison.Ordinal: | ||
| // TODO: https://github.com/dotnet/corefx/issues/25182 | ||
| // return span.EndsWith(value); | ||
| throw new NotImplementedException(); |
There was a problem hiding this comment.
Same comment/question as above.
| if (span.Length < value.Length) | ||
| { | ||
| return false; | ||
| } |
There was a problem hiding this comment.
Is this length check valid for all cultures? cc: @tarekgh #Closed
There was a problem hiding this comment.
| throw new ArgumentException(SR.NotSupported_StringComparison, nameof(comparison)); | ||
| } | ||
|
|
||
| if (span.Length < value.Length) |
There was a problem hiding this comment.
if (span.Length < value.Length [](start = 12, length = 30)
same comment as StartsWith one
| throw new NotImplementedException(); | ||
|
|
||
| case StringComparison.OrdinalIgnoreCase: | ||
| return (CompareInfo.CompareOrdinalIgnoreCase(span.Slice(span.Length - value.Length), value) == 0); |
There was a problem hiding this comment.
e(span.Length - value.Length), value) = [](start = 74, length = 39)
would you handle the case when value.Length == 0?
| /// <summary> | ||
| /// Determines whether the beginning of the span matches the specified value when compared using the specified comparison option. | ||
| /// </summary> | ||
| public static bool StartsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison) |
There was a problem hiding this comment.
l StartsWith(th [](start = 25, length = 15)
All exposed new APIs has to handle the InvariantMode cases and should perform the operations as ordinal/ignorecasing at that time
There was a problem hiding this comment.
Can you please provide an example of what you mean? We already have StringComparison.InvariantCultureIgnoreCase. What other mode needs to be handled?
There was a problem hiding this comment.
We added a global flag to .NET Core 2.0 that effectively says to ignore culture, so that even if code uses StringComparison.CurrentCulture, it'll end up using Ordinal, e.g.
coreclr/src/mscorlib/shared/System/Globalization/CompareInfo.cs
Lines 339 to 347 in 38cf930
This is something someone sets when they don't care about cultures and either don't have the needed dependencies (e.g. ICU) or don't want to carry such dependencies due to size constraints.
There was a problem hiding this comment.
ignore this one. I am seeing we are calling CompareInfo which take care of the InvariantMode.
To let you know what I mean, you may look https://github.com/dotnet/coreclr/blob/master/src/mscorlib/shared/System/Globalization/CompareInfo.cs#L648
There was a problem hiding this comment.
We added a global flag to .NET Core 2.0 that effectively says to ignore culture
Good to know. Ty.
ignore this one.
OK.
|
test Windows_NT x64_arm64_altjit Checked corefx_baseline |
|
I have to address this failure: |
|
I ran the new corefx tests (from related PR) against these changes, locally, and the tests pass. Is there any other feedback that needs to be addressed? |
|
@dotnet-bot test Alpine.3.6 x64 Debug Build @dotnet-bot test Windows_NT arm Cross Checked corefx_baseline Build and Test |
|
When you get a chance, can you point me at the version of the code you wanted to write but found was slower? No need to hold up this PR for that though.... |
https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/String.Comparison.cs#L1062 public Boolean StartsWith(String value, StringComparison comparisonType)
{
if ((Object)value == null)
{
throw new ArgumentNullException(nameof(value));
}
if (comparisonType < StringComparison.CurrentCulture || comparisonType > StringComparison.OrdinalIgnoreCase)
{
throw new ArgumentException(SR.NotSupported_StringComparison, nameof(comparisonType));
}
if ((Object)this == (Object)value)
{
return true;
}
if (value.Length == 0)
{
return true;
}
return this.AsReadOnlySpan().StartsWith(value.AsReadOnlySpan(), comparisonType);
} |
|
@dotnet-bot test OSX10.12 x64 Checked Innerloop Build and Test |
|
@dotnet-bot test OSX10.12 x64 Checked Innerloop Build and Test Most of the corefx_baseline test failures are unrelated (https://github.com/dotnet/coreclr/issues/16272): Existing issue: https://github.com/dotnet/corefx/issues/23435 This one occurs due to the change here since we no longer check the out-of-range StringComparison ahead of time (to throw ArgumentException) and the default within the switch throws NotSupportedException instead. Should we consolidate and always throw ArgumentException since that is what the tests seem to expect: |
|
@dotnet-bot test Ubuntu arm64 Cross Debug Innerloop Build |
|
@dotnet-bot test Windows_NT arm Cross Checked corefx_baseline Build and Test |
|
cc @saurabh500, @corivera - do you guys know why we are seeing test failures - |
|
@dotnet-bot test Ubuntu x64 Checked corefx_baseline |
|
Unrelated to this PR (https://github.com/dotnet/coreclr/issues/16331): |
|
Since the CI failures are unrelated to the changes here (and are being addressed outside the PR), I am going to go ahead and merge this. |
| } | ||
|
|
||
| // See https://github.com/dotnet/coreclr/blob/master/src/utilcode/util_nodependencies.cpp#L970 | ||
| private static readonly bool[] HighCharTable = new bool[0x80] |


Part of https://github.com/dotnet/corefx/issues/21395#issuecomment-359906138
TODO:
Add the allocating counterpart for slow span in corefx andadd tests there.Related PR: dotnet/corefx#26880
cc @jkotas, @stephentoub, @KrzysztofCwalina, @tarekgh, @JeremyKuhne