Add ROSpan StartsWith and EndsWith string-like APIs with StringComparison#26880
Conversation
| /// <param name="span">The source span.</param> | ||
| /// <param name="value">The sequence to compare to the end of the source span.</param> | ||
| /// <param name="comparison">One of the enumeration values that determines how the <paramref name="span"/> and <paramref name="value"/> are compared.</param> | ||
| public static bool EndsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison) |
There was a problem hiding this comment.
comparison -> comparisonType to match the naming used everywhere else.
| public static bool EndsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison) | ||
| { | ||
| string sourceString = span.ToString(); | ||
| string valueString = value.ToString(); |
There was a problem hiding this comment.
Ouch; this is really unfortunate. We should consider only exposing these for core.
If we keep these, we should at least do a reasonable implementation for Ordinal and OrdinalIgnoreCase, and presumably we could do so for Invariant as well? I agree the CurrentCulture ones will be problematic.
There was a problem hiding this comment.
Invariant would be as the current culture (if the string is not fully ASCII).
I think there is opportunity to optimize for the ordinal and ordinal ignore casing but I think we already saying the portable implementation would be slower than the core one and whatever we do will not be able to match the core performance (i guess).
not exposing these would be an issue for anyone want to write a library that work on net core and down level too. I prefer keeping it and document it as it is slower than the core one and we can try optimize here as much we can.
In reply to: 166251624 [](ancestors = 166251624)
There was a problem hiding this comment.
I can file an issue to optimize this method for the ordinal cases. We do have the generic StartsWith method that take a T, which does not take SringComparison that can be used for one of the two scenarios.
FYI (from API review):
// Those need access to globalization APIs. We'll also expose them from
// the .NET Framework OOB (slow span). They will try to extract the string
// from the underlying span (because slow span stores it) -- or -- allocate
// a new string. This avoids bifurcating the API surface.There was a problem hiding this comment.
They will try to extract the string from the underlying span (because slow span stores it)
Can this change do this then?
There was a problem hiding this comment.
I haven't been able to figure out how to do that. The underlying data is stored as a Pinnable<char> with byte offset, and the globalization APIs being utilized work on strings (not char*), at least publically.
There was a problem hiding this comment.
It looks like we're using Unsafe.As to cast the object to a Pinnable. Can you just do a type check on it and cast it back? e.g. object o = _pinnable; if (o is string s) { ... }? @jkotas or @atsushikan could comment on the safety of all this.
There was a problem hiding this comment.
If that is doable, I can definitely update our ToString implementation and take advantage of that as well.
There was a problem hiding this comment.
Is there a way to do something similar for the FastSpan ToString which has a ByReference field?
There was a problem hiding this comment.
No, there is not a reasonable way to do that for fast span.
| { | ||
| if (typeof(T) == typeof(char)) | ||
| { | ||
| if (_pinnable is string str) |
There was a problem hiding this comment.
This is not enough. You have to also check that the offset and length are for the whole string (ie that the Span is not pointing at substring).
There was a problem hiding this comment.
Would adding offset and length checks be sufficient here?
if (obj is string str && _byteOffset == 0 && _length == str.Length)There was a problem hiding this comment.
You need to use StringAdjustment for the offset.
| { | ||
| if (typeof(T) == typeof(char)) | ||
| { | ||
| if (_pinnable is string str) |
There was a problem hiding this comment.
Note that the optimizing compilers can figure out here that Pinnable<T> can never be a string, and turn this condition into false.
I think you may want to pass it through Unsafe.As here to strip the type information here. It is not guaranteed to work, but it is increasing the changes that it will.
There was a problem hiding this comment.
We may want to have a slow-span specific test for this so that we will know when it does not work.
There was a problem hiding this comment.
Note that the optimizing compilers can figure out here that Pinnable can never be a string
Even the C# compiler will... that's why in my example I stored it into an object first, otherwise the C# compiler just emit the cast as a hard-coded false. Though as you say an even better compiler could see through the object cast as well.
| { | ||
| if (typeof(T) == typeof(char)) | ||
| { | ||
| if (_pinnable is string str) |
There was a problem hiding this comment.
Can this ever be true for Span or should I remove this check?
|
For slow span, if we get a read/write Span from Memory, we have Pinnable<T> != null and hence ToString() can return the underlying string. But, if we construct a span from void* and length, Pinnable<T> == null and then ToString() returns a new string. Are we fine with this discrepancy? string original = TestHelpers.BuildString(10, 42);
ReadOnlyMemory<char> readOnlyMemory = original.AsReadOnlyMemory();
Memory<char> memory = MemoryMarshal.AsMemory(readOnlyMemory);
fixed (char* pOriginal = original)
fixed (char* pString1 = memory.Span.ToString())
{
Assert.Equal((int)pOriginal, (int)pString1); // passes the test
}
fixed (char* pOriginal = original)
{
Span<char> span = new Span<char>(pOriginal, original.Length);
fixed (char* pString1 = span.ToString())
{
Assert.Equal((int)pOriginal, (int)pString1); // fails the test
}
} |
| string subString2 = span.Slice(0, 2).ToString(); | ||
| string subString3 = span.Slice(1, 2).ToString(); | ||
|
|
||
| Assert.Equal("RDDNEGSNET", returnedString); |
There was a problem hiding this comment.
nit: shouldn't these use original instead of the hardcoded literal
|
@dotnet-bot test this please |
1 similar comment
|
@dotnet-bot test this please |
| if (typeof(T) == typeof(char)) | ||
| { | ||
| object obj = Unsafe.As<Pinnable<T>, object>(ref Unsafe.AsRef(_pinnable)); | ||
| if (obj is string str && _byteOffset == MemoryExtensions.StringAdjustment && _length == str.Length) |
There was a problem hiding this comment.
As in #27170, how about doing the _byteOffset check before doing any of the casting / type checking?
| { | ||
| Assert.Equal((int)pOriginal, (int)pString1); | ||
| Assert.Equal((int)pOriginal, (int)pString2); | ||
| } |
There was a problem hiding this comment.
Why not just use Assert.Same?
| Assert.NotEqual((int)pSubString1, (int)pSubString2); | ||
| Assert.NotEqual((int)pSubString1, (int)pSubString3); | ||
| Assert.NotEqual((int)pSubString2, (int)pSubString3); | ||
| } |
| // This test is only relevant for portable span | ||
| #if FEATURE_PORTABLE_SPAN | ||
| [Fact] | ||
| #endif |
There was a problem hiding this comment.
Is this the convention used in this project? Elsewhere in corefx we've used SkipOnTargetFramework, e.g. https://github.com/dotnet/corefx/pull/27170/files#diff-f8d6f156088e92893813570d9608526eR63
There was a problem hiding this comment.
I am not sure if NetFramework is the only TargetFrameworkMoniker where we would run this test (uapaot in particular).
Essentially, we want to run on !netcoreapp and !uap, based on when IsPartialFacadeAssembly is false:
There was a problem hiding this comment.
But given this, I will change it.
For reference, the framework monikers that are supported: https://github.com/dotnet/buildtools/blob/master/src/xunit.netcore.extensions/TargetFrameworkMonikers.cs#L23-L26
| { | ||
| if (typeof(T) == typeof(char)) | ||
| { | ||
| object obj = Unsafe.As<Pinnable<T>, object>(ref Unsafe.AsRef(_pinnable)); |
There was a problem hiding this comment.
Can this just be:
object obj = Unsafe.As<object>(_pinnable);? Or does doing this via the refs help further?
There was a problem hiding this comment.
Nit: (Also, a comment indicating why this is used rather than just if (_pinnable is string str ...) would be helpful.)
There was a problem hiding this comment.
I will merge the changes from your PR into here which will include the comments, test updates, etc.
Thanks!
There was a problem hiding this comment.
Cool, thanks. Didn't realize you'd already done the initial work (I remember we'd talked about it, but then couldn't find it in the repo 😄)
| if (obj is string str && _byteOffset == MemoryExtensions.StringAdjustment && _length == str.Length) | ||
| { | ||
| return str; | ||
| } |
There was a problem hiding this comment.
Is this change needed? When will a Span<char> (rather than a ReadOnlySpan<char>) wrap a string?
There was a problem hiding this comment.
See code sample here: #26880 (comment)
I asked the same question but since we support ReadOnlyMemory -> Memory conversion, it is possible (based on the comments here - https://github.com/dotnet/corefx/blob/master/src/Common/src/CoreLib/System/Memory.cs#L195).
There was a problem hiding this comment.
I personally don't think it's worthwhile to add this check to Span<char>. It's such a corner case to optimize for, and at the expense of adding a type check to other Span<char>.ToString() calls. And in general folks can't depend on ToString producing the same string from multiple calls, so I don't consider the discrepancy a real issue.
There was a problem hiding this comment.
I don't have a strong opinion on this. I am fine with removing this check for Span. @KrzysztofCwalina, @jkotas - any objections?
|
@dotnet-bot test Linux x64 Release Build |
|
@dotnet-bot test this please |
…ison (dotnet/corefx#26880) * Add ROSpan StartsWith and EndsWith string-like APIs with StringComparison * We have separate implementations of slow and fast span. * Remove unused using directive. * Address PR feedback and add tests. * Add test files * Get the string from the underlying span to avoid allocation when possible. * Update ToString and add tests * Update Span ToString and add tests * Address PR feedback and disable ToString test for fast span * Borrow additional tests from the existing StringTests test bed. * Fix comment grammar * No StringComparison results in generic StartsWith being called which does ordinal comparison * Address feedback related to ToString and tests * Fix tests for culture specific cases, for Unix. * Fix typo in variable names * Respond to recent change AsReadOnlySpan -> AsSpan * Fix typo in test. Commit migrated from dotnet/corefx@42d5370
Part of https://github.com/dotnet/corefx/issues/21395#issuecomment-359906138
Depends on: dotnet/coreclr#16223 and dotnet/coreclr#16418
TODO:
Add tests.cc @jkotas, @stephentoub, @KrzysztofCwalina, @tarekgh, @JeremyKuhne