Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Add ROSpan StartsWith and EndsWith string-like APIs with StringComparison#26880

Merged
ahsonkhan merged 20 commits into
dotnet:masterfrom
ahsonkhan:AddStartsWith
Feb 22, 2018
Merged

Add ROSpan StartsWith and EndsWith string-like APIs with StringComparison#26880
ahsonkhan merged 20 commits into
dotnet:masterfrom
ahsonkhan:AddStartsWith

Conversation

@ahsonkhan
Copy link
Copy Markdown

@ahsonkhan ahsonkhan commented Feb 6, 2018

/// <param name="span">The source span.</param>
/// <param name="value">The sequence to compare to the end of the source span.</param>
/// <param name="comparison">One of the enumeration values that determines how the <paramref name="span"/> and <paramref name="value"/> are compared.</param>
public static bool EndsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comparison -> comparisonType to match the naming used everywhere else.

public static bool EndsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison)
{
string sourceString = span.ToString();
string valueString = value.ToString();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch; this is really unfortunate. We should consider only exposing these for core.

If we keep these, we should at least do a reasonable implementation for Ordinal and OrdinalIgnoreCase, and presumably we could do so for Invariant as well? I agree the CurrentCulture ones will be problematic.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invariant would be as the current culture (if the string is not fully ASCII).

I think there is opportunity to optimize for the ordinal and ordinal ignore casing but I think we already saying the portable implementation would be slower than the core one and whatever we do will not be able to match the core performance (i guess).

not exposing these would be an issue for anyone want to write a library that work on net core and down level too. I prefer keeping it and document it as it is slower than the core one and we can try optimize here as much we can.


In reply to: 166251624 [](ancestors = 166251624)

Copy link
Copy Markdown
Author

@ahsonkhan ahsonkhan Feb 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can file an issue to optimize this method for the ordinal cases. We do have the generic StartsWith method that take a T, which does not take SringComparison that can be used for one of the two scenarios.

FYI (from API review):

    // Those need access to globalization APIs. We'll also expose them from
    // the .NET Framework OOB (slow span). They will try to extract the string
    // from the underlying span (because slow span stores it) -- or -- allocate
    // a new string. This avoids bifurcating the API surface.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They will try to extract the string from the underlying span (because slow span stores it)

Can this change do this then?

Copy link
Copy Markdown
Author

@ahsonkhan ahsonkhan Feb 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't been able to figure out how to do that. The underlying data is stored as a Pinnable<char> with byte offset, and the globalization APIs being utilized work on strings (not char*), at least publically.

Copy link
Copy Markdown
Member

@stephentoub stephentoub Feb 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we're using Unsafe.As to cast the object to a Pinnable. Can you just do a type check on it and cast it back? e.g. object o = _pinnable; if (o is string s) { ... }? @jkotas or @atsushikan could comment on the safety of all this.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that is doable, I can definitely update our ToString implementation and take advantage of that as well.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to do something similar for the FastSpan ToString which has a ByReference field?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there is not a reasonable way to do that for fast span.

{
if (typeof(T) == typeof(char))
{
if (_pinnable is string str)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not enough. You have to also check that the offset and length are for the whole string (ie that the Span is not pointing at substring).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would adding offset and length checks be sufficient here?

if (obj is string str && _byteOffset == 0 && _length == str.Length)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to use StringAdjustment for the offset.

{
if (typeof(T) == typeof(char))
{
if (_pinnable is string str)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the optimizing compilers can figure out here that Pinnable<T> can never be a string, and turn this condition into false.

I think you may want to pass it through Unsafe.As here to strip the type information here. It is not guaranteed to work, but it is increasing the changes that it will.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to have a slow-span specific test for this so that we will know when it does not work.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the optimizing compilers can figure out here that Pinnable can never be a string

Even the C# compiler will... that's why in my example I stored it into an object first, otherwise the C# compiler just emit the cast as a hard-coded false. Though as you say an even better compiler could see through the object cast as well.

{
if (typeof(T) == typeof(char))
{
if (_pinnable is string str)
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this ever be true for Span or should I remove this check?

@ahsonkhan
Copy link
Copy Markdown
Author

ahsonkhan commented Feb 7, 2018

For slow span, if we get a read/write Span from Memory, we have Pinnable<T> != null and hence ToString() can return the underlying string.

But, if we construct a span from void* and length, Pinnable<T> == null and then ToString() returns a new string.

Are we fine with this discrepancy?

string original = TestHelpers.BuildString(10, 42);
ReadOnlyMemory<char> readOnlyMemory = original.AsReadOnlyMemory();
Memory<char> memory = MemoryMarshal.AsMemory(readOnlyMemory);

fixed (char* pOriginal = original)
fixed (char* pString1 = memory.Span.ToString())
{
    Assert.Equal((int)pOriginal, (int)pString1); // passes the test
}

fixed (char* pOriginal = original)
{
    Span<char> span = new Span<char>(pOriginal, original.Length);
    fixed (char* pString1 = span.ToString())
    {
        Assert.Equal((int)pOriginal, (int)pString1); // fails the test
    }
}

string subString2 = span.Slice(0, 2).ToString();
string subString3 = span.Slice(1, 2).ToString();

Assert.Equal("RDDNEGSNET", returnedString);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: shouldn't these use original instead of the hardcoded literal

@ahsonkhan
Copy link
Copy Markdown
Author

@dotnet-bot test this please

1 similar comment
@ahsonkhan
Copy link
Copy Markdown
Author

@dotnet-bot test this please

if (typeof(T) == typeof(char))
{
object obj = Unsafe.As<Pinnable<T>, object>(ref Unsafe.AsRef(_pinnable));
if (obj is string str && _byteOffset == MemoryExtensions.StringAdjustment && _length == str.Length)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in #27170, how about doing the _byteOffset check before doing any of the casting / type checking?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

{
Assert.Equal((int)pOriginal, (int)pString1);
Assert.Equal((int)pOriginal, (int)pString2);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use Assert.Same?

Assert.NotEqual((int)pSubString1, (int)pSubString2);
Assert.NotEqual((int)pSubString1, (int)pSubString3);
Assert.NotEqual((int)pSubString2, (int)pSubString3);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And Assert.NotSame?

// This test is only relevant for portable span
#if FEATURE_PORTABLE_SPAN
[Fact]
#endif
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the convention used in this project? Elsewhere in corefx we've used SkipOnTargetFramework, e.g. https://github.com/dotnet/corefx/pull/27170/files#diff-f8d6f156088e92893813570d9608526eR63

Copy link
Copy Markdown
Author

@ahsonkhan ahsonkhan Feb 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if NetFramework is the only TargetFrameworkMoniker where we would run this test (uapaot in particular).

Essentially, we want to run on !netcoreapp and !uap, based on when IsPartialFacadeAssembly is false:

<DefineConstants Condition="'$(IsPartialFacadeAssembly)' != 'true'">$(DefineConstants);FEATURE_PORTABLE_SPAN</DefineConstants>

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

Copy link
Copy Markdown
Author

@ahsonkhan ahsonkhan Feb 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But given this, I will change it.

For reference, the framework monikers that are supported: https://github.com/dotnet/buildtools/blob/master/src/xunit.netcore.extensions/TargetFrameworkMonikers.cs#L23-L26

{
if (typeof(T) == typeof(char))
{
object obj = Unsafe.As<Pinnable<T>, object>(ref Unsafe.AsRef(_pinnable));
Copy link
Copy Markdown
Member

@stephentoub stephentoub Feb 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this just be:

object obj = Unsafe.As<object>(_pinnable);

? Or does doing this via the refs help further?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: (Also, a comment indicating why this is used rather than just if (_pinnable is string str ...) would be helpful.)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will merge the changes from your PR into here which will include the comments, test updates, etc.

Thanks!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks. Didn't realize you'd already done the initial work (I remember we'd talked about it, but then couldn't find it in the repo 😄)

if (obj is string str && _byteOffset == MemoryExtensions.StringAdjustment && _length == str.Length)
{
return str;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change needed? When will a Span<char> (rather than a ReadOnlySpan<char>) wrap a string?

Copy link
Copy Markdown
Author

@ahsonkhan ahsonkhan Feb 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See code sample here: #26880 (comment)

I asked the same question but since we support ReadOnlyMemory -> Memory conversion, it is possible (based on the comments here - https://github.com/dotnet/corefx/blob/master/src/Common/src/CoreLib/System/Memory.cs#L195).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don't think it's worthwhile to add this check to Span<char>. It's such a corner case to optimize for, and at the expense of adding a type check to other Span<char>.ToString() calls. And in general folks can't depend on ToString producing the same string from multiple calls, so I don't consider the discrepancy a real issue.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion on this. I am fine with removing this check for Span. @KrzysztofCwalina, @jkotas - any objections?

@ahsonkhan
Copy link
Copy Markdown
Author

@dotnet-bot test Linux x64 Release Build

@ahsonkhan
Copy link
Copy Markdown
Author

@dotnet-bot test this please

@ahsonkhan ahsonkhan removed the blocked Issue/PR is blocked on something - see comments label Feb 21, 2018
@ahsonkhan ahsonkhan merged commit 42d5370 into dotnet:master Feb 22, 2018
@ahsonkhan ahsonkhan deleted the AddStartsWith branch February 22, 2018 01:09
@karelz karelz added this to the 2.1.0 milestone Mar 10, 2018
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…ison (dotnet/corefx#26880)

* Add ROSpan StartsWith and EndsWith string-like APIs with StringComparison

* We have separate implementations of slow and fast span.

* Remove unused using directive.

* Address PR feedback and add tests.

* Add test files

* Get the string from the underlying span to avoid allocation when possible.

* Update ToString and add tests

* Update Span ToString and add tests

* Address PR feedback and disable ToString test for fast span

* Borrow additional tests from the existing StringTests test bed.

* Fix comment grammar

* No StringComparison results in generic StartsWith being called which does ordinal comparison

* Address feedback related to ToString and tests

* Fix tests for culture specific cases, for Unix.

* Fix typo in variable names

* Respond to recent change AsReadOnlySpan -> AsSpan

* Fix typo in test.


Commit migrated from dotnet/corefx@42d5370
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants