Improve MdUtf8String by benaadams · Pull Request #21720 · dotnet/coreclr

benaadams · 2018-12-30T22:57:26Z

They get called in a multiplicative call chain:

Contributes to dotnet/corefx#34283 (comment)

benaadams · 2018-12-30T22:58:39Z

-                    pItr++;
-                }
-            }
+            const int MaxStringLength = 1024;


Previously it was unlimited length; but the SpanHelper.IndexOf isn't happy with that. Have chosen 1024; is that too small?

Is the actual length stored in the metadata so the overload with length could be called, or is it just a null terminated utf8string? /cc @GrabYourPitchforks

The risk is that IndexOf('\0') may access memory that it is not supposed to access when it is given arbitrary sized buffer. It should not happen today given what the implementation looks like, but it is not a good idea to start spreading subtle assumptions like these throughout the codebase.

For now, I would just call StubHelpers.strlen or some other proper strlen variant here (there are several). Consider refactoring of wcslen/strlen to use fully managed (vectorized, etc.) implementation in separate change.

To keep it isolated to one place and easier to refactor/identify? Makes sense.

The comment in the unmanaged code that I removed for the string compare is:

// Important: the string in pSsz isn't null terminated so the length must be used // when performing operations on the string.

So it does feel an uncomfortable coupling; though I suppose that's for the other .ctor of MdUtf8String which takes length rather than looking for a null termination?

Consider refactoring of wcslen/strlen to use fully managed (vectorized, etc.) implementation in separate change.

#21729

benaadams · 2018-12-31T00:27:46Z

@dotnet-bot test Ubuntu x64 Checked Innerloop Build and Test

jkotas · 2018-12-31T06:34:04Z

            if ((s.m_StringHeapByteLength == m_StringHeapByteLength) && (m_StringHeapByteLength != 0))
            {
-                return EqualsCaseSensitive(s.m_pStringHeap, m_pStringHeap, m_StringHeapByteLength);
+                isEqual = SpanHelpers.SequenceEqual<byte>(ref Unsafe.AsRef<byte>(s.m_pStringHeap), ref Unsafe.AsRef<byte>(m_pStringHeap), m_StringHeapByteLength) ? true : false;


This does not need to use Unsafe.AsRef. Regular cast to byte* is enough. Maybe change m_pStringHeap to byte* - it may result into fewer casts.

What's the syntax here? Do I need to ref the dereference e.g. ref *m_pStringHeap it doesn't seem happy using m_pStringHeap directly

Error CS1620 Argument 2 must be passed with the 'ref' keyword

Adding just ref complains about its type (after complaining a readonly value can't be passed by ref)

Error CS1503 Argument 2: cannot convert from 'ref byte*' to 'ref byte'

Using ref deference seems ok, if a bit weird

jkotas · 2018-12-31T06:53:46Z

-                    pItr++;
-                }
-            }
+            const int MaxStringLength = 1024;


The risk is that IndexOf('\0') may access memory that it is not supposed to access when it is given arbitrary sized buffer. It should not happen today given what the implementation looks like, but it is not a good idea to start spreading subtle assumptions like these throughout the codebase.

For now, I would just call StubHelpers.strlen or some other proper strlen variant here (there are several). Consider refactoring of wcslen/strlen to use fully managed (vectorized, etc.) implementation in separate change.

jkotas · 2018-12-31T15:17:05Z

Thank you!

* Move MdUtf8String::EqualsCaseSensitive to managed code * Move MdUtf8String.ToString to safe code * Use Encoding.UTF8.GetString Commit migrated from dotnet/coreclr@e52aaee

benaadams added 3 commits December 30, 2018 22:49

Move MdUtf8String::EqualsCaseSensitive to managed code

80b0d74

Vectorize MdUtf8String::GetUtf8StringByteLength

405e253

Move MdUtf8String.ToString to safe code

4b5ac1a

benaadams commented Dec 30, 2018

View reviewed changes

Tidy up SpanAction

1a83511

jkotas reviewed Dec 31, 2018

View reviewed changes

benaadams added 2 commits December 31, 2018 12:52

Use Encoding.UTF8.GetString

ae7424b

Feedback

7a54747

jkotas reviewed Dec 31, 2018

View reviewed changes

Comment thread src/System.Private.CoreLib/src/System/RtType.cs Outdated

benaadams force-pushed the MdUtf8String branch from 63d118e to bab96db Compare December 31, 2018 13:34

Tweaks

bab96db

jkotas approved these changes Dec 31, 2018

View reviewed changes

jkotas merged commit e52aaee into dotnet:master Dec 31, 2018

benaadams deleted the MdUtf8String branch December 31, 2018 15:42

benaadams mentioned this pull request Dec 31, 2018

strlen to managed code and vectorize #21729

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve MdUtf8String#21720

Improve MdUtf8String#21720
jkotas merged 7 commits into
dotnet:masterfrom
benaadams:MdUtf8String

benaadams commented Dec 30, 2018 •

edited

Loading

Uh oh!

benaadams Dec 30, 2018 •

edited

Loading

Uh oh!

jkotas Dec 31, 2018

Uh oh!

benaadams Dec 31, 2018

Uh oh!

benaadams Dec 31, 2018

Uh oh!

benaadams commented Dec 31, 2018

Uh oh!

Uh oh!

jkotas Dec 31, 2018

Uh oh!

benaadams Dec 31, 2018

Uh oh!

Uh oh!

jkotas Dec 31, 2018

Uh oh!

Uh oh!

jkotas commented Dec 31, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

benaadams commented Dec 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benaadams Dec 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkotas Dec 31, 2018

Choose a reason for hiding this comment

Uh oh!

benaadams Dec 31, 2018

Choose a reason for hiding this comment

Uh oh!

benaadams Dec 31, 2018

Choose a reason for hiding this comment

Uh oh!

benaadams commented Dec 31, 2018

Uh oh!

Uh oh!

jkotas Dec 31, 2018

Choose a reason for hiding this comment

Uh oh!

benaadams Dec 31, 2018

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jkotas Dec 31, 2018

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jkotas commented Dec 31, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

benaadams commented Dec 30, 2018 •

edited

Loading

benaadams Dec 30, 2018 •

edited

Loading