-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Fix GetIndexOfFirstNonAsciiByte_Vector not taken on ARM64 #90527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Tagging subscribers to this area: @dotnet/area-system-text-encoding Issue DetailsFixes #89924
|
| if (!Vector512.IsHardwareAccelerated && | ||
| !Vector256.IsHardwareAccelerated && | ||
| (Sse2.IsSupported || AdvSimd.IsSupported)) | ||
| if (!Vector512.IsHardwareAccelerated && !Vector256.IsHardwareAccelerated && Sse2.IsSupported) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do this, then doesn't this change leave a bunch of dead code in GetIndexOfFirstNonAsciiByte_Intrinsified that should be cleaned up? It has multiple blocks dedicated to AdvSimd which would never be reached. At which point the method should also be renamed to be SSE-specific.
But I'd also explicitly asked about this:
#88532 (comment)
and was told the preferring of the intrinsics path was by design. Maybe it was only measured on x64 and not on Arm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do this, then doesn't this change leave a bunch of dead code in GetIndexOfFirstNonAsciiByte_Intrinsified that should be cleaned up? It has multiple blocks dedicated to AdvSimd which would never be reached. At which point the method should also be renamed to be SSE-specific.
Let me fix that.
The _Vector path is estimated to be ~40% faster on osx-arm64, It is likely the switch was not measured on ARM64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _Vector path is estimated to be ~40% faster on osx-arm64, It is likely the switch was not measured on ARM64.
@neon-sunset can you clarify please? .NET 7 was using the _Intrinsified path for Arm64: https://github.com/dotnet/runtime/blob/v7.0.10/src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs#L81-L91
The PR that introduced the _Vector path (#88532) didn't change _Intrinsified at all (outside using the centralized Vector128.Size constant) and didn't change which Arm64 was calling.
This then wouldn't seem like a regression, but rather simply a case where the new _Vector code is faster for Arm64. If that's the case, could you share your own numbers showing the .NET 7 vs .NET 8 difference (without this PR) and the difference for _Vector vs _Intrinsified for .NET 8 (this PR vs without this PR)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@neon-sunset can you clarify please? .NET 7 was using the _Intrinsified path for Arm64: https://github.com/dotnet/runtime/blob/v7.0.10/src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs#L81-L91
You are correct, my mistake.
This then wouldn't seem like a regression, but rather simply a case where the new _Vector code is faster for Arm64. If that's the case, could you share your own numbers showing the .NET 7 vs .NET 8 difference (without this PR) and the difference for _Vector vs _Intrinsified for .NET 8 (this PR vs without this PR)?
This is the plan, I'm just waiting for the Release build to finish to post detailed numbers 😄
9985c3e to
a00b989
Compare
|
Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it. |
Fixes #89924
CC @EgorBo @anthonycanino