Vectorize SkipWhiteSpace and ConsumeIntegerDigits in System.Text.Json#124647
Vectorize SkipWhiteSpace and ConsumeIntegerDigits in System.Text.Json#124647
Conversation
|
@copilot please resume work on this |
Use SearchValues<byte> + IndexOfAnyExcept for whitespace skipping and IndexOfAnyExceptInRange for digit scanning, with scalar fallbacks for non-.NETCoreApp targets. Co-authored-by: eiriktsarpalis <2813363+eiriktsarpalis@users.noreply.github.com>
Implemented vectorization for two hot-path scalar loops in
All changes are behind Analysis of remaining STJ surface area: Most other hot paths are already vectorized — |
| i = 0; | ||
| data = _buffer; | ||
| #if NET | ||
| nonDigitOffset = data.IndexOfAnyExceptInRange((byte)'0', (byte)'9'); |
There was a problem hiding this comment.
@copilot this pattern seems a bit repetitive. Is there a chance we could extract this into a helper method (marked as aggressive inlining) without sacrificing performance gains?
There was a problem hiding this comment.
Extracted the repeated pattern into a JsonHelpers.SkipDigits helper method marked [MethodImpl(MethodImplOptions.AggressiveInlining)]. All 3 call sites now use the shared helper, eliminating the duplicated #if NET blocks. Commit 3cfb166.
|
@EgorBot -amd -intel -arm --filter "System.Text.Json" |
There was a problem hiding this comment.
Pull request overview
This PR improves System.Text.Json parsing performance by vectorizing two hot-path scanning loops in Utf8JsonReader, using SIMD-accelerated span APIs on modern .NET targets while preserving scalar fallbacks for other TFMs.
Changes:
- Vectorized
Utf8JsonReader.SkipWhiteSpace()usingSearchValues<byte>+IndexOfAnyExcept, and updated newline/column tracking. - Vectorized
ConsumeIntegerDigits()usingIndexOfAnyExceptInRange('0','9')in the single-span reader. - Applied the same digit-scanning vectorization to
ConsumeIntegerDigitsMultiSegment()for multi-segment inputs.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/libraries/System.Text.Json/src/System/Text/Json/Reader/Utf8JsonReader.cs | Adds SIMD-based whitespace skipping and digit scanning guarded by #if NET. |
| src/libraries/System.Text.Json/src/System/Text/Json/Reader/Utf8JsonReader.MultiSegment.cs | Adds SIMD-based digit scanning to the multi-segment integer parsing loop under #if NET. |
You can also share your feedback on Copilot code review. Take the survey.
src/libraries/System.Text.Json/src/System/Text/Json/Reader/Utf8JsonReader.cs
Show resolved
Hide resolved
Extracts the repeated IndexOfAnyExceptInRange pattern into a shared JsonHelpers.SkipDigits helper method marked AggressiveInlining, eliminating the duplicated #if NET blocks across 3 call sites. Co-authored-by: eiriktsarpalis <2813363+eiriktsarpalis@users.noreply.github.com>
|
@EgorBot -amd -intel -arm --filter "System.Text.JsonPerf_Reader" |
|
Seeing as the benchmarks are seeing a limited number of wins, but is otherwise mostly scoring major regressions I'd be inclined to just close this one. cc @stephentoub |
|
I'd note that the PR title is a bit misleading here. There is no actual vectorization contained in this PR, it was more rather an attempt to simplify and use the existing span and other APIs rather than having hand-rolled loops which do the same. The fact that the existing span APIs happen to be vectorized is largely an implementation detail Its possible (and I'd expect likely) that a more direct vectorization approach may even still be beneficial as compared to |
|
See my reply in the issue #28937 (comment). This PR does nothing to implement simdjson, so now I'm confused how the benchmark result here can ever be used for justifying the simdjson feature request. |
Description
Vectorizes hot-path scalar loops in
Utf8JsonReaderusing SIMD-accelerated .NET APIs, addressing the request to identify and implement concrete vectorization opportunities in the System.Text.Json codebase.Codebase Analysis
A thorough scan of the System.Text.Json codebase found that most hot paths are already vectorized:
IndexOfQuoteOrAnyControlOrBackSlash— usesSearchValues<byte>(NET8+) andVector<T>(netstandard)NeedsEscaping(ReadOnlySpan)— delegates toJavaScriptEncoderwhich usesSearchValues<byte>+IndexOfAnyExceptJsonReaderHelper— usesSearchValues<char>for path/escape character detectionTwo remaining scalar loops were identified and vectorized:
Changes Made
SkipWhiteSpace()inUtf8JsonReader.cs— This method is called after every token and was scanning whitespace byte-by-byte. Replaced withSearchValues<byte>+IndexOfAnyExceptfor bulk whitespace skipping, thenCountandLastIndexOffor newline tracking.ConsumeIntegerDigits()inUtf8JsonReader.cs— Replaced the digit-by-digit scanning loop withIndexOfAnyExceptInRange((byte)'0', (byte)'9')for vectorized digit range matching.ConsumeIntegerDigitsMultiSegment()inUtf8JsonReader.MultiSegment.cs— Same vectorization applied to both inner digit-scanning loops in the multi-segment variant.JsonHelpers.SkipDigits()helper — Extracted the repeatedIndexOfAnyExceptInRangedigit-scanning pattern into a sharedAggressiveInlininghelper method inJsonHelpers.cs, eliminating duplicated#if NETblocks across all 3 digit-scanning call sites.All changes are guarded behind
#if NETwith the original scalar implementations preserved as#elsefallbacks for netstandard/NetFramework targets.Testing
Original prompt
🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.