Skip to content

Vectorize SkipWhiteSpace and ConsumeIntegerDigits in System.Text.Json#124647

Closed
Copilot wants to merge 4 commits intomainfrom
copilot/enable-simd-for-json-api
Closed

Vectorize SkipWhiteSpace and ConsumeIntegerDigits in System.Text.Json#124647
Copilot wants to merge 4 commits intomainfrom
copilot/enable-simd-for-json-api

Conversation

Copy link
Contributor

Copilot AI commented Feb 20, 2026

Description

Vectorizes hot-path scalar loops in Utf8JsonReader using SIMD-accelerated .NET APIs, addressing the request to identify and implement concrete vectorization opportunities in the System.Text.Json codebase.

Codebase Analysis

A thorough scan of the System.Text.Json codebase found that most hot paths are already vectorized:

  • IndexOfQuoteOrAnyControlOrBackSlash — uses SearchValues<byte> (NET8+) and Vector<T> (netstandard)
  • NeedsEscaping(ReadOnlySpan) — delegates to JavaScriptEncoder which uses SearchValues<byte> + IndexOfAnyExcept
  • JsonReaderHelper — uses SearchValues<char> for path/escape character detection
  • Unescaping logic — inherently sequential due to escape sequence handling; cannot benefit from SIMD

Two remaining scalar loops were identified and vectorized:

Changes Made

  1. SkipWhiteSpace() in Utf8JsonReader.cs — This method is called after every token and was scanning whitespace byte-by-byte. Replaced with SearchValues<byte> + IndexOfAnyExcept for bulk whitespace skipping, then Count and LastIndexOf for newline tracking.

  2. ConsumeIntegerDigits() in Utf8JsonReader.cs — Replaced the digit-by-digit scanning loop with IndexOfAnyExceptInRange((byte)'0', (byte)'9') for vectorized digit range matching.

  3. ConsumeIntegerDigitsMultiSegment() in Utf8JsonReader.MultiSegment.cs — Same vectorization applied to both inner digit-scanning loops in the multi-segment variant.

  4. JsonHelpers.SkipDigits() helper — Extracted the repeated IndexOfAnyExceptInRange digit-scanning pattern into a shared AggressiveInlining helper method in JsonHelpers.cs, eliminating duplicated #if NET blocks across all 3 digit-scanning call sites.

All changes are guarded behind #if NET with the original scalar implementations preserved as #else fallbacks for netstandard/NetFramework targets.

Testing

  • ✅ Build succeeds with 0 errors, 0 warnings
  • ✅ All 49,986 existing System.Text.Json tests pass with 0 failures
  • ✅ Automated code review found no issues
Original prompt

This section details on the original issue you should resolve

<issue_title>Enable SIMD for JSON API</issue_title>
<issue_description>From @hez2010 on March 9, 2019 9:19

Is your feature request related to a problem? Please describe.

Feature request. Enable SIMD for the new Json APIs in ASP.NET Core 3.

Describe the solution you'd like

See SimdJsonSharp
With SIMD support, SimdJsonSharp is faster than any existing Json library on .NET.
I think the new Json APIs in ASP.NET Core 3 should add SIMD support to improve its performance.

Copied from original issue: dotnet/aspnetcore#8366</issue_description>

<agent_instructions>Scan the System.Text.Json codebase for opportunities to vectorize methods as cited in the original issue and comments. If no immediate opportunities are found, please state so explicitly so we can close the issue.</agent_instructions>

Comments on the Issue (you are @copilot in this section)

@tannergooding I'd defer to @ahsonkhan for his stance; but I would expect we aren't going to look into this for the 3.0 timeframe. Instead, 3.0 will likely aim towards getting the new API shipped and stable and perf improvements (like this) will be investigated for a future release. @eiriktsarpalis Next step would be to identify concrete vectorization opportunities in the codebase. STJ already uses vectorization indirectly via the encoding routines in `System.Text.Encodings.Web` and the Span helper methods. @tannergooding Many of the most popular SIMD implementations are not “strictly spec compliant” and may have edge case allowances that we can’t support in the built-in implementation

While some more opportunities likely exist, the type of perf gains some people are envisioning simply aren’t possible in a full featured, secure, and robust implementation.

See also the discussion above about being oriented towards different goals, such as streaming and incremental parsing vs parsing entire files at once or similar </comment_new>
<comment_new>@tannergooding
That being said, in general if there are incremental improvements with benchmarks showing the performance gains (ideally shown via the existing tests in dotnet/performance), we are open source and people are welcome to contribute

Larger changes should ideally be discussed with the area owners first. But isolated changes and targeted improvements are “typically” fine to just have a PR submitted


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

@eiriktsarpalis
Copy link
Member

@copilot please resume work on this

eiriktsarpalis and others added 2 commits March 18, 2026 18:24
Use SearchValues<byte> + IndexOfAnyExcept for whitespace skipping and
IndexOfAnyExceptInRange for digit scanning, with scalar fallbacks for
non-.NETCoreApp targets.

Co-authored-by: eiriktsarpalis <2813363+eiriktsarpalis@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 18, 2026

@copilot please resume work on this

Implemented vectorization for two hot-path scalar loops in Utf8JsonReader (commit d76d130):

  1. SkipWhiteSpace() — replaced byte-by-byte whitespace scanning with SearchValues<byte> + IndexOfAnyExcept for bulk skip, then Count/LastIndexOf for newline tracking. This is called after every token.

  2. ConsumeIntegerDigits() / ConsumeIntegerDigitsMultiSegment() — replaced digit-by-digit loops with IndexOfAnyExceptInRange((byte)'0', (byte)'9').

All changes are behind #if NET with scalar fallbacks for netstandard/NetFramework. 49,976 existing tests pass with 0 failures.

Analysis of remaining STJ surface area: Most other hot paths are already vectorized — IndexOfQuoteOrAnyControlOrBackSlash uses SearchValues<byte> (net8+), NeedsEscaping delegates to JavaScriptEncoder which uses SearchValues<byte> + IndexOfAnyExcept, and JsonReaderHelper already uses SearchValues<char> for path/escape detection. The unescaping logic is inherently sequential due to escape sequence handling and can't meaningfully benefit from SIMD.

Copilot AI changed the title [WIP] Add SIMD support for Json APIs in ASP.NET Core 3 Vectorize SkipWhiteSpace and ConsumeIntegerDigits in System.Text.Json Mar 18, 2026
Copilot AI requested a review from eiriktsarpalis March 18, 2026 16:53
i = 0;
data = _buffer;
#if NET
nonDigitOffset = data.IndexOfAnyExceptInRange((byte)'0', (byte)'9');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot this pattern seems a bit repetitive. Is there a chance we could extract this into a helper method (marked as aggressive inlining) without sacrificing performance gains?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted the repeated pattern into a JsonHelpers.SkipDigits helper method marked [MethodImpl(MethodImplOptions.AggressiveInlining)]. All 3 call sites now use the shared helper, eliminating the duplicated #if NET blocks. Commit 3cfb166.

@eiriktsarpalis
Copy link
Member

@EgorBot -amd -intel -arm --filter "System.Text.Json"

@eiriktsarpalis eiriktsarpalis marked this pull request as ready for review March 19, 2026 08:32
Copilot AI review requested due to automatic review settings March 19, 2026 08:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves System.Text.Json parsing performance by vectorizing two hot-path scanning loops in Utf8JsonReader, using SIMD-accelerated span APIs on modern .NET targets while preserving scalar fallbacks for other TFMs.

Changes:

  • Vectorized Utf8JsonReader.SkipWhiteSpace() using SearchValues<byte> + IndexOfAnyExcept, and updated newline/column tracking.
  • Vectorized ConsumeIntegerDigits() using IndexOfAnyExceptInRange('0','9') in the single-span reader.
  • Applied the same digit-scanning vectorization to ConsumeIntegerDigitsMultiSegment() for multi-segment inputs.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/libraries/System.Text.Json/src/System/Text/Json/Reader/Utf8JsonReader.cs Adds SIMD-based whitespace skipping and digit scanning guarded by #if NET.
src/libraries/System.Text.Json/src/System/Text/Json/Reader/Utf8JsonReader.MultiSegment.cs Adds SIMD-based digit scanning to the multi-segment integer parsing loop under #if NET.

You can also share your feedback on Copilot code review. Take the survey.

Extracts the repeated IndexOfAnyExceptInRange pattern into a shared
JsonHelpers.SkipDigits helper method marked AggressiveInlining,
eliminating the duplicated #if NET blocks across 3 call sites.

Co-authored-by: eiriktsarpalis <2813363+eiriktsarpalis@users.noreply.github.com>
@eiriktsarpalis
Copy link
Member

@EgorBot -amd -intel -arm --filter "System.Text.JsonPerf_Reader"

@eiriktsarpalis eiriktsarpalis added the tenet-performance Performance related issue label Mar 19, 2026
@eiriktsarpalis
Copy link
Member

Seeing as the benchmarks are seeing a limited number of wins, but is otherwise mostly scoring major regressions I'd be inclined to just close this one. cc @stephentoub

@tannergooding
Copy link
Member

I'd note that the PR title is a bit misleading here.

There is no actual vectorization contained in this PR, it was more rather an attempt to simplify and use the existing span and other APIs rather than having hand-rolled loops which do the same. The fact that the existing span APIs happen to be vectorized is largely an implementation detail

Its possible (and I'd expect likely) that a more direct vectorization approach may even still be beneficial as compared to SearchValues, as it would avoid some of the repeated work and checks that were being done, particularly when a newline is encountered.

@hez2010
Copy link
Contributor

hez2010 commented Mar 19, 2026

See my reply in the issue #28937 (comment). This PR does nothing to implement simdjson, so now I'm confused how the benchmark result here can ever be used for justifying the simdjson feature request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Enable SIMD for JSON API

5 participants