Skip to content

simdjson_decode_from_input: In case of memory stream, do not allocate…#67

Merged
JakubOnderka merged 1 commit intomasterfrom
read-optim
Apr 12, 2026
Merged

simdjson_decode_from_input: In case of memory stream, do not allocate…#67
JakubOnderka merged 1 commit intomasterfrom
read-optim

Conversation

@JakubOnderka
Copy link
Copy Markdown
Owner

@JakubOnderka JakubOnderka commented Apr 12, 2026

… new buffer

Summary by CodeRabbit

  • Bug Fixes
    • Faster handling of in-memory HTTP request bodies for JSON decoding, avoiding unnecessary rewinds and copies to reduce latency.
    • Improved error handling for JSON parse failures, surfacing JSON-related errors immediately.
    • Preserves previous behavior and fallback parsing when the fast-path does not apply.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 12, 2026

📝 Walkthrough

Walkthrough

The PR adds a PHP >= 8.2 fast-path to PHP_FUNCTION(simdjson_decode_from_input): when SG(request_info).request_body exists and its abstract contains a memory-backed innerstream, the code reads the buffered zend_string and decodes directly (reusing or creating a parser). Non-memory streams use the existing rewind-and-copy-to-memory path.

Changes

Cohort / File(s) Summary
Stream buffer optimization
php_simdjson.cpp
Adds a PHP >= 8.2 guarded fast-path that treats body->abstract as a temp struct with php_stream *innerstream; if innerstream is memory-backed, obtains the zend_string via php_stream_memory_get_buffer(...), attempts simdjson_simple_decode (using a reused parser or a new one), returns early on success, or throws JsonException on parse error. Falls back to previous rewind-and-copy-to-memory logic for other cases.

Sequence Diagram(s)

sequenceDiagram
    participant Request as SG.request_info
    participant Body as request_body (php_stream)
    participant Inner as innerstream (memory)
    participant Parser as simdjson parser
    participant Fallback as rewind+copy path

    Request->>Body: request_body present?
    alt abstract contains innerstream && memory-backed
        Body->>Inner: php_stream_memory_get_buffer()
        Inner->>Parser: simdjson_simple_decode(buffer) (reuse or new)
        Parser-->>Request: decoded value OR throw JsonException
    else
        Body->>Fallback: rewind and copy to memory
        Fallback->>Parser: parse copied memory
        Parser-->>Request: decoded value OR throw JsonException
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Poem

🐰 I nibble bytes from buffers near,
I skip the copy and give a cheer.
A parser reused, a hop, a spin,
From memory sweet the parse comes in.
Hooray — no extra drag, just grin! 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: optimizing simdjson_decode_from_input to avoid buffer allocation for memory streams.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch read-optim

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
php_simdjson.cpp (1)

372-383: Missing simdjson_simple_decode optimization in memory-backed path.

The fallback non-memory path eventually calls simdjson_simple_decode() (lines 437-440) to fast-path common JSON values like {}, [], true, false without invoking the parser. The new memory-backed path skips this optimization, which could regress performance for simple JSON payloads.

Suggested fix
         if (php_stream_is(ts->innerstream, PHP_STREAM_IS_MEMORY)) {
             // whole body is in memory, so we can just read stream buffer without allocating new buffer
             zend_string *membuf = php_stream_memory_get_buffer(ts->innerstream);
+            if (simdjson_simple_decode(ZSTR_VAL(membuf), ZSTR_LEN(membuf), return_value, associative)) {
+                return;
+            }
             if (SIMDJSON_SHOULD_REUSE_PARSER(ZSTR_LEN(membuf))) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@php_simdjson.cpp` around lines 372 - 383, The memory-backed branch skips the
fast-path simdjson_simple_decode optimization; modify the block guarded by
SIMDJSON_SHOULD_REUSE_PARSER (and the else branch that creates a new parser) to
first call simdjson_simple_decode(membuf, return_value, associative, depth) and
if it indicates a handled/simple value return immediately, otherwise fall back
to php_simdjson_parse with simdjson_get_reused_parser() (or the newly created
simdjson_php_parser) and keep the existing error handling
(php_simdjson_throw_jsonexception + RETURN_THROWS) and parser free
(php_simdjson_free_parser) logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@php_simdjson.cpp`:
- Around line 362-369: The code unsafely casts body->abstract to
simdjson_php_stream_temp_data* without verifying the stream type; before
accessing ts or ts->innerstream, call php_stream_is(body, PHP_STREAM_IS_MEMORY)
(or the appropriate temp-stream predicate) to ensure body is a temp stream when
handling the "already processed" SG(request_info).request_body path, and only
then cast to simdjson_php_stream_temp_data*; otherwise fall back to the existing
creation logic (php_stream_temp_create_ex) or handle non-temp streams safely to
avoid undefined behavior.

---

Nitpick comments:
In `@php_simdjson.cpp`:
- Around line 372-383: The memory-backed branch skips the fast-path
simdjson_simple_decode optimization; modify the block guarded by
SIMDJSON_SHOULD_REUSE_PARSER (and the else branch that creates a new parser) to
first call simdjson_simple_decode(membuf, return_value, associative, depth) and
if it indicates a handled/simple value return immediately, otherwise fall back
to php_simdjson_parse with simdjson_get_reused_parser() (or the newly created
simdjson_php_parser) and keep the existing error handling
(php_simdjson_throw_jsonexception + RETURN_THROWS) and parser free
(php_simdjson_free_parser) logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8c4c7bd8-3ac0-4c04-9ee9-706c6221e6ae

📥 Commits

Reviewing files that changed from the base of the PR and between 08f6dce and 1481b10.

📒 Files selected for processing (1)
  • php_simdjson.cpp

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
php_simdjson.cpp (1)

367-370: ⚠️ Potential issue | 🔴 Critical

Guard the temp-stream cast before touching body->abstract.

This fast path dereferences body->abstract before proving that SG(request_info).request_body is actually a temp stream. In release builds the ZEND_ASSERTs disappear, so a non-temp request body can turn this into UB instead of falling back to the rewind/copy path.

🛠️ Proposed fix
-        simdjson_php_stream_temp_data *ts = (simdjson_php_stream_temp_data*)body->abstract;
-        ZEND_ASSERT(ts != NULL);
-        ZEND_ASSERT(ts->innerstream != NULL);
-        if (php_stream_is(ts->innerstream, PHP_STREAM_IS_MEMORY)) {
+        if (php_stream_is(body, PHP_STREAM_IS_TEMP) && body->abstract != NULL) {
+            simdjson_php_stream_temp_data *ts = (simdjson_php_stream_temp_data*)body->abstract;
+            if (ts->innerstream != NULL && php_stream_is(ts->innerstream, PHP_STREAM_IS_MEMORY)) {
-        }
+            }
+        }

Run this to confirm the current block dereferences body->abstract without a temp-stream guard first. Expected result: the snippet shows body->abstract access, and the search finds no php_stream_is(body, PHP_STREAM_IS_TEMP) in this branch.

#!/bin/bash
set -euo pipefail

sed -n '352,389p' php_simdjson.cpp
echo
rg -n -C2 'body->abstract|php_stream_is\(body,\s*PHP_STREAM_IS_TEMP\)' php_simdjson.cpp
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@php_simdjson.cpp` around lines 367 - 370, The code currently casts
body->abstract to simdjson_php_stream_temp_data* (ts) and asserts before proving
the request body is a temp stream, which is UB in release builds; change the
logic to first check whether SG(request_info).request_body (the variable body)
is a temp stream using php_stream_is(body, PHP_STREAM_IS_TEMP) and only then
read body->abstract and assign to simdjson_php_stream_temp_data *ts, followed by
the existing php_stream_is(ts->innerstream, PHP_STREAM_IS_MEMORY) check; ensure
any early-fallback (rewind/copy) path runs when the body is not a temp stream so
that dereferencing body->abstract never happens unless php_stream_is confirms a
temp stream.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@php_simdjson.cpp`:
- Around line 367-370: The code currently casts body->abstract to
simdjson_php_stream_temp_data* (ts) and asserts before proving the request body
is a temp stream, which is UB in release builds; change the logic to first check
whether SG(request_info).request_body (the variable body) is a temp stream using
php_stream_is(body, PHP_STREAM_IS_TEMP) and only then read body->abstract and
assign to simdjson_php_stream_temp_data *ts, followed by the existing
php_stream_is(ts->innerstream, PHP_STREAM_IS_MEMORY) check; ensure any
early-fallback (rewind/copy) path runs when the body is not a temp stream so
that dereferencing body->abstract never happens unless php_stream_is confirms a
temp stream.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ffa6667e-70a3-476a-a771-2ea0a6fa4706

📥 Commits

Reviewing files that changed from the base of the PR and between 1481b10 and 329fbe7.

📒 Files selected for processing (1)
  • php_simdjson.cpp

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@php_simdjson.cpp`:
- Around line 367-371: The code currently casts body->abstract to
simdjson_php_stream_temp_data* guarded only by ZEND_ASSERT, which is a no-op in
release builds; change this to perform a runtime check using php_stream_is(body,
PHP_STREAM_IS_TEMP) before doing the cast and accessing ts and ts->innerstream
(keep the existing ZEND_ASSERT calls as debug-time sanity checks). In practice:
wrap the cast and the subsequent if (php_stream_is(ts->innerstream,
PHP_STREAM_IS_MEMORY)) block inside an if (php_stream_is(body,
PHP_STREAM_IS_TEMP)) { ... } so the optimization path only runs when body is
actually a temp stream, and keep the existing NULL checks for ts and
ts->innerstream inside that branch.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d51b603d-d3d8-4cf1-a1df-3f7462040962

📥 Commits

Reviewing files that changed from the base of the PR and between 329fbe7 and 7e70d57.

📒 Files selected for processing (1)
  • php_simdjson.cpp

@JakubOnderka JakubOnderka merged commit aa1c0ef into master Apr 12, 2026
57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant