simdjson_decode_from_input: In case of memory stream, do not allocate…#67
simdjson_decode_from_input: In case of memory stream, do not allocate…#67JakubOnderka merged 1 commit intomasterfrom
Conversation
📝 WalkthroughWalkthroughThe PR adds a PHP >= 8.2 fast-path to Changes
Sequence Diagram(s)sequenceDiagram
participant Request as SG.request_info
participant Body as request_body (php_stream)
participant Inner as innerstream (memory)
participant Parser as simdjson parser
participant Fallback as rewind+copy path
Request->>Body: request_body present?
alt abstract contains innerstream && memory-backed
Body->>Inner: php_stream_memory_get_buffer()
Inner->>Parser: simdjson_simple_decode(buffer) (reuse or new)
Parser-->>Request: decoded value OR throw JsonException
else
Body->>Fallback: rewind and copy to memory
Fallback->>Parser: parse copied memory
Parser-->>Request: decoded value OR throw JsonException
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
1481b10 to
329fbe7
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
php_simdjson.cpp (1)
372-383: Missingsimdjson_simple_decodeoptimization in memory-backed path.The fallback non-memory path eventually calls
simdjson_simple_decode()(lines 437-440) to fast-path common JSON values like{},[],true,falsewithout invoking the parser. The new memory-backed path skips this optimization, which could regress performance for simple JSON payloads.Suggested fix
if (php_stream_is(ts->innerstream, PHP_STREAM_IS_MEMORY)) { // whole body is in memory, so we can just read stream buffer without allocating new buffer zend_string *membuf = php_stream_memory_get_buffer(ts->innerstream); + if (simdjson_simple_decode(ZSTR_VAL(membuf), ZSTR_LEN(membuf), return_value, associative)) { + return; + } if (SIMDJSON_SHOULD_REUSE_PARSER(ZSTR_LEN(membuf))) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@php_simdjson.cpp` around lines 372 - 383, The memory-backed branch skips the fast-path simdjson_simple_decode optimization; modify the block guarded by SIMDJSON_SHOULD_REUSE_PARSER (and the else branch that creates a new parser) to first call simdjson_simple_decode(membuf, return_value, associative, depth) and if it indicates a handled/simple value return immediately, otherwise fall back to php_simdjson_parse with simdjson_get_reused_parser() (or the newly created simdjson_php_parser) and keep the existing error handling (php_simdjson_throw_jsonexception + RETURN_THROWS) and parser free (php_simdjson_free_parser) logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@php_simdjson.cpp`:
- Around line 362-369: The code unsafely casts body->abstract to
simdjson_php_stream_temp_data* without verifying the stream type; before
accessing ts or ts->innerstream, call php_stream_is(body, PHP_STREAM_IS_MEMORY)
(or the appropriate temp-stream predicate) to ensure body is a temp stream when
handling the "already processed" SG(request_info).request_body path, and only
then cast to simdjson_php_stream_temp_data*; otherwise fall back to the existing
creation logic (php_stream_temp_create_ex) or handle non-temp streams safely to
avoid undefined behavior.
---
Nitpick comments:
In `@php_simdjson.cpp`:
- Around line 372-383: The memory-backed branch skips the fast-path
simdjson_simple_decode optimization; modify the block guarded by
SIMDJSON_SHOULD_REUSE_PARSER (and the else branch that creates a new parser) to
first call simdjson_simple_decode(membuf, return_value, associative, depth) and
if it indicates a handled/simple value return immediately, otherwise fall back
to php_simdjson_parse with simdjson_get_reused_parser() (or the newly created
simdjson_php_parser) and keep the existing error handling
(php_simdjson_throw_jsonexception + RETURN_THROWS) and parser free
(php_simdjson_free_parser) logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
There was a problem hiding this comment.
♻️ Duplicate comments (1)
php_simdjson.cpp (1)
367-370:⚠️ Potential issue | 🔴 CriticalGuard the temp-stream cast before touching
body->abstract.This fast path dereferences
body->abstractbefore proving thatSG(request_info).request_bodyis actually a temp stream. In release builds theZEND_ASSERTs disappear, so a non-temp request body can turn this into UB instead of falling back to the rewind/copy path.🛠️ Proposed fix
- simdjson_php_stream_temp_data *ts = (simdjson_php_stream_temp_data*)body->abstract; - ZEND_ASSERT(ts != NULL); - ZEND_ASSERT(ts->innerstream != NULL); - if (php_stream_is(ts->innerstream, PHP_STREAM_IS_MEMORY)) { + if (php_stream_is(body, PHP_STREAM_IS_TEMP) && body->abstract != NULL) { + simdjson_php_stream_temp_data *ts = (simdjson_php_stream_temp_data*)body->abstract; + if (ts->innerstream != NULL && php_stream_is(ts->innerstream, PHP_STREAM_IS_MEMORY)) {- } + } + }Run this to confirm the current block dereferences
body->abstractwithout a temp-stream guard first. Expected result: the snippet showsbody->abstractaccess, and the search finds nophp_stream_is(body, PHP_STREAM_IS_TEMP)in this branch.#!/bin/bash set -euo pipefail sed -n '352,389p' php_simdjson.cpp echo rg -n -C2 'body->abstract|php_stream_is\(body,\s*PHP_STREAM_IS_TEMP\)' php_simdjson.cpp🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@php_simdjson.cpp` around lines 367 - 370, The code currently casts body->abstract to simdjson_php_stream_temp_data* (ts) and asserts before proving the request body is a temp stream, which is UB in release builds; change the logic to first check whether SG(request_info).request_body (the variable body) is a temp stream using php_stream_is(body, PHP_STREAM_IS_TEMP) and only then read body->abstract and assign to simdjson_php_stream_temp_data *ts, followed by the existing php_stream_is(ts->innerstream, PHP_STREAM_IS_MEMORY) check; ensure any early-fallback (rewind/copy) path runs when the body is not a temp stream so that dereferencing body->abstract never happens unless php_stream_is confirms a temp stream.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@php_simdjson.cpp`:
- Around line 367-370: The code currently casts body->abstract to
simdjson_php_stream_temp_data* (ts) and asserts before proving the request body
is a temp stream, which is UB in release builds; change the logic to first check
whether SG(request_info).request_body (the variable body) is a temp stream using
php_stream_is(body, PHP_STREAM_IS_TEMP) and only then read body->abstract and
assign to simdjson_php_stream_temp_data *ts, followed by the existing
php_stream_is(ts->innerstream, PHP_STREAM_IS_MEMORY) check; ensure any
early-fallback (rewind/copy) path runs when the body is not a temp stream so
that dereferencing body->abstract never happens unless php_stream_is confirms a
temp stream.
329fbe7 to
7e70d57
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@php_simdjson.cpp`:
- Around line 367-371: The code currently casts body->abstract to
simdjson_php_stream_temp_data* guarded only by ZEND_ASSERT, which is a no-op in
release builds; change this to perform a runtime check using php_stream_is(body,
PHP_STREAM_IS_TEMP) before doing the cast and accessing ts and ts->innerstream
(keep the existing ZEND_ASSERT calls as debug-time sanity checks). In practice:
wrap the cast and the subsequent if (php_stream_is(ts->innerstream,
PHP_STREAM_IS_MEMORY)) block inside an if (php_stream_is(body,
PHP_STREAM_IS_TEMP)) { ... } so the optimization path only runs when body is
actually a temp stream, and keep the existing NULL checks for ts and
ts->innerstream inside that branch.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
… new buffer
Summary by CodeRabbit