New method simdjson_decode_from_input#66
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds Changes
Sequence DiagramsequenceDiagram
participant Client as HTTP Client
participant SAPI as PHP SAPI
participant Func as simdjson_decode_from_input
participant Buffer as Request Buffer / php_stream
participant Parser as SimdJSON Parser
participant Result as Return Value
Client->>SAPI: Send POST/PUT with JSON body
SAPI->>Func: Call simdjson_decode_from_input()
Func->>Func: Validate depth
Func->>SAPI: Inspect SG(request_info).request_body and content_length
alt request_body already buffered
Func->>Buffer: Rewind & copy into SIMDJSON-padded memory
else
Func->>Func: Validate content_length (empty, negative, >0xFFFFFFFF, > post_max_size)
Func->>Buffer: Allocate content_length + SIMDJSON_PADDING
Func->>Buffer: Read php://input into padded buffer
Func->>Func: Verify read size and optionally write to SG(request_info).request_body
end
Func->>Parser: Try fast-path simple decode
alt fast-path succeeds
Parser->>Result: Return decoded value
else
Parser->>Parser: Parse buffer (reuse singleton parser if eligible)
Parser->>Result: Return value or throw JSON exception
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
07ece24 to
8766e72
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
php_simdjson.cpp (1)
368-371: Missing padding initialization for valgrind consistency.The buffer allocated at line 368 does not initialize the padding bytes to zero. Compare this to
simdjson_stream_copy_to_mem<true>(lines 165-168) which zeroes padding inZEND_DEBUGbuilds to avoid valgrind warnings about uninitialized memory reads.🛠️ Suggested fix for debug build consistency
// allocate buffer for whole request with padding json = (char*) emalloc(SG(request_info).content_length + simdjson::SIMDJSON_PADDING); // read data from sapi, read one byte more so we can check if provided data exceeds content-length len = sapi_read_post_block(json, SG(request_info).content_length + 1); + +#ifdef ZEND_DEBUG + // Set padding to zero to make valgrind happy + if (len <= (size_t)SG(request_info).content_length) { + memset(json + len, 0, simdjson::SIMDJSON_PADDING); + } +#endif🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@php_simdjson.cpp` around lines 368 - 371, The allocated buffer 'json' (from emalloc with size SG(request_info).content_length + simdjson::SIMDJSON_PADDING) does not zero the padding bytes, causing valgrind warnings; modify the same allocation site so that in ZEND_DEBUG builds the trailing simdjson::SIMDJSON_PADDING bytes are explicitly set to 0 (similar to simdjson_stream_copy_to_mem<true>), e.g., after calling sapi_read_post_block(store length in 'len') write zeros into json + SG(request_info).content_length up to SIMDJSON_PADDING when ZEND_DEBUG is enabled; reference the variables/functions json, len, emalloc, sapi_read_post_block, and simdjson::SIMDJSON_PADDING to locate and implement this change.simdjson.stub.php (2)
276-285: Add a description to the docblock.The
simdjson_decode_from_inputfunction is missing a summary description in its docblock. Other similar decoder functions likesimdjson_decode_from_stream(line 262) include a description explaining what the function does, e.g., "Takes a JSON encoded string and converts it into a PHP variable."Consider adding a description such as:
+/** + * Decodes JSON from the request body (php://input). + * * `@param` bool $associative When true, JSON objects will be returned as associative arrays.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@simdjson.stub.php` around lines 276 - 285, The docblock for simdjson_decode_from_input is missing a one-line summary; add a brief description above the `@param` tags (similar to simdjson_decode_from_stream) such as "Takes a JSON encoded string from input and converts it into a PHP variable." Ensure it sits before the parameter descriptions and keeps existing `@return` and `@throws` intact so the docblock format and tags for simdjson_decode_from_input remain consistent with other decoder functions.
227-231: Inconsistent constant naming pattern.The exported constant
SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDSuses a different naming convention than other PHP-specific error constants. For example:
SIMDJSON_PHP_ERR_KEY_COUNT_NOT_COUNTABLE→ exported asSIMDJSON_ERR_KEY_COUNT_NOT_COUNTABLE(line 221)SIMDJSON_PHP_ERR_INVALID_PHP_PROPERTY→ exported asSIMDJSON_ERR_INVALID_PROPERTY(line 226)Consider renaming to
SIMDJSON_ERR_INPUT_SIZE_EXCEEDSfor consistency with the existing pattern, or document why this constant intentionally uses a different prefix.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@simdjson.stub.php` around lines 227 - 231, The exported constant SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS uses a different prefix than other PHP-specific error constants; rename the declaration SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS to SIMDJSON_ERR_INPUT_SIZE_EXCEEDS (and update its `@cvalue` if needed) to match the existing pattern, and search/replace any references to SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS in the codebase (or alternatively add a short comment documenting an intentional prefix difference if you must keep the original name).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@php_simdjson.cpp`:
- Around line 364-380: The code allocates based on
SG(request_info).content_length without handling the case where content_length
is -1 (unknown), which can cause emalloc overflow; update the block around the
emalloc/sapi_read_post_block calls to check if SG(request_info).content_length <
0 and, if so, avoid allocation/reading and throw a simdjson_decoder_exception_ce
(use the same error path pattern as the existing length check -
zend_throw_exception_ex with an appropriate message and RETURN_THROWS()),
referencing SG(request_info).content_length, emalloc, sapi_read_post_block and
simdjson_decoder_exception_ce so the negative/unknown Content-Length is rejected
safely.
In `@tests/input/post_empty_disable_reading.phpt`:
- Line 2: The TEST header title in tests/input/post_empty_disable_reading.phpt
is not unique ("simdjson_decode_from_input empty POST") and should be changed to
a distinct name; update the `--TEST--` section in this file to a unique
descriptive title (e.g., include "disable_reading" or the filename) so its
failures are unambiguous when compared to other tests that reuse the same title.
---
Nitpick comments:
In `@php_simdjson.cpp`:
- Around line 368-371: The allocated buffer 'json' (from emalloc with size
SG(request_info).content_length + simdjson::SIMDJSON_PADDING) does not zero the
padding bytes, causing valgrind warnings; modify the same allocation site so
that in ZEND_DEBUG builds the trailing simdjson::SIMDJSON_PADDING bytes are
explicitly set to 0 (similar to simdjson_stream_copy_to_mem<true>), e.g., after
calling sapi_read_post_block(store length in 'len') write zeros into json +
SG(request_info).content_length up to SIMDJSON_PADDING when ZEND_DEBUG is
enabled; reference the variables/functions json, len, emalloc,
sapi_read_post_block, and simdjson::SIMDJSON_PADDING to locate and implement
this change.
In `@simdjson.stub.php`:
- Around line 276-285: The docblock for simdjson_decode_from_input is missing a
one-line summary; add a brief description above the `@param` tags (similar to
simdjson_decode_from_stream) such as "Takes a JSON encoded string from input and
converts it into a PHP variable." Ensure it sits before the parameter
descriptions and keeps existing `@return` and `@throws` intact so the docblock
format and tags for simdjson_decode_from_input remain consistent with other
decoder functions.
- Around line 227-231: The exported constant SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS
uses a different prefix than other PHP-specific error constants; rename the
declaration SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS to
SIMDJSON_ERR_INPUT_SIZE_EXCEEDS (and update its `@cvalue` if needed) to match the
existing pattern, and search/replace any references to
SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS in the codebase (or alternatively add a
short comment documenting an intentional prefix difference if you must keep the
original name).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c5e03444-3d62-4dde-889a-7ca09d33750a
📒 Files selected for processing (12)
php_simdjson.cppphp_simdjson.hsimdjson.stub.phpsimdjson_arginfo.htests/input/post.phpttests/input/post_bigger.phpttests/input/post_disable_reading.phpttests/input/post_disable_reading_read_whole.phpttests/input/post_empty.phpttests/input/post_empty_disable_reading.phpttests/input/post_max_size.phpttests/input/put.phpt
| @@ -0,0 +1,15 @@ | |||
| --TEST-- | |||
| simdjson_decode_from_input empty POST | |||
There was a problem hiding this comment.
Use a unique --TEST-- title for this file.
simdjson_decode_from_input empty POST is reused in another test file, which makes failure output ambiguous.
Suggested update
-simdjson_decode_from_input empty POST
+simdjson_decode_from_input empty POST with disabled post reading📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| simdjson_decode_from_input empty POST | |
| simdjson_decode_from_input empty POST with disabled post reading |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/input/post_empty_disable_reading.phpt` at line 2, The TEST header title
in tests/input/post_empty_disable_reading.phpt is not unique
("simdjson_decode_from_input empty POST") and should be changed to a distinct
name; update the `--TEST--` section in this file to a unique descriptive title
(e.g., include "disable_reading" or the filename) so its failures are
unambiguous when compared to other tests that reuse the same title.
964e913 to
3cb5f06
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@php_simdjson.cpp`:
- Around line 374-377: The allocation and read use
SG(request_info).content_length + simdjson::SIMDJSON_PADDING without overflow
checks and call sapi_read_post_block once (in php_simdjson.cpp where json is
emalloc(...) and len = sapi_read_post_block(...)), which can overflow for
attacker-controlled content_length and may return partial reads; fix by
validating and bounding content_length before any arithmetic (e.g., ensure
content_length <= SIZE_MAX - simdjson::SIMDJSON_PADDING and also enforce a sane
max like post_max_size or a hard upper bound when SG(post_max_size) <= 0),
allocate using the safely checked size, then replace the single
sapi_read_post_block call with a loop that repeatedly calls sapi_read_post_block
until the requested content_length bytes are read or an error/EOF occurs,
updating an offset and accumulating len, and handle/read error returns and
truncated bodies robustly (free on error and return appropriate failure).
In `@simdjson.stub.php`:
- Around line 281-283: The PHPDoc for simdjson_decode_from_input incorrectly
mentions a `$json` parameter; update the docblock for simdjson_decode_from_input
to replace the stale “$json” wording with the correct source (e.g., “input
stream” or “request input”) and ensure the `@throws` description still documents
SimdJsonDecoderException (for invalid JSON, oversized input >4GB, or
out-of-range numbers) and ValueError for invalid $depth; locate the docblock
near the simdjson_decode_from_input declaration and adjust the text to reference
reading from input instead of a $json parameter.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 994416de-4364-48f2-967a-b35a001bf48a
📒 Files selected for processing (13)
.github/workflows/integration.ymlphp_simdjson.cppphp_simdjson.hsimdjson.stub.phpsimdjson_arginfo.htests/input/post.phpttests/input/post_bigger.phpttests/input/post_disable_reading.phpttests/input/post_disable_reading_read_whole.phpttests/input/post_empty.phpttests/input/post_empty_disable_reading.phpttests/input/post_max_size.phpttests/input/put.phpt
✅ Files skipped from review due to trivial changes (10)
- tests/input/post_disable_reading.phpt
- tests/input/post.phpt
- php_simdjson.h
- tests/input/post_disable_reading_read_whole.phpt
- .github/workflows/integration.yml
- tests/input/put.phpt
- tests/input/post_empty_disable_reading.phpt
- tests/input/post_max_size.phpt
- tests/input/post_bigger.phpt
- tests/input/post_empty.phpt
| json = (char*) emalloc(SG(request_info).content_length + simdjson::SIMDJSON_PADDING); | ||
|
|
||
| // read data from sapi, read one byte more so we can check if provided data exceeds content-length | ||
| len = sapi_read_post_block(json, SG(request_info).content_length + 1); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Extension code around Line 374-377:"
sed -n '350,390p' php_simdjson.cpp
echo
echo "PHP core behavior for sapi_read_post_block (partial-read semantics):"
curl -sL https://raw.githubusercontent.com/php/php-src/master/main/SAPI.c \
| rg -n "SAPI_API size_t sapi_read_post_block|read_bytes < buflen|sapi_read_standard_form_data" -A8 -B3Repository: JakubOnderka/simdjson_php
Length of output: 3110
🏁 Script executed:
#!/bin/bash
echo "=== Finding type definitions for content_length ==="
rg -n "content_length" php_simdjson.cpp | head -20
echo
echo "=== Looking for SIMDJSON_PADDING definition ==="
rg -n "SIMDJSON_PADDING" php_simdjson.cpp
echo
echo "=== Checking complete function context (lines 350-400) ==="
sed -n '350,400p' php_simdjson.cpp
echo
echo "=== Checking zend_long type definition ==="
rg -n "typedef.*zend_long|#define.*zend_long" | head -5Repository: JakubOnderka/simdjson_php
Length of output: 3639
🏁 Script executed:
#!/bin/bash
echo "=== Checking post_max_size usage and limits ==="
grep -n "post_max_size" php_simdjson.cpp
echo
echo "=== Looking for any loop/retry logic for sapi_read_post_block ==="
rg -B5 -A10 "sapi_read_post_block" php_simdjson.cpp
echo
echo "=== Checking if request_body stream reading exists ==="
rg -n "php_stream.*body|request_body" php_simdjson.cpp | head -20Repository: JakubOnderka/simdjson_php
Length of output: 1592
🏁 Script executed:
#!/bin/bash
echo "=== Checking includes and simdjson headers ==="
head -30 php_simdjson.cpp
echo
echo "=== Looking for any simdjson constant definitions ==="
rg -n "SIMDJSON_PADDING|SIMDJSON_.*64|SIMDJSON_.*32" --type cpp | head -20
echo
echo "=== Searching for where SIMDJSON_PADDING might be imported from ==="
rg -n "#include.*simdjson" php_simdjson.cppRepository: JakubOnderka/simdjson_php
Length of output: 3402
🌐 Web query:
simdjson SIMDJSON_PADDING constant value
💡 Result:
Web search failed: Server error: no LLM provider could handle the message
🏁 Script executed:
#!/bin/bash
echo "=== Searching for SIMDJSON_PADDING definition in src/ ==="
rg -n "SIMDJSON_PADDING\s*=" src/ | head -10
echo
echo "=== Checking src/simdjson.h header ==="
rg -B3 -A3 "SIMDJSON_PADDING" src/simdjson.h | head -30
echo
echo "=== Searching in all .h files for SIMDJSON_PADDING ==="
find . -name "*.h" -exec grep -l "SIMDJSON_PADDING" {} \; 2>/dev/null
echo
echo "=== Checking how line 161 handles overflow before erealloc ==="
sed -n '155,175p' php_simdjson.cppRepository: JakubOnderka/simdjson_php
Length of output: 2072
Harden size arithmetic and fully drain request body reads.
Line 374 and line 377 add to content_length without overflow guards. Although a post_max_size check exists at line 362, it only applies when SG(post_max_size) > 0; when disabled, an attacker-controlled content_length near SIZE_MAX can cause integer overflow in the emalloc and sapi_read_post_block calls.
Additionally, sapi_read_post_block (PHP core SAPI.c lines 232–252) can return fewer bytes than requested. A single call at line 377 means the JSON body may be truncated if the underlying SAPI implementation returns partial data, leading to data loss.
🛠️ Suggested fix sketch
+ const zend_long content_length = SG(request_info).content_length;
+ if (UNEXPECTED((size_t)content_length > SIZE_MAX - simdjson::SIMDJSON_PADDING - 1)) {
+ zend_throw_exception_ex(simdjson_decoder_exception_ce, SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS,
+ "POST Content-Length is too large");
+ RETURN_THROWS();
+ }
+
- json = (char*) emalloc(SG(request_info).content_length + simdjson::SIMDJSON_PADDING);
+ json = (char*) emalloc((size_t)content_length + simdjson::SIMDJSON_PADDING);
- len = sapi_read_post_block(json, SG(request_info).content_length + 1);
+ len = 0;
+ size_t target = (size_t)content_length + 1; // +1 sentinel byte for overflow detection
+ while (len < target) {
+ size_t n = sapi_read_post_block(json + len, target - len);
+ if (n == 0) {
+ break;
+ }
+ len += n;
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@php_simdjson.cpp` around lines 374 - 377, The allocation and read use
SG(request_info).content_length + simdjson::SIMDJSON_PADDING without overflow
checks and call sapi_read_post_block once (in php_simdjson.cpp where json is
emalloc(...) and len = sapi_read_post_block(...)), which can overflow for
attacker-controlled content_length and may return partial reads; fix by
validating and bounding content_length before any arithmetic (e.g., ensure
content_length <= SIZE_MAX - simdjson::SIMDJSON_PADDING and also enforce a sane
max like post_max_size or a hard upper bound when SG(post_max_size) <= 0),
allocate using the safely checked size, then replace the single
sapi_read_post_block call with a loop that repeatedly calls sapi_read_post_block
until the requested content_length bytes are read or an error/EOF occurs,
updating an offset and accumulating len, and handle/read error returns and
truncated bodies robustly (free on error and return appropriate failure).
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
simdjson.stub.php (1)
281-282:⚠️ Potential issue | 🟡 MinorFix the stale
$jsonwording in the input decoder docs.
simdjson_decode_from_input()has no$jsonparameter, so this@throwstext is misleading.✏️ Suggested doc fix
- * `@throws` SimdJsonDecoderException for invalid JSON - * (or $json over 4GB long, or out of range integer/float) + * `@throws` SimdJsonDecoderException for invalid JSON input body + * (or input body over 4GB long, or out of range integer/float)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@simdjson.stub.php` around lines 281 - 282, The docblock for simdjson_decode_from_input incorrectly mentions a non-existent $json parameter in the `@throws` text; update the `@throws` description for SimdJsonDecoderException in simdjson_decode_from_input to refer to the input source (e.g., "for invalid JSON (or input over 4GB long, or out of range integer/float)") or otherwise remove the `$json` variable name so the exception message accurately describes errors from the input decoder.php_simdjson.cpp (1)
374-383:⚠️ Potential issue | 🟠 MajorDo not treat a single
sapi_read_post_block()call as a full-body read.Line 375 assumes one
sapi_read_post_block()call drains the request body. That API can return a short read, so this path can parse truncated JSON or accept a valid prefix whenlen < content_length. Keep reading until you hitcontent_length + 1or EOF, and reject short reads explicitly.Verify against PHP core's implementation of
sapi_read_post_block(). Expected result: supported php-src branches show that the function returns the bytes from a single read and marks end-of-input whenread_bytes < buflen, so callers that need the full body must loop.#!/bin/bash set -euo pipefail echo "Current call site in php_simdjson.cpp:" sed -n '371,383p' php_simdjson.cpp echo for branch in PHP-8.4 PHP-8.3 PHP-8.2 PHP-8.1 PHP-8.0; do echo "=== php-src ${branch}: sapi_read_post_block ===" curl -fsSL "https://raw.githubusercontent.com/php/php-src/${branch}/main/SAPI.c" | awk ' /SAPI_API size_t sapi_read_post_block/ {capture=1} capture {print} capture && /^}/ {exit} ' echo done🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@php_simdjson.cpp` around lines 374 - 383, The code currently assumes a single sapi_read_post_block() call reads the full body; change the logic in php_simdjson.cpp to loop calling sapi_read_post_block(json + total_len, remaining_buf_len) until either you have read SG(request_info).content_length + 1 bytes or the read returns fewer bytes than requested (EOF/short read). Accumulate into the buffer (track total_len), reallocate if necessary, and after the loop explicitly reject short reads by throwing the same simdjson_decoder_exception_ce when total_len < SG(request_info).content_length; preserve the existing "exceeds" check when total_len > SG(request_info).content_length. Ensure to efree the json buffer on all error/throw paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@php_simdjson.cpp`:
- Around line 341-349: The cached-body branch currently bypasses the
post_max_size guard; before calling php_stream_rewind and
simdjson_stream_copy_to_mem<true> check SG(read_post_bytes) (or the cached body
length available on SG(request_info).request_body) against the configured
post_max_size and if it exceeds, call
php_simdjson_throw_jsonexception(SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS) and
RETURN_THROWS(); otherwise proceed to rewind and copy as before so the same
size-check behavior as the non-cached path (lines using SG(read_post_bytes) and
simdjson_stream_copy_to_mem<true>) is preserved.
In `@simdjson.stub.php`:
- Around line 227-231: The stub exposes a constant with the internal prefix
(SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS) which leaks internals and is inconsistent
with other public decoder error constants; rename the exported stub constant to
SIMDJSON_ERR_INPUT_SIZE_EXCEEDS while keeping the cvalue mapping to the internal
symbol (i.e., keep `@cvalue` SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS) so the runtime
value is unchanged, update the docblock/constant name in simdjson.stub.php
accordingly, and ensure no other code references the old stub name.
---
Duplicate comments:
In `@php_simdjson.cpp`:
- Around line 374-383: The code currently assumes a single
sapi_read_post_block() call reads the full body; change the logic in
php_simdjson.cpp to loop calling sapi_read_post_block(json + total_len,
remaining_buf_len) until either you have read SG(request_info).content_length +
1 bytes or the read returns fewer bytes than requested (EOF/short read).
Accumulate into the buffer (track total_len), reallocate if necessary, and after
the loop explicitly reject short reads by throwing the same
simdjson_decoder_exception_ce when total_len < SG(request_info).content_length;
preserve the existing "exceeds" check when total_len >
SG(request_info).content_length. Ensure to efree the json buffer on all
error/throw paths.
In `@simdjson.stub.php`:
- Around line 281-282: The docblock for simdjson_decode_from_input incorrectly
mentions a non-existent $json parameter in the `@throws` text; update the `@throws`
description for SimdJsonDecoderException in simdjson_decode_from_input to refer
to the input source (e.g., "for invalid JSON (or input over 4GB long, or out of
range integer/float)") or otherwise remove the `$json` variable name so the
exception message accurately describes errors from the input decoder.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 80b55389-d71f-4bce-9f71-5b11c5f8725a
📒 Files selected for processing (13)
.github/workflows/integration.ymlphp_simdjson.cppphp_simdjson.hsimdjson.stub.phpsimdjson_arginfo.htests/input/post.phpttests/input/post_bigger.phpttests/input/post_disable_reading.phpttests/input/post_disable_reading_read_whole.phpttests/input/post_empty.phpttests/input/post_empty_disable_reading.phpttests/input/post_max_size.phpttests/input/put.phpt
✅ Files skipped from review due to trivial changes (10)
- tests/input/post.phpt
- tests/input/post_empty.phpt
- tests/input/post_disable_reading.phpt
- php_simdjson.h
- tests/input/put.phpt
- tests/input/post_empty_disable_reading.phpt
- .github/workflows/integration.yml
- tests/input/post_disable_reading_read_whole.phpt
- tests/input/post_bigger.phpt
- tests/input/post_max_size.phpt
304c5ab to
7833cf7
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (4)
php_simdjson.cpp (2)
341-349:⚠️ Potential issue | 🟠 MajorReapply the size limit when
request_bodyis already populated.The cached-body branch skips the
SG(post_max_size)check from the fresh-read path, so a pre-buffered request above the configured limit can still be copied and parsed. This should reject oversize input beforesimdjson_stream_copy_to_mem<true>()the same way the uncached branch does.🛡️ Suggested guard
if ((body = SG(request_info).request_body)) { // request body was already processed + if (SG(post_max_size) > 0 && SG(read_post_bytes) > (size_t) SG(post_max_size)) { + php_simdjson_throw_jsonexception(SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS); + RETURN_THROWS(); + } + // rewind to start of the request php_stream_rewind(body);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@php_simdjson.cpp` around lines 341 - 349, The cached-body branch using SG(request_info).request_body skips the post size guard; before calling simdjson_stream_copy_to_mem<true>(...) re-check SG(post_max_size) against SG(read_post_bytes) and reject oversized input the same way the fresh-read path does (invoke php_simdjson_throw_jsonexception or RETURN_THROWS when the size exceeds SG(post_max_size)). Ensure you perform this check in the same block that handles the cached body (around the code referencing SG(request_info).request_body and simdjson_stream_copy_to_mem<true>) so pre-buffered requests cannot bypass the limit.
375-390:⚠️ Potential issue | 🔴 CriticalDrain
sapi_read_post_block()until the declared body is fully read.A single
sapi_read_post_block()call is not guaranteed to return the full request body. If it short-reads, this function writes only the prefix intorequest_bodyand parses a truncated document, which can turn a valid body into a spurious decode error or silently drop trailing bytes.🛠️ Fix sketch
- len = sapi_read_post_block(json, SG(request_info).content_length + 1); + const size_t expected = (size_t) SG(request_info).content_length; + while (len < expected + 1) { + const size_t read = sapi_read_post_block(json + len, (expected + 1) - len); + if (read == 0) { + break; + } + len += read; + } + + if (UNEXPECTED(len < expected)) { + efree(json); + zend_throw_exception_ex(simdjson_decoder_exception_ce, simdjson::IO_ERROR, + "Failed to read the full request body" + ); + RETURN_THROWS(); + }Use this read-only check against official php-src to confirm the API contract. The first grep should show
sapi_read_post_block()returning a single block per call; the second should show core callers wrapping it in a loop when they need the full body.#!/bin/bash set -euo pipefail tmp="$(mktemp -d)" trap 'rm -rf "$tmp"' EXIT curl -fsSL https://raw.githubusercontent.com/php/php-src/master/main/SAPI.c -o "$tmp/SAPI.c" curl -fsSL https://raw.githubusercontent.com/php/php-src/master/main/php_variables.c -o "$tmp/php_variables.c" echo "=== sapi_read_post_block implementation ===" rg -n -C4 'SAPI_API size_t sapi_read_post_block|read_bytes < buflen|return read_bytes' "$tmp/SAPI.c" echo echo "=== core callers that use sapi_read_post_block ===" rg -n -C4 'sapi_read_post_block\(' "$tmp/php_variables.c"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@php_simdjson.cpp` around lines 375 - 390, The current single-call to sapi_read_post_block may short-read; change the read logic in php_simdjson.cpp so you repeatedly call sapi_read_post_block into json+bytes_read until you have attempted to read SG(request_info).content_length + 1 bytes or the call returns 0 (EOF). Track a total bytes_read variable (instead of overwriting len), pass the remaining buffer size to each sapi_read_post_block call, and after the loop check if bytes_read > SG(request_info).content_length to efree(json) and zend_throw_exception_ex(simdjson_decoder_exception_ce, SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS, ...) as before; then use php_stream_write(body, json, bytes_read) (and compare bytes_written to bytes_read) so request_body receives the entire payload. Ensure you still allocate json with SG(request_info).content_length + simdjson::SIMDJSON_PADDING and handle zero/EOF and error returns from sapi_read_post_block properly.simdjson.stub.php (2)
276-285:⚠️ Potential issue | 🟡 MinorDocument this overload as reading from request input.
The new docblock still reads like the string-based decoder. Generated docs/IDE hints should say this reads the current request body, and the
@throwstext should reference the input body rather than a generic JSON string.✏️ Suggested doc update
/** + * Reads JSON from the current request body and converts it into a PHP variable. + * Similar to json_decode() + * * `@param` bool $associative When true, JSON objects will be returned as associative arrays. * When false, JSON objects will be returned as objects. * `@param` int $depth the maximum nesting depth of the structure being decoded. * `@return` array|stdClass|string|float|int|bool|null - * `@throws` SimdJsonDecoderException for invalid JSON - * (or JSON over 4GB long, or out of range integer/float) + * `@throws` SimdJsonDecoderException for invalid JSON in the request body + * (or request body over 4GB long, or out of range integer/float) * `@throws` ValueError for invalid $depth */🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@simdjson.stub.php` around lines 276 - 285, Update the docblock for simdjson_decode_from_input to state that it reads and decodes the current request/input body (not a JSON string parameter), clarify the associative and depth parameter meanings remain the same, and adjust the `@throws` descriptions to reference failures decoding the request input/body (e.g., invalid JSON in request body, body over 4GB, or out of range integer/float) and ValueError for invalid $depth so IDE/docs accurately reflect this overload's behavior.
227-231:⚠️ Potential issue | 🟠 MajorKeep the new input-size error under the public
SIMDJSON_ERR_*API.Every other decoder error constant exported here uses
SIMDJSON_ERR_*. ExposingSIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDSleaks the internal prefix into userland and makes this new public symbol inconsistent from day one.🔧 Suggested rename
/** * `@var` int * `@cvalue` SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS */ -const SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS = UNKNOWN; +const SIMDJSON_ERR_INPUT_SIZE_EXCEEDS = UNKNOWN;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@simdjson.stub.php` around lines 227 - 231, The new public error constant uses the internal-prefixed name SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS; rename it to SIMDJSON_ERR_INPUT_SIZE_EXCEEDS so it matches the existing public API (replace the constant name and the `@cvalue` macro reference from SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS to SIMDJSON_ERR_INPUT_SIZE_EXCEEDS in the declaration) and update any other occurrences in this file that reference the old symbol to use SIMDJSON_ERR_INPUT_SIZE_EXCEEDS.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@php_simdjson.cpp`:
- Around line 341-349: The cached-body branch using
SG(request_info).request_body skips the post size guard; before calling
simdjson_stream_copy_to_mem<true>(...) re-check SG(post_max_size) against
SG(read_post_bytes) and reject oversized input the same way the fresh-read path
does (invoke php_simdjson_throw_jsonexception or RETURN_THROWS when the size
exceeds SG(post_max_size)). Ensure you perform this check in the same block that
handles the cached body (around the code referencing
SG(request_info).request_body and simdjson_stream_copy_to_mem<true>) so
pre-buffered requests cannot bypass the limit.
- Around line 375-390: The current single-call to sapi_read_post_block may
short-read; change the read logic in php_simdjson.cpp so you repeatedly call
sapi_read_post_block into json+bytes_read until you have attempted to read
SG(request_info).content_length + 1 bytes or the call returns 0 (EOF). Track a
total bytes_read variable (instead of overwriting len), pass the remaining
buffer size to each sapi_read_post_block call, and after the loop check if
bytes_read > SG(request_info).content_length to efree(json) and
zend_throw_exception_ex(simdjson_decoder_exception_ce,
SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS, ...) as before; then use
php_stream_write(body, json, bytes_read) (and compare bytes_written to
bytes_read) so request_body receives the entire payload. Ensure you still
allocate json with SG(request_info).content_length + simdjson::SIMDJSON_PADDING
and handle zero/EOF and error returns from sapi_read_post_block properly.
In `@simdjson.stub.php`:
- Around line 276-285: Update the docblock for simdjson_decode_from_input to
state that it reads and decodes the current request/input body (not a JSON
string parameter), clarify the associative and depth parameter meanings remain
the same, and adjust the `@throws` descriptions to reference failures decoding the
request input/body (e.g., invalid JSON in request body, body over 4GB, or out of
range integer/float) and ValueError for invalid $depth so IDE/docs accurately
reflect this overload's behavior.
- Around line 227-231: The new public error constant uses the internal-prefixed
name SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS; rename it to
SIMDJSON_ERR_INPUT_SIZE_EXCEEDS so it matches the existing public API (replace
the constant name and the `@cvalue` macro reference from
SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS to SIMDJSON_ERR_INPUT_SIZE_EXCEEDS in the
declaration) and update any other occurrences in this file that reference the
old symbol to use SIMDJSON_ERR_INPUT_SIZE_EXCEEDS.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 30555d31-6618-401e-a805-e05d8f1491e2
📒 Files selected for processing (13)
.github/workflows/integration.ymlphp_simdjson.cppphp_simdjson.hsimdjson.stub.phpsimdjson_arginfo.htests/input/post.phpttests/input/post_bigger.phpttests/input/post_disable_reading.phpttests/input/post_disable_reading_read_whole.phpttests/input/post_empty.phpttests/input/post_empty_disable_reading.phpttests/input/post_max_size.phpttests/input/put.phpt
✅ Files skipped from review due to trivial changes (10)
- tests/input/post_empty.phpt
- tests/input/post_empty_disable_reading.phpt
- tests/input/post_disable_reading.phpt
- php_simdjson.h
- .github/workflows/integration.yml
- tests/input/put.phpt
- tests/input/post.phpt
- tests/input/post_max_size.phpt
- tests/input/post_disable_reading_read_whole.phpt
- tests/input/post_bigger.phpt
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
php_simdjson.cpp (2)
341-349:⚠️ Potential issue | 🟠 MajorReapply the size guard when reusing
SG(request_info).request_body.Once
SG(request_info).request_bodyis already populated, Lines 341-349 skip thepost_max_sizerejection used on the direct-read path. That makes limit enforcement depend on whether some earlier code path filled the cache first.🔧 Suggested fix
if ((body = SG(request_info).request_body)) { // request body was already processed + if (SG(post_max_size) > 0 && SG(read_post_bytes) > (size_t) SG(post_max_size)) { + zend_throw_exception_ex(simdjson_decoder_exception_ce, SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS, + "Request body exceeds the configured post_max_size" + ); + RETURN_THROWS(); + } + // rewind to start of the request php_stream_rewind(body);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@php_simdjson.cpp` around lines 341 - 349, When reusing SG(request_info).request_body the post_max_size guard is skipped; reapply the same size check used on the direct-read path before calling simdjson_stream_copy_to_mem<true>. Specifically, when SG(request_info).request_body is non-NULL, compare SG(read_post_bytes) (or the body length you will read) against the configured post_max_size and return/throw the same error path if it exceeds the limit, then proceed to php_stream_rewind and simdjson_stream_copy_to_mem<true> only after the size guard passes.
375-390:⚠️ Potential issue | 🟠 MajorDrain
sapi_read_post_block()before parsing.A single
sapi_read_post_block()call at Line 378 can return fewer bytes than requested, and PHP core's own form-data reader handles that by looping until it sees a short read. Stopping after one call here can parse and cache a truncated body instead of the full request. (raw.githubusercontent.com)Run this to inspect the upstream implementation that drives the short-read behavior:
#!/bin/bash set -euo pipefail tmp="$(mktemp)" trap 'rm -f "$tmp"' EXIT curl -sL https://raw.githubusercontent.com/php/php-src/master/main/SAPI.c > "$tmp" echo "=== sapi_read_post_block ===" rg -n -A12 -B2 'SAPI_API size_t sapi_read_post_block' "$tmp" echo echo "=== sapi_read_standard_form_data ===" rg -n -A25 -B2 'SAPI_API SAPI_POST_READER_FUNC\(sapi_read_standard_form_data\)' "$tmp"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@php_simdjson.cpp` around lines 375 - 390, The code currently calls sapi_read_post_block() once and may get a short read; update the logic around json/len to loop and drain the POST body until either the requested bytes are read or a short read occurs: allocate the buffer as before (json), then repeatedly call sapi_read_post_block(target_ptr, remaining + 1) (or just remaining) appending each chunk (advance target_ptr, subtract from remaining, add to total_read) until total_read == SG(request_info).content_length or a read returns 0 or less than requested; after each chunk check if total_read > SG(request_info).content_length to trigger the existing size-exceeds error (using simdjson_decoder_exception_ce and SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS), and finally use php_stream_write(body, json, total_read) and compare against total_read. Reference symbols: sapi_read_post_block, json, len (use total_read/remaining), SG(request_info).content_length, php_stream_write, simdjson_decoder_exception_ce.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@php_simdjson.cpp`:
- Around line 354-366: The error messages currently hard-code "POST" and should
be method-neutral; update the exception strings used where
SG(request_info).content_length is validated: replace "POST Content-Length is
required" in the zend_throw_exception call and the "POST Content-Length of ..."
format string in zend_throw_exception_ex with a method-neutral phrase such as
"request body Content-Length is required" and "request body Content-Length of
... exceeds the limit of ...", preserving the same format specifiers and error
codes (e.g., SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS) and leaving the surrounding
logic around SG(request_info).content_length and SG(post_max_size unchanged.
---
Duplicate comments:
In `@php_simdjson.cpp`:
- Around line 341-349: When reusing SG(request_info).request_body the
post_max_size guard is skipped; reapply the same size check used on the
direct-read path before calling simdjson_stream_copy_to_mem<true>. Specifically,
when SG(request_info).request_body is non-NULL, compare SG(read_post_bytes) (or
the body length you will read) against the configured post_max_size and
return/throw the same error path if it exceeds the limit, then proceed to
php_stream_rewind and simdjson_stream_copy_to_mem<true> only after the size
guard passes.
- Around line 375-390: The code currently calls sapi_read_post_block() once and
may get a short read; update the logic around json/len to loop and drain the
POST body until either the requested bytes are read or a short read occurs:
allocate the buffer as before (json), then repeatedly call
sapi_read_post_block(target_ptr, remaining + 1) (or just remaining) appending
each chunk (advance target_ptr, subtract from remaining, add to total_read)
until total_read == SG(request_info).content_length or a read returns 0 or less
than requested; after each chunk check if total_read >
SG(request_info).content_length to trigger the existing size-exceeds error
(using simdjson_decoder_exception_ce and SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS),
and finally use php_stream_write(body, json, total_read) and compare against
total_read. Reference symbols: sapi_read_post_block, json, len (use
total_read/remaining), SG(request_info).content_length, php_stream_write,
simdjson_decoder_exception_ce.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a17c9d30-1138-41a5-81e3-220b16eb8763
📒 Files selected for processing (13)
.github/workflows/integration.ymlphp_simdjson.cppphp_simdjson.hsimdjson.stub.phpsimdjson_arginfo.htests/input/post.phpttests/input/post_bigger.phpttests/input/post_disable_reading.phpttests/input/post_disable_reading_read_whole.phpttests/input/post_empty.phpttests/input/post_empty_disable_reading.phpttests/input/post_max_size.phpttests/input/put.phpt
✅ Files skipped from review due to trivial changes (8)
- tests/input/post.phpt
- tests/input/post_bigger.phpt
- tests/input/post_empty_disable_reading.phpt
- php_simdjson.h
- tests/input/put.phpt
- tests/input/post_disable_reading_read_whole.phpt
- tests/input/post_max_size.phpt
- tests/input/post_empty.phpt
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/input/post_disable_reading.phpt
- .github/workflows/integration.yml
php_simdjson.cpp
Outdated
| } else if (SG(request_info).content_length < 0) { | ||
| // I am not sure if content-length can be negative, so for now just throw exception | ||
| zend_throw_exception(simdjson_decoder_exception_ce, "POST Content-Length is required", SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS); | ||
| RETURN_THROWS(); | ||
| } else if (SG(request_info).content_length > 0xFFFFFFFF) { | ||
| php_simdjson_throw_jsonexception(simdjson::CAPACITY); | ||
| RETURN_THROWS(); | ||
| } | ||
|
|
||
| if (SG(post_max_size) > 0 && SG(request_info).content_length > SG(post_max_size)) { | ||
| zend_throw_exception_ex(simdjson_decoder_exception_ce, SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS, | ||
| "POST Content-Length of " ZEND_LONG_FMT " bytes exceeds the limit of " ZEND_LONG_FMT " bytes", | ||
| SG(request_info).content_length, SG(post_max_size) |
There was a problem hiding this comment.
Use request-body wording in these exceptions.
Both messages hard-code POST, but this entry point is request-body based. Method-neutral wording keeps failures accurate for every caller.
✏️ Suggested wording
- zend_throw_exception(simdjson_decoder_exception_ce, "POST Content-Length is required", SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS);
+ zend_throw_exception(simdjson_decoder_exception_ce, "Content-Length is required for request body decoding", SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS);
...
- "POST Content-Length of " ZEND_LONG_FMT " bytes exceeds the limit of " ZEND_LONG_FMT " bytes",
+ "Request body Content-Length of " ZEND_LONG_FMT " bytes exceeds the limit of " ZEND_LONG_FMT " bytes",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@php_simdjson.cpp` around lines 354 - 366, The error messages currently
hard-code "POST" and should be method-neutral; update the exception strings used
where SG(request_info).content_length is validated: replace "POST Content-Length
is required" in the zend_throw_exception call and the "POST Content-Length of
..." format string in zend_throw_exception_ex with a method-neutral phrase such
as "request body Content-Length is required" and "request body Content-Length of
... exceeds the limit of ...", preserving the same format specifiers and error
codes (e.g., SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS) and leaving the surrounding
logic around SG(request_info).content_length and SG(post_max_size unchanged.
7833cf7 to
6003300
Compare
Summary by CodeRabbit
New Features
Tests
Chores