Skip to content

New method simdjson_decode_from_input#66

Merged
JakubOnderka merged 1 commit intomasterfrom
read-optim
Apr 12, 2026
Merged

New method simdjson_decode_from_input#66
JakubOnderka merged 1 commit intomasterfrom
read-optim

Conversation

@JakubOnderka
Copy link
Copy Markdown
Owner

@JakubOnderka JakubOnderka commented Apr 10, 2026

Summary by CodeRabbit

  • New Features

    • Added simdjson_decode_from_input(bool $associative = false, int $depth = 512): mixed to decode request-body JSON; validates depth and surfaces errors for empty/oversized input.
    • Added SIMDJSON_ERR_INPUT_SIZE_EXCEEDS constant for oversized request bodies.
  • Tests

    • Added PHPT tests for empty input, large payloads, disabled post-reading, php://input behavior, PUT requests, and post_max_size handling.
  • Chores

    • CI test runner adjusted for quieter test output.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 10, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds simdjson_decode_from_input, a PHP function that reads SAPI request bodies (POST/PUT), validates depth and request-size limits, buffers input into a SIMDJSON-padded memory region, attempts a fast decode then falls back to parser-based parsing (with optional parser reuse), and throws on input/parse errors.

Changes

Cohort / File(s) Summary
Core implementation
php_simdjson.cpp, php_simdjson.h
Add simdjson_decode_from_input implementing SAPI request-body reading (buffered/unbuffered), content-length / post_max_size validation, allocation of content_length + SIMDJSON_PADDING, read into padded buffer, fast-path simdjson_simple_decode, fallback php_simdjson_parse_buffer with optional parser reuse; add error macro SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS.
Public API / arginfo
simdjson.stub.php, simdjson_arginfo.h
Export new function simdjson_decode_from_input(bool $associative = false, int $depth = 512): mixed, add arginfo and register new persistent constant SIMDJSON_ERR_INPUT_SIZE_EXCEEDS / SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS.
Tests (PHPT)
tests/input/post.phpt, tests/input/post_bigger.phpt, tests/input/post_disable_reading.phpt, tests/input/post_disable_reading_read_whole.phpt, tests/input/post_empty.phpt, tests/input/post_empty_disable_reading.phpt, tests/input/post_max_size.phpt, tests/input/put.phpt
Add PHPT tests covering POST/PUT decoding, large payload reading, disabled post-data-reading behavior, preserving php://input reads, empty-body error path, and post_max_size enforcement.
CI workflow
.github/workflows/integration.yml
Pass -q to TESTS in integration workflow make test invocations (CI invocation change only).

Sequence Diagram

sequenceDiagram
    participant Client as HTTP Client
    participant SAPI as PHP SAPI
    participant Func as simdjson_decode_from_input
    participant Buffer as Request Buffer / php_stream
    participant Parser as SimdJSON Parser
    participant Result as Return Value

    Client->>SAPI: Send POST/PUT with JSON body
    SAPI->>Func: Call simdjson_decode_from_input()
    Func->>Func: Validate depth
    Func->>SAPI: Inspect SG(request_info).request_body and content_length
    alt request_body already buffered
        Func->>Buffer: Rewind & copy into SIMDJSON-padded memory
    else
        Func->>Func: Validate content_length (empty, negative, >0xFFFFFFFF, > post_max_size)
        Func->>Buffer: Allocate content_length + SIMDJSON_PADDING
        Func->>Buffer: Read php://input into padded buffer
        Func->>Func: Verify read size and optionally write to SG(request_info).request_body
    end
    Func->>Parser: Try fast-path simple decode
    alt fast-path succeeds
        Parser->>Result: Return decoded value
    else
        Parser->>Parser: Parse buffer (reuse singleton parser if eligible)
        Parser->>Result: Return value or throw JSON exception
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐇 I nibbled php://input with care,
padded each byte in a cozy lair,
tried a quick hop—fast decode did sing,
if not, I parsed with my parser ring,
hop! JSON served, hooray everywhere. 🎉

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately describes the primary change: introduction of a new public method/function simdjson_decode_from_input to the extension.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch read-optim

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
php_simdjson.cpp (1)

368-371: Missing padding initialization for valgrind consistency.

The buffer allocated at line 368 does not initialize the padding bytes to zero. Compare this to simdjson_stream_copy_to_mem<true> (lines 165-168) which zeroes padding in ZEND_DEBUG builds to avoid valgrind warnings about uninitialized memory reads.

🛠️ Suggested fix for debug build consistency
         // allocate buffer for whole request with padding
         json = (char*) emalloc(SG(request_info).content_length + simdjson::SIMDJSON_PADDING);

         // read data from sapi, read one byte more so we can check if provided data exceeds content-length
         len = sapi_read_post_block(json, SG(request_info).content_length + 1);
+
+#ifdef ZEND_DEBUG
+        // Set padding to zero to make valgrind happy
+        if (len <= (size_t)SG(request_info).content_length) {
+            memset(json + len, 0, simdjson::SIMDJSON_PADDING);
+        }
+#endif
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@php_simdjson.cpp` around lines 368 - 371, The allocated buffer 'json' (from
emalloc with size SG(request_info).content_length + simdjson::SIMDJSON_PADDING)
does not zero the padding bytes, causing valgrind warnings; modify the same
allocation site so that in ZEND_DEBUG builds the trailing
simdjson::SIMDJSON_PADDING bytes are explicitly set to 0 (similar to
simdjson_stream_copy_to_mem<true>), e.g., after calling
sapi_read_post_block(store length in 'len') write zeros into json +
SG(request_info).content_length up to SIMDJSON_PADDING when ZEND_DEBUG is
enabled; reference the variables/functions json, len, emalloc,
sapi_read_post_block, and simdjson::SIMDJSON_PADDING to locate and implement
this change.
simdjson.stub.php (2)

276-285: Add a description to the docblock.

The simdjson_decode_from_input function is missing a summary description in its docblock. Other similar decoder functions like simdjson_decode_from_stream (line 262) include a description explaining what the function does, e.g., "Takes a JSON encoded string and converts it into a PHP variable."

Consider adding a description such as:

+/**
+ * Decodes JSON from the request body (php://input).
+ *
  * `@param` bool $associative When true, JSON objects will be returned as associative arrays.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@simdjson.stub.php` around lines 276 - 285, The docblock for
simdjson_decode_from_input is missing a one-line summary; add a brief
description above the `@param` tags (similar to simdjson_decode_from_stream) such
as "Takes a JSON encoded string from input and converts it into a PHP variable."
Ensure it sits before the parameter descriptions and keeps existing `@return` and
`@throws` intact so the docblock format and tags for simdjson_decode_from_input
remain consistent with other decoder functions.

227-231: Inconsistent constant naming pattern.

The exported constant SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS uses a different naming convention than other PHP-specific error constants. For example:

  • SIMDJSON_PHP_ERR_KEY_COUNT_NOT_COUNTABLE → exported as SIMDJSON_ERR_KEY_COUNT_NOT_COUNTABLE (line 221)
  • SIMDJSON_PHP_ERR_INVALID_PHP_PROPERTY → exported as SIMDJSON_ERR_INVALID_PROPERTY (line 226)

Consider renaming to SIMDJSON_ERR_INPUT_SIZE_EXCEEDS for consistency with the existing pattern, or document why this constant intentionally uses a different prefix.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@simdjson.stub.php` around lines 227 - 231, The exported constant
SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS uses a different prefix than other
PHP-specific error constants; rename the declaration
SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS to SIMDJSON_ERR_INPUT_SIZE_EXCEEDS (and
update its `@cvalue` if needed) to match the existing pattern, and search/replace
any references to SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS in the codebase (or
alternatively add a short comment documenting an intentional prefix difference
if you must keep the original name).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@php_simdjson.cpp`:
- Around line 364-380: The code allocates based on
SG(request_info).content_length without handling the case where content_length
is -1 (unknown), which can cause emalloc overflow; update the block around the
emalloc/sapi_read_post_block calls to check if SG(request_info).content_length <
0 and, if so, avoid allocation/reading and throw a simdjson_decoder_exception_ce
(use the same error path pattern as the existing length check -
zend_throw_exception_ex with an appropriate message and RETURN_THROWS()),
referencing SG(request_info).content_length, emalloc, sapi_read_post_block and
simdjson_decoder_exception_ce so the negative/unknown Content-Length is rejected
safely.

In `@tests/input/post_empty_disable_reading.phpt`:
- Line 2: The TEST header title in tests/input/post_empty_disable_reading.phpt
is not unique ("simdjson_decode_from_input empty POST") and should be changed to
a distinct name; update the `--TEST--` section in this file to a unique
descriptive title (e.g., include "disable_reading" or the filename) so its
failures are unambiguous when compared to other tests that reuse the same title.

---

Nitpick comments:
In `@php_simdjson.cpp`:
- Around line 368-371: The allocated buffer 'json' (from emalloc with size
SG(request_info).content_length + simdjson::SIMDJSON_PADDING) does not zero the
padding bytes, causing valgrind warnings; modify the same allocation site so
that in ZEND_DEBUG builds the trailing simdjson::SIMDJSON_PADDING bytes are
explicitly set to 0 (similar to simdjson_stream_copy_to_mem<true>), e.g., after
calling sapi_read_post_block(store length in 'len') write zeros into json +
SG(request_info).content_length up to SIMDJSON_PADDING when ZEND_DEBUG is
enabled; reference the variables/functions json, len, emalloc,
sapi_read_post_block, and simdjson::SIMDJSON_PADDING to locate and implement
this change.

In `@simdjson.stub.php`:
- Around line 276-285: The docblock for simdjson_decode_from_input is missing a
one-line summary; add a brief description above the `@param` tags (similar to
simdjson_decode_from_stream) such as "Takes a JSON encoded string from input and
converts it into a PHP variable." Ensure it sits before the parameter
descriptions and keeps existing `@return` and `@throws` intact so the docblock
format and tags for simdjson_decode_from_input remain consistent with other
decoder functions.
- Around line 227-231: The exported constant SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS
uses a different prefix than other PHP-specific error constants; rename the
declaration SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS to
SIMDJSON_ERR_INPUT_SIZE_EXCEEDS (and update its `@cvalue` if needed) to match the
existing pattern, and search/replace any references to
SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS in the codebase (or alternatively add a
short comment documenting an intentional prefix difference if you must keep the
original name).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c5e03444-3d62-4dde-889a-7ca09d33750a

📥 Commits

Reviewing files that changed from the base of the PR and between 330066e and 8766e72.

📒 Files selected for processing (12)
  • php_simdjson.cpp
  • php_simdjson.h
  • simdjson.stub.php
  • simdjson_arginfo.h
  • tests/input/post.phpt
  • tests/input/post_bigger.phpt
  • tests/input/post_disable_reading.phpt
  • tests/input/post_disable_reading_read_whole.phpt
  • tests/input/post_empty.phpt
  • tests/input/post_empty_disable_reading.phpt
  • tests/input/post_max_size.phpt
  • tests/input/put.phpt

@@ -0,0 +1,15 @@
--TEST--
simdjson_decode_from_input empty POST
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use a unique --TEST-- title for this file.
simdjson_decode_from_input empty POST is reused in another test file, which makes failure output ambiguous.

Suggested update
-simdjson_decode_from_input empty POST
+simdjson_decode_from_input empty POST with disabled post reading
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
simdjson_decode_from_input empty POST
simdjson_decode_from_input empty POST with disabled post reading
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/input/post_empty_disable_reading.phpt` at line 2, The TEST header title
in tests/input/post_empty_disable_reading.phpt is not unique
("simdjson_decode_from_input empty POST") and should be changed to a distinct
name; update the `--TEST--` section in this file to a unique descriptive title
(e.g., include "disable_reading" or the filename) so its failures are
unambiguous when compared to other tests that reuse the same title.

@JakubOnderka JakubOnderka force-pushed the read-optim branch 3 times, most recently from 964e913 to 3cb5f06 Compare April 10, 2026 12:59
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@php_simdjson.cpp`:
- Around line 374-377: The allocation and read use
SG(request_info).content_length + simdjson::SIMDJSON_PADDING without overflow
checks and call sapi_read_post_block once (in php_simdjson.cpp where json is
emalloc(...) and len = sapi_read_post_block(...)), which can overflow for
attacker-controlled content_length and may return partial reads; fix by
validating and bounding content_length before any arithmetic (e.g., ensure
content_length <= SIZE_MAX - simdjson::SIMDJSON_PADDING and also enforce a sane
max like post_max_size or a hard upper bound when SG(post_max_size) <= 0),
allocate using the safely checked size, then replace the single
sapi_read_post_block call with a loop that repeatedly calls sapi_read_post_block
until the requested content_length bytes are read or an error/EOF occurs,
updating an offset and accumulating len, and handle/read error returns and
truncated bodies robustly (free on error and return appropriate failure).

In `@simdjson.stub.php`:
- Around line 281-283: The PHPDoc for simdjson_decode_from_input incorrectly
mentions a `$json` parameter; update the docblock for simdjson_decode_from_input
to replace the stale “$json” wording with the correct source (e.g., “input
stream” or “request input”) and ensure the `@throws` description still documents
SimdJsonDecoderException (for invalid JSON, oversized input >4GB, or
out-of-range numbers) and ValueError for invalid $depth; locate the docblock
near the simdjson_decode_from_input declaration and adjust the text to reference
reading from input instead of a $json parameter.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 994416de-4364-48f2-967a-b35a001bf48a

📥 Commits

Reviewing files that changed from the base of the PR and between 8766e72 and 964e913.

📒 Files selected for processing (13)
  • .github/workflows/integration.yml
  • php_simdjson.cpp
  • php_simdjson.h
  • simdjson.stub.php
  • simdjson_arginfo.h
  • tests/input/post.phpt
  • tests/input/post_bigger.phpt
  • tests/input/post_disable_reading.phpt
  • tests/input/post_disable_reading_read_whole.phpt
  • tests/input/post_empty.phpt
  • tests/input/post_empty_disable_reading.phpt
  • tests/input/post_max_size.phpt
  • tests/input/put.phpt
✅ Files skipped from review due to trivial changes (10)
  • tests/input/post_disable_reading.phpt
  • tests/input/post.phpt
  • php_simdjson.h
  • tests/input/post_disable_reading_read_whole.phpt
  • .github/workflows/integration.yml
  • tests/input/put.phpt
  • tests/input/post_empty_disable_reading.phpt
  • tests/input/post_max_size.phpt
  • tests/input/post_bigger.phpt
  • tests/input/post_empty.phpt

Comment on lines +374 to +377
json = (char*) emalloc(SG(request_info).content_length + simdjson::SIMDJSON_PADDING);

// read data from sapi, read one byte more so we can check if provided data exceeds content-length
len = sapi_read_post_block(json, SG(request_info).content_length + 1);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Extension code around Line 374-377:"
sed -n '350,390p' php_simdjson.cpp

echo
echo "PHP core behavior for sapi_read_post_block (partial-read semantics):"
curl -sL https://raw.githubusercontent.com/php/php-src/master/main/SAPI.c \
  | rg -n "SAPI_API size_t sapi_read_post_block|read_bytes < buflen|sapi_read_standard_form_data" -A8 -B3

Repository: JakubOnderka/simdjson_php

Length of output: 3110


🏁 Script executed:

#!/bin/bash

echo "=== Finding type definitions for content_length ==="
rg -n "content_length" php_simdjson.cpp | head -20

echo
echo "=== Looking for SIMDJSON_PADDING definition ==="
rg -n "SIMDJSON_PADDING" php_simdjson.cpp

echo
echo "=== Checking complete function context (lines 350-400) ==="
sed -n '350,400p' php_simdjson.cpp

echo
echo "=== Checking zend_long type definition ==="
rg -n "typedef.*zend_long|#define.*zend_long" | head -5

Repository: JakubOnderka/simdjson_php

Length of output: 3639


🏁 Script executed:

#!/bin/bash

echo "=== Checking post_max_size usage and limits ==="
grep -n "post_max_size" php_simdjson.cpp

echo
echo "=== Looking for any loop/retry logic for sapi_read_post_block ==="
rg -B5 -A10 "sapi_read_post_block" php_simdjson.cpp

echo
echo "=== Checking if request_body stream reading exists ==="
rg -n "php_stream.*body|request_body" php_simdjson.cpp | head -20

Repository: JakubOnderka/simdjson_php

Length of output: 1592


🏁 Script executed:

#!/bin/bash

echo "=== Checking includes and simdjson headers ==="
head -30 php_simdjson.cpp

echo
echo "=== Looking for any simdjson constant definitions ==="
rg -n "SIMDJSON_PADDING|SIMDJSON_.*64|SIMDJSON_.*32" --type cpp | head -20

echo
echo "=== Searching for where SIMDJSON_PADDING might be imported from ==="
rg -n "#include.*simdjson" php_simdjson.cpp

Repository: JakubOnderka/simdjson_php

Length of output: 3402


🌐 Web query:

simdjson SIMDJSON_PADDING constant value

💡 Result:

Web search failed: Server error: no LLM provider could handle the message


🏁 Script executed:

#!/bin/bash

echo "=== Searching for SIMDJSON_PADDING definition in src/ ==="
rg -n "SIMDJSON_PADDING\s*=" src/ | head -10

echo
echo "=== Checking src/simdjson.h header ==="
rg -B3 -A3 "SIMDJSON_PADDING" src/simdjson.h | head -30

echo
echo "=== Searching in all .h files for SIMDJSON_PADDING ==="
find . -name "*.h" -exec grep -l "SIMDJSON_PADDING" {} \; 2>/dev/null

echo
echo "=== Checking how line 161 handles overflow before erealloc ==="
sed -n '155,175p' php_simdjson.cpp

Repository: JakubOnderka/simdjson_php

Length of output: 2072


Harden size arithmetic and fully drain request body reads.

Line 374 and line 377 add to content_length without overflow guards. Although a post_max_size check exists at line 362, it only applies when SG(post_max_size) > 0; when disabled, an attacker-controlled content_length near SIZE_MAX can cause integer overflow in the emalloc and sapi_read_post_block calls.

Additionally, sapi_read_post_block (PHP core SAPI.c lines 232–252) can return fewer bytes than requested. A single call at line 377 means the JSON body may be truncated if the underlying SAPI implementation returns partial data, leading to data loss.

🛠️ Suggested fix sketch
+        const zend_long content_length = SG(request_info).content_length;
+        if (UNEXPECTED((size_t)content_length > SIZE_MAX - simdjson::SIMDJSON_PADDING - 1)) {
+            zend_throw_exception_ex(simdjson_decoder_exception_ce, SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS,
+                "POST Content-Length is too large");
+            RETURN_THROWS();
+        }
+
-        json = (char*) emalloc(SG(request_info).content_length + simdjson::SIMDJSON_PADDING);
+        json = (char*) emalloc((size_t)content_length + simdjson::SIMDJSON_PADDING);

-        len = sapi_read_post_block(json, SG(request_info).content_length + 1);
+        len = 0;
+        size_t target = (size_t)content_length + 1; // +1 sentinel byte for overflow detection
+        while (len < target) {
+            size_t n = sapi_read_post_block(json + len, target - len);
+            if (n == 0) {
+                break;
+            }
+            len += n;
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@php_simdjson.cpp` around lines 374 - 377, The allocation and read use
SG(request_info).content_length + simdjson::SIMDJSON_PADDING without overflow
checks and call sapi_read_post_block once (in php_simdjson.cpp where json is
emalloc(...) and len = sapi_read_post_block(...)), which can overflow for
attacker-controlled content_length and may return partial reads; fix by
validating and bounding content_length before any arithmetic (e.g., ensure
content_length <= SIZE_MAX - simdjson::SIMDJSON_PADDING and also enforce a sane
max like post_max_size or a hard upper bound when SG(post_max_size) <= 0),
allocate using the safely checked size, then replace the single
sapi_read_post_block call with a loop that repeatedly calls sapi_read_post_block
until the requested content_length bytes are read or an error/EOF occurs,
updating an offset and accumulating len, and handle/read error returns and
truncated bodies robustly (free on error and return appropriate failure).

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
simdjson.stub.php (1)

281-282: ⚠️ Potential issue | 🟡 Minor

Fix the stale $json wording in the input decoder docs.

simdjson_decode_from_input() has no $json parameter, so this @throws text is misleading.

✏️ Suggested doc fix
- * `@throws` SimdJsonDecoderException for invalid JSON
- *                           (or $json over 4GB long, or out of range integer/float)
+ * `@throws` SimdJsonDecoderException for invalid JSON input body
+ *                           (or input body over 4GB long, or out of range integer/float)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@simdjson.stub.php` around lines 281 - 282, The docblock for
simdjson_decode_from_input incorrectly mentions a non-existent $json parameter
in the `@throws` text; update the `@throws` description for SimdJsonDecoderException
in simdjson_decode_from_input to refer to the input source (e.g., "for invalid
JSON (or input over 4GB long, or out of range integer/float)") or otherwise
remove the `$json` variable name so the exception message accurately describes
errors from the input decoder.
php_simdjson.cpp (1)

374-383: ⚠️ Potential issue | 🟠 Major

Do not treat a single sapi_read_post_block() call as a full-body read.

Line 375 assumes one sapi_read_post_block() call drains the request body. That API can return a short read, so this path can parse truncated JSON or accept a valid prefix when len < content_length. Keep reading until you hit content_length + 1 or EOF, and reject short reads explicitly.

Verify against PHP core's implementation of sapi_read_post_block(). Expected result: supported php-src branches show that the function returns the bytes from a single read and marks end-of-input when read_bytes < buflen, so callers that need the full body must loop.

#!/bin/bash
set -euo pipefail

echo "Current call site in php_simdjson.cpp:"
sed -n '371,383p' php_simdjson.cpp

echo
for branch in PHP-8.4 PHP-8.3 PHP-8.2 PHP-8.1 PHP-8.0; do
  echo "=== php-src ${branch}: sapi_read_post_block ==="
  curl -fsSL "https://raw.githubusercontent.com/php/php-src/${branch}/main/SAPI.c" | awk '
    /SAPI_API size_t sapi_read_post_block/ {capture=1}
    capture {print}
    capture && /^}/ {exit}
  '
  echo
done
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@php_simdjson.cpp` around lines 374 - 383, The code currently assumes a single
sapi_read_post_block() call reads the full body; change the logic in
php_simdjson.cpp to loop calling sapi_read_post_block(json + total_len,
remaining_buf_len) until either you have read SG(request_info).content_length +
1 bytes or the read returns fewer bytes than requested (EOF/short read).
Accumulate into the buffer (track total_len), reallocate if necessary, and after
the loop explicitly reject short reads by throwing the same
simdjson_decoder_exception_ce when total_len < SG(request_info).content_length;
preserve the existing "exceeds" check when total_len >
SG(request_info).content_length. Ensure to efree the json buffer on all
error/throw paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@php_simdjson.cpp`:
- Around line 341-349: The cached-body branch currently bypasses the
post_max_size guard; before calling php_stream_rewind and
simdjson_stream_copy_to_mem<true> check SG(read_post_bytes) (or the cached body
length available on SG(request_info).request_body) against the configured
post_max_size and if it exceeds, call
php_simdjson_throw_jsonexception(SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS) and
RETURN_THROWS(); otherwise proceed to rewind and copy as before so the same
size-check behavior as the non-cached path (lines using SG(read_post_bytes) and
simdjson_stream_copy_to_mem<true>) is preserved.

In `@simdjson.stub.php`:
- Around line 227-231: The stub exposes a constant with the internal prefix
(SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS) which leaks internals and is inconsistent
with other public decoder error constants; rename the exported stub constant to
SIMDJSON_ERR_INPUT_SIZE_EXCEEDS while keeping the cvalue mapping to the internal
symbol (i.e., keep `@cvalue` SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS) so the runtime
value is unchanged, update the docblock/constant name in simdjson.stub.php
accordingly, and ensure no other code references the old stub name.

---

Duplicate comments:
In `@php_simdjson.cpp`:
- Around line 374-383: The code currently assumes a single
sapi_read_post_block() call reads the full body; change the logic in
php_simdjson.cpp to loop calling sapi_read_post_block(json + total_len,
remaining_buf_len) until either you have read SG(request_info).content_length +
1 bytes or the read returns fewer bytes than requested (EOF/short read).
Accumulate into the buffer (track total_len), reallocate if necessary, and after
the loop explicitly reject short reads by throwing the same
simdjson_decoder_exception_ce when total_len < SG(request_info).content_length;
preserve the existing "exceeds" check when total_len >
SG(request_info).content_length. Ensure to efree the json buffer on all
error/throw paths.

In `@simdjson.stub.php`:
- Around line 281-282: The docblock for simdjson_decode_from_input incorrectly
mentions a non-existent $json parameter in the `@throws` text; update the `@throws`
description for SimdJsonDecoderException in simdjson_decode_from_input to refer
to the input source (e.g., "for invalid JSON (or input over 4GB long, or out of
range integer/float)") or otherwise remove the `$json` variable name so the
exception message accurately describes errors from the input decoder.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 80b55389-d71f-4bce-9f71-5b11c5f8725a

📥 Commits

Reviewing files that changed from the base of the PR and between 964e913 and 3cb5f06.

📒 Files selected for processing (13)
  • .github/workflows/integration.yml
  • php_simdjson.cpp
  • php_simdjson.h
  • simdjson.stub.php
  • simdjson_arginfo.h
  • tests/input/post.phpt
  • tests/input/post_bigger.phpt
  • tests/input/post_disable_reading.phpt
  • tests/input/post_disable_reading_read_whole.phpt
  • tests/input/post_empty.phpt
  • tests/input/post_empty_disable_reading.phpt
  • tests/input/post_max_size.phpt
  • tests/input/put.phpt
✅ Files skipped from review due to trivial changes (10)
  • tests/input/post.phpt
  • tests/input/post_empty.phpt
  • tests/input/post_disable_reading.phpt
  • php_simdjson.h
  • tests/input/put.phpt
  • tests/input/post_empty_disable_reading.phpt
  • .github/workflows/integration.yml
  • tests/input/post_disable_reading_read_whole.phpt
  • tests/input/post_bigger.phpt
  • tests/input/post_max_size.phpt

@JakubOnderka JakubOnderka force-pushed the read-optim branch 2 times, most recently from 304c5ab to 7833cf7 Compare April 10, 2026 14:16
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (4)
php_simdjson.cpp (2)

341-349: ⚠️ Potential issue | 🟠 Major

Reapply the size limit when request_body is already populated.

The cached-body branch skips the SG(post_max_size) check from the fresh-read path, so a pre-buffered request above the configured limit can still be copied and parsed. This should reject oversize input before simdjson_stream_copy_to_mem<true>() the same way the uncached branch does.

🛡️ Suggested guard
     if ((body = SG(request_info).request_body)) { // request body was already processed
+        if (SG(post_max_size) > 0 && SG(read_post_bytes) > (size_t) SG(post_max_size)) {
+            php_simdjson_throw_jsonexception(SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS);
+            RETURN_THROWS();
+        }
+
         // rewind to start of the request
         php_stream_rewind(body);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@php_simdjson.cpp` around lines 341 - 349, The cached-body branch using
SG(request_info).request_body skips the post size guard; before calling
simdjson_stream_copy_to_mem<true>(...) re-check SG(post_max_size) against
SG(read_post_bytes) and reject oversized input the same way the fresh-read path
does (invoke php_simdjson_throw_jsonexception or RETURN_THROWS when the size
exceeds SG(post_max_size)). Ensure you perform this check in the same block that
handles the cached body (around the code referencing
SG(request_info).request_body and simdjson_stream_copy_to_mem<true>) so
pre-buffered requests cannot bypass the limit.

375-390: ⚠️ Potential issue | 🔴 Critical

Drain sapi_read_post_block() until the declared body is fully read.

A single sapi_read_post_block() call is not guaranteed to return the full request body. If it short-reads, this function writes only the prefix into request_body and parses a truncated document, which can turn a valid body into a spurious decode error or silently drop trailing bytes.

🛠️ Fix sketch
-        len = sapi_read_post_block(json, SG(request_info).content_length + 1);
+        const size_t expected = (size_t) SG(request_info).content_length;
+        while (len < expected + 1) {
+            const size_t read = sapi_read_post_block(json + len, (expected + 1) - len);
+            if (read == 0) {
+                break;
+            }
+            len += read;
+        }
+
+        if (UNEXPECTED(len < expected)) {
+            efree(json);
+            zend_throw_exception_ex(simdjson_decoder_exception_ce, simdjson::IO_ERROR,
+                "Failed to read the full request body"
+            );
+            RETURN_THROWS();
+        }

Use this read-only check against official php-src to confirm the API contract. The first grep should show sapi_read_post_block() returning a single block per call; the second should show core callers wrapping it in a loop when they need the full body.

#!/bin/bash
set -euo pipefail

tmp="$(mktemp -d)"
trap 'rm -rf "$tmp"' EXIT

curl -fsSL https://raw.githubusercontent.com/php/php-src/master/main/SAPI.c -o "$tmp/SAPI.c"
curl -fsSL https://raw.githubusercontent.com/php/php-src/master/main/php_variables.c -o "$tmp/php_variables.c"

echo "=== sapi_read_post_block implementation ==="
rg -n -C4 'SAPI_API size_t sapi_read_post_block|read_bytes < buflen|return read_bytes' "$tmp/SAPI.c"

echo
echo "=== core callers that use sapi_read_post_block ==="
rg -n -C4 'sapi_read_post_block\(' "$tmp/php_variables.c"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@php_simdjson.cpp` around lines 375 - 390, The current single-call to
sapi_read_post_block may short-read; change the read logic in php_simdjson.cpp
so you repeatedly call sapi_read_post_block into json+bytes_read until you have
attempted to read SG(request_info).content_length + 1 bytes or the call returns
0 (EOF). Track a total bytes_read variable (instead of overwriting len), pass
the remaining buffer size to each sapi_read_post_block call, and after the loop
check if bytes_read > SG(request_info).content_length to efree(json) and
zend_throw_exception_ex(simdjson_decoder_exception_ce,
SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS, ...) as before; then use
php_stream_write(body, json, bytes_read) (and compare bytes_written to
bytes_read) so request_body receives the entire payload. Ensure you still
allocate json with SG(request_info).content_length + simdjson::SIMDJSON_PADDING
and handle zero/EOF and error returns from sapi_read_post_block properly.
simdjson.stub.php (2)

276-285: ⚠️ Potential issue | 🟡 Minor

Document this overload as reading from request input.

The new docblock still reads like the string-based decoder. Generated docs/IDE hints should say this reads the current request body, and the @throws text should reference the input body rather than a generic JSON string.

✏️ Suggested doc update
 /**
+ * Reads JSON from the current request body and converts it into a PHP variable.
+ * Similar to json_decode()
+ *
  * `@param` bool $associative When true, JSON objects will be returned as associative arrays.
  *                          When false, JSON objects will be returned as objects.
  * `@param` int $depth the maximum nesting depth of the structure being decoded.
  * `@return` array|stdClass|string|float|int|bool|null
- * `@throws` SimdJsonDecoderException for invalid JSON
- *                           (or JSON over 4GB long, or out of range integer/float)
+ * `@throws` SimdJsonDecoderException for invalid JSON in the request body
+ *                           (or request body over 4GB long, or out of range integer/float)
  * `@throws` ValueError for invalid $depth
  */
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@simdjson.stub.php` around lines 276 - 285, Update the docblock for
simdjson_decode_from_input to state that it reads and decodes the current
request/input body (not a JSON string parameter), clarify the associative and
depth parameter meanings remain the same, and adjust the `@throws` descriptions to
reference failures decoding the request input/body (e.g., invalid JSON in
request body, body over 4GB, or out of range integer/float) and ValueError for
invalid $depth so IDE/docs accurately reflect this overload's behavior.

227-231: ⚠️ Potential issue | 🟠 Major

Keep the new input-size error under the public SIMDJSON_ERR_* API.

Every other decoder error constant exported here uses SIMDJSON_ERR_*. Exposing SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS leaks the internal prefix into userland and makes this new public symbol inconsistent from day one.

🔧 Suggested rename
 /**
  * `@var` int
  * `@cvalue` SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS
  */
-const SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS = UNKNOWN;
+const SIMDJSON_ERR_INPUT_SIZE_EXCEEDS = UNKNOWN;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@simdjson.stub.php` around lines 227 - 231, The new public error constant uses
the internal-prefixed name SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS; rename it to
SIMDJSON_ERR_INPUT_SIZE_EXCEEDS so it matches the existing public API (replace
the constant name and the `@cvalue` macro reference from
SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS to SIMDJSON_ERR_INPUT_SIZE_EXCEEDS in the
declaration) and update any other occurrences in this file that reference the
old symbol to use SIMDJSON_ERR_INPUT_SIZE_EXCEEDS.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@php_simdjson.cpp`:
- Around line 341-349: The cached-body branch using
SG(request_info).request_body skips the post size guard; before calling
simdjson_stream_copy_to_mem<true>(...) re-check SG(post_max_size) against
SG(read_post_bytes) and reject oversized input the same way the fresh-read path
does (invoke php_simdjson_throw_jsonexception or RETURN_THROWS when the size
exceeds SG(post_max_size)). Ensure you perform this check in the same block that
handles the cached body (around the code referencing
SG(request_info).request_body and simdjson_stream_copy_to_mem<true>) so
pre-buffered requests cannot bypass the limit.
- Around line 375-390: The current single-call to sapi_read_post_block may
short-read; change the read logic in php_simdjson.cpp so you repeatedly call
sapi_read_post_block into json+bytes_read until you have attempted to read
SG(request_info).content_length + 1 bytes or the call returns 0 (EOF). Track a
total bytes_read variable (instead of overwriting len), pass the remaining
buffer size to each sapi_read_post_block call, and after the loop check if
bytes_read > SG(request_info).content_length to efree(json) and
zend_throw_exception_ex(simdjson_decoder_exception_ce,
SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS, ...) as before; then use
php_stream_write(body, json, bytes_read) (and compare bytes_written to
bytes_read) so request_body receives the entire payload. Ensure you still
allocate json with SG(request_info).content_length + simdjson::SIMDJSON_PADDING
and handle zero/EOF and error returns from sapi_read_post_block properly.

In `@simdjson.stub.php`:
- Around line 276-285: Update the docblock for simdjson_decode_from_input to
state that it reads and decodes the current request/input body (not a JSON
string parameter), clarify the associative and depth parameter meanings remain
the same, and adjust the `@throws` descriptions to reference failures decoding the
request input/body (e.g., invalid JSON in request body, body over 4GB, or out of
range integer/float) and ValueError for invalid $depth so IDE/docs accurately
reflect this overload's behavior.
- Around line 227-231: The new public error constant uses the internal-prefixed
name SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS; rename it to
SIMDJSON_ERR_INPUT_SIZE_EXCEEDS so it matches the existing public API (replace
the constant name and the `@cvalue` macro reference from
SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS to SIMDJSON_ERR_INPUT_SIZE_EXCEEDS in the
declaration) and update any other occurrences in this file that reference the
old symbol to use SIMDJSON_ERR_INPUT_SIZE_EXCEEDS.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 30555d31-6618-401e-a805-e05d8f1491e2

📥 Commits

Reviewing files that changed from the base of the PR and between 3cb5f06 and 304c5ab.

📒 Files selected for processing (13)
  • .github/workflows/integration.yml
  • php_simdjson.cpp
  • php_simdjson.h
  • simdjson.stub.php
  • simdjson_arginfo.h
  • tests/input/post.phpt
  • tests/input/post_bigger.phpt
  • tests/input/post_disable_reading.phpt
  • tests/input/post_disable_reading_read_whole.phpt
  • tests/input/post_empty.phpt
  • tests/input/post_empty_disable_reading.phpt
  • tests/input/post_max_size.phpt
  • tests/input/put.phpt
✅ Files skipped from review due to trivial changes (10)
  • tests/input/post_empty.phpt
  • tests/input/post_empty_disable_reading.phpt
  • tests/input/post_disable_reading.phpt
  • php_simdjson.h
  • .github/workflows/integration.yml
  • tests/input/put.phpt
  • tests/input/post.phpt
  • tests/input/post_max_size.phpt
  • tests/input/post_disable_reading_read_whole.phpt
  • tests/input/post_bigger.phpt

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
php_simdjson.cpp (2)

341-349: ⚠️ Potential issue | 🟠 Major

Reapply the size guard when reusing SG(request_info).request_body.

Once SG(request_info).request_body is already populated, Lines 341-349 skip the post_max_size rejection used on the direct-read path. That makes limit enforcement depend on whether some earlier code path filled the cache first.

🔧 Suggested fix
     if ((body = SG(request_info).request_body)) { // request body was already processed
+        if (SG(post_max_size) > 0 && SG(read_post_bytes) > (size_t) SG(post_max_size)) {
+            zend_throw_exception_ex(simdjson_decoder_exception_ce, SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS,
+                "Request body exceeds the configured post_max_size"
+            );
+            RETURN_THROWS();
+        }
+
         // rewind to start of the request
         php_stream_rewind(body);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@php_simdjson.cpp` around lines 341 - 349, When reusing
SG(request_info).request_body the post_max_size guard is skipped; reapply the
same size check used on the direct-read path before calling
simdjson_stream_copy_to_mem<true>. Specifically, when
SG(request_info).request_body is non-NULL, compare SG(read_post_bytes) (or the
body length you will read) against the configured post_max_size and return/throw
the same error path if it exceeds the limit, then proceed to php_stream_rewind
and simdjson_stream_copy_to_mem<true> only after the size guard passes.

375-390: ⚠️ Potential issue | 🟠 Major

Drain sapi_read_post_block() before parsing.

A single sapi_read_post_block() call at Line 378 can return fewer bytes than requested, and PHP core's own form-data reader handles that by looping until it sees a short read. Stopping after one call here can parse and cache a truncated body instead of the full request. (raw.githubusercontent.com)

Run this to inspect the upstream implementation that drives the short-read behavior:

#!/bin/bash
set -euo pipefail

tmp="$(mktemp)"
trap 'rm -f "$tmp"' EXIT

curl -sL https://raw.githubusercontent.com/php/php-src/master/main/SAPI.c > "$tmp"

echo "=== sapi_read_post_block ==="
rg -n -A12 -B2 'SAPI_API size_t sapi_read_post_block' "$tmp"

echo
echo "=== sapi_read_standard_form_data ==="
rg -n -A25 -B2 'SAPI_API SAPI_POST_READER_FUNC\(sapi_read_standard_form_data\)' "$tmp"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@php_simdjson.cpp` around lines 375 - 390, The code currently calls
sapi_read_post_block() once and may get a short read; update the logic around
json/len to loop and drain the POST body until either the requested bytes are
read or a short read occurs: allocate the buffer as before (json), then
repeatedly call sapi_read_post_block(target_ptr, remaining + 1) (or just
remaining) appending each chunk (advance target_ptr, subtract from remaining,
add to total_read) until total_read == SG(request_info).content_length or a read
returns 0 or less than requested; after each chunk check if total_read >
SG(request_info).content_length to trigger the existing size-exceeds error
(using simdjson_decoder_exception_ce and SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS),
and finally use php_stream_write(body, json, total_read) and compare against
total_read. Reference symbols: sapi_read_post_block, json, len (use
total_read/remaining), SG(request_info).content_length, php_stream_write,
simdjson_decoder_exception_ce.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@php_simdjson.cpp`:
- Around line 354-366: The error messages currently hard-code "POST" and should
be method-neutral; update the exception strings used where
SG(request_info).content_length is validated: replace "POST Content-Length is
required" in the zend_throw_exception call and the "POST Content-Length of ..."
format string in zend_throw_exception_ex with a method-neutral phrase such as
"request body Content-Length is required" and "request body Content-Length of
... exceeds the limit of ...", preserving the same format specifiers and error
codes (e.g., SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS) and leaving the surrounding
logic around SG(request_info).content_length and SG(post_max_size unchanged.

---

Duplicate comments:
In `@php_simdjson.cpp`:
- Around line 341-349: When reusing SG(request_info).request_body the
post_max_size guard is skipped; reapply the same size check used on the
direct-read path before calling simdjson_stream_copy_to_mem<true>. Specifically,
when SG(request_info).request_body is non-NULL, compare SG(read_post_bytes) (or
the body length you will read) against the configured post_max_size and
return/throw the same error path if it exceeds the limit, then proceed to
php_stream_rewind and simdjson_stream_copy_to_mem<true> only after the size
guard passes.
- Around line 375-390: The code currently calls sapi_read_post_block() once and
may get a short read; update the logic around json/len to loop and drain the
POST body until either the requested bytes are read or a short read occurs:
allocate the buffer as before (json), then repeatedly call
sapi_read_post_block(target_ptr, remaining + 1) (or just remaining) appending
each chunk (advance target_ptr, subtract from remaining, add to total_read)
until total_read == SG(request_info).content_length or a read returns 0 or less
than requested; after each chunk check if total_read >
SG(request_info).content_length to trigger the existing size-exceeds error
(using simdjson_decoder_exception_ce and SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS),
and finally use php_stream_write(body, json, total_read) and compare against
total_read. Reference symbols: sapi_read_post_block, json, len (use
total_read/remaining), SG(request_info).content_length, php_stream_write,
simdjson_decoder_exception_ce.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a17c9d30-1138-41a5-81e3-220b16eb8763

📥 Commits

Reviewing files that changed from the base of the PR and between 304c5ab and 7833cf7.

📒 Files selected for processing (13)
  • .github/workflows/integration.yml
  • php_simdjson.cpp
  • php_simdjson.h
  • simdjson.stub.php
  • simdjson_arginfo.h
  • tests/input/post.phpt
  • tests/input/post_bigger.phpt
  • tests/input/post_disable_reading.phpt
  • tests/input/post_disable_reading_read_whole.phpt
  • tests/input/post_empty.phpt
  • tests/input/post_empty_disable_reading.phpt
  • tests/input/post_max_size.phpt
  • tests/input/put.phpt
✅ Files skipped from review due to trivial changes (8)
  • tests/input/post.phpt
  • tests/input/post_bigger.phpt
  • tests/input/post_empty_disable_reading.phpt
  • php_simdjson.h
  • tests/input/put.phpt
  • tests/input/post_disable_reading_read_whole.phpt
  • tests/input/post_max_size.phpt
  • tests/input/post_empty.phpt
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/input/post_disable_reading.phpt
  • .github/workflows/integration.yml

php_simdjson.cpp Outdated
Comment on lines +354 to +366
} else if (SG(request_info).content_length < 0) {
// I am not sure if content-length can be negative, so for now just throw exception
zend_throw_exception(simdjson_decoder_exception_ce, "POST Content-Length is required", SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS);
RETURN_THROWS();
} else if (SG(request_info).content_length > 0xFFFFFFFF) {
php_simdjson_throw_jsonexception(simdjson::CAPACITY);
RETURN_THROWS();
}

if (SG(post_max_size) > 0 && SG(request_info).content_length > SG(post_max_size)) {
zend_throw_exception_ex(simdjson_decoder_exception_ce, SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS,
"POST Content-Length of " ZEND_LONG_FMT " bytes exceeds the limit of " ZEND_LONG_FMT " bytes",
SG(request_info).content_length, SG(post_max_size)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use request-body wording in these exceptions.

Both messages hard-code POST, but this entry point is request-body based. Method-neutral wording keeps failures accurate for every caller.

✏️ Suggested wording
-            zend_throw_exception(simdjson_decoder_exception_ce, "POST Content-Length is required", SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS);
+            zend_throw_exception(simdjson_decoder_exception_ce, "Content-Length is required for request body decoding", SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS);
...
-                "POST Content-Length of " ZEND_LONG_FMT " bytes exceeds the limit of " ZEND_LONG_FMT " bytes",
+                "Request body Content-Length of " ZEND_LONG_FMT " bytes exceeds the limit of " ZEND_LONG_FMT " bytes",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@php_simdjson.cpp` around lines 354 - 366, The error messages currently
hard-code "POST" and should be method-neutral; update the exception strings used
where SG(request_info).content_length is validated: replace "POST Content-Length
is required" in the zend_throw_exception call and the "POST Content-Length of
..." format string in zend_throw_exception_ex with a method-neutral phrase such
as "request body Content-Length is required" and "request body Content-Length of
... exceeds the limit of ...", preserving the same format specifiers and error
codes (e.g., SIMDJSON_PHP_ERR_INPUT_SIZE_EXCEEDS) and leaving the surrounding
logic around SG(request_info).content_length and SG(post_max_size unchanged.

@JakubOnderka JakubOnderka merged commit 08f6dce into master Apr 12, 2026
57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant