test(plan): port 19 [post with Plan] tests (#55)#75
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 10 minutes and 34 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughAdds upstream “[post with Plan]” tests and updates Plan behavior: targeted task updates by id, synchronous FIFO edit chaining to prevent out-of-order edits under concurrency, and propagation of adapter edit failures to callers; tests and CHANGELOG updated accordingly. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Plan
participant Queue as "Edit Queue (chained tail)"
participant Adapter
Client->>Plan: add_task(input)
Note right of Plan: capture current Queue tail synchronously
Plan->>Queue: attach chained runner (returns chained promise)
Client->>Plan: await chained promise
Queue->>Adapter: edit_object(plan_payload)
Adapter-->>Queue: success / error
alt success
Queue->>Plan: apply adapter result -> update model
Plan-->>Client: resolve chained promise with updated task
else error
Queue->>Plan: log internal warning (internal tail swallows)
Plan-->>Client: reject chained promise with adapter error
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Code Review
This pull request ports 19 tests related to the 'post with Plan' functionality from TypeScript to Python to improve test fidelity, adding a new TestPostWithPlan class and updating the CHANGELOG. Feedback focuses on improving code structure by moving imports to the top of the file, refactoring repetitive setup logic into pytest fixtures, and using exact assertions for mock call counts to make tests more robust.
| # it("should post fallback text when adapter does not support plans") | ||
| @pytest.mark.asyncio | ||
| async def test_should_post_fallback_text_when_adapter_does_not_support_plans(self): | ||
| from chat_sdk.plan import Plan, StartPlanOptions |
There was a problem hiding this comment.
The imports from chat_sdk.plan are repeated in almost every test method in this class. To improve maintainability and follow PEP 8, these should be moved to the top of the file.
References
- According to PEP 8, imports should always be placed at the top of the file, just after any module comments and docstrings, and before module globals and constants. (link)
There was a problem hiding this comment.
Deferred — test boilerplate helper would save a few lines but masks which mocks each test uses. Tests are intentionally self-contained for faithful upstream parity.
| adapter = create_mock_adapter() | ||
| state = create_mock_state() | ||
| thread = _make_thread(adapter, state) |
There was a problem hiding this comment.
Deferred — same reason: explicit setup per test preserves readability and 1:1 correspondence with upstream TS test bodies.
| assert task is not None | ||
| assert task.title == "Fetch data" | ||
| assert task.status == "in_progress" | ||
| assert edit_object.await_count >= 1 |
There was a problem hiding this comment.
The assertion assert edit_object.await_count >= 1 is loose. Since add_task is awaited and it internally awaits the edit task, the count should be exactly 1 here. Using exact counts makes the test more robust against unexpected extra calls.
| assert edit_object.await_count >= 1 | |
| assert edit_object.await_count == 1 |
There was a problem hiding this comment.
Deferred — await_count >= 1 is the right assertion: the test asserts editObject was called, and upstream has no strict count requirement. Tighter would diverge from upstream intent.
| updated = await plan.update_task("Got result: 42") | ||
|
|
||
| assert updated is not None | ||
| assert edit_object.await_count >= 1 |
There was a problem hiding this comment.
There was a problem hiding this comment.
Deferred — same as above.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/test_thread_faithful.py (1)
2917-2925: Replaceassert Truestubs withpytest.skip()for clarity.These absorber tests violate the coding guideline to avoid
assert Truestubs. Since JSX is TypeScript-only and has no Python equivalent, usingpytest.skip()with an explanation would better communicate intent and satisfy fidelity tracking without creating tests that always pass.♻️ Proposed refactor
class TestMissingAbsorbers: """Fidelity-check absorbers for TS test names that have no Python equivalent.""" # JSX-specific tests: Python has no JSX runtime, so these remain as absorbers. # See TestSchedule for explanatory comments on why these are not portable. def test_should_convert_jsx_card_elements_to_cardelement_before_passing_to_adapter(self): - assert True + pytest.skip("JSX Card/CardElement conversion is TypeScript-only; no Python equivalent") def test_should_convert_card_jsx_with_children_to_cardelement(self): - assert True + pytest.skip("JSX Card rendering with children is TypeScript-only; no Python equivalent")As per coding guidelines: "Every test must fail when the code is wrong. Avoid
assert Truestubs."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_thread_faithful.py` around lines 2917 - 2925, Replace the `assert True` stub in the absorber test inside class TestMissingAbsorbers (method test_should_convert_jsx_card_elements_to_cardelement_before_passing_to_adapter) with a proper skip: import pytest if not already present and call pytest.skip(...) with a concise explanation that this test is TypeScript/JSX-only and has no Python equivalent; ensure the skip message clearly states why the test is skipped to satisfy the fidelity guideline.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/test_thread_faithful.py`:
- Around line 2917-2925: Replace the `assert True` stub in the absorber test
inside class TestMissingAbsorbers (method
test_should_convert_jsx_card_elements_to_cardelement_before_passing_to_adapter)
with a proper skip: import pytest if not already present and call
pytest.skip(...) with a concise explanation that this test is
TypeScript/JSX-only and has no Python equivalent; ensure the skip message
clearly states why the test is skipped to satisfy the fidelity guideline.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: e718b771-e977-4028-b472-a7dfe8b563f2
📒 Files selected for processing (2)
CHANGELOG.mdtests/test_thread_faithful.py
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1826a710e6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # GAP: Python ``UpdateTaskInput`` has no ``id`` field; the TS | ||
| # variant accepts ``{ id, output?, status? }`` to target a specific | ||
| # task. Skipping until ``plan.py`` grows id-based lookup. | ||
| pytest.skip("Python UpdateTaskInput has no 'id' field (gap vs TS); track via follow-up to #55") |
There was a problem hiding this comment.
Keep parity cases executable instead of unconditional skips
These new "post with Plan" parity tests are hard-skipped, which means the most important gap scenarios never execute in CI and can silently stay stale even after plan.py behavior changes. In practice this undermines the stated fidelity closure because future fixes/regressions in these paths will not be detected. Prefer xfail (ideally strict) or assertions of current behavior so the test still runs and signals when the gap is resolved.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Addressed in 6ada545 — the 5 previously-skipped tests were unskipped after fixing 3 upstream-parity gaps (UpdateTaskInput.id, error propagation, edit queue race).
| # swapped the chain out with a rejecting future — the | ||
| # suppression keeps the parity guarantee defensive.) | ||
| with contextlib.suppress(BaseException): | ||
| await prev |
There was a problem hiding this comment.
False positive — await prev under contextlib.suppress(...) has real side effects (drives the previous coroutine to completion).
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/test_thread_faithful.py (1)
1487-1504: Stale docstring: remove or update references to now-fixed gaps.The docstring describes three behavioral divergences that required tests to be skipped:
UpdateTaskInputmissingidfield_enqueue_editswallowing errors- Edit-queue ordering races
Per the commit summary, these gaps have been fixed and the 5 previously-skipped tests are now un-skipped. The docstring should be updated to reflect that these gaps are closed, or the outdated gap descriptions should be removed.
,
📝 Suggested docstring update
"""describe("post with Plan") Ported from TS thread.test.ts to close the fidelity gap tracked in `#55`. - - Note: a few tests in this block expose known behavior gaps between the - current Python ``Plan`` implementation and the upstream TS version: - - * ``UpdateTaskInput`` in ``plan.py`` has no ``id`` field, so looking up - a task by id via ``update_task({"id": ...})`` is not supported. - * ``_enqueue_edit`` swallows adapter errors instead of propagating them - to the caller (upstream returns the chained promise, which rejects). - * The edit chain is rebuilt post-await rather than synchronously, which - does not preserve strict ordering under ``asyncio.gather``. - - Those tests are skipped with a pointer back here so the gaps remain - visible for a follow-up fix rather than silently drifting. + + All 19 tests pass against the current plan.py implementation. """🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_thread_faithful.py` around lines 1487 - 1504, Update the stale class docstring on TestPostWithPlan: remove or rewrite the bullet points that claim gaps in UpdateTaskInput (plan.py), _enqueue_edit, and edit-queue ordering since those issues were fixed and tests un-skipped; keep a brief note if desired about historical context but not as active bugs, and ensure the docstring accurately reflects current behavior of UpdateTaskInput, _enqueue_edit, and the edit ordering (or delete the outdated section entirely).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/test_thread_faithful.py`:
- Around line 1487-1504: Update the stale class docstring on TestPostWithPlan:
remove or rewrite the bullet points that claim gaps in UpdateTaskInput
(plan.py), _enqueue_edit, and edit-queue ordering since those issues were fixed
and tests un-skipped; keep a brief note if desired about historical context but
not as active bugs, and ensure the docstring accurately reflects current
behavior of UpdateTaskInput, _enqueue_edit, and the edit ordering (or delete the
outdated section entirely).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 790a84ff-e5da-4f37-9608-6b3d6a7004da
📒 Files selected for processing (4)
CHANGELOG.mdsrc/chat_sdk/plan.pytests/test_plan.pytests/test_thread_faithful.py
✅ Files skipped from review due to trivial changes (1)
- CHANGELOG.md
| bound.update_chain = loop.create_task(_absorb_for_chain()) | ||
| # ``chained`` preserves upstream semantics: exceptions from the | ||
| # adapter edit propagate to the caller. | ||
| await chained |
There was a problem hiding this comment.
False positive — same as #502 above; await chained has real effect.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/test_thread_faithful.py (1)
3032-3041: Consider usingpytest.skip()instead ofassert Truefor fidelity absorbers.While the docstring clearly documents these as intentional absorbers for JSX-specific tests, using
assert Truetechnically violates the coding guideline to avoidassert Truestubs. Usingpytest.skip("JSX Card not applicable in Python")would be more explicit and align with the pattern used elsewhere for behavioral divergences.Optional refactor
class TestMissingAbsorbers: """Fidelity-check absorbers for TS test names that have no Python equivalent.""" - # JSX-specific tests: Python has no JSX runtime, so these remain as absorbers. - # See TestSchedule for explanatory comments on why these are not portable. - def test_should_convert_jsx_card_elements_to_cardelement_before_passing_to_adapter(self): - assert True + `@pytest.mark.skip`(reason="JSX Card not applicable in Python") + def test_should_convert_jsx_card_elements_to_cardelement_before_passing_to_adapter(self): + pass - def test_should_convert_card_jsx_with_children_to_cardelement(self): - assert True + `@pytest.mark.skip`(reason="JSX Card not applicable in Python") + def test_should_convert_card_jsx_with_children_to_cardelement(self): + pass🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_thread_faithful.py` around lines 3032 - 3041, Replace the assert True stubs in the TestMissingAbsorbers test methods with explicit pytest.skip calls: in test_should_convert_jsx_card_elements_to_cardelement_before_passing_to_adapter and test_should_convert_card_jsx_with_children_to_cardelement, call pytest.skip("JSX Card not applicable in Python") at the start of each test and ensure pytest is imported at the top of the file if not already present.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/chat_sdk/plan.py`:
- Around line 475-476: Replace the overly broad
contextlib.suppress(BaseException) around awaiting the previous task so it only
suppresses regular exceptions; change the suppression to
contextlib.suppress(Exception) (referencing the context where "with
contextlib.suppress(BaseException): await prev" uses the variable prev) so
CancellationError, KeyboardInterrupt and SystemExit can still propagate
consistent with _absorb_for_chain's behavior.
---
Nitpick comments:
In `@tests/test_thread_faithful.py`:
- Around line 3032-3041: Replace the assert True stubs in the
TestMissingAbsorbers test methods with explicit pytest.skip calls: in
test_should_convert_jsx_card_elements_to_cardelement_before_passing_to_adapter
and test_should_convert_card_jsx_with_children_to_cardelement, call
pytest.skip("JSX Card not applicable in Python") at the start of each test and
ensure pytest is imported at the top of the file if not already present.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: c7c487e9-902b-4bfb-b9ae-f691abc3cf9d
📒 Files selected for processing (2)
src/chat_sdk/plan.pytests/test_thread_faithful.py
|
Review verdict: comment, no functional blocker found. Refreshed and reviewed latest head 70df558 against main. Plan behavior tests pass, including edit failure propagation and sequential edit queue. Two stale prose issues should be fixed before merge for parity clarity: tests/test_thread_faithful.py lines 1492-1503 still say UpdateTaskInput.id and edit queue/error propagation are unsupported/skipped, but this PR implements and runs those tests; CHANGELOG.md line 11 says StreamingPlan.update_task(input), but that API does not exist here and the fix is for Plan.update_task(input). Verification: tests/test_plan.py plus tests/test_thread_faithful.py passed (183 passed), focused regression pair passed, and ruff passed. |
Feature in src/chat_sdk/plan.py is already ported; this closes the test-parity gap in tests/test_thread_faithful.py. Closes #55. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- update_task now honors input.id (was only last in-progress) - add_task/update_task propagate adapter.edit_object errors - _enqueue_edit is actually sequential under asyncio.gather Un-skips the 5 previously-skipped tests in TestPostWithPlan. Closes the remaining parity gaps in #55. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Internal chain-absorption coroutine now only catches Exception so CancelledError / KeyboardInterrupt / SystemExit propagate correctly - Remove redundant local `import asyncio` in test_thread_faithful.py Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the last [post with Plan] fidelity gap. The existing test_current_task_tracks_in_progress in test_plan.py covers only initial + add_task branches; this port verifies the post-complete() fallback branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…consistency Parallels the round-2 except-block narrowing on _absorb_for_chain so CancelledError / KeyboardInterrupt / SystemExit propagate correctly through both error paths in the edit queue. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
70df558 to
b3fe670
Compare
|
Noted from comment 4311954741 — test docstring at test_thread_faithful.py:1492-1503 still says UpdateTaskInput.id + edit queue are unsupported/skipped (stale), and CHANGELOG mentions |
…y baseline Two infra-level changes to the fidelity check: 1. `.github/workflows/lint.yml` — the `Clone upstream vercel/chat at pinned parity tag` step no longer carries `continue-on-error: true`. The clone is infrastructure the fidelity check depends on; if it fails, the job should fail there rather than swallow the failure and hope a later step catches it. Combined with the script-level guard from the previous commit, this is defense in depth. 2. CI now runs `scripts/verify_test_fidelity.py --strict`. Every `[post with Plan]` test that was baselined is now ported (PR #75 and PR #74 in the 0.4.26.2 bundle), so the repo ships at 0 missing. `scripts/fidelity_baseline.json` is reduced to `{"missing": {}}` (metadata retained so `--update-baseline` and the documented workflow still function for future upstream syncs). Closes self-review gap #2 on #72.
Summary
[post with Plan]tests to test_thread_faithful.py #55. Feature insrc/chat_sdk/plan.pyis already ported; this closes the test-parity gap intests/test_thread_faithful.pyby adding aTestPostWithPlanclass with all 19it(...)cases from the upstreamdescribe("post with Plan", ...)block inthread.test.ts.it(...)title soverify_test_fidelity.pymatches.AsyncMockused forpost_object/edit_object(absent by default onMockAdapter) and for theedit_objecterror side-effects.Behavior gaps surfaced (5 tests skipped)
Porting the tests exposed 3 real divergences between Python
plan.pyand upstream TS. Those 5 tests are scaffolded (so fidelity counts match) butpytest.skip-ed with a pointer back to #55 rather than silently drifting. Worth a follow-up issue, but intentionally out of scope for this test-port-only PR per the issue's "Any behavior gaps discovered during translation filed as separate bugs" note:UpdateTaskInputhas noidfield (skipstest_should_update_a_specific_task_by_id,test_should_return_null_when_updating_by_nonexistent_id). UpstreamUpdateTaskInput = PlanContent | { id?, output?, status? }; Python dataclass is{output?, status?}only, soupdate_task({"id": ...})falls through thePlanContentbranch and setscurrent.output = dict._enqueue_editswallows adapter errors (skipstest_should_propagate_editobject_errors_from_addtask,test_should_continue_accepting_edits_after_a_failed_edit). Upstream returns the chained promise soaddTaskrejects wheneditObjectrejects; Python catches the exception in theexceptblock and only logs a warning.asyncio.gather(skipstest_should_ensure_sequential_edits_via_queue). TS builds the chain synchronously via.then; Python awaits the previousupdate_chainthen re-binds it, which lets concurrent callers read a stale chain and schedule_do_edittasks out of order. Reproduced locally:[1, 3, 2]instead of[1, 2, 3].The other 14 tests pass against current
plan.py.Fidelity impact
[post with Plan]gaps remaining (out of the 19 enumerated in Port 20 missing[post with Plan]tests to test_thread_faithful.py #55; the other 9 were already phantom-matched but kept flagged).[post with Plan]gaps closed.thread.test.tsdrops from 29 missing to 10 missing (all[Streaming]-family, tracked elsewhere).Test plan
uv run ruff check src/ tests/ scripts/— cleanuv run ruff format --check src/ tests/ scripts/— 192 files already formatteduv run pyrefly check— 0 errorsuv run python scripts/audit_test_quality.py— 0 hard failures (pre-existing warnings only)uv run python scripts/verify_test_fidelity.py— all 19[post with Plan]gaps closeduv run pytest tests/ --tb=short -q— 3559 passed, 7 skipped (2 pre-existing + 5 new gap skips)🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Tests
Documentation