Skip to content

test(plan): port 19 [post with Plan] tests (#55)#75

Merged
patrick-chinchill merged 5 commits into
mainfrom
feat/plan-tests-55
Apr 24, 2026
Merged

test(plan): port 19 [post with Plan] tests (#55)#75
patrick-chinchill merged 5 commits into
mainfrom
feat/plan-tests-55

Conversation

@patrick-chinchill
Copy link
Copy Markdown
Collaborator

@patrick-chinchill patrick-chinchill commented Apr 24, 2026

Summary

  • Closes Port 20 missing [post with Plan] tests to test_thread_faithful.py #55. Feature in src/chat_sdk/plan.py is already ported; this closes the test-parity gap in tests/test_thread_faithful.py by adding a TestPostWithPlan class with all 19 it(...) cases from the upstream describe("post with Plan", ...) block in thread.test.ts.
  • Ordering preserved vs upstream; each Python test name snake-cases the TS it(...) title so verify_test_fidelity.py matches. AsyncMock used for post_object / edit_object (absent by default on MockAdapter) and for the edit_object error side-effects.

Behavior gaps surfaced (5 tests skipped)

Porting the tests exposed 3 real divergences between Python plan.py and upstream TS. Those 5 tests are scaffolded (so fidelity counts match) but pytest.skip-ed with a pointer back to #55 rather than silently drifting. Worth a follow-up issue, but intentionally out of scope for this test-port-only PR per the issue's "Any behavior gaps discovered during translation filed as separate bugs" note:

  • UpdateTaskInput has no id field (skips test_should_update_a_specific_task_by_id, test_should_return_null_when_updating_by_nonexistent_id). Upstream UpdateTaskInput = PlanContent | { id?, output?, status? }; Python dataclass is {output?, status?} only, so update_task({"id": ...}) falls through the PlanContent branch and sets current.output = dict.
  • _enqueue_edit swallows adapter errors (skips test_should_propagate_editobject_errors_from_addtask, test_should_continue_accepting_edits_after_a_failed_edit). Upstream returns the chained promise so addTask rejects when editObject rejects; Python catches the exception in the except block and only logs a warning.
  • Edit queue races under asyncio.gather (skips test_should_ensure_sequential_edits_via_queue). TS builds the chain synchronously via .then; Python awaits the previous update_chain then re-binds it, which lets concurrent callers read a stale chain and schedule _do_edit tasks out of order. Reproduced locally: [1, 3, 2] instead of [1, 2, 3].

The other 14 tests pass against current plan.py.

Fidelity impact

$ uv run python -c "import sys; sys.path.insert(0,'scripts'); import verify_test_fidelity as v; \
  print('post with Plan gaps closed:', not any(d=='post with Plan' for d,_,_ in \
    v.check_fidelity('packages/chat/src/thread.test.ts','tests/test_thread_faithful.py')[0]))"
post with Plan gaps closed: True

Test plan

  • uv run ruff check src/ tests/ scripts/ — clean
  • uv run ruff format --check src/ tests/ scripts/ — 192 files already formatted
  • uv run pyrefly check — 0 errors
  • uv run python scripts/audit_test_quality.py — 0 hard failures (pre-existing warnings only)
  • uv run python scripts/verify_test_fidelity.py — all 19 [post with Plan] gaps closed
  • uv run pytest tests/ --tb=short -q — 3559 passed, 7 skipped (2 pre-existing + 5 new gap skips)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Task updates can target a specific task by ID (returns None for unknown IDs); queued concurrent edits now preserve FIFO ordering; adapter edit failures when applying plan edits propagate to callers instead of being swallowed.
  • Tests

    • Added end-to-end tests covering posting, plan/task lifecycle, queued concurrent edits, failure propagation, and message/content variants.
  • Documentation

    • Updated changelog to document the new tests and revised plan semantics.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

Warning

Rate limit exceeded

@patrick-chinchill has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 10 minutes and 34 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 10 minutes and 34 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 93840a11-93c1-4f4d-b45b-6a88c11d2e84

📥 Commits

Reviewing files that changed from the base of the PR and between fa81da6 and b3fe670.

📒 Files selected for processing (4)
  • CHANGELOG.md
  • src/chat_sdk/plan.py
  • tests/test_plan.py
  • tests/test_thread_faithful.py
📝 Walkthrough

Walkthrough

Adds upstream “[post with Plan]” tests and updates Plan behavior: targeted task updates by id, synchronous FIFO edit chaining to prevent out-of-order edits under concurrency, and propagation of adapter edit failures to callers; tests and CHANGELOG updated accordingly.

Changes

Cohort / File(s) Summary
Changelog
CHANGELOG.md
Adds an Unreleased entry documenting test parity additions and three Plan runtime changes: task-id targeting, edit propagation, and sequential edit queue chaining.
Plan implementation
src/chat_sdk/plan.py
Adds `UpdateTaskInput.id: str
Tests — faithful suite
tests/test_thread_faithful.py
Adds TestPostWithPlan covering fallback vs native adapter posting, add_task/update_task/complete flows (including id-targeting, non-existent-id -> None, last in-progress fallback), initial message formats, concurrency/FIFO queueing, pre-post no-op/None semantics, and edit failure propagation and recovery.
Tests — existing plan tests
tests/test_plan.py
Updates expectations: adapter edit_object failures from add_task are expected to propagate to caller (test uses pytest.raises), still asserts an internal logged warning, and verifies subsequent successful edits persist.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant Plan
  participant Queue as "Edit Queue (chained tail)"
  participant Adapter

  Client->>Plan: add_task(input)
  Note right of Plan: capture current Queue tail synchronously
  Plan->>Queue: attach chained runner (returns chained promise)
  Client->>Plan: await chained promise
  Queue->>Adapter: edit_object(plan_payload)
  Adapter-->>Queue: success / error
  alt success
    Queue->>Plan: apply adapter result -> update model
    Plan-->>Client: resolve chained promise with updated task
  else error
    Queue->>Plan: log internal warning (internal tail swallows)
    Plan-->>Client: reject chained promise with adapter error
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through chains of edits and tests,

Tasks now find their names and rest.
Queue lined up, no scramble or tumble,
Errors shouted — then we re-assemble.
Hooray, the burrow’s tidy and blessed.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 13.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title accurately summarizes the main change: porting 19 [post with Plan] tests to test_thread_faithful.py with the upstream issue reference (#55).
Linked Issues check ✅ Passed PR successfully translates the missing [post with Plan] test block; 19 of 20 tests added (14 passing, 5 scaffolded/skipped for known behavioral gaps); satisfies core objective to close TS-Python fidelity gaps via test porting.
Out of Scope Changes check ✅ Passed All changes directly support test porting: CHANGELOG documents three behavioral fixes that enable test parity, plan.py updates support test requirements (id field, error propagation, queue ordering), and test_plan.py adjusts for new propagation semantics.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/plan-tests-55

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request ports 19 tests related to the 'post with Plan' functionality from TypeScript to Python to improve test fidelity, adding a new TestPostWithPlan class and updating the CHANGELOG. Feedback focuses on improving code structure by moving imports to the top of the file, refactoring repetitive setup logic into pytest fixtures, and using exact assertions for mock call counts to make tests more robust.

# it("should post fallback text when adapter does not support plans")
@pytest.mark.asyncio
async def test_should_post_fallback_text_when_adapter_does_not_support_plans(self):
from chat_sdk.plan import Plan, StartPlanOptions
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The imports from chat_sdk.plan are repeated in almost every test method in this class. To improve maintainability and follow PEP 8, these should be moved to the top of the file.

References
  1. According to PEP 8, imports should always be placed at the top of the file, just after any module comments and docstrings, and before module globals and constants. (link)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferred — test boilerplate helper would save a few lines but masks which mocks each test uses. Tests are intentionally self-contained for faithful upstream parity.

Comment on lines +1511 to +1513
adapter = create_mock_adapter()
state = create_mock_state()
thread = _make_thread(adapter, state)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is significant boilerplate code for setting up the mock adapter and state in each test. Consider refactoring this into a helper method or a pytest fixture to reduce duplication and improve readability.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferred — same reason: explicit setup per test preserves readability and 1:1 correspondence with upstream TS test bodies.

assert task is not None
assert task.title == "Fetch data"
assert task.status == "in_progress"
assert edit_object.await_count >= 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The assertion assert edit_object.await_count >= 1 is loose. Since add_task is awaited and it internally awaits the edit task, the count should be exactly 1 here. Using exact counts makes the test more robust against unexpected extra calls.

Suggested change
assert edit_object.await_count >= 1
assert edit_object.await_count == 1

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferred — await_count >= 1 is the right assertion: the test asserts editObject was called, and upstream has no strict count requirement. Tighter would diverge from upstream intent.

updated = await plan.update_task("Got result: 42")

assert updated is not None
assert edit_object.await_count >= 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the previous comment, the assertion assert edit_object.await_count >= 1 should be exact. In this test, add_task and update_task are both called and awaited, so the count should be exactly 2.

Suggested change
assert edit_object.await_count >= 1
assert edit_object.await_count == 2

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferred — same as above.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/test_thread_faithful.py (1)

2917-2925: Replace assert True stubs with pytest.skip() for clarity.

These absorber tests violate the coding guideline to avoid assert True stubs. Since JSX is TypeScript-only and has no Python equivalent, using pytest.skip() with an explanation would better communicate intent and satisfy fidelity tracking without creating tests that always pass.

♻️ Proposed refactor
 class TestMissingAbsorbers:
     """Fidelity-check absorbers for TS test names that have no Python equivalent."""

     # JSX-specific tests: Python has no JSX runtime, so these remain as absorbers.
     # See TestSchedule for explanatory comments on why these are not portable.
     def test_should_convert_jsx_card_elements_to_cardelement_before_passing_to_adapter(self):
-        assert True
+        pytest.skip("JSX Card/CardElement conversion is TypeScript-only; no Python equivalent")

     def test_should_convert_card_jsx_with_children_to_cardelement(self):
-        assert True
+        pytest.skip("JSX Card rendering with children is TypeScript-only; no Python equivalent")

As per coding guidelines: "Every test must fail when the code is wrong. Avoid assert True stubs."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_thread_faithful.py` around lines 2917 - 2925, Replace the `assert
True` stub in the absorber test inside class TestMissingAbsorbers (method
test_should_convert_jsx_card_elements_to_cardelement_before_passing_to_adapter)
with a proper skip: import pytest if not already present and call
pytest.skip(...) with a concise explanation that this test is
TypeScript/JSX-only and has no Python equivalent; ensure the skip message
clearly states why the test is skipped to satisfy the fidelity guideline.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/test_thread_faithful.py`:
- Around line 2917-2925: Replace the `assert True` stub in the absorber test
inside class TestMissingAbsorbers (method
test_should_convert_jsx_card_elements_to_cardelement_before_passing_to_adapter)
with a proper skip: import pytest if not already present and call
pytest.skip(...) with a concise explanation that this test is
TypeScript/JSX-only and has no Python equivalent; ensure the skip message
clearly states why the test is skipped to satisfy the fidelity guideline.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: e718b771-e977-4028-b472-a7dfe8b563f2

📥 Commits

Reviewing files that changed from the base of the PR and between d23b6d9 and 1826a71.

📒 Files selected for processing (2)
  • CHANGELOG.md
  • tests/test_thread_faithful.py

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1826a710e6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tests/test_thread_faithful.py Outdated
# GAP: Python ``UpdateTaskInput`` has no ``id`` field; the TS
# variant accepts ``{ id, output?, status? }`` to target a specific
# task. Skipping until ``plan.py`` grows id-based lookup.
pytest.skip("Python UpdateTaskInput has no 'id' field (gap vs TS); track via follow-up to #55")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep parity cases executable instead of unconditional skips

These new "post with Plan" parity tests are hard-skipped, which means the most important gap scenarios never execute in CI and can silently stay stale even after plan.py behavior changes. In practice this undermines the stated fidelity closure because future fixes/regressions in these paths will not be detected. Prefer xfail (ideally strict) or assertions of current behavior so the test still runs and signals when the gap is resolved.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 6ada545 — the 5 previously-skipped tests were unskipped after fixing 3 upstream-parity gaps (UpdateTaskInput.id, error propagation, edit queue race).

Comment thread src/chat_sdk/plan.py
# swapped the chain out with a rejecting future — the
# suppression keeps the parity guarantee defensive.)
with contextlib.suppress(BaseException):
await prev
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

False positive — await prev under contextlib.suppress(...) has real side effects (drives the previous coroutine to completion).

Comment thread src/chat_sdk/plan.py Fixed
Comment thread tests/test_thread_faithful.py Fixed
Comment thread src/chat_sdk/plan.py Fixed
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/test_thread_faithful.py (1)

1487-1504: Stale docstring: remove or update references to now-fixed gaps.

The docstring describes three behavioral divergences that required tests to be skipped:

  • UpdateTaskInput missing id field
  • _enqueue_edit swallowing errors
  • Edit-queue ordering races

Per the commit summary, these gaps have been fixed and the 5 previously-skipped tests are now un-skipped. The docstring should be updated to reflect that these gaps are closed, or the outdated gap descriptions should be removed.

,

📝 Suggested docstring update
     """describe("post with Plan")

     Ported from TS thread.test.ts to close the fidelity gap tracked in `#55`.
-
-    Note: a few tests in this block expose known behavior gaps between the
-    current Python ``Plan`` implementation and the upstream TS version:
-
-    * ``UpdateTaskInput`` in ``plan.py`` has no ``id`` field, so looking up
-      a task by id via ``update_task({"id": ...})`` is not supported.
-    * ``_enqueue_edit`` swallows adapter errors instead of propagating them
-      to the caller (upstream returns the chained promise, which rejects).
-    * The edit chain is rebuilt post-await rather than synchronously, which
-      does not preserve strict ordering under ``asyncio.gather``.
-
-    Those tests are skipped with a pointer back here so the gaps remain
-    visible for a follow-up fix rather than silently drifting.
+
+    All 19 tests pass against the current plan.py implementation.
     """
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_thread_faithful.py` around lines 1487 - 1504, Update the stale
class docstring on TestPostWithPlan: remove or rewrite the bullet points that
claim gaps in UpdateTaskInput (plan.py), _enqueue_edit, and edit-queue ordering
since those issues were fixed and tests un-skipped; keep a brief note if desired
about historical context but not as active bugs, and ensure the docstring
accurately reflects current behavior of UpdateTaskInput, _enqueue_edit, and the
edit ordering (or delete the outdated section entirely).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/test_thread_faithful.py`:
- Around line 1487-1504: Update the stale class docstring on TestPostWithPlan:
remove or rewrite the bullet points that claim gaps in UpdateTaskInput
(plan.py), _enqueue_edit, and edit-queue ordering since those issues were fixed
and tests un-skipped; keep a brief note if desired about historical context but
not as active bugs, and ensure the docstring accurately reflects current
behavior of UpdateTaskInput, _enqueue_edit, and the edit ordering (or delete the
outdated section entirely).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 790a84ff-e5da-4f37-9608-6b3d6a7004da

📥 Commits

Reviewing files that changed from the base of the PR and between 1826a71 and 6ada545.

📒 Files selected for processing (4)
  • CHANGELOG.md
  • src/chat_sdk/plan.py
  • tests/test_plan.py
  • tests/test_thread_faithful.py
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.md

Comment thread src/chat_sdk/plan.py
bound.update_chain = loop.create_task(_absorb_for_chain())
# ``chained`` preserves upstream semantics: exceptions from the
# adapter edit propagate to the caller.
await chained
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

False positive — same as #502 above; await chained has real effect.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/test_thread_faithful.py (1)

3032-3041: Consider using pytest.skip() instead of assert True for fidelity absorbers.

While the docstring clearly documents these as intentional absorbers for JSX-specific tests, using assert True technically violates the coding guideline to avoid assert True stubs. Using pytest.skip("JSX Card not applicable in Python") would be more explicit and align with the pattern used elsewhere for behavioral divergences.

Optional refactor
 class TestMissingAbsorbers:
     """Fidelity-check absorbers for TS test names that have no Python equivalent."""

-    # JSX-specific tests: Python has no JSX runtime, so these remain as absorbers.
-    # See TestSchedule for explanatory comments on why these are not portable.
-    def test_should_convert_jsx_card_elements_to_cardelement_before_passing_to_adapter(self):
-        assert True
+    `@pytest.mark.skip`(reason="JSX Card not applicable in Python")
+    def test_should_convert_jsx_card_elements_to_cardelement_before_passing_to_adapter(self):
+        pass

-    def test_should_convert_card_jsx_with_children_to_cardelement(self):
-        assert True
+    `@pytest.mark.skip`(reason="JSX Card not applicable in Python")
+    def test_should_convert_card_jsx_with_children_to_cardelement(self):
+        pass
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_thread_faithful.py` around lines 3032 - 3041, Replace the assert
True stubs in the TestMissingAbsorbers test methods with explicit pytest.skip
calls: in
test_should_convert_jsx_card_elements_to_cardelement_before_passing_to_adapter
and test_should_convert_card_jsx_with_children_to_cardelement, call
pytest.skip("JSX Card not applicable in Python") at the start of each test and
ensure pytest is imported at the top of the file if not already present.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/chat_sdk/plan.py`:
- Around line 475-476: Replace the overly broad
contextlib.suppress(BaseException) around awaiting the previous task so it only
suppresses regular exceptions; change the suppression to
contextlib.suppress(Exception) (referencing the context where "with
contextlib.suppress(BaseException): await prev" uses the variable prev) so
CancellationError, KeyboardInterrupt and SystemExit can still propagate
consistent with _absorb_for_chain's behavior.

---

Nitpick comments:
In `@tests/test_thread_faithful.py`:
- Around line 3032-3041: Replace the assert True stubs in the
TestMissingAbsorbers test methods with explicit pytest.skip calls: in
test_should_convert_jsx_card_elements_to_cardelement_before_passing_to_adapter
and test_should_convert_card_jsx_with_children_to_cardelement, call
pytest.skip("JSX Card not applicable in Python") at the start of each test and
ensure pytest is imported at the top of the file if not already present.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: c7c487e9-902b-4bfb-b9ae-f691abc3cf9d

📥 Commits

Reviewing files that changed from the base of the PR and between 6ada545 and 1e6c25c.

📒 Files selected for processing (2)
  • src/chat_sdk/plan.py
  • tests/test_thread_faithful.py

Comment thread src/chat_sdk/plan.py Outdated
@patrick-chinchill
Copy link
Copy Markdown
Collaborator Author

Review verdict: comment, no functional blocker found. Refreshed and reviewed latest head 70df558 against main. Plan behavior tests pass, including edit failure propagation and sequential edit queue. Two stale prose issues should be fixed before merge for parity clarity: tests/test_thread_faithful.py lines 1492-1503 still say UpdateTaskInput.id and edit queue/error propagation are unsupported/skipped, but this PR implements and runs those tests; CHANGELOG.md line 11 says StreamingPlan.update_task(input), but that API does not exist here and the fix is for Plan.update_task(input). Verification: tests/test_plan.py plus tests/test_thread_faithful.py passed (183 passed), focused regression pair passed, and ruff passed.

patrick-chinchill and others added 5 commits April 24, 2026 02:48
Feature in src/chat_sdk/plan.py is already ported; this closes the
test-parity gap in tests/test_thread_faithful.py.

Closes #55.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- update_task now honors input.id (was only last in-progress)
- add_task/update_task propagate adapter.edit_object errors
- _enqueue_edit is actually sequential under asyncio.gather

Un-skips the 5 previously-skipped tests in TestPostWithPlan.
Closes the remaining parity gaps in #55.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Internal chain-absorption coroutine now only catches Exception so
  CancelledError / KeyboardInterrupt / SystemExit propagate correctly
- Remove redundant local `import asyncio` in test_thread_faithful.py

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the last [post with Plan] fidelity gap. The existing
test_current_task_tracks_in_progress in test_plan.py covers only initial
+ add_task branches; this port verifies the post-complete() fallback
branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…consistency

Parallels the round-2 except-block narrowing on _absorb_for_chain so
CancelledError / KeyboardInterrupt / SystemExit propagate correctly
through both error paths in the edit queue.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@patrick-chinchill patrick-chinchill merged commit b7065ed into main Apr 24, 2026
10 checks passed
@patrick-chinchill
Copy link
Copy Markdown
Collaborator Author

Noted from comment 4311954741 — test docstring at test_thread_faithful.py:1492-1503 still says UpdateTaskInput.id + edit queue are unsupported/skipped (stale), and CHANGELOG mentions StreamingPlan.update_task which doesn't exist (should just be Plan.update_task). PR already merged (b7065ed); will fix in the release PR.

@patrick-chinchill patrick-chinchill mentioned this pull request Apr 24, 2026
5 tasks
patrick-chinchill added a commit that referenced this pull request Apr 24, 2026
Parity catch-up release for upstream chat@4.26.0. Bundles 8 PRs (#64 #65 #66 #67 #73 #74 #75 #76) + small followup cleanups. See CHANGELOG.md for details.
patrick-chinchill added a commit that referenced this pull request Apr 24, 2026
…y baseline

Two infra-level changes to the fidelity check:

1. `.github/workflows/lint.yml` — the `Clone upstream vercel/chat at
   pinned parity tag` step no longer carries `continue-on-error: true`.
   The clone is infrastructure the fidelity check depends on; if it
   fails, the job should fail there rather than swallow the failure
   and hope a later step catches it. Combined with the script-level
   guard from the previous commit, this is defense in depth.

2. CI now runs `scripts/verify_test_fidelity.py --strict`. Every
   `[post with Plan]` test that was baselined is now ported (PR #75
   and PR #74 in the 0.4.26.2 bundle), so the repo ships at 0 missing.
   `scripts/fidelity_baseline.json` is reduced to `{"missing": {}}`
   (metadata retained so `--update-baseline` and the documented
   workflow still function for future upstream syncs).

Closes self-review gap #2 on #72.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Port 20 missing [post with Plan] tests to test_thread_faithful.py

1 participant