test: add loop protocol and max_remembers unit tests#24
Conversation
|
This pull request needs a small update before review:
|
📝 WalkthroughWalkthroughAdded a new unit test module that validates the Anchor loop protocol behavior with scripted AI responses. Tests exercise DONE, CLARIFY, and error marker paths, REMEMBER cycles, and MAX_REMEMBERS limit enforcement, ensuring deterministic coverage of core Phase 1 loop behavior. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
tests/unit/test_loop.py (3)
69-75: Add explicit unrecognized-marker error test.The objective calls out missing and unrecognized markers. This file currently covers missing marker only.
Suggested additional test
`@pytest.mark.unit` def test_no_marker_error_path() -> None: # no protocol marker => loop exits with stop_reason="error"; kind is still "done" anchor = _make_anchor(["Just some text with no protocol marker."]) result = anchor.run("What is this?") assert result.kind == "done" assert result.stop_reason == "error" + + +@pytest.mark.unit +def test_unrecognized_marker_error_path() -> None: + anchor = _make_anchor(["I think this is final.\nFINISH"]) + result = anchor.run("What is this?") + assert result.kind == "done" + assert result.stop_reason == "error"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/test_loop.py` around lines 69 - 75, Add a new unit test that mirrors test_no_marker_error_path but supplies an unrecognized protocol marker string to _make_anchor (e.g., a line starting with a wrong marker token) and calls anchor.run("..."); assert the result.stop_reason == "error" and result.kind == "done" to explicitly cover the unrecognized-marker error path; place the new test alongside test_no_marker_error_path and reuse _make_anchor and anchor.run to keep behavior consistent.
77-91: Consider asserting content on MAX_REMEMBERS exit.This path strips
REMEMBERbefore returning. Adding a content assertion makes that contract explicit and guards against marker leakage.Suggested assertion
def test_max_remembers_guard() -> None: @@ result = anchor.run("Something that requires many lookups.") assert result.kind == "done" assert result.stop_reason == "max_remembers" + assert result.content == "GAP: gap3\nCONTEXT: ctx3"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/test_loop.py` around lines 77 - 91, The test test_max_remembers_guard currently only checks result.kind and result.stop_reason but does not assert that the returned content has had the "REMEMBER" marker stripped; update the test to also assert that result.content (or the field holding the returned text) does not contain the "REMEMBER" token and matches the expected stripped output after MAX_REMEMBERS triggers; locate the test function test_max_remembers_guard and the anchor fixture (anchor, MAX_REMEMBERS, run) and add an assertion such as verifying "REMEMBER" not in result.content and/or that result.content equals the expected concatenated/contextual text without the marker.
9-56: Strengthen REMEMBER tests by asserting decompose/light-model calls.Right now these tests can still pass even if REMEMBER stops triggering the expected decompose flow in
src/anchor/loop.py. SinceScriptedModeltracks calls, assert call counts in both REMEMBER tests.Proposed test hardening
def test_single_remember_then_done() -> None: anchor = _make_anchor( ai_responses=[ "GAP: what is X?\nCONTEXT: figuring out X\nREMEMBER", "The answer is X.\nDONE", ], light_ai_responses=["query about X"], ) result = anchor.run("Tell me about X.") assert result.kind == "done" assert result.stop_reason == "done" assert result.content == "The answer is X." + assert len(anchor.light_ai_fn.calls) == 1 @@ def test_two_remembers_then_done() -> None: anchor = _make_anchor( ai_responses=[ "GAP: what is X?\nCONTEXT: figuring out X\nREMEMBER", "GAP: what is Y?\nCONTEXT: still working through it\nREMEMBER", "The final answer.\nDONE", ], light_ai_responses=["query about X", "query about Y"], ) result = anchor.run("Tell me about X and Y.") assert result.kind == "done" assert result.stop_reason == "done" assert result.content == "The final answer." + assert len(anchor.light_ai_fn.calls) == 2🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/test_loop.py` around lines 9 - 56, The REMEMBER tests should assert the scripted model call counts so the decompose/light-model flow is exercised: modify test_single_remember_then_done and test_two_remembers_then_done to construct ScriptedModel instances explicitly (instead of using _make_anchor shorthand) so you can inspect their call counters, pass them into Anchor (ai_fn and light_ai_fn), call Anchor.run, then assert the light model was called once in the single-REMEMBER test and twice in the two-REMEMBER test, and that the main ai model was called the expected number of times (2 for single, 3 for two); update or add assertions referencing the ScriptedModel instances and Anchor.run to validate these call counts.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/unit/test_loop.py`:
- Around line 69-75: Add a new unit test that mirrors test_no_marker_error_path
but supplies an unrecognized protocol marker string to _make_anchor (e.g., a
line starting with a wrong marker token) and calls anchor.run("..."); assert the
result.stop_reason == "error" and result.kind == "done" to explicitly cover the
unrecognized-marker error path; place the new test alongside
test_no_marker_error_path and reuse _make_anchor and anchor.run to keep behavior
consistent.
- Around line 77-91: The test test_max_remembers_guard currently only checks
result.kind and result.stop_reason but does not assert that the returned content
has had the "REMEMBER" marker stripped; update the test to also assert that
result.content (or the field holding the returned text) does not contain the
"REMEMBER" token and matches the expected stripped output after MAX_REMEMBERS
triggers; locate the test function test_max_remembers_guard and the anchor
fixture (anchor, MAX_REMEMBERS, run) and add an assertion such as verifying
"REMEMBER" not in result.content and/or that result.content equals the expected
concatenated/contextual text without the marker.
- Around line 9-56: The REMEMBER tests should assert the scripted model call
counts so the decompose/light-model flow is exercised: modify
test_single_remember_then_done and test_two_remembers_then_done to construct
ScriptedModel instances explicitly (instead of using _make_anchor shorthand) so
you can inspect their call counters, pass them into Anchor (ai_fn and
light_ai_fn), call Anchor.run, then assert the light model was called once in
the single-REMEMBER test and twice in the two-REMEMBER test, and that the main
ai model was called the expected number of times (2 for single, 3 for two);
update or add assertions referencing the ScriptedModel instances and Anchor.run
to validate these call counts.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: b1625342-eacf-4f1f-93c1-911d3b44ddb6
📒 Files selected for processing (1)
tests/unit/test_loop.py
AlexanderGardiner
left a comment
There was a problem hiding this comment.
Thanks for the PR! Everything looks good, I'll merge it shortly. Also, make sure to claim any future issues you are working on with /claim.
Pull Request
test: add loop protocol and max_remembers unit testsSummary
Adds offline unit tests for the core orchestrator loop covering all protocol exit paths (DONE, REMEMBER, CLARIFY, no-marker) and the max_remembers guard. The loop was the most critical component with zero unit test coverage.
Closes #11 and #12
PR Size
PR Type
promptlabel required; eval results required below)Context / Motivation
The orchestrator loop in
loop.pydrives all of Anchor's core behavior but had no unit tests.ScriptedModelandFakeMemoryStorewere already available inunit_harness.pyto support fully offline testing, so this PR puts them to use on the most critical path in the codebase.Changes
Added
tests/unit/test_loop.pywith 6 unit tests covering: DONE on first call, single REMEMBER cycle, chained REMEMBER cycles, CLARIFY path, no-marker error path (stop_reason="error", kind="done"), and max_remembers guard exitHow to test
uv run pytest -m unit -vtest_loop.pyshould pass with no Ollama or network dependencyPrompt Change Eval Results
Full eval output (optional)
Checklist
Notes for reviewers
Call out any tradeoffs, follow-up work, or setup needed to validate the change.
Summary by CodeRabbit