Skip to content

test: add loop protocol and max_remembers unit tests#24

Merged
AlexanderGardiner merged 1 commit intoCodelab-Davis:mainfrom
kavishkartha05:kavish/loop-unit-tests
Apr 18, 2026
Merged

test: add loop protocol and max_remembers unit tests#24
AlexanderGardiner merged 1 commit intoCodelab-Davis:mainfrom
kavishkartha05:kavish/loop-unit-tests

Conversation

@kavishkartha05
Copy link
Copy Markdown
Contributor

@kavishkartha05 kavishkartha05 commented Apr 18, 2026

Pull Request

test: add loop protocol and max_remembers unit tests

Summary

Adds offline unit tests for the core orchestrator loop covering all protocol exit paths (DONE, REMEMBER, CLARIFY, no-marker) and the max_remembers guard. The loop was the most critical component with zero unit test coverage.

Closes #11 and #12

PR Size

  • [✔] Small (< 100 LOC delta)
  • Medium (100-499 LOC delta)
  • Large (500-999 LOC delta)
  • XL (1000+ LOC delta — must have been pre-discussed in an issue)

PR Type

  • Bug fix
  • New feature
  • Prompt change (prompt label required; eval results required below)
  • [✔] Test
  • Docs
  • Refactor / chore

Context / Motivation

The orchestrator loop in loop.py drives all of Anchor's core behavior but had no unit tests. ScriptedModel and FakeMemoryStore were already available in unit_harness.py to support fully offline testing, so this PR puts them to use on the most critical path in the codebase.

Changes

Added tests/unit/test_loop.py with 6 unit tests covering: DONE on first call, single REMEMBER cycle, chained REMEMBER cycles, CLARIFY path, no-marker error path (stop_reason="error", kind="done"), and max_remembers guard exit

How to test

  1. uv run pytest -m unit -v
  2. All 6 tests in test_loop.py should pass with no Ollama or network dependency

Prompt Change Eval Results

Full eval output (optional)
<paste here>

Checklist

  • [✔] I ran the relevant local checks.
  • [✔] I updated documentation or confirmed no docs changes were needed.
  • [✔] I linked related issues or pull requests and confirmed this is not duplicate work.
  • [✔] This change stays within the stated scope of the PR.

Notes for reviewers

Call out any tradeoffs, follow-up work, or setup needed to validate the change.

Summary by CodeRabbit

  • Tests
    • Added comprehensive unit tests for loop control flow validation, including termination states, clarification requests, error conditions, and memory limits.

@github-actions
Copy link
Copy Markdown

This pull request needs a small update before review:

  • Pull requests must complete every checklist item in the template.
  • Add linked context such as Closes #123, Related to #123, or Supersedes #123 in the PR template.

@github-actions github-actions Bot added needs: compliance The author or reporter needs to fix template or policy gaps. needs: linked-context A pull request needs a linked issue or related pull request. area: tests Automated tests and test infrastructure. labels Apr 18, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 18, 2026

📝 Walkthrough

Walkthrough

Added a new unit test module that validates the Anchor loop protocol behavior with scripted AI responses. Tests exercise DONE, CLARIFY, and error marker paths, REMEMBER cycles, and MAX_REMEMBERS limit enforcement, ensuring deterministic coverage of core Phase 1 loop behavior.

Changes

Cohort / File(s) Summary
Loop Protocol Tests
tests/unit/test_loop.py
New test suite exercising loop protocol markers: validates DONE termination, CLARIFY queries, error handling for missing markers, REMEMBER cycles, and MAX_REMEMBERS enforcement using scripted AI functions.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

  • Issue #12: Directly related—both changes validate MAX_REMEMBERS guard behavior by mutating the limit and asserting stop_reason == "max_remembers" on exceeded threshold.

Poem

🐰 A loop that learns, then learns once more,
Until the markers say "no more"—
DONE, CLARIFY, REMEMBER's dance,
Now tested well, by chance and plan!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding unit tests for loop protocol and max_remembers behavior.
Linked Issues check ✅ Passed The PR addresses all coding requirements from issue #11: tests for DONE, CLARIFY, error paths, REMEMBER cycles, and max_remembers guard with deterministic offline testing.
Out of Scope Changes check ✅ Passed All changes are scoped to the new test module; no refactoring of core loop logic or out-of-scope additions detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
tests/unit/test_loop.py (3)

69-75: Add explicit unrecognized-marker error test.

The objective calls out missing and unrecognized markers. This file currently covers missing marker only.

Suggested additional test
 `@pytest.mark.unit`
 def test_no_marker_error_path() -> None:
     # no protocol marker => loop exits with stop_reason="error"; kind is still "done"
     anchor = _make_anchor(["Just some text with no protocol marker."])
     result = anchor.run("What is this?")
     assert result.kind == "done"
     assert result.stop_reason == "error"
+
+
+@pytest.mark.unit
+def test_unrecognized_marker_error_path() -> None:
+    anchor = _make_anchor(["I think this is final.\nFINISH"])
+    result = anchor.run("What is this?")
+    assert result.kind == "done"
+    assert result.stop_reason == "error"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/test_loop.py` around lines 69 - 75, Add a new unit test that
mirrors test_no_marker_error_path but supplies an unrecognized protocol marker
string to _make_anchor (e.g., a line starting with a wrong marker token) and
calls anchor.run("..."); assert the result.stop_reason == "error" and
result.kind == "done" to explicitly cover the unrecognized-marker error path;
place the new test alongside test_no_marker_error_path and reuse _make_anchor
and anchor.run to keep behavior consistent.

77-91: Consider asserting content on MAX_REMEMBERS exit.

This path strips REMEMBER before returning. Adding a content assertion makes that contract explicit and guards against marker leakage.

Suggested assertion
 def test_max_remembers_guard() -> None:
@@
     result = anchor.run("Something that requires many lookups.")
     assert result.kind == "done"
     assert result.stop_reason == "max_remembers"
+    assert result.content == "GAP: gap3\nCONTEXT: ctx3"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/test_loop.py` around lines 77 - 91, The test
test_max_remembers_guard currently only checks result.kind and
result.stop_reason but does not assert that the returned content has had the
"REMEMBER" marker stripped; update the test to also assert that result.content
(or the field holding the returned text) does not contain the "REMEMBER" token
and matches the expected stripped output after MAX_REMEMBERS triggers; locate
the test function test_max_remembers_guard and the anchor fixture (anchor,
MAX_REMEMBERS, run) and add an assertion such as verifying "REMEMBER" not in
result.content and/or that result.content equals the expected
concatenated/contextual text without the marker.

9-56: Strengthen REMEMBER tests by asserting decompose/light-model calls.

Right now these tests can still pass even if REMEMBER stops triggering the expected decompose flow in src/anchor/loop.py. Since ScriptedModel tracks calls, assert call counts in both REMEMBER tests.

Proposed test hardening
 def test_single_remember_then_done() -> None:
     anchor = _make_anchor(
         ai_responses=[
             "GAP: what is X?\nCONTEXT: figuring out X\nREMEMBER",
             "The answer is X.\nDONE",
         ],
         light_ai_responses=["query about X"],
     )
     result = anchor.run("Tell me about X.")
     assert result.kind == "done"
     assert result.stop_reason == "done"
     assert result.content == "The answer is X."
+    assert len(anchor.light_ai_fn.calls) == 1
 
@@
 def test_two_remembers_then_done() -> None:
     anchor = _make_anchor(
         ai_responses=[
             "GAP: what is X?\nCONTEXT: figuring out X\nREMEMBER",
             "GAP: what is Y?\nCONTEXT: still working through it\nREMEMBER",
             "The final answer.\nDONE",
         ],
         light_ai_responses=["query about X", "query about Y"],
     )
     result = anchor.run("Tell me about X and Y.")
     assert result.kind == "done"
     assert result.stop_reason == "done"
     assert result.content == "The final answer."
+    assert len(anchor.light_ai_fn.calls) == 2
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/test_loop.py` around lines 9 - 56, The REMEMBER tests should
assert the scripted model call counts so the decompose/light-model flow is
exercised: modify test_single_remember_then_done and
test_two_remembers_then_done to construct ScriptedModel instances explicitly
(instead of using _make_anchor shorthand) so you can inspect their call
counters, pass them into Anchor (ai_fn and light_ai_fn), call Anchor.run, then
assert the light model was called once in the single-REMEMBER test and twice in
the two-REMEMBER test, and that the main ai model was called the expected number
of times (2 for single, 3 for two); update or add assertions referencing the
ScriptedModel instances and Anchor.run to validate these call counts.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/unit/test_loop.py`:
- Around line 69-75: Add a new unit test that mirrors test_no_marker_error_path
but supplies an unrecognized protocol marker string to _make_anchor (e.g., a
line starting with a wrong marker token) and calls anchor.run("..."); assert the
result.stop_reason == "error" and result.kind == "done" to explicitly cover the
unrecognized-marker error path; place the new test alongside
test_no_marker_error_path and reuse _make_anchor and anchor.run to keep behavior
consistent.
- Around line 77-91: The test test_max_remembers_guard currently only checks
result.kind and result.stop_reason but does not assert that the returned content
has had the "REMEMBER" marker stripped; update the test to also assert that
result.content (or the field holding the returned text) does not contain the
"REMEMBER" token and matches the expected stripped output after MAX_REMEMBERS
triggers; locate the test function test_max_remembers_guard and the anchor
fixture (anchor, MAX_REMEMBERS, run) and add an assertion such as verifying
"REMEMBER" not in result.content and/or that result.content equals the expected
concatenated/contextual text without the marker.
- Around line 9-56: The REMEMBER tests should assert the scripted model call
counts so the decompose/light-model flow is exercised: modify
test_single_remember_then_done and test_two_remembers_then_done to construct
ScriptedModel instances explicitly (instead of using _make_anchor shorthand) so
you can inspect their call counters, pass them into Anchor (ai_fn and
light_ai_fn), call Anchor.run, then assert the light model was called once in
the single-REMEMBER test and twice in the two-REMEMBER test, and that the main
ai model was called the expected number of times (2 for single, 3 for two);
update or add assertions referencing the ScriptedModel instances and Anchor.run
to validate these call counts.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: b1625342-eacf-4f1f-93c1-911d3b44ddb6

📥 Commits

Reviewing files that changed from the base of the PR and between 11610b7 and d9f1d0d.

📒 Files selected for processing (1)
  • tests/unit/test_loop.py

Copy link
Copy Markdown
Collaborator

@AlexanderGardiner AlexanderGardiner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Everything looks good, I'll merge it shortly. Also, make sure to claim any future issues you are working on with /claim.

@AlexanderGardiner AlexanderGardiner merged commit 8f9ac15 into Codelab-Davis:main Apr 18, 2026
6 of 8 checks passed
@coderabbitai coderabbitai Bot mentioned this pull request Apr 21, 2026
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: tests Automated tests and test infrastructure. needs: compliance The author or reporter needs to fix template or policy gaps. needs: linked-context A pull request needs a linked issue or related pull request.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[task] Add loop protocol contract tests

2 participants