test: add loop protocol and max_remembers unit tests by kavishkartha05 · Pull Request #24 · Codelab-Davis/Anchor

kavishkartha05 · 2026-04-18T15:13:53Z

Pull Request

test: add loop protocol and max_remembers unit tests

Summary

Adds offline unit tests for the core orchestrator loop covering all protocol exit paths (DONE, REMEMBER, CLARIFY, no-marker) and the max_remembers guard. The loop was the most critical component with zero unit test coverage.

Closes #11 and #12

PR Size

[✔] Small (< 100 LOC delta)
Medium (100-499 LOC delta)
Large (500-999 LOC delta)
XL (1000+ LOC delta — must have been pre-discussed in an issue)

PR Type

Context / Motivation

The orchestrator loop in loop.py drives all of Anchor's core behavior but had no unit tests. ScriptedModel and FakeMemoryStore were already available in unit_harness.py to support fully offline testing, so this PR puts them to use on the most critical path in the codebase.

Changes

Added tests/unit/test_loop.py with 6 unit tests covering: DONE on first call, single REMEMBER cycle, chained REMEMBER cycles, CLARIFY path, no-marker error path (stop_reason="error", kind="done"), and max_remembers guard exit

How to test

uv run pytest -m unit -v
All 6 tests in test_loop.py should pass with no Ollama or network dependency

Prompt Change Eval Results

Full eval output (optional)

<paste here>

Checklist

[✔] I ran the relevant local checks.
[✔] I updated documentation or confirmed no docs changes were needed.
[✔] I linked related issues or pull requests and confirmed this is not duplicate work.
[✔] This change stays within the stated scope of the PR.

Notes for reviewers

Call out any tradeoffs, follow-up work, or setup needed to validate the change.

Summary by CodeRabbit

Tests
- Added comprehensive unit tests for loop control flow validation, including termination states, clarification requests, error conditions, and memory limits.

github-actions · 2026-04-18T15:14:02Z

This pull request needs a small update before review:

Pull requests must complete every checklist item in the template.
Add linked context such as Closes #123, Related to #123, or Supersedes #123 in the PR template.

coderabbitai · 2026-04-18T15:14:07Z

📝 Walkthrough

Walkthrough

Added a new unit test module that validates the Anchor loop protocol behavior with scripted AI responses. Tests exercise DONE, CLARIFY, and error marker paths, REMEMBER cycles, and MAX_REMEMBERS limit enforcement, ensuring deterministic coverage of core Phase 1 loop behavior.

Changes

Cohort / File(s)	Summary
Loop Protocol Tests `tests/unit/test_loop.py`	New test suite exercising loop protocol markers: validates DONE termination, CLARIFY queries, error handling for missing markers, REMEMBER cycles, and MAX_REMEMBERS enforcement using scripted AI functions.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

Issue #12: Directly related—both changes validate MAX_REMEMBERS guard behavior by mutating the limit and asserting stop_reason == "max_remembers" on exceeded threshold.

Poem

🐰 A loop that learns, then learns once more,
Until the markers say "no more"—
DONE, CLARIFY, REMEMBER's dance,
Now tested well, by chance and plan! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adding unit tests for loop protocol and max_remembers behavior.
Linked Issues check	✅ Passed	The PR addresses all coding requirements from issue `#11`: tests for DONE, CLARIFY, error paths, REMEMBER cycles, and max_remembers guard with deterministic offline testing.
Out of Scope Changes check	✅ Passed	All changes are scoped to the new test module; no refactoring of core loop logic or out-of-scope additions detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (3)

tests/unit/test_loop.py (3)

69-75: Add explicit unrecognized-marker error test.

The objective calls out missing and unrecognized markers. This file currently covers missing marker only.

Suggested additional test

 `@pytest.mark.unit`
 def test_no_marker_error_path() -> None:
     # no protocol marker => loop exits with stop_reason="error"; kind is still "done"
     anchor = _make_anchor(["Just some text with no protocol marker."])
     result = anchor.run("What is this?")
     assert result.kind == "done"
     assert result.stop_reason == "error"
+
+
+@pytest.mark.unit
+def test_unrecognized_marker_error_path() -> None:
+    anchor = _make_anchor(["I think this is final.\nFINISH"])
+    result = anchor.run("What is this?")
+    assert result.kind == "done"
+    assert result.stop_reason == "error"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unit/test_loop.py` around lines 69 - 75, Add a new unit test that
mirrors test_no_marker_error_path but supplies an unrecognized protocol marker
string to _make_anchor (e.g., a line starting with a wrong marker token) and
calls anchor.run("..."); assert the result.stop_reason == "error" and
result.kind == "done" to explicitly cover the unrecognized-marker error path;
place the new test alongside test_no_marker_error_path and reuse _make_anchor
and anchor.run to keep behavior consistent.

77-91: Consider asserting content on MAX_REMEMBERS exit.

This path strips REMEMBER before returning. Adding a content assertion makes that contract explicit and guards against marker leakage.

Suggested assertion

 def test_max_remembers_guard() -> None:
@@
     result = anchor.run("Something that requires many lookups.")
     assert result.kind == "done"
     assert result.stop_reason == "max_remembers"
+    assert result.content == "GAP: gap3\nCONTEXT: ctx3"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unit/test_loop.py` around lines 77 - 91, The test
test_max_remembers_guard currently only checks result.kind and
result.stop_reason but does not assert that the returned content has had the
"REMEMBER" marker stripped; update the test to also assert that result.content
(or the field holding the returned text) does not contain the "REMEMBER" token
and matches the expected stripped output after MAX_REMEMBERS triggers; locate
the test function test_max_remembers_guard and the anchor fixture (anchor,
MAX_REMEMBERS, run) and add an assertion such as verifying "REMEMBER" not in
result.content and/or that result.content equals the expected
concatenated/contextual text without the marker.

9-56: Strengthen REMEMBER tests by asserting decompose/light-model calls.

Right now these tests can still pass even if REMEMBER stops triggering the expected decompose flow in src/anchor/loop.py. Since ScriptedModel tracks calls, assert call counts in both REMEMBER tests.

Proposed test hardening

 def test_single_remember_then_done() -> None:
     anchor = _make_anchor(
         ai_responses=[
             "GAP: what is X?\nCONTEXT: figuring out X\nREMEMBER",
             "The answer is X.\nDONE",
         ],
         light_ai_responses=["query about X"],
     )
     result = anchor.run("Tell me about X.")
     assert result.kind == "done"
     assert result.stop_reason == "done"
     assert result.content == "The answer is X."
+    assert len(anchor.light_ai_fn.calls) == 1
 
@@
 def test_two_remembers_then_done() -> None:
     anchor = _make_anchor(
         ai_responses=[
             "GAP: what is X?\nCONTEXT: figuring out X\nREMEMBER",
             "GAP: what is Y?\nCONTEXT: still working through it\nREMEMBER",
             "The final answer.\nDONE",
         ],
         light_ai_responses=["query about X", "query about Y"],
     )
     result = anchor.run("Tell me about X and Y.")
     assert result.kind == "done"
     assert result.stop_reason == "done"
     assert result.content == "The final answer."
+    assert len(anchor.light_ai_fn.calls) == 2

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unit/test_loop.py` around lines 9 - 56, The REMEMBER tests should
assert the scripted model call counts so the decompose/light-model flow is
exercised: modify test_single_remember_then_done and
test_two_remembers_then_done to construct ScriptedModel instances explicitly
(instead of using _make_anchor shorthand) so you can inspect their call
counters, pass them into Anchor (ai_fn and light_ai_fn), call Anchor.run, then
assert the light model was called once in the single-REMEMBER test and twice in
the two-REMEMBER test, and that the main ai model was called the expected number
of times (2 for single, 3 for two); update or add assertions referencing the
ScriptedModel instances and Anchor.run to validate these call counts.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/unit/test_loop.py`:
- Around line 69-75: Add a new unit test that mirrors test_no_marker_error_path
but supplies an unrecognized protocol marker string to _make_anchor (e.g., a
line starting with a wrong marker token) and calls anchor.run("..."); assert the
result.stop_reason == "error" and result.kind == "done" to explicitly cover the
unrecognized-marker error path; place the new test alongside
test_no_marker_error_path and reuse _make_anchor and anchor.run to keep behavior
consistent.
- Around line 77-91: The test test_max_remembers_guard currently only checks
result.kind and result.stop_reason but does not assert that the returned content
has had the "REMEMBER" marker stripped; update the test to also assert that
result.content (or the field holding the returned text) does not contain the
"REMEMBER" token and matches the expected stripped output after MAX_REMEMBERS
triggers; locate the test function test_max_remembers_guard and the anchor
fixture (anchor, MAX_REMEMBERS, run) and add an assertion such as verifying
"REMEMBER" not in result.content and/or that result.content equals the expected
concatenated/contextual text without the marker.
- Around line 9-56: The REMEMBER tests should assert the scripted model call
counts so the decompose/light-model flow is exercised: modify
test_single_remember_then_done and test_two_remembers_then_done to construct
ScriptedModel instances explicitly (instead of using _make_anchor shorthand) so
you can inspect their call counters, pass them into Anchor (ai_fn and
light_ai_fn), call Anchor.run, then assert the light model was called once in
the single-REMEMBER test and twice in the two-REMEMBER test, and that the main
ai model was called the expected number of times (2 for single, 3 for two);
update or add assertions referencing the ScriptedModel instances and Anchor.run
to validate these call counts.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: b1625342-eacf-4f1f-93c1-911d3b44ddb6

📥 Commits

Reviewing files that changed from the base of the PR and between 11610b7 and d9f1d0d.

📒 Files selected for processing (1)

tests/unit/test_loop.py

AlexanderGardiner

Thanks for the PR! Everything looks good, I'll merge it shortly. Also, make sure to claim any future issues you are working on with /claim.

test: add loop protocol and max_remembers unit tests

d9f1d0d

kavishkartha05 requested a review from AlexanderGardiner as a code owner April 18, 2026 15:13

github-actions Bot added needs: compliance The author or reporter needs to fix template or policy gaps. needs: linked-context A pull request needs a linked issue or related pull request. area: tests Automated tests and test infrastructure. labels Apr 18, 2026

coderabbitai Bot reviewed Apr 18, 2026

View reviewed changes

AlexanderGardiner approved these changes Apr 18, 2026

View reviewed changes

AlexanderGardiner merged commit 8f9ac15 into Codelab-Davis:main Apr 18, 2026
6 of 8 checks passed

coderabbitai Bot mentioned this pull request Apr 21, 2026

test: add loop helper parsing tests #26

Merged

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: add loop protocol and max_remembers unit tests#24

test: add loop protocol and max_remembers unit tests#24
AlexanderGardiner merged 1 commit intoCodelab-Davis:mainfrom
kavishkartha05:kavish/loop-unit-tests

kavishkartha05 commented Apr 18, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

github-actions Bot commented Apr 18, 2026

Uh oh!

coderabbitai Bot commented Apr 18, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

AlexanderGardiner left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kavishkartha05 commented Apr 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request

Summary

PR Size

PR Type

Context / Motivation

Changes

How to test

Prompt Change Eval Results

Checklist

Notes for reviewers

Summary by CodeRabbit

Uh oh!

github-actions Bot commented Apr 18, 2026

Uh oh!

coderabbitai Bot commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

AlexanderGardiner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kavishkartha05 commented Apr 18, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 18, 2026 •

edited

Loading