Multi-dimensional AI-assisted testing framework with tracing, signal rewards, and continuous feedback for agent guidance.
Traditional testing is binary: pass or fail. This works for humans who understand nuance, but AI agents need richer signals. VTR (Verifier-Trace-Reward) treats tests as reward functions that produce continuous metrics, structured execution traces, and contextual prompts to guide agents toward correct behavior.
- Continuous signals —
0.0..1.0distance from target, not just pass/fail. Enables gradient-based improvement. - Multi-dimensional rewards — score coverage, latency, memory simultaneously without hiding trade-offs.
- Structured tracing — semantic categories (state / decision / effect / boundary / checkpoint) with depth-rendered call trees.
- Async-safe depth — depth tracked via span ancestry, not thread-local. Correct across tokio task migration.
- Ring-buffer capture — full-verbosity tracing with zero disk I/O until failure dumps the buffer.
- Correlation IDs — follow a logical operation across async / process boundaries.
- "Never done" recurrence — even when all tests pass, VTR suggests coverage gaps, mutation tests, and adversarial inputs.
[dependencies]
vtr = "0.1"
# optional features
# x11 — Xvfb/Xephyr orchestration, XTEST input, EWMH
# stroboscope — full AST-level tracing (expensive, dev only)
# full — x11 + stroboscopeuse vtr::prelude::*;
#[vtr::test]
fn connection_works() -> Result<(), Box<dyn std::error::Error>> {
let session = connect()?;
// Continuous signals: how close to target?
vtr::signal!("latency_ms", session.latency_ms(), target: 100.0, minimize);
vtr::signal!("coverage", coverage_pct(), target: 0.95, maximize);
// Semantic traces: structured, depth-aware, color-coded
vtr::state!("session", "Disconnected" => "Connected");
vtr::checkpoint!("ready");
Ok(())
}Run with cargo test. VTR produces a rich report:
═══ VTR Report ═══
Score: 0.92 (target: 1.0)
Tests: 1 passed, 0 failed
Signals:
latency_ms: 85 [████████░░] target=100 ↓ ✓
coverage: 0.92 [█████████░] target=0.95 ↑ ✗
Trace:
→ connection_works
◆ Disconnected → Connected
● ready
← connection_works ok (85ms)
Actions:
• Coverage at 92% — add tests for error paths in auth.rs
• Try adversarial: empty input, boundary values
VTR runs one-shot. The agent reacts:
agent runs cargo test → VTR emits scored trace → agent reads, fixes → loop
VTR's job is not to be the loop. It is to telegraph clearly so the agent knows what's wrong, how wrong, in which direction to improve, and what concrete next action to try. The loop is emergent.
| Module | Purpose |
|---|---|
signals |
Binary / Scalar / Vector / Gradient reward types |
trace |
Semantic macros, depth-rendered formatter, ring buffer, journald sinks |
harness |
#[vtr::test] runtime, signal! macro, thread-local context |
feedback |
Meta-review triggers, prompt templates, contextual guidance |
recurrence |
NeverDone, coverage gaps, mutation hints, adversarial inputs |
logging |
Centralized init, SIGHUP hot-reload, TUI buffering |
x11 |
(feature-gated) Xvfb/Xephyr, XTEST, EWMH for GUI integration |
MIT