v0.2.0: CLI, Python package, working e2e replay by pyyush · Pull Request #1 · pyyush/dbar

pyyush · 2026-03-27T16:30:50Z

Summary

CLI: dbar replay --cost, dbar eval, dbar validate — honest cost model with verified figures
Python package: pip install dbar — native browser-use v0.12.5 integration via on_step_end hooks, zero runtime deps, 33 tests
E2E replay works: 5 core bugs fixed — capture → replay at 100% on 3 real websites (books.toscrape.com, example.com, quotes.toscrape.com)
Integrations rebuilt: browser-use (snapshot-only, honest about CDP limits) + Browserbase (@browserbasehq/sdk v2.6.0, full determinism)
All claims verified: cost figures from browser-use benchmark blog, qualified benchmark stats, security hardening (no --api-key CLI flags, masked CDP URLs, PII warnings)

E2E Results

Site	Steps	Requests	Replay Success
books.toscrape.com	3	68	100%
example.com	1	1	100%
quotes.toscrape.com	2	12	100%

Core Fixes

DOM hash non-deterministic — hash getOuterHTML instead of DOMSnapshot (layout-independent)
Accessibility API removed — multi-strategy fallback (legacy → CDP → ariaSnapshot)
Response bodies garbled — hydrateTranscript() resolves deduplicated paths to base64
Virtual time blocks navigation — defer virtualizer to first step(), add suspend()
No multi-step replay — added ReplaySession with step-by-step API

Test plan

🤖 Generated with Claude Code

DBAR can now record and replay browser-use agent sessions. The bridge connects to browser-use's running Playwright browser via CDP and wraps the page with DBAR capture. File-based signaling (.dbar-step, .dbar-finish) coordinates the Python agent process with the Node.js capture process. Includes capture.ts, replay.ts, example.py, and supporting config. Test: tsc --noEmit passes with zero errors Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Browserbase provides cloud browser infrastructure via CDP. This bridge lets Browserbase users attach DBAR to their cloud sessions to produce deterministic capsules that can be replayed locally. Supports two connection modes: - Browserbase API: session ID + API key resolves CDP URL automatically - Direct CDP: raw WebSocket URL for manual connection Follows the same architecture as the browser-use integration (file-based signaling, same capsule format) adapted for Browserbase's REST API. Test: typecheck passes (npx tsc --noEmit) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds `dbar` CLI binary with three commands: - `dbar replay <capsule> --cost` — replay with cost comparison - `dbar eval --capsules <dir> --assertions <yaml>` — batch evaluation - `dbar validate <capsule>` — structural validation Includes professional 4K demo page (demo/index.html) with animated terminal, cost comparison widget, and integration showcase. 204 tests passing (171 existing + 33 new CLI tests). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Demo now shows real SHA-256 hashes from books.toscrape.com capture (Chromium 145.0.7632.6), real browser-use cost numbers ($0.19/task from their published blog), real market stats (84.6K stars, $17M, $67.5M), and real divergence detection (DOM mismatch in 669ms). Added "The Divergence" section showing DBAR catching timestamp changes between runs — demonstrating the core value proposition. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Self-contained 90-second auto-playing HTML presentation for screen recording. 7 scenes: title, problem, capture, replay, value props, integrations, CTA. All real data from books.toscrape.com capture session. Pure CSS animation sequencing, JS only for pause/play/fullscreen. Optimized for 1920x1080 Loom recording. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Automated Apple-keynote-quality demo that captures a real DBAR session on books.toscrape.com with: - Human-like mouse movement (cubic bezier curves) - Live updating DBAR dashboard side panel - 7-scene sequence: navigate, browse, view, cart, capsule, replay, CTA - Real DBAR capture/replay with actual hashes and cost comparison Run: npx tsx demo/record.ts Record screen with QuickTime/Loom while it plays. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Cost claims audit findings: - OUTPUT_TOKEN_RATIO 0.15 (agents output ~15% of input) - Replay shows compute cost (~$0.001), not hardcoded $0.00 - "API savings" replaces misleading "100% savings" Code quality: args.ts dead branch removed, YAML regex fixed, eval.ts url_contains documented, process.exitCode not exit(0). 206 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Demo claims (index.html, video.html): - $0.19/task -> $0.10/task (verified from browser-use benchmark blog) - $19/100 tasks -> $10/100 tasks - "$0.00 replay" -> "$0 API cost" with compute footnote - "84,600+" -> "84K+" (less likely to go stale) - "97% accuracy" -> "97% on Online-Mind2Web" with WebVoyager note - "0% can prove it" -> "no hash-verified proof" - "669ms" -> "~670ms" - Split "$84.5M combined funding" into separate figures Integration honesty: - browser-use README rewritten: honest about parallel-capture architecture, CDP Fetch conflict, limitations section - Browserbase: removed --api-key CLI flag (env-var only) - Browserbase: CDP URLs masked in log output - --no-sandbox gated behind DBAR_NO_SANDBOX=1 env var - PII security notice added to both integration READMEs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

$0.19 -> $0.10, $0.00 -> $0 API, 100% -> ~99%. Console.log statements are intentional CLI progress output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Complete rewrite addressing review findings: Architecture: Snapshot-only capture (NOT full determinism). Honest about CDP Fetch conflict with browser-use's cdp-use client. DBAR captures DOM/a11y/screenshot snapshots at step boundaries via raw CDP, without interfering with the agent. API fixes: - Browser() constructor, not BrowserConfig (dropped in v0.12.5) - on_step_end hook for step signaling (not file-only) - browser-use uses cdp-use, not Playwright (documented) - Removed replay.ts (snapshot-only mode can't replay) Pinned: browser-use==0.12.5, cdp-use==1.4.5 11 unit tests for capture utilities. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Complete rewrite using official Browserbase SDK: - session.connectUrl for CDP (not hand-rolled REST calls) - DBAR OWNS the session — full deterministic capture works (virtual time + network recording + state snapshots) - Secrets via env vars only (BROWSERBASE_API_KEY, PROJECT_ID) - connectUrl masked in all log output (contains API key) - Extracted helpers.ts for testable pure functions - 17 unit tests for arg parsing, URL masking, manifest building Pinned: @browserbasehq/sdk ^2.6.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

pip install dbar — three lines to record browser-use agent sessions: recorder = DBARRecorder(output_dir="./capsules") result = await agent.run(on_step_end=recorder.on_step_end) capsule = recorder.finish() Records DOM hashes, screenshot hashes, actions, and results at each step. Zero runtime deps. Capsule.diff() for comparing runs. Sensitive data redaction by default. 33 Python tests. 206 TS tests unaffected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

5 bugs found and fixed via e2e testing against 3 live sites: 1. DOM hash non-deterministic — hash getOuterHTML instead of DOMSnapshot (layout-independent, structural only) 2. Accessibility API removed — multi-strategy fallback (legacy → CDP getFullAXTree → ariaSnapshot) 3. Response bodies garbled — hydrateTranscript() resolves deduplicated body paths back to base64 before replay 4. Virtual time blocks navigation — defer virtualizer start to first step(), add suspend() for between-step nav 5. No multi-step replay — added ReplaySession with step-by-step API (startReplay → step → finish) Also: added screenshot_mismatch divergence type. e2e results (demo/e2e-test.ts): - books.toscrape.com: 3 steps, 68 requests — 100% - example.com: 1 step, 1 request — 100% - quotes.toscrape.com: 2 steps, 12 requests — 100% 206 tests passing. Typecheck clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The ReplaySession.step() path was fixed but DBAR.replay() still used "dom_mismatch" for screenshot divergences. Both paths now correctly use "screenshot_mismatch". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Removed: - demo/index.html (static landing page with hardcoded data) - demo/video.html (CSS animation presentation) Kept: - demo/e2e-test.ts (real capture+replay on 3 live sites) - demo/record.ts (real Chrome recording with DBAR capture) - demo/dashboard.html (live status panel for record.ts) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- package.json: 0.1.0 → 0.2.0 - python/pyproject.toml: 0.1.0 → 0.2.0 - python/dbar/_version.py: 0.1.0 → 0.2.0 - release.yml: added pypi job (hatchling build + twine upload) using secrets.PYPI_TOKEN Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…side-package publish This narrows the public release surface before the follow-up changeset work. The workflow now blocks tracked operator docs/state files, the README banner uses a hosted asset that renders on npm, and integration packages are marked non-publishable with an explicit prepublish failure. Constraint: Keep this as a stacked prep PR on top of fix/e2e-replay without bundling unrelated dirty-worktree edits Rejected: Commit the cleanup from the active dirty worktree directly | risked scooping unrelated uncommitted changes into the PR Confidence: high Scope-risk: narrow Reversibility: clean Directive: Keep integration packages private unless they gain a deliberate published-package contract and files allowlist Tested: Root npm ci/build/typecheck/test, browser-use npm ci/typecheck/test, root npm pack --dry-run, integration npm publish --dry-run failure paths Not-tested: Browserbase npm ci on this base branch; package-lock.json is out of sync before this prep change

This locks the browser-use and Browserbase integration surfaces to the versions actually verified in this branch and makes those lanes visible in the public README. The browser-use Python extra now reflects its real Python 3.11+ constraint, both private integration packages point at the local repo package, and the docs state the exact versions users should expect. Constraint: Land on fix/e2e-replay without pulling unrelated dirty-worktree edits from the active checkout Rejected: Push the active checkout directly | risked bundling unrelated in-progress code and generated state Confidence: high Scope-risk: narrow Reversibility: clean Directive: Keep integration docs aligned with exact tested versions; if you loosen pins later, update lockfiles and wording from pinned to supported in the same change Tested: Root npm ci/build/typecheck/test, browser-use npm ci/typecheck/test, browserbase npm ci/test, Python 3.12 uv install of ./python[browser-use,dev], python/tests Not-tested: Python 3.11 runtime specifically on this machine; browser-use extra was validated on Python 3.12 because that interpreter was available

This replaces the tag-triggered publish path with a Changesets-driven release workflow on main. The workflow now creates or updates a release PR when changesets are present and only publishes npm plus PyPI after that version PR has landed. A small Python version sync step keeps python/pyproject.toml in lockstep with the package version, and the npm publish helper no-ops when the current version is already on npm so normal main pushes do not republish. Constraint: Keep release automation aligned with the existing single-package repo shape and the Python package version lockstep Rejected: Keep tag-triggered publishing and add a main-branch ancestry check | still allowed branch-side release control and skipped the requested release-PR workflow Confidence: high Scope-risk: moderate Reversibility: clean Directive: Future releases should merge via the Changesets-generated version PR on main; do not push release tags by hand as the primary release path Tested: Root npm run release:verify, npm run release:publish:npm no-op on already published 0.2.0, browser-use npm ci/typecheck/test, browserbase npm ci/test, Python 3.12 uv install of ./python[dev] and python/tests Not-tested: End-to-end GitHub changesets/action execution in Actions; behavior is inferred from the official changesets/action contract and local script verification

pyyush and others added 19 commits March 26, 2026 16:31

fix(demo): update record.ts cost claims to verified figures

f7ed7df

$0.19 -> $0.10, $0.00 -> $0 API, 100% -> ~99%. Console.log statements are intentional CLI progress output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: screenshot_mismatch type in auto-replay path

24fe16f

The ReplaySession.step() path was fixed but DBAR.replay() still used "dom_mismatch" for screenshot divergences. Both paths now correctly use "screenshot_mismatch". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

pyyush mentioned this pull request Apr 2, 2026

chore(release): lock down public release hygiene before changeset PR #2

Closed

pyyush merged commit daa7ba8 into main Apr 2, 2026
2 checks passed

pyyush deleted the fix/e2e-replay branch April 2, 2026 21:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.0: CLI, Python package, working e2e replay#1

v0.2.0: CLI, Python package, working e2e replay#1
pyyush merged 19 commits intomainfrom
fix/e2e-replay

pyyush commented Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pyyush commented Mar 27, 2026

Summary

E2E Results

Core Fixes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant