v0.2.0: CLI, Python package, working e2e replay#1
Merged
Conversation
DBAR can now record and replay browser-use agent sessions. The bridge connects to browser-use's running Playwright browser via CDP and wraps the page with DBAR capture. File-based signaling (.dbar-step, .dbar-finish) coordinates the Python agent process with the Node.js capture process. Includes capture.ts, replay.ts, example.py, and supporting config. Test: tsc --noEmit passes with zero errors Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Browserbase provides cloud browser infrastructure via CDP. This bridge lets Browserbase users attach DBAR to their cloud sessions to produce deterministic capsules that can be replayed locally. Supports two connection modes: - Browserbase API: session ID + API key resolves CDP URL automatically - Direct CDP: raw WebSocket URL for manual connection Follows the same architecture as the browser-use integration (file-based signaling, same capsule format) adapted for Browserbase's REST API. Test: typecheck passes (npx tsc --noEmit) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds `dbar` CLI binary with three commands: - `dbar replay <capsule> --cost` — replay with cost comparison - `dbar eval --capsules <dir> --assertions <yaml>` — batch evaluation - `dbar validate <capsule>` — structural validation Includes professional 4K demo page (demo/index.html) with animated terminal, cost comparison widget, and integration showcase. 204 tests passing (171 existing + 33 new CLI tests). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Demo now shows real SHA-256 hashes from books.toscrape.com capture (Chromium 145.0.7632.6), real browser-use cost numbers ($0.19/task from their published blog), real market stats (84.6K stars, $17M, $67.5M), and real divergence detection (DOM mismatch in 669ms). Added "The Divergence" section showing DBAR catching timestamp changes between runs — demonstrating the core value proposition. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Self-contained 90-second auto-playing HTML presentation for screen recording. 7 scenes: title, problem, capture, replay, value props, integrations, CTA. All real data from books.toscrape.com capture session. Pure CSS animation sequencing, JS only for pause/play/fullscreen. Optimized for 1920x1080 Loom recording. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Automated Apple-keynote-quality demo that captures a real DBAR session on books.toscrape.com with: - Human-like mouse movement (cubic bezier curves) - Live updating DBAR dashboard side panel - 7-scene sequence: navigate, browse, view, cart, capsule, replay, CTA - Real DBAR capture/replay with actual hashes and cost comparison Run: npx tsx demo/record.ts Record screen with QuickTime/Loom while it plays. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cost claims audit findings: - OUTPUT_TOKEN_RATIO 0.15 (agents output ~15% of input) - Replay shows compute cost (~$0.001), not hardcoded $0.00 - "API savings" replaces misleading "100% savings" Code quality: args.ts dead branch removed, YAML regex fixed, eval.ts url_contains documented, process.exitCode not exit(0). 206 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Demo claims (index.html, video.html): - $0.19/task -> $0.10/task (verified from browser-use benchmark blog) - $19/100 tasks -> $10/100 tasks - "$0.00 replay" -> "$0 API cost" with compute footnote - "84,600+" -> "84K+" (less likely to go stale) - "97% accuracy" -> "97% on Online-Mind2Web" with WebVoyager note - "0% can prove it" -> "no hash-verified proof" - "669ms" -> "~670ms" - Split "$84.5M combined funding" into separate figures Integration honesty: - browser-use README rewritten: honest about parallel-capture architecture, CDP Fetch conflict, limitations section - Browserbase: removed --api-key CLI flag (env-var only) - Browserbase: CDP URLs masked in log output - --no-sandbox gated behind DBAR_NO_SANDBOX=1 env var - PII security notice added to both integration READMEs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
$0.19 -> $0.10, $0.00 -> $0 API, 100% -> ~99%. Console.log statements are intentional CLI progress output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Complete rewrite addressing review findings: Architecture: Snapshot-only capture (NOT full determinism). Honest about CDP Fetch conflict with browser-use's cdp-use client. DBAR captures DOM/a11y/screenshot snapshots at step boundaries via raw CDP, without interfering with the agent. API fixes: - Browser() constructor, not BrowserConfig (dropped in v0.12.5) - on_step_end hook for step signaling (not file-only) - browser-use uses cdp-use, not Playwright (documented) - Removed replay.ts (snapshot-only mode can't replay) Pinned: browser-use==0.12.5, cdp-use==1.4.5 11 unit tests for capture utilities. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Complete rewrite using official Browserbase SDK: - session.connectUrl for CDP (not hand-rolled REST calls) - DBAR OWNS the session — full deterministic capture works (virtual time + network recording + state snapshots) - Secrets via env vars only (BROWSERBASE_API_KEY, PROJECT_ID) - connectUrl masked in all log output (contains API key) - Extracted helpers.ts for testable pure functions - 17 unit tests for arg parsing, URL masking, manifest building Pinned: @browserbasehq/sdk ^2.6.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
pip install dbar — three lines to record browser-use agent sessions: recorder = DBARRecorder(output_dir="./capsules") result = await agent.run(on_step_end=recorder.on_step_end) capsule = recorder.finish() Records DOM hashes, screenshot hashes, actions, and results at each step. Zero runtime deps. Capsule.diff() for comparing runs. Sensitive data redaction by default. 33 Python tests. 206 TS tests unaffected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5 bugs found and fixed via e2e testing against 3 live sites: 1. DOM hash non-deterministic — hash getOuterHTML instead of DOMSnapshot (layout-independent, structural only) 2. Accessibility API removed — multi-strategy fallback (legacy → CDP getFullAXTree → ariaSnapshot) 3. Response bodies garbled — hydrateTranscript() resolves deduplicated body paths back to base64 before replay 4. Virtual time blocks navigation — defer virtualizer start to first step(), add suspend() for between-step nav 5. No multi-step replay — added ReplaySession with step-by-step API (startReplay → step → finish) Also: added screenshot_mismatch divergence type. e2e results (demo/e2e-test.ts): - books.toscrape.com: 3 steps, 68 requests — 100% - example.com: 1 step, 1 request — 100% - quotes.toscrape.com: 2 steps, 12 requests — 100% 206 tests passing. Typecheck clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The ReplaySession.step() path was fixed but DBAR.replay() still used "dom_mismatch" for screenshot divergences. Both paths now correctly use "screenshot_mismatch". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Removed: - demo/index.html (static landing page with hardcoded data) - demo/video.html (CSS animation presentation) Kept: - demo/e2e-test.ts (real capture+replay on 3 live sites) - demo/record.ts (real Chrome recording with DBAR capture) - demo/dashboard.html (live status panel for record.ts) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- package.json: 0.1.0 → 0.2.0 - python/pyproject.toml: 0.1.0 → 0.2.0 - python/dbar/_version.py: 0.1.0 → 0.2.0 - release.yml: added pypi job (hatchling build + twine upload) using secrets.PYPI_TOKEN Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…side-package publish This narrows the public release surface before the follow-up changeset work. The workflow now blocks tracked operator docs/state files, the README banner uses a hosted asset that renders on npm, and integration packages are marked non-publishable with an explicit prepublish failure. Constraint: Keep this as a stacked prep PR on top of fix/e2e-replay without bundling unrelated dirty-worktree edits Rejected: Commit the cleanup from the active dirty worktree directly | risked scooping unrelated uncommitted changes into the PR Confidence: high Scope-risk: narrow Reversibility: clean Directive: Keep integration packages private unless they gain a deliberate published-package contract and files allowlist Tested: Root npm ci/build/typecheck/test, browser-use npm ci/typecheck/test, root npm pack --dry-run, integration npm publish --dry-run failure paths Not-tested: Browserbase npm ci on this base branch; package-lock.json is out of sync before this prep change
This locks the browser-use and Browserbase integration surfaces to the versions actually verified in this branch and makes those lanes visible in the public README. The browser-use Python extra now reflects its real Python 3.11+ constraint, both private integration packages point at the local repo package, and the docs state the exact versions users should expect. Constraint: Land on fix/e2e-replay without pulling unrelated dirty-worktree edits from the active checkout Rejected: Push the active checkout directly | risked bundling unrelated in-progress code and generated state Confidence: high Scope-risk: narrow Reversibility: clean Directive: Keep integration docs aligned with exact tested versions; if you loosen pins later, update lockfiles and wording from pinned to supported in the same change Tested: Root npm ci/build/typecheck/test, browser-use npm ci/typecheck/test, browserbase npm ci/test, Python 3.12 uv install of ./python[browser-use,dev], python/tests Not-tested: Python 3.11 runtime specifically on this machine; browser-use extra was validated on Python 3.12 because that interpreter was available
This replaces the tag-triggered publish path with a Changesets-driven release workflow on main. The workflow now creates or updates a release PR when changesets are present and only publishes npm plus PyPI after that version PR has landed. A small Python version sync step keeps python/pyproject.toml in lockstep with the package version, and the npm publish helper no-ops when the current version is already on npm so normal main pushes do not republish. Constraint: Keep release automation aligned with the existing single-package repo shape and the Python package version lockstep Rejected: Keep tag-triggered publishing and add a main-branch ancestry check | still allowed branch-side release control and skipped the requested release-PR workflow Confidence: high Scope-risk: moderate Reversibility: clean Directive: Future releases should merge via the Changesets-generated version PR on main; do not push release tags by hand as the primary release path Tested: Root npm run release:verify, npm run release:publish:npm no-op on already published 0.2.0, browser-use npm ci/typecheck/test, browserbase npm ci/test, Python 3.12 uv install of ./python[dev] and python/tests Not-tested: End-to-end GitHub changesets/action execution in Actions; behavior is inferred from the official changesets/action contract and local script verification
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
dbar replay --cost,dbar eval,dbar validate— honest cost model with verified figurespip install dbar— native browser-use v0.12.5 integration viaon_step_endhooks, zero runtime deps, 33 testsE2E Results
Core Fixes
getOuterHTMLinstead of DOMSnapshot (layout-independent)hydrateTranscript()resolves deduplicated paths to base64step(), addsuspend()ReplaySessionwith step-by-step APITest plan
demo/e2e-test.ts— 3 real websites at 100% replay🤖 Generated with Claude Code