Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
- `src/` is the source of truth. `agent/` orchestrates the runtime loop (`tools/agent.ts`), with `shared/` housing DOM capture/runtime utilities and element finding, `examine-dom/` powering `page.perform` (and deprecated `page.aiAction`), `messages/` building prompts, `mcp/` hosting the MCP client, and `error.ts` centralizing agent errors; `debug/options.ts` controls low-level tracing.
- `cdp/` wraps Chrome DevTools (client lifecycle, frame graph/context tracking, element resolution, bounding boxes, and action dispatch). `performAction` prefers these CDP paths and falls back to Playwright helpers when disabled.
- `context-providers/a11y-dom/` is the single DOM provider (accessibility tree + encoded IDs, optional bounding boxes/visual overlays, DOM snapshot cache, streaming). Keep overlay generation (`visual-overlay.ts`) and ID maps (`build-maps.ts`, `dom-cache.ts`) aligned when changing extraction logic. Shared overlay/screenshot helpers live in `context-providers/shared/`.
- `browser-providers/` implement `LocalBrowserProvider` (Playwright `chromium` channel "chrome" with stealth flags) and `HyperbrowserProvider`; extend these instead of launching browsers directly. Base class is in `types/browser-providers/types.ts`.
- `browser-providers/` implement `LocalBrowserProvider` (Playwright `chromium` channel "chrome" with stealth flags), `HyperbrowserProvider`, and `RemoteChromeProvider`; extend these instead of launching browsers directly. Base class is in `types/browser-providers/types.ts`.
- `llm/` houses native adapters (`openai`, `anthropic`, `gemini`, `deepseek`) plus schema/message converters—use `createLLMClient` and update `providers/index.ts` for new backends.
- `types/` centralizes config, browser provider, agent state/action definitions (including `cdpActions`, visual/streaming/cache flags). Add interfaces here before wiring features elsewhere.
- `utils/` collects shared helpers (`ErrorEmitter`, retry/sleep, markdown conversion, DOM settle logic); prefer reuse over reimplementation.
Expand All @@ -18,7 +18,7 @@
- `yarn build` wipes `dist/`, runs `tsc` + `tsc-alias`, and restores executable bits on `dist/cli/index.js` and `cli.sh`; run before publishing or cutting releases.
- `yarn lint` / `yarn format` use the flat ESLint config (`eslint.config.mjs`) and Prettier over `src/**/*.ts`; fix warnings instead of suppressing rules.
- `yarn test` launches Jest; set `CI=true` for coverage and deterministic snapshots.
- `yarn cli -c "..." [--debug --hyperbrowser --mcp <path>]` runs the agent; `--hyperbrowser` switches to the remote provider and `--debug` drops artifacts into `debug/`.
- `yarn cli -c "..." [--debug --hyperbrowser --remote-chrome <wsEndpoint> --mcp <path>]` runs the agent; `--hyperbrowser` switches to the Hyperbrowser provider, `--remote-chrome <wsEndpoint>` connects to a remote Chrome instance, and `--debug` drops artifacts into `debug/`.
- `yarn example <path>` (backed by `ts-node -r tsconfig-paths/register`) is the quickest way to execute flows in `examples/` or `scripts/`.
- DOM metadata builds at runtime via the a11y provider; the legacy `build-dom-tree-script` entry points at a removed file—avoid relying on it until refreshed.

Expand Down
32 changes: 29 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,12 +59,13 @@ $ npx @hyperbrowser/agent -c "Find a route from Miami to New Orleans, and provid
<img src="assets/flight-schedule.gif" alt="Hyperagent Demo"/>
</p>

The CLI supports options for debugging or using hyperbrowser instead of a local browser
The CLI supports options for debugging or using different browser providers

```bash
-d, --debug Enable debug mode
-c, --command <task description> Command to run
--hyperbrowser Use Hyperbrowser for the browser provider
--remote-chrome <wsEndpoint> Use Remote Chrome for the browser provider with specified WebSocket endpoint
```

### Library
Expand Down Expand Up @@ -186,9 +187,12 @@ const products = await page.extract(
);
```

## ☁️ Cloud
## ☁️ Cloud & Remote Browsers

You can scale HyperAgent with cloud headless browsers using Hyperbrowser
You can scale HyperAgent with different browser providers:

### Hyperbrowser Provider
Scale with cloud headless browsers using Hyperbrowser

1. Get a free api key from [Hyperbrowser](https://app.hyperbrowser.ai/)
2. Add it to your env as `HYPERBROWSER_API_KEY`
Expand All @@ -207,6 +211,28 @@ console.log(response);
await agent.closeAgent();
```

### Remote Chrome Provider
Connect to a remote Chrome instance using a WebSocket endpoint

```typescript
const agent = new HyperAgent({
browserProvider: "RemoteChrome",
remoteChromeConfig: {
wsEndpoint: "ws://your-remote-chrome-instance:9222/devtools/browser/...",
browserConfig: {
// Additional Playwright connect options
}
}
});

const response = await agent.executeTask(
"Go to hackernews, and list me the 5 most recent article titles"
);

console.log(response);
await agent.closeAgent();
```

## Usage Guide

### Multi-Page Management
Expand Down
9 changes: 7 additions & 2 deletions currentState.md
Original file line number Diff line number Diff line change
Expand Up @@ -374,12 +374,17 @@ text = text.replace(`<<${variable.key}>>`, variable.value);
- Cloud-based browser service
- Remote CDP connection

3. **RemoteChromeProvider**
- Connects to a remote Chrome instance via WebSocket endpoint
- Allows connection to existing Chrome instances with custom CDP endpoints

**Selection:** ([index.ts:85-94](src/agent/index.ts#L85-L94))
```typescript
new HyperAgent({
browserProvider: "Local" | "Hyperbrowser",
browserProvider: "Local" | "Hyperbrowser" | "RemoteChrome",
localConfig: { ... },
hyperbrowserConfig: { ... }
hyperbrowserConfig: { ... },
remoteChromeConfig: { ... }
})
```

Expand Down
21 changes: 17 additions & 4 deletions src/agent/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ import {
HyperbrowserProvider,
LocalBrowserProvider,
} from "../browser-providers";
import { RemoteChromeProvider } from "../browser-providers/remote-chrome";
import { HyperagentError } from "./error";
import { findElementWithInstruction } from "./shared/find-element";
import {
Expand Down Expand Up @@ -80,7 +81,9 @@ export class HyperAgent<T extends BrowserProviders = "Local"> {
private mcpClient: MCPClient | undefined;
private browserProvider: T extends "Hyperbrowser"
? HyperbrowserProvider
: LocalBrowserProvider;
: T extends "RemoteChrome"
? RemoteChromeProvider
: LocalBrowserProvider;
private browserProviderType: T;
private actions: Array<AgentActionDefinition> = [...DEFAULT_ACTIONS];
private cdpActionsEnabled: boolean;
Expand Down Expand Up @@ -132,8 +135,17 @@ export class HyperAgent<T extends BrowserProviders = "Local"> {
...(params.hyperbrowserConfig ?? {}),
debug: params.debug,
})
: new LocalBrowserProvider(params.localConfig)
) as T extends "Hyperbrowser" ? HyperbrowserProvider : LocalBrowserProvider;
: this.browserProviderType === "RemoteChrome"
? new RemoteChromeProvider(
params.remoteChromeConfig ?? { wsEndpoint: "" },
params.debug ?? false
)
: new LocalBrowserProvider(params.localConfig)
) as T extends "Hyperbrowser"
? HyperbrowserProvider
: T extends "RemoteChrome"
? RemoteChromeProvider
: LocalBrowserProvider;

if (params.customActions) {
params.customActions.forEach(this.registerAction, this);
Expand All @@ -152,7 +164,8 @@ export class HyperAgent<T extends BrowserProviders = "Local"> {
if (!this.browser) {
this.browser = await this.browserProvider.start();
if (
this.browserProviderType === "Hyperbrowser" &&
(this.browserProviderType === "Hyperbrowser" ||
this.browserProviderType === "RemoteChrome") &&
this.browser.contexts().length > 0
) {
this.context = this.browser.contexts()[0];
Expand Down
3 changes: 2 additions & 1 deletion src/browser-providers/index.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import { HyperbrowserProvider } from "./hyperbrowser";
import { LocalBrowserProvider } from "./local";
import { RemoteChromeProvider } from "./remote-chrome";

export { HyperbrowserProvider, LocalBrowserProvider };
export { HyperbrowserProvider, LocalBrowserProvider, RemoteChromeProvider };
48 changes: 48 additions & 0 deletions src/browser-providers/remote-chrome.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
import { chromium, Browser, ConnectOverCDPOptions } from "playwright-core";
import BrowserProvider from "@/types/browser-providers/types";

export interface RemoteChromeConfig {
wsEndpoint: string;
browserConfig?: Omit<ConnectOverCDPOptions, "endpointURL">;
}

export class RemoteChromeProvider extends BrowserProvider<Browser> {
config: RemoteChromeConfig;
session: Browser | undefined;
debug: boolean;

constructor(config: RemoteChromeConfig, debug: boolean = false) {
super();
this.config = config;
this.debug = debug;
this.session = undefined;
}

async start(): Promise<Browser> {
if (this.debug) {
console.log("\nConnecting to remote Chrome:", this.config.wsEndpoint, "\n");
}

this.session = await chromium.connectOverCDP(
this.config.wsEndpoint,
this.config.browserConfig
);

if (this.debug) {
console.log("\nConnected to remote Chrome:", this.config.wsEndpoint, "\n");
}

return this.session;
}

async close(): Promise<void> {
await this.session?.close();
}

public getSession() {
if (!this.session) {
return null;
}
return this.session;
}
}
16 changes: 12 additions & 4 deletions src/cli/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -39,10 +39,12 @@ program
.option("-f, --file <file path>", "Path to a file containing a command")
.option("-m, --mcp <mcp config file>", "Path to a file containing mcp config")
.option("--hyperbrowser", "Use Hyperbrowser for the browser provider")
.option("--remote-chrome <wsEndpoint>", "Use Remote Chrome for the browser provider with specified WebSocket endpoint")
.action(async function () {
const options = this.opts();
const debug = (options.debug as boolean) || false;
const useHB = (options.hyperbrowser as boolean) || false;
const remoteChromeWsEndpoint = (options.remoteChrome as string) || undefined;
let taskDescription = (options.command as string) || undefined;
const filePath = (options.file as string) || undefined;
const mcpPath = (options.mcp as string) || undefined;
Expand Down Expand Up @@ -70,7 +72,9 @@ program

const agent = new HyperAgent({
debug: debug,
browserProvider: useHB ? "Hyperbrowser" : "Local",
browserProvider: useHB ? "Hyperbrowser" : remoteChromeWsEndpoint ? "RemoteChrome" : "Local",
hyperbrowserConfig: useHB ? {} : undefined,
remoteChromeConfig: remoteChromeWsEndpoint ? { wsEndpoint: remoteChromeWsEndpoint } : undefined,
customActions: [
UserInteractionAction(
async ({ message, kind, choices }): Promise<ActionOutput> => {
Expand Down Expand Up @@ -267,10 +271,14 @@ program
await agent.initializeMCPClient({ servers: mcpConfig });
}

if (useHB && !debug) {
if ((useHB || remoteChromeWsEndpoint) && !debug) {
await agent.initBrowser();
const session = agent.getSession() as SessionDetail;
console.log(`Hyperbrowser Live URL: ${session.liveUrl}\n`);
if (useHB) {
const session = agent.getSession() as SessionDetail;
console.log(`Hyperbrowser Live URL: ${session.liveUrl}\n`);
} else if (remoteChromeWsEndpoint) {
console.log(`Connected to Remote Chrome: ${remoteChromeWsEndpoint}\n`);
}
}

task = await agent.executeTaskAsync(taskDescription, {
Expand Down
4 changes: 3 additions & 1 deletion src/types/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import {
HyperbrowserProvider,
LocalBrowserProvider,
} from "@/browser-providers";
import { RemoteChromeProvider } from "@/browser-providers/remote-chrome";
import type { Page as PlaywrightPage, BrowserContext } from "playwright-core";

export interface MCPServerConfig {
Expand Down Expand Up @@ -55,7 +56,7 @@ export interface MCPConfig {
servers: MCPServerConfig[];
}

export type BrowserProviders = "Local" | "Hyperbrowser";
export type BrowserProviders = "Local" | "Hyperbrowser" | "RemoteChrome";

/**
* Placeholder connector configuration for the Phase 4 connector-only flow.
Expand Down Expand Up @@ -89,6 +90,7 @@ export interface HyperAgentConfig<T extends BrowserProviders = "Local"> {
"debug"
>;
localConfig?: ConstructorParameters<typeof LocalBrowserProvider>[0];
remoteChromeConfig?: ConstructorParameters<typeof RemoteChromeProvider>[0];

/**
* Configuration for agent actions
Expand Down
Loading