This project provides a framework for browser automation using Playwright, designed to be used with agent systems.
The framework allows for programmatic control of web browsers through a simple interface. It can be used to:
- Automate web browsing tasks
- Take screenshots
- Handle user interactions (clicks, typing, scrolling)
- Block access to specified domains for safety
- Computer Protocol: Defines a standard interface for browser automation
- Playwright Integration: Uses the Playwright library for browser control
- Safety Measures: Includes URL blocking for specified domains
- Image Handling: Utilities for displaying and processing screenshots
- API Integration: Helper functions for OpenAI API communication
- Python 3.9+
- Playwright browser binaries
- OpenCV for image processing
- OpenAI API key (for certain integrations)
-
Clone the repository:
git clone https://github.com/startengine/se-agents-examples.git cd se-agents-examples -
Set up a virtual environment:
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Install Playwright browsers:
playwright install
-
Create a
.envfile with your API keys (you can copy from the provided template):cp .env.example .env # Then edit .env with your actual API keys
from app.browser_agent.local_playwright import LocalPlaywrightComputer
# Create a computer instance (headless=False to see the browser window)
with LocalPlaywrightComputer(headless=False) as computer:
# Navigate to a URL
computer.goto("https://example.com")
# Take a screenshot
screenshot = computer.screenshot()
# Click on coordinates
computer.click(100, 200)
# Type text
computer.type("Hello, world!")
# Press keys
computer.keypress(["Enter"])
# Wait for page to load
computer.wait(2000) # millisecondsfrom app.utils import show_image, show_image_cv2
# Display screenshot using PIL
show_image(screenshot)
# Display screenshot using OpenCV (with auto-close after 2 seconds)
show_image_cv2(screenshot, timeout=2)The project includes example scripts in the examples/ directory to help you get started:
The weather_example.py script demonstrates how to:
- Launch a browser and navigate to a weather search page
- Take and display screenshots
- Handle basic browser automation for weather information
Run it with:
python -m examples.weather_exampleThe search_example.py script shows how to:
- Perform web searches using a browser
- Interact with search results
- Process and extract information from search pages
Run it with:
python -m examples.search_exampleThe web_search_example.py script demonstrates how to:
- Use OpenAI's capabilities to simulate web search
- Perform searches without browser automation
- Generate plausible content based on URLs
Run it with:
python -m examples.web_search_exampleThese examples provide a starting point for building more complex automation and search tasks.
app/: Main package directorybrowser_agent/: Browser automation implementationscomputer.py: Protocol defining the browser interfacebase_playwright.py: Base class for Playwright-based browser automationlocal_playwright.py: Implementation for local browser automation
web_search/: Web search implementationssearch.py: Protocol defining the web search interfaceopenai_search.py: Implementation using OpenAI's web search capability
computers/: Alternative implementation path (identical to browser_agent)utils.py: Utility functions for image handling, URL safety, API communication
examples/: Example scripts demonstrating usageweather_example.py: Example for checking weather information with browsersearch_example.py: Example for performing web searches with browserweb_search_example.py: Example for web searches using OpenAI API
The project includes domain blocking functionality to prevent navigation to specified domains. This is handled through the BLOCKED_DOMAINS list in utils.py. You can customize this list to add additional domains that should be blocked.
BLOCKED_DOMAINS = [
"maliciousbook.com",
"evilvideos.com",
# Add your own domains to block
]This project is designed to be used with LLM agents by providing a "Computer" tool that gives agents the ability to control a web browser. The agent can be given access to browser controls through a simple protocol interface.
The Computer protocol defines the interface an agent can use to control the browser:
class Computer(Protocol):
"""Defines the 'shape' (methods/properties) our loop expects."""
@property
def environment(self) -> Literal["windows", "mac", "linux", "browser"]: ...
@property
def dimensions(self) -> tuple[int, int]: ...
def screenshot(self) -> str: ...
def click(self, x: int, y: int, button: str = "left") -> None: ...
def double_click(self, x: int, y: int) -> None: ...
def scroll(self, x: int, y: int, scroll_x: int, scroll_y: int) -> None: ...
def type(self, text: str) -> None: ...
def wait(self, ms: int = 1000) -> None: ...
def move(self, x: int, y: int) -> None: ...
def keypress(self, keys: List[str]) -> None: ...
def drag(self, path: List[Dict[str, int]]) -> None: ...
def get_current_url() -> str: ...When integrated with an agent system, the agent can use this interface to control the browser and perform complex tasks.
The project includes several utility functions:
show_image(): Display a base64-encoded image using PILshow_image_cv2(): Display a base64-encoded image using OpenCV with auto-closecalculate_image_dimensions(): Get width and height of an imagecheck_blocklisted_url(): Validate URLs against blocked domainscreate_response(): Helper for OpenAI API communication
Contributions are welcome! Please feel free to submit a Pull Request.