GitHub - lsy641/WebOasis: A framework for researchers to build and study web agents for real-world applications. It features: 1. Robust handling of complex, dynamic web pages without hassle. 2. Support for both automated and interactive (tutor-style) web agents. 3. Compatibility with multiple UI testing engines.

Handling "Any site. Any page. Any UI. Any complexity" for your web tasks and completing web tasks like a human.

Updates

Proxy, user-agent, headers: New parameters for custom proxy, user-agent, and request headers in both Playwright and Selenium managers
reCAPTCHA: Detection and handling for reCAPTCHA challenges
Safe close: Improved resource cleanup on browser/manager shutdown
Interactive class patterns: Configurable patterns for identifying interactive elements
frame_mark_elements.js: Page stability checks, chat-interface handling, and DOM element marking
identify_interactive_elements.js: Expanded interactive element detection
Max history messages: Configurable limit on conversation history
Execution outcomes: Richer execution feedback in agent messages
Accessibility tree: Accessibility information included in agent observations
Registry updates: Improved operation registration flow
Demo: Example of adding custom operations
Lowercase operations: Case-insensitive operation extraction in the parser

Features

Any site. Any page. Any UI. Any complexity. Robust handling of dynamic, highly interactive pages. You focus on research—no brittle low‑level UI hacking. If you run into a tricky page the agent can't yet handle, please open a request and we'll help.
One-parameter engine switch (Playwright ↔ Selenium). Choose your UI engine per experiment without changing operation code or test-suite boilerplate.
Dual-agent architecture for clarity and power. Role Agent (human-like intent, high-level reasoning) + Web Agent (browser expert, low-level actions). Clean separation of observation and control.
Supports both task automatiton and interactive (tutor‑style) agents (TODO). For tutor-style agents, Human (novice) → Role Agent (proficient user) → Web Agent (operator): guide, involve, and supervise actions in the loop.

Installation

From source (Recommended):

git clone https://github.com/lsy641/WebOasis.git
cd WebOasis
pip install -e .

PyPI:

pip install weboasis

Configuration

WebOasis uses prompt-based configuration for its AI agents. You can customize these prompts by setting the WEBOASIS_PROMPTS_PATH environment variable to point to your own prompts.yaml file.

Customizing Prompts

# Set the path to your custom prompts file
export WEBOASIS_PROMPTS_PATH="/path/to/your/custom/prompts.yaml"

# Run your script
python your_script.py

Prompts File Format

Your custom prompts.yaml should follow the same structure as the default:

observe_prompt: |-
  # Interactive Elements
  ${interactive_elements_str}
  
  # Your custom instructions here
  - Be more cautious when interacting with elements
  - Focus on accessibility-first interactions
  
  # Response Format
  [Action] (Your custom format)

act_prompt: |-
  # Interactive elements:
  ${interactive_elements_str}
  
  # Action Space
  ${action_space_desc}
  
  # Your custom instructions here
  
  # Goal:
  ${goal}
  
  # Response Format


example_profile: |-
  # Your custom user profile
  You are a [describe your persona]
  
  ## Task Description
  [describe what you want to accomplish]
  
  ## Profile
  [describe your characteristics and preferences]

Available Variables

The following variables can be used in your prompts:

${interactive_elements_str} - List of interactive elements on the page
${action_space_desc} - Available actions the agent can perform
${accessibility_tree} - Page accessibility information
${goal} - Current goal/task to accomplish

Run a demo

The demo simulates a prostate cancer patient using a newly developed visit‑prep web app to surface UI design and system usability issues. At each step, the DualAgent observes page dynamics, articulates the user experience, infers intent, and executes the next UI action.

python WebOasis/scripts/demo.py

Demo core logic (simplified):

import os
from openai import OpenAI
from weboasis.act_book import ActBookController
from weboasis.agents import DualAgent
from weboasis.agents.constants import TEST_ID_ATTRIBUTE

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
act_book = ActBookController(auto_register=False)
act_book.register("browser/interaction")
act_book.register("browser/navigation")
act_book.register("general/flow")

agent = DualAgent(
    client=client, model="gpt-4.1-mini",
    act_book=act_book, web_manager="playwright",
    test_id_attribute=TEST_ID_ATTRIBUTE, log_dir="./logs/demo", verbose=True,
)

for _ in range(20):
    if not agent.web_manager.is_browser_available():
        break
    agent.step()

Project structure

WebOasis/
├── act_book/                      # Operations and registry
│   ├── core/                      # Base classes, registry, automator interface
│   ├── book/
│   │   ├── browser/
│   │   │   ├── interaction.py     # Click/Type/Scroll/... operations
│   │   │   ├── navigation.py      # Navigate/Back/Forward/Tab ops
│   │   │   └── extraction.py      # GetText/Attribute/Screenshot/Title/URL
│   │   ├── dom/selector.py        # Find/Wait/Exists/Visible
│   │   ├── composite/
│   │   │   ├── forms.py           # FillForm/Login/SubmitForm
│   │   │   └── highlighting.py    # Visual highlight helpers
│   │   └── general/flow.py        # NoAction (wait)
│   └── engines/
│       ├── playwright/playwright_automator.py
│       └── selenium/selenium_automator.py
├── ui_manager/                    # Browser managers and parser
│   ├── base_manager.py
│   ├── playwright_manager.py
│   ├── selenium_manager.py
│   ├── parsers/simple_parser.py   # Robust function-call parser
│   ├── js_adapters.py             # Selenium JS adapters (sync/async)
│   └── constants.py               # Loads injected JS utilities
├── agents/                        # Agents and shared types
│   ├── base.py                    # BaseAgent, WebAgent, RoleAgent
│   ├── dual_agent.py              # Orchestrates Role + Web agents
│   ├── constants.py               # Prompts and shared config
│   └── types.py                   # Observation/Message/etc.
├── javascript/                    # Injected browser-side utilities
│   ├── frame_mark_elements.js
│   ├── add_outline_elements.js
│   ├── identify_interactive_elements.js
│   ├── extract_accessbility_tree.js
│   ├── create_developper_panel.js
│   ├── hide_developer_elements.js
│   └── show_developer_elements.js
├── config/prompts.yaml            # Act/observe prompts
└── scripts/demo.py                # Minimal runnable example

Citation

If you use WebOasis in your research, please cite:

@software{siyang_liu_2025_17052503,
  author       = {Siyang Liu},
  title        = {lsy641/WebOasis: 0.1.5},
  month        = sep,
  year         = 2025,
  publisher    = {Zenodo},
  version      = {0.1.5},
  doi          = {10.5281/zenodo.17052503},
  url          = {https://doi.org/10.5281/zenodo.17052503},
  swhid        = {swh:1:dir:3e01a9805d7e6f1f92703629987b90fbc07a3218;origin=https://doi.org/10.5281/zenodo.17052502;visit=swh:1:snp:5d80fd402b00b67f790a56c6532a88a6e9300f88;anchor=swh:1:rel:d8fbac11afe389ce44a306513eefc372905d900b;path=lsy641-WebOasis-3bec1d9},
}

License

Apache License 2.0. See the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.github/workflows		.github/workflows
config		config
docs		docs
scripts		scripts
weboasis		weboasis
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Updates

Features

Installation

Configuration

Customizing Prompts

Prompts File Format

Available Variables

Run a demo

Project structure

Citation

License

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Updates

Features

Installation

Configuration

Customizing Prompts

Prompts File Format

Available Variables

Run a demo

Project structure

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages