🔥 WebFetch Proxy - Advanced Web Content Fetching Service

A sophisticated web content fetching proxy service with intelligence gathering, caching, and seamless integration for AI/LLM systems. Built for OpenCode agents and other AI platforms requiring reliable web content access.

✨ Features

🎯 Core Capabilities

High-Performance Proxy: Async HTTP/HTTPS proxy with configurable concurrency
Intelligent Caching: Redis-backed caching with TTL and size limits
OSINT Integration: Built-in intelligence gathering and tagging
Security Controls: Rate limiting, domain filtering, API key authentication
Visual Monitoring: Real-time request visualization and logging

🔧 Technical Features

Async Architecture: Built on FastAPI with aiohttp for maximum performance
Plugin System: Extensible plugin architecture for different AI platforms
Content Analysis: Automatic content type detection and formatting
Fallback System: Graceful degradation when proxy is unavailable
Comprehensive Logging: Structured logs with intelligence data storage

🛡️ Security & Compliance

SSL Verification: Configurable SSL certificate verification
Rate Limiting: Per-minute and per-hour request limits
Domain Controls: Whitelist/blacklist domain filtering
API Authentication: Bearer token authentication support
Privacy Protection: Request anonymization and data minimization

🚀 Quick Start

Prerequisites

# Install system dependencies
brew install redis  # macOS
# or
sudo apt install redis-server  # Ubuntu

# Install Python dependencies
pip install fastapi uvicorn aiohttp redis asyncio pyyaml pydantic

Running the Proxy

# Clone repository
git clone https://github.com/lostsock1/webfetch-proxy.git
cd webfetch-proxy

# Start Redis
redis-server --daemonize yes

# Start proxy server
python3 webfetch_proxy.py

# Alternative: Use startup script
./start.sh

Testing the Proxy

# Test basic functionality
python3 test_plugin.py

# Test integration
python3 test_proxy_integration.py

# Run comprehensive tests
python3 agent_test.py

📖 Usage

Basic API Usage

import requests

# Single fetch request
response = requests.post(
    "http://localhost:8082/fetch",
    json={
        "url": "https://example.com",
        "method": "GET",
        "headers": {"User-Agent": "Custom-Agent/1.0"},
        "timeout": 30
    },
    headers={"Authorization": "Bearer test-key"}
)

# Bulk fetch
response = requests.post(
    "http://localhost:8082/fetch/bulk",
    json={
        "urls": ["https://example1.com", "https://example2.com"],
        "concurrent_limit": 5
    }
)

OpenCode Integration

import sys
sys.path.insert(0, "/path/to/webfetch-proxy")

from opencode_plugin import initialize_plugin, webfetch, bulk_webfetch

# Initialize plugin
initialize_plugin()

# Single fetch
result = webfetch("https://example.com", format="markdown")

# Bulk fetch
results = bulk_webfetch([
    "https://news.ycombinator.com",
    "https://github.com/trending"
], concurrent_limit=3)

# Check status
from opencode_plugin import get_plugin_status
print(get_plugin_status())

Command Line Usage

# Start proxy server
python3 webfetch_proxy.py --host 0.0.0.0 --port 8082

# Check proxy status
./status.sh

# Run maintenance
python3 maintenance.py --cleanup

# View logs
tail -f logs/proxy.log

🔌 API Reference

Endpoints

Method	Endpoint	Description	Authentication
`GET`	`/health`	Health check	Optional
`POST`	`/fetch`	Single URL fetch	Required
`POST`	`/fetch/bulk`	Bulk URL fetch	Required
`GET`	`/stats`	Proxy statistics	Optional
`GET`	`/blocked/requests`	Blocked requests log	Required
`POST`	`/housekeeping/cleanup`	Clean obsolete files	Required
`GET`	`/intelligence/tags`	List intelligence tags	Required

Request Models

Single Fetch:

{
  "url": "https://example.com",
  "method": "GET",
  "headers": {"User-Agent": "Custom/1.0"},
  "timeout": 30,
  "follow_redirects": true,
  "verify_ssl": true,
  "cache_enabled": true,
  "intelligence_tags": ["osint", "reconnaissance"]
}

Bulk Fetch:

{
  "urls": ["https://site1.com", "https://site2.com"],
  "concurrent_limit": 5,
  "common_headers": {"Accept": "text/html"},
  "intelligence_tags": ["bulk", "analysis"]
}

⚙️ Configuration

`config.yaml`

proxy:
  host: "0.0.0.0"
  port: 8082
  workers: 1
  timeout: 30
  max_concurrent: 100
  show_requests: true

caching:
  enabled: true
  ttl: 3600
  redis_url: "redis://localhost:6379/0"
  max_size_mb: 100

security:
  api_key: "your-secret-key"
  allowed_domains: []
  blocked_domains: ["localhost", "127.0.0.1"]
  rate_limiting:
    enabled: true
    requests_per_minute: 60
    requests_per_hour: 1000

intelligence:
  enabled: true
  storage_path: "./intelligence"
  auto_tagging: true
  content_analysis: true

Environment Variables

export WEBFETCH_PROXY_HOST="0.0.0.0"
export WEBFETCH_PROXY_PORT="8082"
export WEBFETCH_REDIS_URL="redis://localhost:6379/0"
export WEBFETCH_API_KEY="your-secret-key"
export WEBFETCH_CACHE_TTL="3600"

🏗️ Architecture

Project Structure

webfetch-proxy/
├── webfetch_proxy.py           # Main proxy server (35K+ lines)
├── opencode_plugin.py          # OpenCode integration plugin
├── opencode_proxy_plugins.py   # Plugin architecture
├── webfetch_proxy_integration.py # Integration utilities
├── test_proxy_integration.py   # Integration tests
├── test_plugin.py              # Plugin testing
├── agent_test.py               # Agent testing
├── config.yaml                 # Configuration file
├── proxy_config.json           # Proxy configuration
├── proxy_wrapper.sh            # Shell wrapper script
├── start.sh                    # Startup script
├── status.sh                   # Status monitoring script
├── maintenance.py              # Maintenance utilities
├── README.md                   # Documentation
├── __init__.py                 # Python package init
├── logs/                       # Log directory
├── intelligence/               # Intelligence data (JSON)
├── archive/                    # Archived files and docs
└── images/                     # Image assets

Core Components

Proxy Server (webfetch_proxy.py): FastAPI-based async HTTP proxy
Plugin System: Platform-specific integration plugins
Cache Layer: Redis-backed response caching
Intelligence Engine: Content analysis and tagging
Security Module: Authentication and rate limiting
Monitoring System: Real-time request visualization

Data Flow

Client Request → Authentication → Rate Limiting → Cache Check → 
→ Domain Filtering → HTTP Fetch → Content Analysis → 
→ Intelligence Tagging → Cache Storage → Response

🔍 Intelligence Features

Automatic Tagging

Content Analysis: Detects content type (HTML, JSON, XML, etc.)
Domain Classification: Categorizes by domain type (news, social, etc.)
OSINT Patterns: Identifies reconnaissance and scanning patterns
Metadata Extraction: Extracts titles, descriptions, keywords

Storage Format

{
  "request_id": "8cb584ff",
  "timestamp": "2026-01-26T02:33:01",
  "url": "https://example.com",
  "method": "GET",
  "status_code": 200,
  "content_type": "text/html",
  "size_bytes": 15432,
  "execution_time_ms": 245,
  "tags": ["web", "osint", "reconnaissance"],
  "metadata": {
    "title": "Example Domain",
    "description": "Example description",
    "keywords": ["example", "domain"]
  },
  "headers": {...},
  "cache_hit": false
}

🚢 Deployment

Docker Deployment

FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .
EXPOSE 8082

CMD ["python", "webfetch_proxy.py"]

Systemd Service (Linux)

[Unit]
Description=WebFetch Proxy Service
After=network.target redis.service

[Service]
Type=simple
User=webfetch
WorkingDirectory=/opt/webfetch-proxy
ExecStart=/usr/bin/python3 webfetch_proxy.py
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: webfetch-proxy
spec:
  replicas: 2
  selector:
    matchLabels:
      app: webfetch-proxy
  template:
    metadata:
      labels:
        app: webfetch-proxy
    spec:
      containers:
      - name: proxy
        image: webfetch-proxy:latest
        ports:
        - containerPort: 8082
        env:
        - name: REDIS_URL
          value: "redis://redis-service:6379/0"

📊 Monitoring & Metrics

Health Checks

# Check proxy health
curl http://localhost:8082/health

# Get statistics
curl http://localhost:8082/stats

# View logs
tail -f logs/proxy.log

Performance Metrics

Request Latency: Average/95th percentile response times
Cache Hit Rate: Percentage of requests served from cache
Error Rate: HTTP error rate by status code
Concurrency: Active concurrent requests
Throughput: Requests per second/minute

🔧 Development

Setting Up Development Environment

# Clone repository
git clone https://github.com/lostsock1/webfetch-proxy.git
cd webfetch-proxy

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements-dev.txt

# Install pre-commit hooks
pre-commit install

# Run tests
pytest tests/

Adding New Features

New Plugin: Extend opencode_plugin.py base class
New Endpoint: Add route in webfetch_proxy.py
New Intelligence Tag: Update config.yaml patterns
New Cache Strategy: Implement in caching module

Testing

# Run unit tests
python -m pytest tests/unit

# Run integration tests
python -m pytest tests/integration

# Run all tests with coverage
coverage run -m pytest
coverage report

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Code Standards

Follow PEP 8 style guide
Add type hints for all functions
Write comprehensive docstrings
Include unit tests for new features
Update documentation as needed

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

FastAPI Team: For the excellent async web framework
Redis Community: For the robust caching solution
aiohttp Maintainers: For the async HTTP client library
OpenCode Community: For inspiration and testing

📞 Support

Issues: GitHub Issues
Documentation: Project Wiki
Email: Open an issue for support requests

Built with ❤️ for the AI/LLM community. Keep fetching! 🔥

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
__init__.py		__init__.py
config.yaml		config.yaml
maintenance.py		maintenance.py
opencode_plugin.py		opencode_plugin.py
requirements.txt		requirements.txt
setup.py		setup.py
start.sh		start.sh
test_plugin.py		test_plugin.py
webfetch_proxy.py		webfetch_proxy.py

lostsock1/webfetch-proxy

Folders and files

Latest commit

History

Repository files navigation