Real-time data pipeline for TUS (Transportes Urbanos de Santander) bus network. Collects live vehicle positions and stop-level ETA predictions to build a historical dataset for delay analysis and ML-based prediction.
posiciones: GPS positions of buses (lat/lon, timestamp, line, vehicle ID)estimaciones_parada: Real-time ETAs for each bus-stop pairpasos_parada: Historical passages (stale since June 2025, not used)
GTFS static files from nap.transportes.gob.es:
stops.txt: Stop coordinates and metadata (for proximity calculation)shapes.txt: Detailed route geometries (for GPS map-matching and visualization)routes.txt: Route names, colors, and metadatatrips.txt: Trip patterns and service IDsstop_times.txt: Stop sequences and route structurecalendar_dates.txt: Service exceptions (holidays, special schedules)
Note: GTFS files are stored in data/gtfs-static/ (not tracked in git due to size).
Source: datos.santander.es
Data Collection:
- Cloudflare Worker (
pulsetransit-worker/): Scheduled collection every 2 minutes (estimaciones) and hourly (posiciones), storing in Cloudflare D1 database - GitHub Actions (Legacy) (
.github/workflows/collect.yml): Legacy collector, writes todata/tus.dbfor development/testing
Database Schema:
estimaciones: Predictions withUNIQUE(parada_id, linea, fech_actual)to deduplicateposiciones: GPS breadcrumbs withUNIQUE(vehiculo, instante)to deduplicate overlapping route histories
src/pulsetransit/ # Legacy Python collector (backup/testing)
├── collector.py # API fetching and DB insertion
└── db.py # Schema and connection management
pulsetransit-worker/ # Cloudflare Worker (production collector)
├── src/index.js # Scheduled tasks, API fetching, health endpoint
├── schema.sql # D1 database schema
└── wrangler.jsonc # Cloudflare config and cron triggers
.github/workflows/
├── collect.yml # Manual backup collector
└── monitor.yml # Hourly worker health check
data/
└── tus.db # SQLite database (GitHub Actions/local dev)
- Data collection pipeline (GPS + ETA)
- GTFS static feed integration (stop geometries, scheduled timetables)
- Delay computation (predicted vs actual arrival)
- Weather feature enrichment (via meteomat)
- ML delay prediction model
- Live dashboard
pip install -e .
python src/pulsetransit/collector.py both