-
Notifications
You must be signed in to change notification settings - Fork 0
Closed
Description
Goal
Implement production-grade event storage using GCS + BigQuery, matching patterns from Segment, RudderStack, and Snowplow.
Architecture
Events → GCS (Parquet) → WarehouseLoader → BigQuery
- EventStore Protocol: Pluggable storage (GCS default, users can implement S3/Azure)
- WarehouseLoader Protocol: Pluggable warehouses (BigQuery default, users can implement Snowflake/Redshift)
- Latency: 5-10 minutes (batch pattern)
Phases
- GCSEventStore Implementation (1.5 days)
- Warehouse Loader Implementation (2 days)
- Integration & Configuration (1 day)
- BigQuery Setup & Production Scripts (1 day)
- Integration Tests (1.5 days)
- Documentation (0.5 days)
Detailed Spec
See specs/gcs-bigquery-storage/ for full implementation plan.
Timeline
Estimated: 7-8 days (with AI assistance)
Acceptance Criteria
- Events written to GCS as Parquet files
- BigQueryLoader polls GCS and loads to BigQuery
- Events queryable in BigQuery within 10 minutes
- Protocols allow custom implementations (S3, Snowflake, etc.)
- All tests passing
- Documentation complete
Metadata
Metadata
Assignees
Labels
No labels