Skip to content

GCS + BigQuery Storage Implementation #24

@prosdev

Description

@prosdev

Goal

Implement production-grade event storage using GCS + BigQuery, matching patterns from Segment, RudderStack, and Snowplow.

Architecture

Events → GCS (Parquet) → WarehouseLoader → BigQuery
  • EventStore Protocol: Pluggable storage (GCS default, users can implement S3/Azure)
  • WarehouseLoader Protocol: Pluggable warehouses (BigQuery default, users can implement Snowflake/Redshift)
  • Latency: 5-10 minutes (batch pattern)

Phases

  1. GCSEventStore Implementation (1.5 days)
  2. Warehouse Loader Implementation (2 days)
  3. Integration & Configuration (1 day)
  4. BigQuery Setup & Production Scripts (1 day)
  5. Integration Tests (1.5 days)
  6. Documentation (0.5 days)

Detailed Spec

See specs/gcs-bigquery-storage/ for full implementation plan.

Timeline

Estimated: 7-8 days (with AI assistance)

Acceptance Criteria

  • Events written to GCS as Parquet files
  • BigQueryLoader polls GCS and loads to BigQuery
  • Events queryable in BigQuery within 10 minutes
  • Protocols allow custom implementations (S3, Snowflake, etc.)
  • All tests passing
  • Documentation complete

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions