Analytics Engineer | Data Infrastructure | AI-Enabled Analytics
Bay Area • SQL / Python • BigQuery / Snowflake • Airflow • Looker / Power BI
I build scalable analytics systems: clean data models, reliable pipelines, and decision-ready metrics.
My background includes infrastructure analytics (Google/GCP), marketing attribution & ROI, and financial risk/churn analytics.
Currently building LucidParse — an AI-assisted document intelligence pipeline (OCR → embeddings → structured extraction + validation).
- SQL at scale: complex joins, window functions, query tuning, partitioning/clustering, 100M+ row datasets
- Data modeling: star schema, SCD2, metric layers, semantic consistency for BI
- Pipelines: Python + SQL ETL/ELT, orchestration patterns (Airflow-style), monitoring & validation
- Analytics + product impact: experimentation support, cohort/segmentation, KPI design, executive reporting
- AI workflows (applied): embeddings, retrieval patterns, evaluation/validation to reduce unreliable outputs
Problem: PDFs and scanned docs are hard to turn into reliable structured data
Solution: OCR + chunking + embeddings + extraction + deterministic validation
Focus: accuracy, latency, cost tradeoffs; confidence scoring and post-processing guardrails
Repo: (ADD LINK)
Problem: fragmented infra signals, slow reporting, inconsistent definitions
Solution: modeled warehouse tables + optimized queries + quality checks + dashboards
Impact: improved data reliability and reduced time-to-insight (add metric)
Repo: (ADD LINK)
Problem: multiple channels, inconsistent conversions, hard to measure ROI
Solution: unified event model + attribution logic + performance-optimized reporting tables
Impact: improved campaign reporting speed and decision-making (add metric)
Repo: (ADD LINK)
Problem: churn/risk insights required repeatable feature pipelines and consistent KPIs
Solution: feature engineering + incremental data updates + evaluation reporting
Impact: reduced manual work and improved monitoring (add metric)
Repo: (ADD LINK)
SQL: BigQuery, Snowflake, Postgres, SQL Server
Python: pandas, APIs, automation, ETL patterns
BI: Looker, Power BI, Tableau
Orchestration/Cloud: Airflow (patterns), GCP, AWS basics