Skip to content

Samdevelop25/samat-portfolio

Repository files navigation

Hi, I’m Samat

Analytics Engineer | Data Infrastructure | AI-Enabled Analytics
Bay Area • SQL / Python • BigQuery / Snowflake • Airflow • Looker / Power BI

I build scalable analytics systems: clean data models, reliable pipelines, and decision-ready metrics.
My background includes infrastructure analytics (Google/GCP), marketing attribution & ROI, and financial risk/churn analytics.
Currently building LucidParse — an AI-assisted document intelligence pipeline (OCR → embeddings → structured extraction + validation).


What I’m strong at

  • SQL at scale: complex joins, window functions, query tuning, partitioning/clustering, 100M+ row datasets
  • Data modeling: star schema, SCD2, metric layers, semantic consistency for BI
  • Pipelines: Python + SQL ETL/ELT, orchestration patterns (Airflow-style), monitoring & validation
  • Analytics + product impact: experimentation support, cohort/segmentation, KPI design, executive reporting
  • AI workflows (applied): embeddings, retrieval patterns, evaluation/validation to reduce unreliable outputs

Featured projects

1) LucidParse — AI Document Intelligence (In progress)

Problem: PDFs and scanned docs are hard to turn into reliable structured data
Solution: OCR + chunking + embeddings + extraction + deterministic validation
Focus: accuracy, latency, cost tradeoffs; confidence scoring and post-processing guardrails
Repo: (ADD LINK)

2) GCP Infrastructure Audit & Inventory Analytics

Problem: fragmented infra signals, slow reporting, inconsistent definitions
Solution: modeled warehouse tables + optimized queries + quality checks + dashboards
Impact: improved data reliability and reduced time-to-insight (add metric)
Repo: (ADD LINK)

3) Marketing Attribution & ROI Metrics Layer

Problem: multiple channels, inconsistent conversions, hard to measure ROI
Solution: unified event model + attribution logic + performance-optimized reporting tables
Impact: improved campaign reporting speed and decision-making (add metric)
Repo: (ADD LINK)

4) Credit Risk & Churn Analytics Pipeline

Problem: churn/risk insights required repeatable feature pipelines and consistent KPIs
Solution: feature engineering + incremental data updates + evaluation reporting
Impact: reduced manual work and improved monitoring (add metric)
Repo: (ADD LINK)


Tech stack

SQL: BigQuery, Snowflake, Postgres, SQL Server
Python: pandas, APIs, automation, ETL patterns
BI: Looker, Power BI, Tableau
Orchestration/Cloud: Airflow (patterns), GCP, AWS basics

About

My public data analytics and BI portfolio — SQL, Python, Tableau, Power BI projects

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages