A production-ready Airflow + dbt + Snowflake data orchestration demo for rapid onboarding and reproducible pipelines.
Highlights
- Built reusable TaskGroups wrapping dbt run/test with backfill vars
{start_date,end_date}to reduce boilerplate. - Designed a layered ELT pipeline (Bronze → Silver → Gold) with tests as quality gates between layers.
- Serialized dbt CLI runs via Airflow Pool
dbtto preventtarget/anddbt_packages/race conditions. - Published dataset
dbt://gold/fct_ordersfor downstream Datasets-based orchestration. - Integrated Great Expectations as automated data quality validation.
A stable, reproducible local data orchestration template: Apache Airflow for scheduling, dbt for modeling, Postgres as the Airflow metadata DB, and Snowflake as the warehouse. Comes with one‑command startup, health checks, regression validation, Great Expectations data quality, and Mailpit for notifications.
Prerequisites: Docker Desktop ≥ 4.x, GNU Make, bash, curl
- Credentials (local only, not committed)
- Copy
airflow/.env.exampletoairflow/.envand fill Snowflake vars:SNOWFLAKE_ACCOUNT,SNOWFLAKE_USER,SNOWFLAKE_PASSWORD,SNOWFLAKE_ROLE,SNOWFLAKE_WAREHOUSE,SNOWFLAKE_DATABASE,SNOWFLAKE_SCHEMA. - Optional:
ALERT_EMAILfor failure notifications.
- Start (pick one)
make up# init + start, opens the UI./launch.sh --init# one‑time init + startmake rebuildor./launch.sh --rebuild# rebuild images then startmake fresh# start clean, delete volumes (dangerous)- Open
http://localhost:8080(user/pass:airflow / airflow)
- Validate
- Trigger and wait for sample DAGs to succeed:
make validate - Or a subset:
make validate-daily/make validate-pipelines
- Clear historical failures (red dots in UI)
- Keep run records, clear failed task instances:
make clear-failed - Delete failed runs (destructive):
make clear-failed-hard
Or Simply Start:
./launch.sh --fresh --no-open && make validate
Tested on macOS 14 / Ubuntu 22.04 environments.
./
├─ airflow/ # Airflow (DAGs, container deps, .env)
│ ├─ dags/
│ │ ├─ dbt_daily.py
│ │ ├─ dbt_daily_pipeline.py
│ │ ├─ dbt_layered_pipeline.py
│ │ ├─ smtp_smoke.py
│ │ └─ serving/
│ │ ├─ quality_checks.py
│ │ └─ dbt_gold_consumer.py
│ ├─ requirements.txt # dbt + GE provider installed in the image
│ └─ .env # Snowflake + optional alert email (gitignored)
├─ data_pipeline/ # dbt project
│ ├─ dbt_project.yml
│ ├─ profiles.yml # reads Snowflake creds from env vars
│ ├─ models/
│ │ ├─ bronze/
│ │ ├─ silver/
│ │ └─ gold/
│ └─ snippets/ # copy‑ready templates (sources/tests)
├─ great_expectations/ # GE config, validations, local Data Docs
├─ scripts/ # validation, cleanup, QA helpers
├─ docker-compose.yml # Postgres + Airflow + Mailpit + Nginx(GE docs)
├─ Makefile # handy commands (make help)
└─ README.md
- Airflow 2.9.3 (
apache/airflow:2.9.3-python3.11)- Executor: LocalExecutor
- Metadata DB: Postgres 15
- Healthcheck:
airflow db check
- dbt-core 1.10 + dbt-snowflake 1.10 (installed in container)
- Great Expectations 0.18 + Airflow provider
- Mailpit (local SMTP sink, UI:
http://localhost:8025) - Nginx serves GE Data Docs:
http://localhost:8081
Mounts
./airflow/dags -> /opt/airflow/dags./data_pipeline -> /opt/airflow/dbt./great_expectations -> /opt/airflow/great_expectations
dbt_layered_pipeline(flagship):dbt_deps → [bronze.run] → [bronze.test] → [silver.run] → [silver.test] → [gold.run] → [gold.test] → publish Dataset dbt://gold/fct_orders
dbt_daily_pipeline: single‑line pipeline using TaskGroupsdbt_daily: minimal smoke (dbt_deps → dbt_run → dbt_test)dbt_gold_consumer: subscribes todbt://gold/fct_ordersand runs downstream (tag:downstream)quality_checks: runs GE checkpointdaily_metrics_chk, updates Data Docssmtp_smoke: SMTP smoke test (requiresALERT_EMAIL)
TaskGroup helpers live in airflow/dags/lib/dbt_groups.py.
- Local Data Docs:
http://localhost:8081 - DAG:
quality_checksrunsdaily_metrics_chkand callsUpdateDataDocsAction - Airflow task extra link rewrites container
file://...to hosthttp://localhost:8081/... - Prune historical GE outputs (keep last N):
make prune_ge(default keep 5) ormake prune_ge PRUNE_KEEP=10
- Dev default: Mailpit UI at
http://localhost:8025, SMTPmailpit:1025(no auth/TLS) - Switch to real SMTP (example: Gmail)
- Airflow UI → Admin → Connections → +
- Conn Id:
smtp_gmail, Type:smtp, Host:smtp.gmail.com, Port:587 - Login: your address; Password: App Password
- Extra:
{ "starttls": true }
- Conn Id:
- Or via CLI and then update
smtp_smoketo use yourconn_id:
- Airflow UI → Admin → Connections → +
docker compose exec -T webserver \
airflow connections add smtp_gmail \
--conn-type smtp --conn-host smtp.gmail.com --conn-port 587 \
--conn-login YOU@gmail.com --conn-password 'APP_PASSWORD' \
--conn-extra '{"starttls": true}'
make envcreates a local venv for dbt, loadsairflow/.env, and runs a quick check- Useful targets:
make dbt-debug/make dbt-parse/make dbt-lsmake dbt-run-bronze/make dbt-run-silver/make dbt-run-goldmake dbt-build(full build + tests)make dbt-docs(generate + serve docs locally)
make helplist available commandsmake pscontainer statusmake logsfollow webserver + scheduler logsmake healthweb/scheduler health checkmake downstop containers (keep volumes)make destroystop and delete volumes (dangerous)
- All dbt tasks use Pool
dbt(size 1) to serialize CLI runs - DAGs use
max_active_runs=1and 1 retry by default - Keep deps consistent with
dbt deps; do not deletetarget/ordbt_packages/in tasks
- Web health:
curl -fsS http://localhost:8080/health; restart Docker andmake up - No Snowflake creds: dbt tasks are ShortCircuited to avoid noisy failures
- Red dots in UI:
make clear-failedormake clear-failed-hard - GE provider missing: installed via
airflow/requirements.txt - dbt not found: container PATH includes
~/.local/bin; for local dev runmake env
airflow/.envis gitignored — do not commit real credentials- For production, bake dependencies into images and use a Secret Manager (Vault/KMS/Secrets Manager)
MIT — see LICENSE at the repo root.



