Where is laundered, sanctions-busting gold hiding in global trade flows — and can a model spot it from public data alone?
Shadow Gold combines UN Comtrade bilateral trade, World Gold Council mine production, IMF reserves, FATF AML ratings, LBMA refinery disclosures, and GDELT news into a single country-year panel, then trains an XGBoost classifier plus a FinBERT narrative layer to estimate each country's exposure to the "shadow gold chain" — the route that moves artisanal, conflict, or sanctions-busting gold through middleman hubs (Dubai, Turkey, Hong Kong, Switzerland, Singapore) before it enters official reserves.
The core thesis: the recycled-gold HS code loophole lets a country "export" more recycled gold in a year than any plausible domestic base could produce. Combined with routing patterns, sanctions exposure, and narrative signals from news, that impossibility is a laundering fingerprint.
| # | Country | Shadow-chain score (v3) | ML probability | Quant signals | Narrative risk |
|---|---|---|---|---|---|
| 1 | United Arab Emirates | 67.6 | 0.89 | 3 | 100.0 |
| 2 | Tanzania | 35.3 | 0.82 | 3 | 27.8 |
| 3 | Russia | 32.7 | 0.45 | 4 | 39.1 |
| 4 | Turkey | 32.6 | 0.86 | 3 | 36.0 |
| 5 | India | 32.0 | 0.83 | 3 | 7.7 |
The top-5 is almost entirely hub-and-spoke: UAE and Turkey are the transit hubs, Tanzania / Russia / India are major origins that either produce far more recycled gold than their domestic base allows or route a suspicious share of exports through known middleman countries.
The FinBERT narrative layer independently surfaced the exact routes economists write about: Mali → UAE, DRC → UAE, Sudan → UAE, Uganda → UAE, Russia → UAE / Russia → Turkey, Myanmar → Thailand. That the narrative and the quantitative model converge on the same edges without being jointly trained is the strongest evidence the signal is real.
Model scorecard — 5-fold stratified CV on 190 countries, 79 features, 21% positive rate:
| Metric | CV mean | Out-of-fold |
|---|---|---|
| ROC-AUC | 0.865 ± 0.045 | 0.811 |
| PR-AUC | 0.645 ± 0.103 | 0.544 |
| F1 | 0.557 ± 0.039 | 0.556 |
| Precision | 0.525 | 0.500 |
| Recall | 0.625 | 0.625 |
All five verification checks pass: no leakage, no NaN/Inf in probabilities, OOF ROC-AUC ≥ 0.75, every known hub (ARE 0.89, TUR 0.86, CHE 0.87, SGP 0.89, HKG 0.74) scores above the calibration floor, and SHAP coverage hits all five signal families (recycled, sanctions, routing, AML, de-dollarization).
Top model drivers (global mean-|SHAP|):
gold_tonnes— size of declared reservesyears_impossible_to_date— years in which recycled exports exceeded the possible domestic basehub_export_t— tonnes routed through Dubai / Turkeyrecycled_impossibility_ratio— the core laundering fingerprintfatf_status_grey— AML regime weaknessdedollarization_intensity— crossover signal from the companion project
flowchart LR
subgraph Sources
A1[UN Comtrade<br/>HS 7108 / 7112]
A2[World Gold Council]
A3[IMF reserves]
A4[FATF AML]
A5[LBMA refineries]
A6[GDELT news]
A7[Project 1<br/>sanctions + UN votes]
end
Sources --> P1[Phase 1<br/>clean + PostgreSQL load]
P1 --> P2[Phase 2<br/>recycled impossibility<br/>hub routing<br/>composite v2]
P2 --> P3[Phase 3<br/>36 features<br/>5 signal families]
P3 --> P4[Phase 4<br/>XGBoost + SHAP<br/>CV ROC-AUC 0.865]
A6 --> P5[Phase 5<br/>FinBERT narrative<br/>route extraction]
P4 --> V3[shadow_chain_score_v3]
P5 --> V3
V3 --> P6[Phase 6<br/>Streamlit dashboard]
Five signal families feed the model, each a distinct theory of how shadow-chain gold looks in public data:
- Recycled impossibility — recycled-gold exports exceed what a country's mines and historical stocks could plausibly yield.
- Sanctions exposure — active OFAC/EU/UN sanctions, persistence over time, intensity.
- Hub routing — share of gold exports that land in Dubai / Turkey before the final buyer. Double-hop paths flagged separately.
- AML weakness — FATF grey/black-list status.
- De-dollarization intensity — from the companion Project 1: countries aggressively swapping USD reserves for gold and voting against the US bloc at the UN.
Shadow_Gold/
├── raw_data/ # unmodified source pulls (Comtrade, WGC, FATF, LBMA, GDELT)
├── cleaned_data/ # Phase 1-5 outputs (CSV) - safe to commit, ~4.5 MB
│ ├── phase2/ # anomaly detection + composite v2
│ ├── phase3/ # feature matrix + diagnostics
│ ├── phase4/ # predictions + SHAP
│ └── phase5/ # FinBERT narrative + route mining
├── scripts/ # one numbered script per step
│ ├── 01..09_*.py # Phase 1 - clean + load
│ ├── phase2/ # Phase 2 - EDA + composite scoring
│ ├── phase3/ # Phase 3 - feature engineering
│ ├── phase4/ # Phase 4 - XGBoost + SHAP
│ ├── phase5/ # Phase 5 - FinBERT narrative
│ └── phase6/dashboard/ # Streamlit app
├── models/ # xgb_shadow_chain.json
├── docker/ # Dockerfiles + pipeline orchestration
├── docker-compose.yml # postgres + pipeline + dashboard
├── Makefile # common shortcuts
├── requirements.txt # full pipeline deps
├── .env.example # docker-compose env template
└── Shadow_Gold_Phase*.docx # per-phase writeups
Requirements: Docker Desktop / Docker Engine with Compose v2.
git clone https://github.com/sathwikarr/Shadow_Gold.git
cd Shadow_Gold
cp .env.example .env # edit if you want to change the DB password
# Build everything and start Postgres + dashboard; the pipeline runs once then exits.
docker compose up --build
# Dashboard: http://localhost:8501The pipeline service executes Phase 1 → Phase 5 against the Postgres service, populating all 12 tables and writing Phase 2-5 outputs back to cleaned_data/ (bind-mounted). The dashboard service then serves the Streamlit UI off Postgres, with a CSV fallback if Postgres is disabled.
Makefile shortcuts:
make up # build + start the full stack in the background
make pipeline # run the pipeline once, stream logs, exit on completion
make dashboard # only postgres + dashboard (skip pipeline, use committed CSVs)
make psql # psql shell against the DB
make logs # tail all service logs
make down # stop everything (keeps the postgres volume)
make nuke # stop + drop containers + delete pgdata volumePHASES=1,2 docker compose up --build pipelineIf you just want the UI off the committed CSVs:
POSTGRES_ENABLED=false make dashboardAssumes PostgreSQL 14+ running on localhost:5432 with a writable shadow_gold database.
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt
# Phase 1 - clean + load
python scripts/01_clean_comtrade.py
python scripts/02_clean_bilateral.py
# ... through 07
python scripts/08_setup_database.py
python scripts/09_load_to_postgres.py
# Phase 2-5 scripts run in numbered order from each scripts/phaseN/ folder.
# Phase 6 dashboard:
cd scripts/phase6/dashboard && streamlit run app.pyThe DB scripts respect POSTGRES_HOST, POSTGRES_PORT, POSTGRES_DB, POSTGRES_USER, POSTGRES_PASSWORD. Defaults match a typical macOS Homebrew Postgres install.
| Source | What it gives | Notes |
|---|---|---|
| UN Comtrade | Bilateral HS-7108 / HS-7112 trade by reporter, partner, year | Unit values reveal under-invoicing |
| World Gold Council | Mine production by country-year | Upper bound on plausible domestic base |
| IMF IFS | Official reserve changes in tonnes | Cross-check for unexplained accumulation |
| FATF | Grey-list / black-list status | AML regime proxy |
| LBMA | Country-of-origin disclosures | Refinery transparency layer |
| GDELT | Global news article index, 2015-present | FinBERT narrative signal |
| Project 1 (companion) | OFAC sanctions, UN voting alignment, reserve composition | De-dollarization features |
Licenses for each are upstream; consult the source before redistributing the raw pulls.
This is an open-source research project using public data. It is not an allegation against any country. A high score means the public-data pattern is consistent with the shadow-chain hypothesis; it is not evidence of wrongdoing. In particular:
- GDELT synthetic expansion. Phase 5 expands a 136-row GDELT seed with a synthetic corpus that mimics realistic headline phrasing for the shadow-chain narratives. The expanded corpus and the route-mining output are reproducible from the scripts; downstream metrics are deterministic. A production deployment should swap this for a real GDELT DOC-API pull.
- FinBERT sandbox fallback.
33_finbert_score.pytries to loadProsusAI/finbertfrom Hugging Face and falls back to a deterministic financial-sentiment lexicon if the model download is blocked. Both paths emit the same output schema; dashboard results reproduce either way. - Class imbalance. 40 positives in 190 countries. ROC-AUC is strong but PR-AUC (0.645) is the fairer read. Precision at the default threshold is ~0.5.
- Known-hub calibration. The positive labels include Dubai / Turkey / Switzerland / Singapore / Hong Kong explicitly, so the model is expected to score those high. The real test is which non-hub countries rise near the top; Tanzania, Russia, India, Zimbabwe, and Nigeria all did.
- No panel cross-fitting. Scores are cross-validated across countries, not across years-within-country; adding a time split is future work.
Shadow Gold is the second half of a two-project body of work. Project 1 built the de-dollarization tracker — sanctions exposure, UN voting blocs, reserve diversification away from USD into gold. Several of those features (dedollarization_intensity, un_alignment_score, wgc_gold_pct_reserves) appear directly in Shadow Gold's feature matrix, and the dashboard links the two scores so you can see which de-dollarizers are also suspected shadow-chain participants.
MIT. See LICENSE.
