End-to-end tooling for creating differentially private tabular datasets with PrivSyn and AIM.
- Upload & configure: pick a synthesis method, dataset name, privacy budget, and either upload your CSV or load the bundled
adultsample. - Confirm metadata: skim the summary card, then fine-tune per-column settings (categorical domains, numeric bounds, binning choices).
- Review results: download the synthesized CSV, preview the first rows, and inspect metadata-aware evaluation metrics.
These screenshots were generated with
python scripts/capture_ui_screenshots.py. The script drives a Playwright browser through the sample flow and can be rerun whenever the UI changes.
For a deeper walkthrough of each step (UI wiring, API payloads, evaluation metrics), check out docs/frontend.md and docs/backend.md.
- Unified web app. Upload a dataset, review inferred metadata, tweak categorical/numerical encodings, and download synthesized data in minutes.
- Two synthesis engines. PrivSyn (rho-CDP, iterative marginal updates) and AIM (adaptive measurement selection) exposed behind the same API.
- Notebook-friendly modules. Reusable preprocessing (PrivTree, DAWA), marginal selection, and synthesis utilities under
method/synthesis/privsynandmethod/preprocess_common. - Coverage-first test suite. 100+ pytest cases plus Playwright E2E flows keep the UI/back-end contract in check.
- MkDocs documentation. Browse the Markdown guides with
mkdocs serve(install viapip install mkdocs) and open http://127.0.0.1:8000/ for a structured site. - Built on published research. The core algorithms follow PrivSyn: Differentially Private Data Synthesis (USENIX Security 2021) and The AIM Mechanism for Differentially Private Synthetic Data (PVLDB 2022).
git clone https://github.com/vvv214/privsyn-tabular.git
cd privsyn-tabular
python3 -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
python3 -m pip install -r requirements.txtuvicorn web_app.main:app --reload --port 8001The API is now live at http://127.0.0.1:8001. Docs: http://127.0.0.1:8001/docs.
cd frontend
npm install
npm run dev -- --port 5174Visit http://127.0.0.1:5174. The frontend defaults to http://127.0.0.1:8001 unless VITE_API_BASE_URL is set.
- Launch both servers.
- Open the app and click Load Sample (loads
adult.csv.zip). - Confirm metadata (tweak domains if desired) and click Confirm & Synthesize.
- Download the resulting CSV or explore the preview table.
- Want a quick preview without running anything? Check
docs/sample_output/for a tiny synthetic CSV and matching metrics JSON.
Note: The backend now treats every column in the uploaded table as part of the feature space—no target column is required or stripped automatically.
We keep the screenshots in docs/media/ up to date via Playwright:
python3 scripts/capture_ui_screenshots.pyThe script starts both dev servers (or reuses them if already running), walks through the sample adult dataset, and writes:
docs/media/ui_form.pngdocs/media/ui_metadata_overview.pngdocs/media/ui_metadata_column.pngdocs/media/ui_results.png
Make sure Playwright browsers are installed once with
playwright install, and no other process is already bound to ports8001/5174.
All diagrams live in docs/media/; replace or expand them to match your deployment.
Shows how the user-facing frontend interacts with the FastAPI backend: upload → metadata inference → confirmation → synthesis → download.
Highlights the preprocessing stage (metadata normalisation), the PrivSyn core (marginal selection + GUM), and post-processing steps (storage/evaluation).
Summarises AIM’s adaptive measurement loop: initialise workload → iteratively measure queries with DP noise → update the model → generate synthetic data.
Illustrates the main request/response boundaries between frontend, backend, synthesis engines, and temporary storage.
python3 -m pip install -r requirements.txt pytest-cov– Python dependencies plus the coverage plugin used by CI.npm install --prefix frontend– React/Vite dependencies for component tests.python3 -m playwright install– one-time browser download for Playwright end-to-end flows.
pytest -q– run the full Python suite with terse output.pytest -q -m "not slow"– skip expensive markers while iterating.pytest --cov=. --cov-report=term– collect coverage locally (requirespytest-cov).pytest -q -W error::DeprecationWarning -W error::FutureWarning -k "web_app or method/preprocess_common or method/synthesis/privsyn"– enforce warning hygiene on core modules.
# Focus on metadata / preprocessing helpers
pytest -q test/test_metadata_overrides.py test/test_preprocessing.py test/test_data_inference.py
# Exercise the PrivSyn API contract only
pytest -q -k privsyn# Component tests (Vitest)
cd frontend
npm test -- --run
# Browser E2E flows (requires backend + frontend running)
E2E=1 pytest -q -k e2eTip: set
PLAYWRIGHT_HEADLESS=0if you want to watch the E2E browser session while debugging.
- Local preview:
pip install mkdocs && mkdocs serveexposes the docs at http://127.0.0.1:8000/ with search + navigation. - CI build:
.github/workflows/deploy-docs.ymlpublishes MkDocs output to thegh-pagesbranch on every push tomain. - Vercel publish: point a secondary Vercel project at
gh-pages(output dir.) and updatevercel.jsonrewrites to proxy/docs/*to that URL. - Optional hook: store
VERCEL_DOCS_DEPLOY_HOOK_URLas a secret so the GitHub Action can trigger a redeploy after pushing docs. - Diagrams live under
docs/media/(sequence.svg,privsyn.svg,aim.svg,flow.svg) and drive the architecture section above—replace them with your own exports as needed.
| Path | Description |
|---|---|
frontend/ |
React + Vite SPA (upload flow, metadata editors, results view). |
web_app/ |
FastAPI backend: metadata inference, synthesis orchestration, evaluation endpoints. |
method/api/ |
Unified interface (SynthRegistry, PrivacySpec, RunConfig) that normalises every synthesis engine. |
method/synthesis/privsyn/ |
PrivSyn implementation (marginal selection, GUM synthesis, helpers). |
method/synthesis/AIM/ |
AIM adapter and reference engine wired into the shared registry. |
method/preprocess_common/ |
Discretisers (PrivTree, DAWA) and preprocessing pipelines shared across methods. |
test/ |
Pytest suite; test/e2e/ houses Playwright browser flows. |
sample_data/ |
Fixture datasets for local trials (adult.csv.zip, etc.). |
scripts/ |
Automation helpers for booting servers, benchmarks, and screenshot capture. |
web_app/data_inference.pyinfers column metadata and returns draftdomain.json/info.jsonpayloads to the UI.web_app/synthesis_service.pyrebuilds tabular data, routes execution to the selected engine, and persists session artefacts.web_app/data_comparison.pycomputes evaluation metrics and surfaces them via/evaluateand the results screen.
- PrivSyn lives in
method/synthesis/privsyn/privsyn.py, combining marginal selection with the GUM generator. - AIM’s adapter (
method/synthesis/AIM/adapter.py) maps the unifiedprepare/runhooks onto the original workflow so the backend treats both engines identically.
- PrivTree (
method/preprocess_common/privtree.py) – noisy hierarchical binning with inverse transforms. - DAWA (
method/preprocess_common/dawa.py) – adaptive workload partitioning used within AIM. - Deterministic tests (
test/test_privtree.py,test/test_dawa.py) ensure these helpers remain stable.
./scripts/start_backend.shand./scripts/start_frontend.shwrap the dev commands; passprodto emit gunicorn builds and copy static assets.- Cloud Run / container deploys: build via Docker, configure
VITE_API_BASE_URLfor the frontend, and extendallow_originsinweb_app/main.pyfor any new domains. - Free Cloud Run tiers cap CPU/memory/time—large datasets or AIM runs can hit those limits. If you expect heavier jobs, download the repo and run the backend locally instead.
- CORS extras: set
ADDITIONAL_CORS_ORIGINS(comma-separated) to whitelist preview/prod frontends without code changes. - Temp storage: override
PRIVSYN_DATA_ROOT/PRIVSYN_EXP_ROOTif you need deterministic artefact paths (CI caches, shared volumes, etc.).
- Fork & branch (
git checkout -b feat/awesome-improvement). - Keep diffs focused, add/update tests, and run
pytest -qbefore pushing. - Follow Conventional Commits (
feat:,fix:,chore(scope):, etc.). - Attach screenshots or GIFs for UI-facing changes (store assets under
docs/media/).
PrivSyn is released under the MIT License.
Need help? Open an issue or ping us on GitHub Discussions – we love hearing about new differential privacy use-cases!


