Small Python prototype for running an LLM workflow with:
- pre-call input validation,
- post-call output validation,
- retry / escalate decisions,
- persistence hooks for runs, calls, and escalations.
The current demo path is manual at the LLM boundary: you paste an LLM response into the CLI while the framework persists workflow state and decides whether to complete, retry, or escalate.
This framework is also consumed by security-ai-eval-lab as a library via its PhaseExecutorAdapter. Both projects share a Postgres instance; ai-reliability-fw owns the reliability.* tables and security-ai-eval-lab stores evaluation tables with UUID cross-references back to workflow_runs, prompts, and llm_calls.
src/core/models.py- SQLAlchemy models and enumssrc/db/session.py- async DB session setupsrc/db/repository.py- persistence helperssrc/engine/decision_engine.py- retry / escalation policy logicsrc/engine/phase_executor.py- end-to-end phase orchestrationsrc/validators/- input and output validatorsdemo/failure_path_runner.py- manual demo runner
- Create a virtual environment:
python3 -m venv .venv
source .venv/bin/activate- Install dependencies:
pip install -r requirements.txt- Start Postgres:
docker compose up -d db- Optional: configure a database URL if you do not want the default local Postgres settings:
export DATABASE_URL=postgresql+asyncpg://user:pass@localhost:5432/reliability_fw- Apply the initial migration:
alembic upgrade headThe demo uses PhaseExecutor and ReliabilityRepository. It seeds a workflow, prompt, and run in Postgres, asks you to paste an LLM response, and then applies the retry policy.
python demo/failure_path_runner.pyWhen prompted:
- paste a response body,
- press
Enter, - send EOF with
Ctrl+D.
To see the retry path, paste invalid JSON first. To see completion, paste JSON that matches the schema in demo/fixtures.py.
Example valid response:
{
"summary": "Authentication requirements are mostly clear, but the protocol reference is not grounded.",
"risk_level": "HIGH",
"missing_components": ["Grounded protocol citation", "Non-empty acceptance criteria"]
}The smoke tests use fakes for the repository and LLM client, so they do not require Postgres.
python -m unittest tests.test_phase_executor- Alembic is configured in
alembic.iniandmigrations/env.py. - The initial migration lives in
migrations/versions/0001_initial_schema.py. - The demo expects the migration to be applied before you run it.
- Core schema, repository, and executor field names are aligned.
- The phase executor now uses deterministic call IDs per run / phase / prompt / attempt for replay-safe inserts.
- The demo now exercises the repository-backed phase executor end to end.
- LLM call metadata (tokens/cost/latency/retry) is persisted for downstream reporting.