AI Reliability Framework

Small Python prototype for running an LLM workflow with:

pre-call input validation,
post-call output validation,
retry / escalate decisions,
persistence hooks for runs, calls, and escalations.

The current demo path is manual at the LLM boundary: you paste an LLM response into the CLI while the framework persists workflow state and decides whether to complete, retry, or escalate.

This framework is also consumed by security-ai-eval-lab as a library via its PhaseExecutorAdapter. Both projects share a Postgres instance; ai-reliability-fw owns the reliability.* tables and security-ai-eval-lab stores evaluation tables with UUID cross-references back to workflow_runs, prompts, and llm_calls.

Repository layout

src/core/models.py - SQLAlchemy models and enums
src/db/session.py - async DB session setup
src/db/repository.py - persistence helpers
src/engine/decision_engine.py - retry / escalation policy logic
src/engine/phase_executor.py - end-to-end phase orchestration
src/validators/ - input and output validators
demo/failure_path_runner.py - manual demo runner

Bootstrap

Create a virtual environment:

python3 -m venv .venv
source .venv/bin/activate

Install dependencies:

pip install -r requirements.txt

Start Postgres:

docker compose up -d db

Optional: configure a database URL if you do not want the default local Postgres settings:

export DATABASE_URL=postgresql+asyncpg://user:pass@localhost:5432/reliability_fw

Apply the initial migration:

alembic upgrade head

Run the demo

The demo uses PhaseExecutor and ReliabilityRepository. It seeds a workflow, prompt, and run in Postgres, asks you to paste an LLM response, and then applies the retry policy.

python demo/failure_path_runner.py

When prompted:

paste a response body,
press Enter,
send EOF with Ctrl+D.

To see the retry path, paste invalid JSON first. To see completion, paste JSON that matches the schema in demo/fixtures.py.

Example valid response:

{
  "summary": "Authentication requirements are mostly clear, but the protocol reference is not grounded.",
  "risk_level": "HIGH",
  "missing_components": ["Grounded protocol citation", "Non-empty acceptance criteria"]
}

Run tests

The smoke tests use fakes for the repository and LLM client, so they do not require Postgres.

python -m unittest tests.test_phase_executor

Database notes

Alembic is configured in alembic.ini and migrations/env.py.
The initial migration lives in migrations/versions/0001_initial_schema.py.
The demo expects the migration to be applied before you run it.

Current status

Core schema, repository, and executor field names are aligned.
The phase executor now uses deterministic call IDs per run / phase / prompt / attempt for replay-safe inserts.
The demo now exercises the repository-backed phase executor end to end.
LLM call metadata (tokens/cost/latency/retry) is persisted for downstream reporting.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
demo		demo
docs		docs
migrations		migrations
src		src
tests		tests
.env.sample		.env.sample
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
MANIFEST.in		MANIFEST.in
README.md		README.md
TODO.md		TODO.md
alembic.ini		alembic.ini
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Reliability Framework

Repository layout

Bootstrap

Run the demo

Run tests

Database notes

Current status

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Reliability Framework

Repository layout

Bootstrap

Run the demo

Run tests

Database notes

Current status

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages