inspect-ai

Star

Here are 7 public repositories matching this topic...

METR / hawk

Star

Run Inspect AI evals in the cloud

aws llm evals inspect-ai

Updated Apr 22, 2026
PLpgSQL

jmagly / matric-eval

Sponsor

Star

Consolidated model evaluation framework for LLM benchmarking with Ollama

python benchmark machine-learning ai evaluation llm ollama inspect-ai

Updated Apr 2, 2026
Python

edward-lcl / factor-ut-untrusted-decomposer

Star

Factor(UT): Controlling Untrusted AI Decomposers — AAAI 2026 workshop paper on monitoring untrusted decomposition in code generation workflows.

ai-safety deception-detection aaai-2026 inspect-ai bigcodebench

Updated Mar 26, 2026
Jupyter Notebook

jacobemmerson / certificate

Star

EuroSafeAI's AI safety certificiation pipeline.

ai-safety ai-governance inspect-ai ai-safety-research ai-safety-philosophy

Updated Apr 20, 2026
Python

pwenker / inspect-optimize

Star

Automated prompt optimization for Inspect AI via structured failure analysis

ai-safety ai-evaluation prompt-optimization inspect-ai

Updated Feb 4, 2026
Python

jang1563 / SciReplicBench

Star

PaperBench-style Inspect AI benchmark for computational reproducibility in computational biology.

benchmark computational-biology reproducibility llm-evaluation agent-evaluation inspect-ai paperbench

Updated Apr 14, 2026
Python

lucharo / eval-claude

Star

Run inspect_ai evals via Claude Code CLI — use your Claude subscription instead of per-token API billing

benchmarks ai-safety claude llm-evaluation claude-code inspect-ai

Updated Apr 13, 2026
Python

Improve this page

Add a description, image, and links to the inspect-ai topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the inspect-ai topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inspect-ai

Here are 7 public repositories matching this topic...

METR / hawk

jmagly / matric-eval

edward-lcl / factor-ut-untrusted-decomposer

jacobemmerson / certificate

pwenker / inspect-optimize

jang1563 / SciReplicBench

lucharo / eval-claude

Improve this page

Add this topic to your repo