Popular repositories Loading
-
sandbagging_auditing_games
sandbagging_auditing_games PublicThis repository accompanies the research paper "Sandbagging Auditing Games" on detecting sandbagging in frontier AI systems. We provide access to the model organisms used in the paper and tools for…
Repositories
Showing 1 of 1 repositories
- sandbagging_auditing_games Public
This repository accompanies the research paper "Sandbagging Auditing Games" on detecting sandbagging in frontier AI systems. We provide access to the model organisms used in the paper and tools for interacting with them, enabling AI safety researchers to reproduce our results and develop novel sandbagging detection techniques.
AI-Safety-Institute/sandbagging_auditing_games’s past year of commit activity
Top languages
Loading…
Most used topics
Loading…