Automates attribution-graph analysis via probe prompting: circuit-trace a prompt, auto-generate concept probes, profile feature activations, cluster supernodes.
-
Updated
Apr 24, 2026 - Python
Automates attribution-graph analysis via probe prompting: circuit-trace a prompt, auto-generate concept probes, profile feature activations, cluster supernodes.
Reproducible case study of pitfalls in contrastive SAE discovery and steering for "consciousness" features (GemmaScope SAEs, Gemma 3 4B/12B): reconstruction confound, delta-steering fix, matched controls, and false-positive scaling law vs dataset size.
Add a description, image, and links to the neuronpedia topic page so that developers can more easily learn about it.
To associate your repository with the neuronpedia topic, visit your repo's landing page and select "manage topics."