## Problem Only 3 models tested; hard to generalize. ## Tasks - [ ] Add Mistral, Phi, Gemma to test matrix - [ ] Use identical attack set and detector config - [ ] Document hardware/quantization for reproducibility ## Acceptance Criteria - 6+ models with comparable results
Problem
Only 3 models tested; hard to generalize.
Tasks
Acceptance Criteria