Scientific justification for puzzle design choices

- Research reasoning puzzle best practices (ARC AGI, Raven's Progressive Matrices)
- Ensure community acceptance of evaluation approach
   - Quick experiments to understand the failure modes of current SoTA models with initial puzzle designs
   - Collaboration with researchers and orgs for advise, feedback, and design contribution