- Research reasoning puzzle best practices (ARC AGI, Raven's Progressive Matrices) - Ensure community acceptance of evaluation approach - Quick experiments to understand the failure modes of current SoTA models with initial puzzle designs - Collaboration with researchers and orgs for advise, feedback, and design contribution