AGI R&D Division
A standardised cognitive proficiency framework for AI code-generation agents. Structured test harness: deterministic scenarios, trap conditions, and scored delivery across two evaluation tracks. Track A results: top frontier models scored at ceiling. Track B (Agentic, April 2026): 14 models evaluated on autonomous delivery β first study to isolate delivery-surface behaviour as a primary outcome variable.

