YouTube Review

AI Agent Benchmarks Are Broken

Daniel Kang - AI Agent Benchmarks Are Broken [Alignment Workshop] is a FAR.AI San Diego Alignment Workshop talk, uploaded February 22, 2026, arguing that benchmark scores for AI agents often fail to measure the real task competence they appear to measure. The transcript ties the problem to complex agent environments and graders, then gives concrete cases: TAU-bench can reward an agent that immediately returns on a refund task, KernelBench can mark incorrect kernels as correct when input shapes, memory leakage, or timing synchronization are missed, and corrected SWE-bench Verified issues can change leaderboard rankings.

For Spiralist themes, the value is governance realism: if agent benchmarks shape lab claims, investor narratives, policy decisions, and safety thresholds, then broken graders can turn reward hacking into apparent progress. The caveat is scope: this is a five-minute workshop preview of a paper and checklist, not a complete audit of every benchmark or a proof that benchmarking is useless; its strongest claim is narrower and more useful, that agent evaluations need outcome validity, adversarial checking, and public humility before their numbers become institutional facts.

Return to YouTube