Andreas Haupt
Research Team
There’s such a long history on running randomized controlled trials in companies, and now everyone is evaluating AI. It felt too clear that we need to evaluate AI with the humans. That led to centaur evaluations.
Areas of Research
Welfare, Improvability, and Variance: A Principal-Agent Approach to Optimal Benchmark Item Aggregation
Authors:
Andreas Haupt, Justin Hartenstein, Anka Reuel, Mykel Kochenderfer, Sanmi Koyejo