AI Should Not Be An Imitation Game: Centaur Evaluations
- Economics of transformative ai
- New measures of the economy
- Working Paper
Benchmarks and evaluations are central to machine learning methodology and direct research in the field. Current evaluations commonly test systems in the absence of humans. This position paper argues that the machine learning community should increasingly use centaur evaluations, in which humans and AI jointly solve tasks.
Centaur Evaluations refocus machine learning development toward human augmentation instead of human replacement, they allow for direct evaluation of human-centered desiderata, such as interpretability and helpfulness, and they can be more challenging and realistic than existing evaluations. By shifting the focus from automation toward collaboration between humans and AI, centaur evaluations can drive progress toward more effective and human-augmenting machine learning systems.
Authors
Andreas Haupt
Postdoctoral Fellow
Andy is a human-centered AI postdoctoral fellow jointly appointed in the Economics and Computer Science Departments.
He is very interested in the micro interactions of humans (in particular non-experts) with AI systems, and the implications for privacy, oversight, and consumer steering. At the lab, he specializes on evaluating humans and AI systems together. In his work, he develops and applies methods of microeconomic theory, structural econometrics, and reinforcement learning. He holds a Ph.D. from MIT in February 2025 with a committee evenly split between Economics and Computer Science. Prior to that, he completed two master’s degrees at the University of Bonn—first in Mathematics (2017) and then in Economics (2018), with distinction. He has worked on competition enforcement for the European Commission’s Directorate-General for Competition and the U.S. Federal Trade Commission, and taught high school mathematics and computer science in Germany before his Ph.D.
Read more
Erik Brynjolfsson
Jerry Yang and Akiko Yamazaki Professor
Erik Brynjolfsson is one of the world’s leading experts on the economics of technology and artificial intelligence. He is the Jerry Yang and Akiko Yamazaki Professor and Senior Fellow at the Stanford Institute for Human-Centered AI (HAI), and Director of the Stanford Digital Economy Lab. He also is the Ralph Landau Senior Fellow at the Stanford Institute for Economic Policy Research (SIEPR), Professor by Courtesy at the Stanford Graduate School of Business and Stanford Department of Economics, and a Research Associate at the National Bureau of Economic Research (NBER).
One of the most-cited authors on the economics of information, Brynjolfsson was among the first researchers to measure productivity contributions of IT and the complementary role of organizational capital and other intangibles.
Read more