Top performing AI systems in coding, math, and language-based knowledge tests

Coding performance is measured with the APPS benchmark; math performance with the MATH benchmark; and language-based knowledge tests with theMMLU benchmark.

Jul 26, 2019Dec 11, 2024Jun 4, 2021Oct 17, 2022Feb 29, 202450%Math and knowledge tests: approximate score of expert humanMath and knowledge tests: approximate score of expert humanMath: average score of 5 university studentsMath: average score of 5 university studentsKnowledge tests: average score of non-expert humansKnowledge tests: average score of non-expert humansMath
Top performing AI systems in coding, math, and language-based knowledge tests

Interactive visualization requires JavaScript