Artificial intelligence: Performance on knowledge tests vs. training
computation

Performance on knowledge tests is measured with the MMLU benchmark, here with 5-shot learning, which gauges a model’saccuracy after receiving only five examples for each task. Training computation is measured in total petaFLOP, which is 10¹⁵floating-point operations.

Training computation (petaFLOP)100,0001 million10 million100 million1 billion10 billionKnowledge tests (MMLU)50%Expert human performanceExpert human performanceGPT-2 (finetuned)GPT-2 (finetuned)GPT-3 (davinci)GPT-3 (davinci)Gopher (0.4B)Gopher (0.4B)Gopher (7B)Gopher (7B)OPTOPTGPT-4GPT-4Developer of AI systemBloombergEleutherGoogle DeepMindGoogle ResearchHuggingFace, BigScienceMeta AIOpenAITsinghua KEG20192023

Select systems

Sort by
Name
    All systems
    Artificial intelligence: Performance on knowledge tests vs. training computation

    Interactive visualization requires JavaScript