Data

Cumulative number of large-scale AI models by domain

See all data and research on:

What you should know about this indicator

  • Game systems are specifically designed for games and excel in understanding and strategizing gameplay. For instance, AlphaGo, developed by DeepMind, defeated the world champion in the game of Go. Such systems use complex algorithms to compete effectively, even against skilled human players.
  • Language systems are tailored to process language, focusing on understanding, translating, and interacting with human languages. Examples include chatbots, machine translation tools like Google Translate, and sentiment analysis algorithms that can detect emotions in text.
  • Multimodal systems are artificial intelligence frameworks that integrate and interpret more than one type of data input, such as text, images, and audio. ChatGPT-4 is an example of a multimodal system, as it has the capability to process and generate responses based on both textual and visual inputs.
  • Vision systems focus on processing visual information, playing a pivotal role in image recognition and related areas. For example, Facebook's photo tagging system uses vision AI to identify faces.
  • Speech systems are dedicated to handling spoken language, serving as the backbone of voice assistants and similar applications. They recognize, interpret, and generate spoken language to interact with users.
  • Biology systems analyze biological data and simulate biological processes, aiding in drug discovery and genetic research.
  • Image generation systems create visual content from text descriptions or other inputs, used in graphic design and content creation.

A foreign key field categorizing the system’s domain of machine learning. This field links to the ML Domains table, and domains are selected from the options in that table.

Cumulative number of large-scale AI models by domain
Describes the specific area, application, or field in which a large-scale AI model is designed to operate. An AI system can operate in more than one domain, thus contributing to the count for multiple domains. The 2024 data is incomplete and was last updated 20 June 2024.
Source
Epoch (2024) – with major processing by Our World in Data
Last updated
June 19, 2024
Next expected update
December 2024
Date range
2017–2024
Unit
AI systems

Sources and processing

This data is based on the following sources

A dataset that tracks compute-intensive AI models, with training compute over 10²³ floating point operations (FLOP). This corresponds to training costs of hundreds of thousands of dollars or more. 

To identify compute-intensive AI models, the team at Epoch AI used various resources, estimating compute when not directly reported. They included benchmarks and repositories, such as Papers With Code and Hugging Face, to find models exceeding 10²³ FLOP. They also explored non-English media and specific leaderboards, particularly focusing on Chinese sources.

Additionally, they examined blog posts, press releases from major labs, and scholarly literature to track new models. A separate table was created for models with unconfirmed but plausible compute levels. Despite thorough methods, proprietary and secretive models may have been missed.

Retrieved on
June 19, 2024
Citation
This is the citation of the original data obtained from the source, prior to any processing or adaptation by Our World in Data. To cite data downloaded from this page, please use the suggested citation given in Reuse This Work below.
Robi Rahman, David Owen and Josh You (2024), "Tracking Compute-Intensive AI Models". Published online at epochai.org. Retrieved from: 'https://epochai.org/blog/tracking-compute-intensive-ai-models' [online resource]

How we process data at Our World in Data

All data and visualizations on Our World in Data rely on data sourced from one or several original data providers. Preparing this original data involves several processing steps. Depending on the data, this can include standardizing country names and world region definitions, converting units, calculating derived indicators such as per capita measures, as well as adding or adapting metadata such as the name or the description given to an indicator.

At the link below you can find a detailed description of the structure of our data pipeline, including links to all the code used to prepare data across Our World in Data.

Read about our data pipeline
Notes on our processing step for this indicator

The count of large-scale AI models AI systems per domain is derived by tallying the instances of machine learning models classified under each domain category. It's important to note that a single machine learning model can fall under multiple domains. The classification into domains is determined by the specific area, application, or field that the AI system is primarily designed to operate within.

Reuse this work

  • All data produced by third-party providers and made available by Our World in Data are subject to the license terms from the original providers. Our work would not be possible without the data providers we rely on, so we ask you to always cite them appropriately (see below). This is crucial to allow data providers to continue doing their work, enhancing, maintaining and updating valuable data.
  • All data, visualizations, and code produced by Our World in Data are completely open access under the Creative Commons BY license. You have the permission to use, distribute, and reproduce these in any medium, provided the source and authors are credited.

Citations

How to cite this page

To cite this page overall, including any descriptions, FAQs or explanations of the data authored by Our World in Data, please use the following citation:

“Data Page: Cumulative number of large-scale AI models by domain”, part of the following publication: Charlie Giattino, Edouard Mathieu, Veronika Samborska and Max Roser (2023) - “Artificial Intelligence”. Data adapted from Epoch. Retrieved from https://ourworldindata.org/grapher/cumulative-number-of-large-scale-ai-models-by-domain [online resource]
How to cite this data

In-line citationIf you have limited space (e.g. in data visualizations), you can use this abbreviated in-line citation:

Epoch (2024) – with major processing by Our World in Data

Full citation

Epoch (2024) – with major processing by Our World in Data. “Cumulative number of large-scale AI models by domain” [dataset]. Epoch, “Tracking Compute-Intensive AI Models” [original data]. Retrieved July 15, 2024 from https://ourworldindata.org/grapher/cumulative-number-of-large-scale-ai-models-by-domain