Data

Exponential growth of datapoints used to train notable AI systems

See all data and research on:

What you should know about this indicator

  • Training data size measures the volume of unique examples used to train an AI model during its learning phase. It represents the total number of distinct data points the model learns from, counted only once regardless of how many times they're seen during training.
  • To understand this concept, imagine teaching someone to identify different bird species. Each unique bird photo you show them is one piece of training data. If you show 100 different photos, your training data size is 100, even if you review those same photos multiple times.
  • Since datasets vary by domain, there's no universal unit for measuring size. Text models might count tokens, image models count pictures, and video models count clips. Epoch AI typically uses the smallest unit that triggers a model update during training. For language models that predict the next word, this would be individual tokens.
  • Training data size directly impacts model performance. Larger datasets enable deeper learning and more nuanced pattern recognition, allowing models to identify subtle distinctions and handle diverse real-world scenarios more effectively.
Exponential growth of datapoints used to train notable AI systems
The number of unique data points used to train the model. Each domain has a specific data point unit; for example, for vision it is images, for language it is words, and for games it is timesteps. This means systems can only be compared directly within the same domain.
Source
Epoch AI (2025)with major processing by Our World in Data
Last updated
March 12, 2025
Next expected update
April 2026
Unit
unique datapoints

Sources and processing

Epoch AI – Parameter, Compute and Data Trends in Machine Learning

Retrieved on
March 7, 2026
Citation
This is the citation of the original data obtained from the source, prior to any processing or adaptation by Our World in Data. To cite data downloaded from this page, please use the suggested citation given in Reuse This Work below.
Epoch AI, ‘Parameter, Compute and Data Trends in Machine Learning’. Published online at epochai.org. Retrieved from: ‘https://epoch.ai/data/epochdb/visualization’ [online resource]
Retrieved on
March 7, 2026
Citation
This is the citation of the original data obtained from the source, prior to any processing or adaptation by Our World in Data. To cite data downloaded from this page, please use the suggested citation given in Reuse This Work below.
Epoch AI, ‘Parameter, Compute and Data Trends in Machine Learning’. Published online at epochai.org. Retrieved from: ‘https://epoch.ai/data/epochdb/visualization’ [online resource]

All data and visualizations on Our World in Data rely on data sourced from one or several original data providers. Preparing this original data involves several processing steps. Depending on the data, this can include standardizing country names and world region definitions, converting units, calculating derived indicators such as per capita measures, as well as adding or adapting metadata such as the name or the description given to an indicator.

At the link below you can find a detailed description of the structure of our data pipeline, including links to all the code used to prepare data across Our World in Data.

Read about our data pipeline

How to cite this page

To cite this page overall, including any descriptions, FAQs or explanations of the data authored by Our World in Data, please use the following citation:

“Data Page: Exponential growth of datapoints used to train notable AI systems”, part of the following publication: Charlie Giattino, Edouard Mathieu, Veronika Samborska, and Max Roser (2023) - “Artificial Intelligence”. Data adapted from Epoch AI. Retrieved from https://archive.ourworldindata.org/20260308-063423/grapher/exponential-growth-of-datapoints-used-to-train-notable-ai-systems.html [online resource] (archived on March 8, 2026).

How to cite this data

In-line citationIf you have limited space (e.g. in data visualizations), you can use this abbreviated in-line citation:

Epoch AI (2025) – with major processing by Our World in Data

Full citation

Epoch AI (2025) – with major processing by Our World in Data. “Exponential growth of datapoints used to train notable AI systems” [dataset]. Epoch AI, “Parameter, Compute and Data Trends in Machine Learning” [original data]. Retrieved March 31, 2026 from https://archive.ourworldindata.org/20260308-063423/grapher/exponential-growth-of-datapoints-used-to-train-notable-ai-systems.html (archived on March 8, 2026).

Quick download

Download the data shown in this chart as a ZIP file containing a CSV file, metadata in JSON format, and a README. The CSV file can be opened in Excel, Google Sheets, and other data analysis tools.

Data API

Use these URLs to programmatically access this chart's data and configure your requests with the options below. Our documentation provides more information on how to use the API, and you can find a few code examples below.

Data URL (CSV format)
https://ourworldindata.org/grapher/exponential-growth-of-datapoints-used-to-train-notable-ai-systems.csv?v=1&csvType=full&useColumnShortNames=false
Metadata URL (JSON format)
https://ourworldindata.org/grapher/exponential-growth-of-datapoints-used-to-train-notable-ai-systems.metadata.json?v=1&csvType=full&useColumnShortNames=false

Code examples

Examples of how to load this data into different data analysis tools.

Excel / Google Sheets
=IMPORTDATA("https://ourworldindata.org/grapher/exponential-growth-of-datapoints-used-to-train-notable-ai-systems.csv?v=1&csvType=full&useColumnShortNames=false")
Python with Pandas
import pandas as pd
import requests

# Fetch the data.
df = pd.read_csv("https://ourworldindata.org/grapher/exponential-growth-of-datapoints-used-to-train-notable-ai-systems.csv?v=1&csvType=full&useColumnShortNames=false", storage_options = {'User-Agent': 'Our World In Data data fetch/1.0'})

# Fetch the metadata
metadata = requests.get("https://ourworldindata.org/grapher/exponential-growth-of-datapoints-used-to-train-notable-ai-systems.metadata.json?v=1&csvType=full&useColumnShortNames=false").json()
R
library(jsonlite)

# Fetch the data
df <- read.csv("https://ourworldindata.org/grapher/exponential-growth-of-datapoints-used-to-train-notable-ai-systems.csv?v=1&csvType=full&useColumnShortNames=false")

# Fetch the metadata
metadata <- fromJSON("https://ourworldindata.org/grapher/exponential-growth-of-datapoints-used-to-train-notable-ai-systems.metadata.json?v=1&csvType=full&useColumnShortNames=false")
Stata
import delimited "https://ourworldindata.org/grapher/exponential-growth-of-datapoints-used-to-train-notable-ai-systems.csv?v=1&csvType=full&useColumnShortNames=false", encoding("utf-8") clear