COVID-19 vaccinations, cases, excess mortality, and much more
Explore our COVID-19 data

What are the sources for Our World in Data’s population estimates?

Population size is our most commonly used metric throughout Our World in Data. It is either used directly to understand population growth over time, or indirectly to calculate per capita adjustments of the many other metrics we care about: from extreme poverty to electricity access; from CO₂ emissions to vaccination rates.

Many datasets on population cover a specific time period – for example, the UN publishes data from 1950 onwards. However, few maintain very long-term datasets that are continually updated to the present day.

Our team therefore builds and maintains a long-run dataset on population by country, region, and for the world, based on three key sources:

The script that produces this long-run dataset can be accessed in our GitHub repository.

In all sources that we rely on, historical population estimates are based on today’s geographical borders.

We provide a full citation for each of the sources below. If you cite population data for a specific period, please cite the original source. For example for the period 1950 onwards, please cite the UN World Population Prospects. You can add “via Our World in Data” if you downloaded the data from us.

10,000 BCE to 1799: HYDE Version 3.2

The very ambitious HYDE database (History database of the Global Environment) is maintained by researchers at the Netherlands Environmental Assessment Agency:

HYDE is an internally consistent combination of updated historical population estimates and land use. Estimates are produced as gridded maps of total, urban, rural population, and population density as well as built-up area. The period covered is 10,000 BCE to 2015. Spatial resolution is 5 arc minutes (approximately 85 km2 at the equator).

Full citation: Klein Goldewijk, K., A. Beusen, J.Doelman and E. Stehfest (2017), Anthropogenic land use estimates for the Holocene; HYDE 3.2, Earth System Science Data, 9, 927-953.

The HYDE estimates go up to 2015, but they are only available once per decade for the period 1800–2015. Therefore we favor the Gapminder dataset from 1800 onwards, as it provides annual estimates.

1800 to 1949: Gapminder Version 6

Gapminder maintains a population dataset based on data from Angus Maddison and CLIO INFRA, that we use as our source for the period from 1800 to 1949. Their documentation provides the following details on their sources:

We use Maddison population data improved by CLIO INFRA in April 2015 and Gapminder v3 documented in greater detail by Mattias Lindgren. The main source of v3 was Angus Maddison’s data which is maintained and improved by CLIO Infra Project. The updated Maddison data by CLIO INFRA were based on the following improvements:

i. Whenever estimates by Maddison were available, his figures are being followed in favor of estimates by Gapminder;

ii. For Africa, estimates by Frankema and Jerven (2014) for the period 1850-1960 have been added to the existing database;

iii. For Latin America, estimates by Abad & Van Zanden (2014) for the period 1500-1940 have been added.

Full citation: Gapminder doesn’t provide a preferred citation themselves. We cite their work as: Gapminder population dataset version 6, based on data by Angus Maddison improved by CLIO INFRA.

1950 to 2100: UN World Population Prospects

We rely on the latest issue of the United Nations World Population Prospects as our main source for recent historical data and future projections. We use this data for its reliability, its consistent methods, and because it includes population estimates for almost all territories in the world. The UN updates its dataset every 2 years, with:

  • Annual historical estimates running from 1950 to the year before the most recent dataset publication;
  • Annual projections running from the year of the most recent dataset publication to 2100. The UN publishes multiple projections based on different scenarios of global fertility rates: a low, medium and high scenario. In our dataset we use the medium-variant scenario.

The United Nations estimates may not always reflect the latest censuses or national figures. However, there are several reasons why we use this data over country-by-country national population estimates:

  • The UNWPP dataset is the standard in research. The main reason is that it uses a reliable and standardized methodology for all countries. For example, if we used individual country data, some may include overseas workers, expats, undocumented immigrants, etc. but others wouldn’t. The UNWPP dataset tries to maintain a consistent methodology across all countries.
  • Using data from the UN allows us to get accurate population estimates for all territories in the world very easily. Finding and maintaining estimates based on national censuses would be very time-consuming and more prone to errors.
  • Other reasons include the availability of yearly data (national censuses are only conducted every few years), and avoiding double-counting in cases of border disputes.

Full citation: United Nations, Department of Economic and Social Affairs, Population Division (2019). World Population Prospects 2019, Online Edition. Rev. 1.

Keep reading at Our World in Data