OWID Data Collection: Inequality and Poverty

Joe Hasell; Pablo Arriagada; Max Roser

OWID Data Collection: Inequality and Poverty

Explore a wide range of indicators on inequality and poverty and compare sources.

July 06, 2023

At Our World in Data, we are building an extensive dataset of inequality and poverty indicators, pulling together multiple sources to provide as comprehensive a view as possible.

To make it easier to navigate this wide range of data, below we provide links to a set of Data Explorers that allow you to explore a very detailed range of indicators and compare data across sources. The explorers draw from three prominent sources, each offering a global perspective on poverty and inequality: the World Bank Poverty and Inequality Platform, the Luxembourg Income Study, and the World Inequality Database. Information about the definitions and methods behind the data from each of these sources is provided at the bottom of this page.

The detailed data contained in the explorers collected below is intended for experts or researchers who are already quite familiar with the measures and concepts involved. Users with a more general interest are likely to benefit more from the Data Explorers shown in our topic pages on Inequality and Poverty. These provide an overview of the key indicators from this collection.

Poverty indicators:

Inequality indicators:

Incomes across the distribution:

Details on the methods used by each data source

World Bank PIP

The World Bank Poverty and Inequality Platform is an interactive website and API that the World Bank uses to share the estimates it produces in its activities of monitoring global poverty, inequality and shared prosperity.

Here we summarize some key aspects of the definitions and methods used in the platform’s data.

For a more detailed discussion, see the World Bank PIP methodology document.

Welfare measure

The data collated in the PIP data relates to a mix of after-tax income and consumption, depending on the country and year. While in most high-income countries, the data relates to after-tax income, in poorer countries, the data tends to relate to consumption.

The World Bank pools the data to get a global picture of poverty and inequality. But it’s essential to remember that, depending on the country or year, somewhat different things are being measured.

In the Data Explorers of the World Bank data above, we provide the option of plotting these after-tax income and consumption data points separately.

The World Bank PIP data provides no indicators in terms of before-tax income.

To make absolute comparisons of living standards across countries and over time, the World Bank converts the survey data – measured in local currencies at current prices – into constant international dollars. The World Bank data shown above is all measured in 2017 international dollars.

Primary data sources

The World Bank PIP estimates are derived from a large collection of household surveys.

In addition to the difference between income and consumption data mentioned above, there are several other ways in which comparability across household surveys can be limited, both across countries and over time. In collating this survey data, the World Bank takes various steps to harmonize it where possible, but comparability issues remain. The PIP Methodology Handbook provides a good summary of the comparability and data quality issues affecting this data and how it tries to address them.

To help communicate this limitation of the data, the World Bank produces a companion indicator that groups data points within each country into ‘spells’. The surveys underlying the data within a given spell for a particular country are considered by World Bank researchers to be more comparable. In the Data Explorers of the World Bank data above, we provide the option of plotting the data with the breaks between spells shown.

Accounting for resource sharing within households

The surveys on which the World Bank estimates are conducted at the household level. The income or consumption reported in the survey data sums across all household members.

In calculating its poverty and inequality indicators, the World Bank uses per capita income or consumption: it attributes an equal share of household income to each member – adults and children.

Methods and assumptions applied

In some cases, the raw household survey data itself is not made available to the World Bank. In these cases, their estimates are based on ‘grouped data’ – tabulations of the average incomes of richer and poorer segments of the population. To produce its poverty and inequality estimates, the World Bank fits a distribution to this grouped data, by making certain assumptions about the shape of that distribution.

As we discuss more in this article, a well-known issue with household survey data is that the incomes of the richest are often poorly captured. This can lead to underestimates of inequality. Statistical offices organizing household surveys may adopt various strategies to minimize this, but this varies across countries and over time. In processing this survey data, the World Bank takes no steps to further correct the problem of missing top incomes. As such, inequality indicators based on this data – particularly those sensitive to the top, such as top income shares – may, in many cases, underestimate inequality.

Luxembourg Income Study

The Luxembourg Income Study (LIS) is a collection of household survey data, in which the raw data produced by different statistical offices is reorganized to make them more comparable. One particular benefit of LIS is that it provides access to the ‘microdata’ – the data for particular individuals and households participating in the survey.

From this microdata, Our World in Data calculates a range of poverty and inequality indicators.

Welfare measure

We calculate poverty and inequality indicators for both after-tax and before-tax income. Our definitions align with those used in LIS’ DART data visualization tool and their Key Figures estimates, described here.

As a measure of after-tax income, we use their measure of ‘disposable household income’. This refers to “cash and non-cash income from labor, income from capital, income from pensions (including private and public pensions) and non-pension public social benefits stemming from insurance, universal or assistance schemes (including in-kind social assistance transfers), as well as cash and non-cash private transfers, after deduction of the amount of income taxes and social contributions paid”.

As a measure of before-tax income we use their measure of ‘market income’. This refers to “income received by the households before public redistribution takes place; it includes cash and non-cash income from labor, income from capital, income from private pensions, as well as cash and non-cash private transfers, before deduction of income taxes and social contributions paid”.¹

In order to make absolute comparisons of standards of living across countries and over time, we convert the data – measured in local currencies at current prices – into constant international dollars. The LIS data shown above is all measured in 2017 international dollars.

Primary data sources

The LIS data is a ‘harmonized’ collection of household survey data. This means that the raw data produced by different statistical offices has been reorganized to align the concepts behind the data as much as possible.

The underlying survey data is, however, very heterogeneous, and not all comparability issues can be resolved. To communicate these issues, LIS has released the very helpful Compare.it tool, which provides very detailed comparability notes for each country.

Accounting for resource sharing within households

The surveys LIS collates are conducted at the household level. The income or consumption reported in the survey data sums across all members of the household.

From the LIS microdata, we calculate poverty and inequality indicators based on two approaches for accounting for resource sharing within households:

Per capita income: here, each member of the household (both adults and children) is attributed an income equal to total household income divided by the number of household members.
Equivalized income: on this basis, incomes are adjusted to account for the fact that people in the same household can share costs like rent and heating. We use the ‘square root’ equivalence scale to make this adjustment: each household member (both adults and children) is attributed an income equal to the total household income divided by the square root of the number of household members.

Methods and assumptions applied

LIS provides very detailed documentation of how they process the original survey data on two dedicated metadata platforms: METIS and the Compare.it tool.

In calculating inequality and poverty estimates from the LIS microdata, we apply the same ‘top-’ and ‘bottom-coding’ procedure as used by LIS to calculate their summary statistics presented on their website – both the LIS ‘Key Figures’ and the DART interactive visualization tool. This is done to remove extreme values from the raw survey data and to make the data across countries more comparable. For a more detailed discussion of why this is done, the methods used, and how it impacts resulting estimates, see the helpful explainer from LIS.

A well-known issue with household survey data is that the incomes of the richest are often poorly captured. This can lead to underestimates of inequality. Statistical offices organizing household surveys may adopt various strategies to minimize this, but this varies across countries and over time. In processing this survey data, the Luxembourg Income Study itself takes no steps to try to further correct the problem of missing top incomes. As such, inequality indicators based on this data – particularly those sensitive to the top, such as top income shares – may, in many cases, underestimate inequality.

World Inequality Database

The World Inequality Database (WID) is an extensive database on the distribution of income and wealth maintained by the World Inequality Lab (WIL), located at the Paris School of Economics (PSE). The database is the result of a collaborative effort involving many researchers worldwide.

Primary data sources, welfare measures, and methods

A distinctive feature of the WID data is the broad range of raw data sources it draws on.

Most sources of inequality data draw exclusively on household surveys. As we discuss more in this article, A downside of this approach is that the incomes of the richest are often poorly captured in survey data. This can lead to underestimates of inequality, particularly for measures focused on the top of the distribution, such as the share of income of the richest 1%.

The WID database emerged from the substantial literature on ‘top incomes’ that sought to address this shortcoming of survey data by relying instead on data obtained from tax records, or tabulations of such data released by tax authorities. The use of such tax data often limited what concept of income could be analyzed. Inequality estimates produced within the ‘top incomes’ literature have generally been measured in terms of before-tax income, with the exact definitions varying due to differences in the tax system across countries or over time. Since, in many places or periods, it is only a relatively small population of high-earning individuals that file tax returns, the use of tax data also required a focus on the top of the income distribution – for example, on the share of income received by the top 1 or 10%.²

These methodologies have continued to develop, and the WID database has established a more standardized set of methods. Within this approach, tax data is combined with data from household surveys and national accounts to produce Distributional National Accounts (DINA). The survey and tax data are used to understand how different income components are distributed across the population. This is then scaled to match the aggregates given in national accounts. This allows WID to account for income missing from tax and survey data – notably, the profits of firms that are not distributed to shareholders – and to provide a more consistent basis for international comparisons. Using this approach, inequality estimates can be produced not only for top pre-tax income shares but across the whole distribution, according to a range of different income concepts – including after-tax income.

Another general difference between WID and other main data sources on inequality is that its methodological approach is aimed more at describing the distribution of earnings itself, rather than the distribution of welfare this income generates.

Because of these differences in the goals and raw data sources used by WID, some definitions and methods they use differ from those of other providers of inequality data.

For example, the scaling of incomes to match national accounts aggregates means that the absolute income levels reported in WID data are much higher across the distribution, compared to other sources based on survey data alone. This reflects the very different income concepts being measured.The after-tax income concept used in the data presented above includes, for example, the addition and also the value of public services like schools, hospitals or the armed forces.³

Another example is WID’s unusual approach to accounting for pensions in before- and after-tax income concepts. Typically, public pensions are considered part of the redistribution achieved by governments; private pensions are not.⁴ The before-tax income concept we present in the data above is described by WID as ‘pre-tax, post-replacement’ income. It measures income after the operation of both public and private pension systems. This unusual definition of income is used to yield more consistent comparisons across countries, less impacted by the different ways countries organize pensions.

It is worth pointing out that, at its fullest, the Distributional National Accounts approach is very data intensive. At the same time, WID aims to provide wide coverage across countries and time. As such, for many countries and periods, the raw data required to produce DINAs according to the full methodology is often lacking. Depending on data availability, the way the general approach is implemented in particular countries and periods varies considerably. To document the different assumptions and methods applied in particular cases, WID provides notes on its methodology by country.

Accounting for resource sharing within households

The WID data we gather above counts adult individuals only. For example, the income share of the richest 1% means the income received by the richest 1% of adult individuals as a share of income received by all adult individuals.

The focus on adult individuals means that this data does not make any adjustment to account for the number of children that a person’s income needs to support. This is another reason why the absolute income levels reported in WID data are much higher than in other sources.

The underlying raw data partly drive this choice: in some cases, the reliance on tax data can make it difficult to identify whole households. But it also reflects the goal of the WID data: to measure the distribution of what people earn, rather than the welfare this income generates.

We use the ‘equal split’ data from WID. On this basis, income earned by adults living together is summed up and then allocated equally between them. Depending on the country, this income sharing is done either amongst couples or else amongst all adults living in a household (e.g., both parents and grandparents in multi-generational households). Income is un-equivalized in this data: it makes no adjustment to account for any sharing of costs that multi-person households may benefit from with respect to single-person households.

Endnotes

We calculate before-tax (‘market’) income as the sum of income from labor and capital (LIS code: ‘hifactor’), private cash transfers and in-kind goods and services provided (hiprivate), and private pensions (hi33). We only calculate before-tax income for surveys in which the required data on tax and contributions are fully captured (including where it has been imputed).
Anthony Atkinson, Thomas Piketty, and Emmanuel Saez (2011) provide a good overview of initial research efforts in this field. Atkinson, Anthony B., Thomas Piketty, and Emmanuel Saez. ‘Top Incomes in the Long Run of History’. Journal of Economic Literature 49, no. 1 (March 2011): 3–71. Available here.
WID also produces estimates of after-tax income in which collective expenditures and in-kind transfers like these are not added to individuals’ income, though these are available with lower coverage.
This binary distinction is a simplification. It does not capture well the many different types of pensions: for example, whether an employer contributes or not, whether it is organized collectively or individually, and whether it is mandatory or not. The exact treatment of each kind can vary somewhat between data sources.
As noted in the World Bank PIP methodology documentation.
It’s important to realize that ‘monetary’ poverty also captures sources of income that do not involve money. For example, standard monetary measures of poverty account for the value of food that subsistence farmers grow for their own consumption.

Cite this work

Our articles and data visualizations rely on work from many different people and organizations. When citing this article, please also cite the underlying data sources. This article can be cited as:

Joe Hasell and Pablo Arriagada (2023) - “OWID Data Collection: Inequality and Poverty” Published online at OurWorldinData.org. Retrieved from: 'https://ourworldindata.org/owid-data-collection-inequality-and-poverty' [Online Resource]

BibTeX citation

@article{owid-owid-data-collection-inequality-and-poverty,
    author = {Joe Hasell and Pablo Arriagada},
    title = {OWID Data Collection: Inequality and Poverty},
    journal = {Our World in Data},
    year = {2023},
    note = {https://ourworldindata.org/owid-data-collection-inequality-and-poverty}
}

Reuse this work freely

All visualizations, data, and code produced by Our World in Data are completely open access under the Creative Commons BY license. You have the permission to use, distribute, and reproduce these in any medium, provided the source and authors are credited.

The data produced by third parties and made available by Our World in Data is subject to the license terms from the original third-party authors. We will always indicate the original source of the data in our documentation, so you should always check the license of any such third-party data before use and redistribution.

All of our charts can be embedded in any site.