To make progress against the outbreak of the Coronavirus disease – COVID-19 – we need to understand how the pandemic is developing.
For this we need reliable and timely data.
At Our World in Data we have therefore focused our work on bringing together the research and statistics on the COVID-19 outbreak – you find our work on the pandemic here.
Why we stopped relying on data from the World Health Organization
Up until 18 March 2020, we relied on the World Health Organization (WHO) as our only data source. The WHO publishes daily Situation Reports and went through them every day to convert these data publications of the WHO – single PDF files with printed tables – into consistent data files. We undertook the extra effort to bring the data from the Situation Reports into the right format to be able to rely on the international agency of the UN with the mandate to provide data on the COVID-19 outbreak.
Unfortunately, on 18 March 2020, Situation Report 58 shifted the reporting cutoff time. With this change by the WHO comparability with earlier figures was compromised. There is now an overlap between the last two WHO data releases (Situation Reports 57 and 58) – the hours between midnight and 9am (CET) are reported in both reports due to the cutoff time change.
This – in addition to the series of small errors that we found in the WHO dataset – means that we believe it is currently not possible to understand how the pandemic is developing based on the data published by the WHO.
Why we now rely on the data from the European Center for Disease Control and Prevention (ECDC)
When we realized late at night on March 18th that we cannot continue to rely on the WHO data we brought together the data from the two other global COVID-19 sources – Johns Hopkins University and the ECDC – to compare these alternative sources with the statistics reported by the WHO.
As we report below, we did not find major differences and decided therefore to rely on the consistently published and cleanly maintained data from the European Center for Disease Control and Prevention (ECDC).
The European CDC publishes daily statistics on the COVID-19 pandemic. Not just for Europe, but for the entire world.
This is the EU agency with the aim of strengthening Europe’s defense against infectious diseases. Their website is here.
To make this change, we had to republish all of the underlying data of our work, so our charts remain consistent across countries and time. We made this change at 15:30 London time on March 19.
Our standards for the selection of data continue to apply through the rapidly changing terrain of the available data on COVID-19. Our choice to report the ECDC data followed an analysis of competing sources. We provide here a summary of this analysis in case this information may also be useful for others.
We will continue exploring and reporting on the relevant merits and limitations of existing data sources as new statistical information becomes available.
There are three key sources providing regular updates of COVID-19 cases and deaths globally and by country: Johns Hopkins University; the World Health Organization (WHO); and the European Centre for Disease Prevention and Control (ECDC).
We give a more detailed breakdown of the details and differences between them in the section below.
Johns Hopkins University is doing very valuable work, but the list of open issues is long – they can be found here. For readers that care about getting data sources right and are very meticulous it could be very valuable to support the efforts of the researchers at Johns Hopkins (via the link before).
The two visualizations here allow you to compare the total number of confirmed cases and deaths across the three sources over time.
This data is shown until March 17th – the date of our last WHO update. Up to date data beyond March 17 can always be found in our main page on COVID-19.
These charts are interactive: you can make comparisons between the three sources for all countries around the world using the “Change country” toggle in the bottom-left of the chart.
The shift up in the global series for cases in mid-February is due to a reporting change in China. The WHO reported this change later than the other two sources.
Apart from this we see that these different sources report a similar global perspective.
The trends are very close, but Johns Hopkins’ numbers are higher than the numbers of the WHO and the ECDC. This may be because Johns Hopkins also includes estimates of ‘presumptive positive cases’. Presumptive positive cases are those that have been confirmed by state or local labs, but not by national labs (e.g. the US CDC).
Global COVID-19 data: source by source
World Health Organization – daily Situation Reports
- Link: WHO Situation Reports
- Reporting time and frequency: The daily Situation Reports are published every day. Until Report 57, all Situation Reports covered data up until 0900 CET of the same day. After Report 58 the data cutoff was shifted to 0000 CET. This affects all reports as of 18th March 2020.
- Metrics covered: By country: Total confirmed cases; daily new cases; total confirmed deaths; daily new deaths.
- Metrics not included: Tests, recoveries or breakdown of confirmed cases.
- Underlying source of data: Direct reports from national governments
World Health Organization – API (feeds their Dashboard)
- Link: WHO Health Emergency Dashboard
- Reporting time and frequency: The data feeding the live WHO dashboard is updated three times per day. Based on monitoring the dashboard, we have established the following update times: 8am, 10am, and 4pm Geneva time (CET). These times seem to have changed as a consequence of the new reporting time for the production of the Situation Reports (see source above).
- Metrics covered: By country: Total confirmed cases; daily new cases; total confirmed deaths (only the total up to the current update is available, totals as of previous days are not available).
- Metrics not covered: Daily deaths (it is not possible to see deaths over time); tests; recoveries or breakdown of confirmed cases.
- Underlying source of data: WHO, National Health Commission of the People’s Republic of China.
European Center for Disease Control and Prevention (ECDC)
- Links: The ECDC provides three related but different COVID-19 resources
- Reporting time and frequency:
After monitoring their recent updates, our understanding of their data release process is the following:
Step 1: They collect data in the mornings (6-10am CET) and upload it into the dashboard.
Step 2: They produce the data export CSV file with the entire time series for each country, adding the data collected on Step 1.
Step 3: They upload at 1pm CET the CSV file from Step 2.
Step 4: They update the Situation Update Reports, to reflect the new data in Step 3. This is done in the afternoons (2-4pm CET)
- Metrics: The data tables include daily new cases and daily new deaths, country by country. These are provided as complete time series in a CSV file, which allows the reconstruction of cumulative cases and deaths. Additionally, the dashboard includes a selection of countries (under “Enhanced Surveillance Area”) for which it is possible to explore data on cases broken down by outcomes (e.g. infection source, hospitalization status, ICU care, etc), as well as by age groups and gender.
- Metrics not covered: Tests or recoveries.
- Underlying sources: Global data that ECDC obtains from multiple sources, such as websites of ministries and public health institutes
Johns Hopkins University
- Link: Coronavirus COVID-19 Global Cases by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University.
- Reporting time and frequency: The map is maintained in near real time throughout the day through a combination of manual and automated updating. The time of the latest update is noted on the bottom of the dashboard. The GitHub database updates daily at around 11:59 p.m. UTC. The documentation notes that occasional maintenance can result in slower updates.
- Metrics covered: By country: Total confirmed cases; daily new cases; total confirmed deaths; daily new deaths; recovered (available as API).
- Metrics not covered: Tests; breakdown of confirmed cases.
- Underlying source of data: The documentation notes: “Our primary data source is DXY, an online platform run by members of the Chinese medical community, which aggregates local media and government reports to provide COVID-19 cumulative case totals in near real-time at the province level in China and country level otherwise. Every 15 minutes, the cumulative case counts are updated from DXY for all provinces in China and affected countries and regions. For countries and regions outside mainland China (including Hong Kong, Macau and Taiwan), we found DXY cumulative case counts to frequently lag other sources; we therefore manually update these case numbers throughout the day when new cases are identified. To identify new cases, we monitor various twitter feeds, online news services, and direct communication sent through the dashboard. Before manually updating the dashboard, we confirm the case numbers using regional and local health departments, namely the China CDC (CCDC), Hong Kong Department of Health, Macau Government, Taiwan CDC, European CDC (ECDC), the World Health Organization (WHO), as well as city and state level health authorities. For city level case reports in the U.S., Australia, and Canada, which we began reporting on February 1, we rely on the US CDC, Government of Canada, Australia Government Department of Health and various state or territory health authorities. All manual updates (outside mainland China) are coordinated by a team at JHU.”
- Possible sources of discrepancy with other sources: The documentation notes that confirmed cases include presumptive positive cases. Presumptive positive cases are those that have been confirmed by state or local labs, but not by national labs (e.g. CDC). This may help explain why their figures can sometimes be slightly higher than those published by the WHO.