epidemic curve

Communicating data effectively with data visualizations: Part 24 (Mortality Curves)

INTRODUCTION

The continual threat of infection by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has ground the world to a metaphysical stop. Economies appear to be under threat of a long recession, political debates have delayed needed relief to citizens, lack of N95 masks for healthcare workers place them at greater risk for doing their jobs, and mortality has increased world-wide. We are undoubtedly experiencing an seminal period in the 21st Century and data analysts have rushed to develop stunning visuals and dashboards such as the ones developed by Johns Hopkins University, “Our World in Data”,1 and the Centers for Disease Control and Prevention to feature the impact the SARS-CoV-2 is having on our world.

 

MOTIVATING EXAMPLE

In this article, we will replicate the total deaths due to SARS-CoV-2 using data from the European Centre for Disease Control and Prevention (ECDC)2 or from the Our World in Data’s GitHub site.3 Due to the changing nature of SARS-CoV-2 data, this exercise will have used data that would ultimately be updated in the future. Please visit the ECDC site to download the most recent SARS-CoV-2 data.

The figure we will replicate is one posted on the “Our World in Data” website and looks like the following:

Source: Our World in Data. Total confirmed COVID-19 deaths: How rapidly are they increasing? URL: https://ourworldindata.org/grapher/covid-confirmed-deaths-since-5th-death [last accessed: 17 April 2020].

We will download data from the ECDC and then use Excel to recreate this plot for several countries (we won’t create plots for all the countries, but you can feel free to do so by taking advantage of the available data).

Step 1. Download data from the ECDC.

You can download the raw data from the ECDC’s site here. Alternatively, you can also download the cleaned data for this article here (I cleaned the data and prepared them for use in Excel).

The data has the following format:

Figure 2.png

Each column represents a country and the rows represents the total number of deaths for each day after the 5th confirmed death.

 

Step 2. Select all of the data and insert a line chart.

Once all the data have been downloaded, select all of them. Insert a line chart and Excel will automatically generate a figure for you. This figure will need to be edited further, but Excel does a good job of plotting the total number of deaths along the X-axis (time).

Right-click in the chart region and click on the Select Data option. You want to de-select the “Days since the 5th total confirmed death” because this is not the value of interest. Rather, this represents the values for the X-axis (time).

Once you de-select the “Days since the 5th total confirmed death,” your line chart should look like the following figure.

Step 3. Changing the scale on the Y-axis.

In the figure from the Our World in Data, the values for total deaths are plotted using a log-scale. When Excel generates the line chart, it automatically uses the continuous scale on the Y-axis. To change this, you need to right-click on the Y-axis and then select Format Axis...

Once you make the changes to the Y-axis scale, the line chart should now look similar to the one from Our World in Data.

Step 4. Adding the axes labels and formatting the lines.

Once the line chart’s Y-axis has been transformed into a log-scale, you can make changes to the axes labels and the line formatting. Select the Design tab to made changes to the Y-axis label. Select the Add Chart Element to open the drop-down menu, then select the Axis Title followed by the Primary Vertical option. This will allow you to make changes to the Y-axis label.

Change the Y-axis label to read, “Log number of COVID-19 deaths.” Do the same thing for the X-axis label, but change it to read “Days since 5th death occurred.” Your figure should look like the following.

You can make the lines thinner by right-clicking on one of them (e.g., China), opening up the options. Select Format Data Series… and then adjusting the Width to be 1.5 points. This will make the line easier to see without having to take up space with the thicker lines.

Repeat this process for all the lines in the chart. Once you have completed that, the line chart should look like the following.

Step 5. Now all that’s left is changing some of the aesthetics.

The final line chart replicates the figure from Our World in Data and provides the references line for a doubling in the number of deaths for each country. The reference line (“Doubling every 5 days”) was creating using a base of 10 on a log-scale to replicate a doubling of that value every 5 days. The reference line was also placed on a secondary axis to create a continuous line (since creating a line on the same scale as the other countries would have yielded gaps because we’re doubling deaths every 5 days). In the Design tab, you can add the secondary horizontal axis to match that of the primary horizontal axis (time). The secondary vertical and horizontal axes had their font color changed to white to hide them from view and to clean the final figure.

Gridlines were added along with the label for the reference line, which indicates to point where the total confirmed SARS-CoV-2 deaths are doubling every five days.

Here is the final chart after some formatting changes were made.

Conclusions

Although we plotted total deaths from SARS-CoV-19 from existing data, these were limited to five countries. More countries can be added using the available data, and it is encouraged that you try to plot all the other countries as an exercise. The reference line provides us with the doubling of deaths on a log-scale and carefully provides the readers with a threshold where certain doubling of deaths would be reported. China seems to have controlled their total number of deaths, but there is a spike at the end of day 87 that shows an increase in deaths. This may be due to reporting error or a change in the definitions of death. The US, including the other European countries, are trailing the rest of the world in containing the SARA-CoV-2 pandemic. Hopefully, this type of data visualization will help inform decision makers to develop policies that would mitigate the impact of SARS-CoV-19 on mortality any parts of the world.

You can download the data and complete exercise here.

References

  1. Roser M, Ritchie H, Ortiz-Ospina E, Hasell J. Coronavirus Disease (COVID-19) – Statistics and Research. Our World Data. March 2020. https://ourworldindata.org/coronavirus. Accessed April 17, 2020.

  2. European Centre for Disease Control and Prevention. Download today’s data on the geographic distribution of COVID-19 cases worldwide. European Centre for Disease Prevention and Control. https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide. Published April 18, 2020. Accessed April 18, 2020.

  3. Our World in Data. GitHub: Owid/Covid-19-Data. Our World in Data; 2020. https://github.com/owid/covid-19-data. Accessed April 18, 2020.

Communicating data effectively with data visualizations: Part 23 (Epidemic Curves)

INTRODUCTION

In December 2019, a novel strain of coronavirus was detected in Wuhan, the capital city of the Hubei province in China. This coronavirus was designated as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). This current iteration of the coronavirus has many similar characteristics to its earlier ancestor SARS-CoV-1,[1] which was first detected in 2003 and known simply as SARS. According to a recent study, clinical characteristics of patients in China who were infected with SARS-CoV-2 included fever (up to 88.7% who were hospitalized) and cough (67.8%).[2] The median age of patients infected was 47.0 years (IQR: 35.0-58.0) with a large distribution of 58.3% over the age of 50 years having severe symptoms. Additionally, the case fatality rate was reported to be 1.4%.

In January 2020, the World Health Organization (WHO) in regards to the SARS-CoV-2 outbreak declared a global health emergency.[3] Regardless, as SARS-CoV-2 spread across the globe into a pandemic, many countries started to report the attributable number of cases and deaths. According to the WHO, the global total of confirmed cases is at 191,127 and global deaths is at 7,807 (as reported on March 18, 2020).[4]

One of the most important tools in understanding the SARS-CoV-2 epidemic course is the epidemic curve. Epidemic curves allow epidemiologists to visualize the progression of an outbreak by surveilling the number of cases across time.[5] The epidemic curve informs epidemiologists about the pattern of the outbreak’s spread, magnitude, time to exposure, and outliers. Moreover, the epidemic curve is constantly updated as more data become available.

As the SARS-CoV-2 pandemic spreads to other countries, many data visualizations have been developed to help educate and inform people. Johns Hopkins University has developed a real-time dashboard with epidemic curves on the SARS-CoV-2 pandemic that is an excellent source of global cases and mortality. The Centers for Disease Control and Prevention (CDC) also has a series of data visualizations on the SARS-CoV-2 outbreak in the United States including an epidemic curve.

This article will review the features of an epidemic curve and provide a tutorial on creating one based on the available data from the CDC on the SARS-CoV-2 outbreak in the United States.

EPIDEMIC CURVE

When an outbreak happens, there is an urgency to determine when it first occurred. Epidemiologists carefully, collect data to determine who patient zero is and when the case was first identified. This gives them a starting point for when the outbreak occurred. Epidemic curves provide information on the outbreaks’ spread, magnitude, incubation period, outliers, and time trend. Key features of the epidemic curve include the number of cases on the Y-axis and the date of illness on the X-axis. Figure 1 illustrates the key features of the epidemic curve for a point-source outbreak.

Figure 1. Key features of the epidemic curve for a point-source outbreak.

Epidemic curves can tell us information about the outbreak’s pattern of spread. Figure 1 illustrates the pattern of spread for a point-source outbreak. In a point-source outbreak, a single source of contamination affects a group of people at a single event (e.g., rotten potato salad at a dinner party). Other patterns include continuous-source and propagated-source outbreaks. Continuous-source outbreak occurs when the group of people are exposed to a source of contamination for a period of time (e.g., lead poisoning in children). Propagated-source outbreak occurs when the contamination is spread from person-to-person (e.g., flu). The SARS-CoV-2 is an example of a mixed-source outbreak where the early outbreak was due to a common-source (e.g., possibly from zoonotic transmission from animal to human) followed by a propagated-source outbreak where the virus is spread from person-to-person via air droplets or physical contact.[6] Based on a recent study, the mean incubation period for SARS-CoV-2 is 5.1 days (95% CI: 4.5 to 5.8 days).[6]

 

Motivating example

The SARS-CoV-2 outbreak in the United States was first reported in January 14, 2020. Data on the number of cases can be downloaded from the CDC’s SARS-CoV-2 surveillance site (Note: CDC data are updated daily; hence, the data for this exercise will not reflect these changes). We will use these data to create an epidemic curve of the SARS-CoV-2 outbreak in the United States using Excel. You can download the data files used in this exercise here.

The data are arranged in a wide format where the date (time) is represented by columns and the number of cases is represented by rows. This makes it much easier to generate the epidemic curve in Excel.

Step 1. Highlight the data and Insert a bar chart.

Select the data and insert a clustered column chart. The default version will provide a simple epidemic curve. However, we want to remove the spaces between the bars. To do that, we will need to format the

Step 2. Changing the size of the bars.

To change the size of the bars, we need to right-click one of them to bring up the editing menu. Then we select the Format Data Series… to bring up the options. We set the Gap Width of the bars to 0% so that their sides are in contact with each other. But to distinguish them, we can change the outline’s color to White. Increasing the width of the border’s color will increase gaps between each bar.

After a few more changes (e.g., color and labels), the final epidemic curve will represent the CDC’s data on SARS-CoV-2 on March 18, 2020 (Figure 2). Since data are constantly changing and require validation during an outbreak, this epidemic curve will eventually change. It is recommended that you constantly update this exercise’s data in order to have the most recent, accurate, and valid data from the CDC on SARS-CoV-2. You can also compare your findings to those of the CDC at their website.

Figure 2. Cases of SARS-CoV-2 in the United States.

Conclusions

Epidemic curves are helpful in understanding a disease outbreak in a community. They provide us with a visual representation of the outbreak’s magnitude, pattern, and time period, which will allow us to implement public health policy to stem, reduce, and eventually eradicate the contagion from our population. Although this is a short introduction on epidemic curves, it will, hopefully, be enough for you to review and interpret other epidemic curves in the news or literature.

Files related to this exercise are available here.

References

  1. van Doremalen N, Bushmaker T, Morris DH, et al. Aerosol and Surface Stability of SARS-CoV-2 as Compared with SARS-CoV-1. N Engl J Med. 2020;0(0):null. doi:10.1056/NEJMc2004973

  2. Guan W, Ni Z, Hu Y, et al. Clinical Characteristics of Coronavirus Disease 2019 in China. N Engl J Med. 2020;0(0):null. doi:10.1056/NEJMoa2002032

  3. World Health Organization. Statement on the second meeting of the International Health Regulations (2005) Emergency Committee regarding the outbreak of novel coronavirus (2019-nCoV). https://www.who.int/news-room/detail/30-01-2020-statement-on-the-second-meeting-of-the-international-health-regulations-(2005)-emergency-committee-regarding-the-outbreak-of-novel-coronavirus-(2019-ncov). Accessed March 19, 2020.

  4. World Health Organization. Coronavirus Disease 2019 (COVID-19) Situation Report –58.; 2020. https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200318-sitrep-58-covid-19.pdf?sfvrsn=20876712_2. Accessed March 19, 2020.

  5. Centers for Disease Control and Prevention. Interpretation of Epidemic (Epi) Curves during Ongoing Outbreak Investigations | Foodborne Outbreaks | Food Safety | CDC. https://www.cdc.gov/foodsafety/outbreaks/investigating-outbreaks/epi-curves.html. Published November 16, 2018. Accessed March 19, 2020.

  6. Lauer SA, Grantz KH, Bi Q, et al. The Incubation Period of Coronavirus Disease 2019 (COVID-19) From Publicly Reported Confirmed Cases: Estimation and Application. Ann Intern Med. March 2020. doi:10.7326/M20-0504