scatter plot

Communicating data effectively with data visualizations: Part 36 (Scatter plots and Moving Averages for COVID-19 vaccination)

INTRODUCTION

As of 27 April 2021, the Centers for Disease Control and Prevention (CDC) reported that approximately 42.7% of the U.S. population have received at least one dose of the coronavirus (COVID-19) vaccine and 29.1% are fully vaccinated.1 Moreover, a total of over 200 million doses have been administered. Despite the effectiveness of the availability of COVID-19 vaccines, there are some people who have decided to not receive the vaccine. The Economist published a data visualization that summarized the results of weekly surveys that asked respondents, “When a covid-19 vaccine becomes available to you, will you get vaccinated?” This figure shows that approximately 60% of the population will get or already have received the COVID-19 vaccine. However, a little over 20% stated that they will not get the vaccine and nearly 20% are unsure. Using data from The Economist, we can use Microsoft Excel to recreate the figure below:

* Source: The Economist – “When a covid-19 vaccine becomes available to you, will you get vaccinated?” (link). This is for educational purposes only.

DATA

The data to recreate the figure can be found at The Economist YouGov data site (link). I also abstracted the data and saved them onto an Excel file, which is located here. Using the data from The Economist, we will create a scatter plot and use the 2-day moving average to re-create the figure above.

Creating the Scatter Plot with Moving Average

Step 1. Inspect the data

The data include the percentage of respondents who answered “Yes” (includes both Yes and those who received the vaccine), “No”, and “Not Sure”. The timeline is composed of the calendar year, month, and the week the data was collected. Notice that for some of the months, there were 5 weeks. I altered the scale so that the data would fit into a 4-week bin. Hence, you’ll notice that the weekly scale is 1, 2, 3, 3.5, and 4.

Step 2. Highlight the data and insert a line chart

Step 3. Modify the data source

Once the line chart is inserted onto the workspace, you’ll notice that the scatter plot hasn’t been created. A few more steps are required. Right-click on the chart and click on “Select Data” to open the window that will allow you to make modifications to the data source.

Click on the “Edit” option and select the rows containing the Year and Month on the data sheet. This will create the X-axis labels for our data visualization. You should notice that the X-axis labels have changed to reflect the Year and Month.

To further modify the data source, right-click on the chart again and click on “Select Data….” At the data source window, remove “Year”, “Month”, and “Week” from the legend series to only display the percentage of respondents to the survey question. Once these have been removed, click “Ok.” This will generate a link chart with the respondents’ answers to the survey question.

Step 4. Change line chart to scatter plot

Currently, the X-axis displays the data from 2021 before 2020. We can change this by right-clicking on the X-axis and then clicking on “Format Axis.” This brings up a window where we can click on the box by “Categories in reverse order” for the X-axis to display the respondents answers in chronological order.

To change the line chart to a scatter plot, right-click on the line and select  “Format Data Series…” This will open the options to modify the data on the chart. Instead of a line chart, we want a scatter plot. To make this change, click on the paint bucket and under the line group click on “No line” to remove the line chart. Then under the Marker group, select the circle under “Type.” This will replace the line chart with a scatter plot.

Repeat this for the other lines and you should have a chart that looks like the following:

Step 5. Adjust the aesthetics and add a moving average line

The final step involves making changes to the aesthetics of the chart. You can remove the labels on the Y-axis, delete the gridlines, delete the chart title, eliminate the border of the X-axis, and then delete the legend.  You can add moving average lines for each of the scatter plots by right-clicking on the data points and selecting “Add Trendline…” Then select “Moving average” with “2” periods. Change the moving average line from a dashed line to a solid line and a higher thickness.

The final figure can be achieved by changing the colors of the scatter plots and moving average lines. You add additional labels to emulate the figure from The Economist.

CONCLUSIONS

Using data from The Economist, we recreated the figure showing how the perceptions of respondents changed regarding getting a COVID-19 vaccine from 2020 to 2021. All the data files can be downloaded from here.

 

REFERENCES

1. CDC. COVID Data Tracker. Centers for Disease Control and Prevention. Published March 28, 2020. Accessed April 27, 2021. https://covid.cdc.gov/covid-data-tracker

The Economist YouGov data site (link)

 

Communicating data effectively with data visualizations: Part 28 (Scatter plot)

INTRODUCTION

Between 1918 to 1919, the influenza pandemic (also known as the “Spanish Flu”) raged across the world and caused over 40 million deaths. Cities in the United States enacted nonpharmaceutical interventions (e.g., social distancing, shelter-in-place mandates) to reduce the transmission of the influenza pandemic, overall and peak attack rates, and the number of deaths. Some of the cities were successful in mitigating the calamity associated with the pandemic, but others were not. The experiences that these cities learned in the past yield important insight for policy makers today to tackle the current COVID-19 pandemic.

Markel and colleagues (2007) reported on the impact of nonpharmaceutical interventions enacted by cities in the United States and their effect that they had on mitigating the influenza pandemic of 1918 to 1919.[1] Briefly, their report highlights that cities that implemented these public health interventions early had greater delays in the time to reaching peak mortality, lower peak mortality rates, and lower total mortality.

We will recreate one of the figures (Figure 1c) in this manuscript using Excel and the data provided.

Figure from the study that we will recreate.[1]*

(*This figure is used for educational purposes only.)

DATA

Data for this tutorial come directly from the study’s Table 1. We will use the Public Health Response (days) in the X-axis and the Excess Pneumonia and Influenza Mortality rate (deaths per 100,000 population). You can download the data from the following link.

Step 1. Get the data

Download the data from this link. Data has been cleaned specifically for this tutorial.

Step 2. Insert a scatter plot chart

After downloading the data, open the Excel file. Look for the column that contains the Public health response time, days; this will be the data for the X-axis. Now, look for the column that contains the Excess pneumonia and influenza mortality, deaths / 100,000 population; this is the data for the Y-axis.

In Excel, insert the Scatter plot by selecting the Scatter option in the Charts tab.

Step 3. Select the data for the Scatter plot

An empty figure will appear. Right-click in any area in the empty figure and you should be able to click on “Select Data”. From there, click on “Add” to add data and select the appropriate data for the X-axis values and the Y-axis values.

Clicking “OK” will generate a scatter plot of the excess deaths across the time the public health responded to the pandemic.

Step 5. Adjust the axes

First, we want to move the Y-axis so that it is flushed with the left side of the chart instead of intersecting at zero on the X-axis.

This will change the Y-axis position from its intersection on the X-axis = 0 to X-axis = -15.

Step 5. Change the color of the scatter

To finalize the scatter plot, change the color and size of the scatter.

 
 

FINAL SCATTER PLOT

Once all the adjustments have been made, we can add some data labels for some of the select cities, which were also highlighted with a different color.

CONCLUSION

After recreating the figure from the paper by Markel and colleagues,[1] it is clear that as public health response is delayed, there is a general trend for excess deaths due to the influenza pandemic to increase. Although other types of interventions occurred during this pandemic, the findings from Markel and colleague provides some empirical evidence that early public health measures have significant contributions in terms of mitigating the excess deaths due to the influenza pandemic. Policy makers can use the lessons from the past to inform them about the effectiveness of public health nonpharmaceutical interventions in delaying or reducing the mortality of the current COVID-19 pandemic.

REFERENCE

  1. Markel H, Lipman HB, Navarro JA, et al. Nonpharmaceutical Interventions Implemented by US Cities During the 1918-1919 Influenza Pandemic. JAMA. 2007;298(6):644-654. doi:10.1001/jama.298.6.644