excel

Reproduction number—COVID-19

BACKGROUND

As the COVID-19 pandemic, which began in December 2019, continues into its second year, public health measures have been put into place to mitigate its spread. At the time of writing this article, there have been over 4.5 million deaths and over 216 million cases due to COVID-19.[1] Surveillance of COVID-19 remains an important public health measure of understanding the spread and impact. Daily reports such as the John Hopkins COVID-19 dashboard provide end users with visual and statistical information about the surges in cases and deaths associated with COVID-19. However, one measure that is of great interest is the reproduction number or R0.

 

Reproduction number (R0) and effective reproduction number (Rt)

The reproduction number is the number of new cases that is directly caused by exposure to a single case.[2,3] Figure 1 provides a visual explanation of the basic reproduction number. However, the underlying assumption with R0 is that everyone in the population is susceptible to infection. With the introduction of vaccines, the R0 isn’t a good measure of the reproductive capabilities of COVID-19. Instead, the effective reproduction number (Rt) is used to provide a more realistic reproduction number based on the population being infected, recovered, or vaccinated. The Rt changes over time as the population susceptible to infection changes.

Figure 1. Basic reproduction number.

I wanted to create a figure that would highlight the changes associated with the Rt for each state in the United States. To do this, I downloaded the Rt data from the by Xihong Lin's Group in the Department of Biostatistics at the Harvard T.H. Chan School of Public Health. They have an amazing COVID-19 tracker dashboard that captures the changing patterns of Rt for each state. Then I created a Cleveland plot to show where the Rt was near the beginning of the pandemic and where it is currently (August 2021). (Note: I wrote a tutorial on creating Cleveland plots that you can review here.) Here is the final figure (because of the length of the figure, I cropped it to show the first 30 states or territories):

 

Figure 2. Effective reproduction number (Rt) for U.S. states and territories, April 17, 2020 (past) to August 14, 2021 (recent).

The blue dots denote the most recent effective reproduction number (14 August 2021) and the past dots denote the earliest effective reproduction number (17 April 2020).

It seems that some states have gotten worse in terms of increase effective reproduction number since the beginning of the pandemic. This could be due to lack of good data in the early phases of the pandemic. However, what is of concern is the high effective reproduction numbers in some states (Rt > 2), which indicates that the pandemic is still spreading at an alarming rate.

There were some missing data which are identified by a single dot (blue or red) or an empty field in the recent or past effective reproduction number. Rather than fill these in, I left them empty. There may be data in between the two time periods that I could have used, but I left those out.

One thing to mention is that this Cleveland plot only tells us one dimension of the effective reproduction number story (the difference between the most recent Rt and the earliest Rt). It doesn’t tell us much about how the effective reproduction number changes across time. For that, I direct your attention to the Lin’s Laboratory Group at Harvard, they have a great figure that shows the fluctuation of the effective reproduction number for the U.S. and its states/territories (see example):

Source: Lin’s Laboratory Group at Harvard (link). [last accessed on 30 August 2021].

CONCLUSIONS

The effective reproduction number provides us with some interesting patterns in spread of COVID-19 by states/territories. It seems to have worsened over time, but this could be due to poor data early in the pandemic. There are some issues with the us of effective reproduction number for policy decisions. Reporting delays can impact the estimates for the effective reproduction number. A technique called “nowcasting” is used to estimate the reproduction number.[3] But when I explored some of the work in this area, there appears to be a variety of methods for performing this technique. Despite this limitation, the effective reproduction number may be useful to evaluate public health policy decisions to reduce the spread of the COVID-19 pandemic.[4,5]

 

DATA SOURCE

I provided the link to the COVID-19 Spread Tracker from the Lin Lab at Harvard. You can also download a curated version of the data for this article from my Dropbox folder. The data are current as of 17 August 2021. If you’re interested in recreating this Cleveland plot, I recommend downloading the most recent data to see how much the effective reproduction number has changed.

REFERENCES

  1. Worldometeres.info. COVID Live Update: 217,770,381 Cases and 4,521,936 Deaths from the Coronavirus - Worldometer. Accessed August 30, 2021. https://www.worldometers.info/coronavirus/

  2. Lim J-S, Cho S-I, Ryu S, Pak S-I. Interpretation of the Basic and Effective Reproduction Number. J Prev Med Pub Health. 2020;53(6):405-408. doi:10.3961/jpmph.20.288

  3. Adam D. A guide to R — the pandemic’s misunderstood metric. Nature. 2020;583(7816):346-348. doi:10.1038/d41586-020-02009-w

  4. Inglesby TV. Public Health Measures and the Reproduction Number of SARS-CoV-2. JAMA. 2020;323(21):2186-2187. doi:10.1001/jama.2020.7878

  5. Pan A, Liu L, Wang C, et al. Association of Public Health Interventions With the Epidemiology of the COVID-19 Outbreak in Wuhan, China. JAMA. 2020;323(19):1915-1923. doi:10.1001/jama.2020.6130

Communicating data effectively with data visualizations: Part 36 (Scatter plots and Moving Averages for COVID-19 vaccination)

INTRODUCTION

As of 27 April 2021, the Centers for Disease Control and Prevention (CDC) reported that approximately 42.7% of the U.S. population have received at least one dose of the coronavirus (COVID-19) vaccine and 29.1% are fully vaccinated.1 Moreover, a total of over 200 million doses have been administered. Despite the effectiveness of the availability of COVID-19 vaccines, there are some people who have decided to not receive the vaccine. The Economist published a data visualization that summarized the results of weekly surveys that asked respondents, “When a covid-19 vaccine becomes available to you, will you get vaccinated?” This figure shows that approximately 60% of the population will get or already have received the COVID-19 vaccine. However, a little over 20% stated that they will not get the vaccine and nearly 20% are unsure. Using data from The Economist, we can use Microsoft Excel to recreate the figure below:

* Source: The Economist – “When a covid-19 vaccine becomes available to you, will you get vaccinated?” (link). This is for educational purposes only.

DATA

The data to recreate the figure can be found at The Economist YouGov data site (link). I also abstracted the data and saved them onto an Excel file, which is located here. Using the data from The Economist, we will create a scatter plot and use the 2-day moving average to re-create the figure above.

Creating the Scatter Plot with Moving Average

Step 1. Inspect the data

The data include the percentage of respondents who answered “Yes” (includes both Yes and those who received the vaccine), “No”, and “Not Sure”. The timeline is composed of the calendar year, month, and the week the data was collected. Notice that for some of the months, there were 5 weeks. I altered the scale so that the data would fit into a 4-week bin. Hence, you’ll notice that the weekly scale is 1, 2, 3, 3.5, and 4.

Step 2. Highlight the data and insert a line chart

Step 3. Modify the data source

Once the line chart is inserted onto the workspace, you’ll notice that the scatter plot hasn’t been created. A few more steps are required. Right-click on the chart and click on “Select Data” to open the window that will allow you to make modifications to the data source.

Click on the “Edit” option and select the rows containing the Year and Month on the data sheet. This will create the X-axis labels for our data visualization. You should notice that the X-axis labels have changed to reflect the Year and Month.

To further modify the data source, right-click on the chart again and click on “Select Data….” At the data source window, remove “Year”, “Month”, and “Week” from the legend series to only display the percentage of respondents to the survey question. Once these have been removed, click “Ok.” This will generate a link chart with the respondents’ answers to the survey question.

Step 4. Change line chart to scatter plot

Currently, the X-axis displays the data from 2021 before 2020. We can change this by right-clicking on the X-axis and then clicking on “Format Axis.” This brings up a window where we can click on the box by “Categories in reverse order” for the X-axis to display the respondents answers in chronological order.

To change the line chart to a scatter plot, right-click on the line and select  “Format Data Series…” This will open the options to modify the data on the chart. Instead of a line chart, we want a scatter plot. To make this change, click on the paint bucket and under the line group click on “No line” to remove the line chart. Then under the Marker group, select the circle under “Type.” This will replace the line chart with a scatter plot.

Repeat this for the other lines and you should have a chart that looks like the following:

Step 5. Adjust the aesthetics and add a moving average line

The final step involves making changes to the aesthetics of the chart. You can remove the labels on the Y-axis, delete the gridlines, delete the chart title, eliminate the border of the X-axis, and then delete the legend.  You can add moving average lines for each of the scatter plots by right-clicking on the data points and selecting “Add Trendline…” Then select “Moving average” with “2” periods. Change the moving average line from a dashed line to a solid line and a higher thickness.

The final figure can be achieved by changing the colors of the scatter plots and moving average lines. You add additional labels to emulate the figure from The Economist.

CONCLUSIONS

Using data from The Economist, we recreated the figure showing how the perceptions of respondents changed regarding getting a COVID-19 vaccine from 2020 to 2021. All the data files can be downloaded from here.

 

REFERENCES

1. CDC. COVID Data Tracker. Centers for Disease Control and Prevention. Published March 28, 2020. Accessed April 27, 2021. https://covid.cdc.gov/covid-data-tracker

The Economist YouGov data site (link)

 

Communicating data effectively with data visualizations: Part 35 (Heat maps – COVID-19 Deaths by Week and State)

INTRODUCTION

With the COVID-19 pandemic entering its second year, the number of deaths in the United States (US), has total over 546 thousand deaths (data from March 26, 2021). Of note is the variation in death rates across US. CDC provides a daily update on the number of deaths reported by each state (and major cities) in the US, which is accessible through their portal. However, it can be difficult to visualize all these data; proper visualizations will assist the audience with interpretability and add to the narrative that the COVID-19 pandemic has been more severe is some states compared to others. The Economist provided some helpful data visualizations for Great Britain by using heat maps to report the deviations from the expected weekly death counts due to COVID-19. Here is an example.

* Source: The Economist – Tracking COVID-19 excess deaths across countries (link). This is for educational purposes only.

From this visualization, it is clear that all sectors of Britain were hit by COVID-19 at similar times. Some areas had higher deviations from the expected mortality rates, while other had lower deviations. Additionally, there appears to be another peak in the deviations from the expected number of deaths in late 2020. Using this example as a model, we can recreate this heat map for the US. Fortunately, The Economist has a GitHub portal with all the data for different countries including the US.

DATA

Data for this exercise is located the The Economist GitHub portal. We will use the “covid-19-excess-deaths-tracker,” which includes data on the deviation from the expected deaths (%). Data on the deviation from the expected deaths (%) for the US will be used to emulate the example from The Economist.

 

Heat map of deviations from the number expected death (%) for the US.

We will use Excel to create a heat map of the weekly deviations from the number of expected death because of COVID-19 in the US from January 2020 to February 2021.

Step 1. Inspect the data:

The data for this exercise is located here. Download and open the file. Review the data under the “exercise data” sheet. It should look like the following:

We will manipulate this data so copy and paste this into a new sheet.

Step 2. Add spaces:

One of the elements that The Economist added to their heat map were spaces between each state or territory.

This can be accomplished manually by inserted a row between each state or territory. However, when there are a lot of rows, this can be a burden. A systematic method includes a couple of columns (number and alternate) to create the spaces we need. These will represent the spaces between the states/territories.

Once you have created these, select the entire data set. Then sort them by “number” and then by “alternate”; see picture below:

You should have spaces between each state/territory now.

Step 3. Change column width and row height:

Select the columns and right click to change the width. Set the value to 2. Once you done this, the weekly deviations should look like the following:

Next, use the Filter feature in Excel to only select the “alternate” values  of 1. This should only show values for the “alternate” column that equal to 1. Select the rows where the column values are 1 and right-click. This will allow you to select “Row Height” so that you can change the value to 25. Please see the illustration on how to do this below:

Now that you’ve change the row height for the “alternate” column values for 1, we will change the row height for the “alternate” column values for 0 and 0.5 using the same method. The only difference is that the row height will be 5 instead of 25. See the illustration below on how to perform this step:

Step 4. Add condition formatting rules:

Now that the data column width and row height have been set properly, you can add the conditional formatting rules to change the color of the boxes to reflect the deviations from the number of expected death (%) due to COVID-19.

First, select all the data. Then go to the Home tab and select “Conditional Formatting.” Make sure to click on “New Rule...” Select the “3-Color Scale” because we will use the gradient levels to create our heat map. The darker color will denote the higher deviations and the lighter color will denote the low or no deviations from the number of expected deaths (%). Make sure to change the “Type” to “Number.” You can enter any values that work for you, but for this example, use “0”, “0.5”, and “1” for the “Minimum”, “Midpoint”, and “Maximum” types, respectively.

Next, with the data still selected, right-click on the data to open a window. Select “Format Cells” because we want to remove the values from the cells. In the Number tab, select “Custom” and enter the following: "";"";"";"". This will change the number values so that it doesn’t show in the cells.

The Excel file should start looking like a heat map. However, there are a few more features we can do to emulate the heat map on The Economist website.

Step 5. Adding borders to the heat map:

After formatting the cells so that the numbers are not visible, you can add borders to the heat map. Select the heat map area, right-click anywhere on the data and select “Format Cells”; you will select the “Border” tab and change the border color to white. Then apply the borders to the heat map.

Once you’ve applied the border color change, the heat map should look like the following:

Next, we will add a border to separate the states/territories from each other.

Use the “Filter” feature and select rows where the “alternate” values are 0.5. The rows will be very narrow since we set them to a value of 5 earlier. Now, select the data area including the state/territories column and right-click on the data. Select “Format Cells” and then change the border color to a light shade of gray and apply this to the top border. This will apply a light gray border to separate the states/territories from each other. See illustration below for how to perform this step.

Once you’ve added the border, the heat map should look like the following:

Step 6. Final steps:

In the last steps, you can change the font, hide some of the values, and add a label to identify the weekly deviations. For the final figure, I hid the “alternate” and “number” columns, and I hid the weekly labels. I replaced the weekly labels with a more informative timeline to based on quarters. I also rearranged the states/territories so that I had the United States at the top and some of the more active states/territories at the top (e.g., New York and New York City). This should be remarkably similar to the heat map developed by The Economist. The final figure can be downloaded from the following link.

CONCLUSIONS

Using the data from The Economist, we recreated their heat map for the US. We can visualize when the COVID-19-related deaths exceeded the expected all-cause mortality deaths on a weekly basis from January 2020 to February 2021. For example, New York City was hit with a high percentage of COVID-19 deaths around April 2020, which was also reported for The District of Columbia, New York State, and New Jersey. However, North Dakota, South Dakota, California, and Arizona were hit with high percentages of COVID-19 deaths that exceeded the expected number of deaths later in the year.

Heat maps provide us with a useful visual to compare and contrast different states/territories and the severity of COVID-19 deaths using simple tools provided by Excel.

 

REFERENCES

I used the lessons from a previous blog on developing heat maps (link).

Data on deviations from the number of expected deaths (%) in the US was provided by The Economist (link). However, I cleaned the data so that you can use it for your own exercise (link).