March 15, 2022

Communicating data effectively with data visualization: Part 40 (Percentage of population with COVID-19 vaccination)

March 15, 2022

INTRODUCTION

Since the COVID-19 pandemic began on 12 December 2019, there have been several innovations in the forms of vaccinations and treatments. Of critical importance is the world-wide international support for COVID-19 vaccination. A timeline of the COVID-19 pandemic can be found at the CDC website.

The website informationisbeautiful.net has a great COVID-19 dashboard that ranks countries that have administered vaccinations.

Source: www.informationisbeautiful.net (https://informationisbeautiful.net/visualizations/covid-19-coronavirus-infographic-datapack/)

TUTORIAL

We will recreate this chart using Excel. Data can be download from my Dropbox folder or from Our World in Data website.

The data will contain information for 39 countries. These variables include the total number of vaccinations, the number of people vaccinated, the number of people who are fully vaccinated, the number of boosters, the total population, percentage of the population who are vaccinated, percentage of the population who are fully vaccinated, and the percentage of the population with boosters.

We will create a vertical bar chart for the percentage of the total population that received at least one COVID-19 vaccination. You can use these methods to recreate the vertical bar charts of the remaining figures in the figure above.

Visually inspect the data. It should look like the following.

We will us the “total_vaccinations,” “percent_vaccinated,” “percent_full_vaccinated,” and “percent_boosted” to generate the figures.

Step 1: Insert the vertical bar chart.

Once the vertical bar chart is inserted onto the Excel sheet, right click on the chart and click on “Select data.” We want to highlight the data column titled, “percent_fully_vaccinated.” For the “Series name,” enter “% fully vaccinated.”

Step 2. Edit the labels.

In the “Horizontal (Category) Axis Labels,” click on the “Edit” button and select the
location.” In the “Axis label range,” highlight the values in the “location” column. This will assign each value of the “percent_fully_vaccinated” with the country.

Step 3. Delete unnecessary labels and titles.

The figure should look like the following. We can delete the x-axis, the axis title, and the chart title. Extend the chart so that you can view all the countries. You can also narrow the chart so that it will resemble the size of the informationisbeautiful.net figures.

Step 4. Create a progress bar

We will add a progress bar, which will allow us to see how much of the population is fully vaccinated. This requires us to use the “full_bar” column which has a value of “1” for each country. This value represents a 100% progress goal.

Right-click on the chart and add a new data entry. Select the data under the column called “full_bar.” The values are all “1” to represent a 100% progress bar.

The chart will have two bars for each country; one bar represents the percentage of the population that is fully vaccinated, and other bar represents 100% progress. We need to edit the overlay and the width of the bar to match the ones on the informationisbeautiful.net charts.

Right-click the chart and select “Format Data Series…” Change the “Series Overlap” to 100% and the “Gap Width” to 40%. You may not see the blue bar chart because the red bar chart will be in front of it. To fix this, go to the “Select Data Source” window and use the arrow to reposition the data series. This will place the blue bar chart in front of the red bar chart.

Step 5. Add the data labels.

The next step will include the data labels to the progress bar. We want to have the data labels aligned on the right. When you right-click the blue chart, you will get the data labels to be aligned to the left or the end of the bar. This does not get us to the data labels aligned on the right.

To do this, we need to right-click the red bar (not the blue bar) and click on “Add data labels.” This will all the data value for the progress bar, which is “1.”

Right-click on one of the data labels that has a “1” and select “Format Data Labels…” This will open a window where you will need to check the box next to “Value From Cells.” When prompted, select the values in the “percent_fully_vaccinated” column. This will add the percentage of the population that is fully vaccinated on the right-side of the bar chart. Additionally, uncheck the box next to “Value.” This will remove the value from the bar chart so that you will no longer see the “1.”

The chart should be updated to have the percentage of the population who are fully vaccinated aligned to the right-side of the bar chart.

Step 6. Modify the chart to improve the aesthetics.

Once you’ve added the data labels and aligned them on the right-side of the bar chart, you can start to edit the font color, bar colors, and delete the grid lines to emulate the chart in informationinbeautiful.net website.

I changed the background to a dark gray and the data labels to a white color. I also changed the font to Arial Narrow.

I only presented part of the figure here. But you can download and view the whole figure from my Drobox folder.

CONCLUSIONS

Using the vertical bar chart with progress bars allow us to emulate the figures generated by informationisbeautiful.net. I like these types of charts for data visualization because they provide some indication of progress. You can see that the United States is behind many nations when it comes to having the population fully vaccinated. This is much lower than the UAE and South Korea where 95.6% and 86.6% of the total population are fully vaccinated. Hopefully, you can use this exercise to help you develop similar charts for your data visualizations.

REFERENCES

Data was access using the Our World in Data COVID-19 site.

The data visualization that inspired this exercise was based on informationisbeautiful.net COVID-19 Coronavirus Data Dashboard.

The Excel file with the data used for this tutorial is available on my Dropbox folder.

Mark Bounthavong

January 18, 2022

Data visualization

Communicating data effectively with data visualization: Part 39 (Heatmaps of COVID-19 deaths)

Mark Bounthavong

January 18, 2022

Data visualization

INTRODUCTION

I wanted to incorporate a heatmap that illustrated the death rates (per 100,000 population) across time in the United States. But I also wanted to show when the coronavirus pandemic 2019 (COVID-19) vaccine was introduced and how it impacted death rates. I thought that a heatmap would do a nice job of illustrating this.

The data visualization by Tynan DeBold and Dov Friedman from the Wall Street Journal has a great visualization on the impact of vaccines for various disease from the measles to smallpox on death rates (See figure below). This heatmap shows the number of measles cases per 100,000 population between 1920 and 2000. Each row represents a state or territory of the United States (U.S.). In this tutorial, we’ll create a similar heatmap for COVID-19 deaths.

Source: **Tynan DeBold** and **Dov Friedman**, Wall Street Journal (link)

I set out to create my own heat map with COVID-related death rates using data from the Centers for Disease Prevention and Protection (CDC). The CDC provides a dashboard to visualize the trends in death rates by states and U.S. territories (Compare Trends in COVID-19 Cases and Deaths in the US). However, the data was not compiled in an easy manner. You can only visualize 6 territories at a time. I was able to download all the data and compile this into a single file for this tutorial, which you can download here. Use the file with the *.xlsm extension, which supports macros.

TUTORIAL

Step 1. Download the Excel file with the data. Use the data from the “data” tab. Inspect the data. The columns represent the weekly death rate (7-day average number of deaths per 100,000 population). The rows represents the states and U.S. territories.

Step 2. Use the VBA macro. In a previous article, I explained how to create a heatmap with different gradient levels. We will use a modified version of the macro for this exercise.

This is the VBA macro that we’ll use (link). Don’t be intimidated by this. I’ll go over how to use this code

I start by determining the number of gradient levels for the heatmap. The average death rate was 0.35 per 100,000 population, so I generated 20 levels of gradient (0 to 0.999, 1.0 to 1.999, 2.0 to 2.999, etc). I wanted a “blue” shade for this heatmap, so I had to figure out the RGB scheme for each level. I identified the RGB color scheme using a gradient generator by ColorDesigner. RGB code uses three values to represents the main color on the spectrum (red, green, blue).

Once you have the RGB codes for the gradient levels, you can edit the VBA macro.

In the Developer tab, click on “Visual Basic.” Make sure that the Developer tab is viewable on the Ribbon. If it is not, then you can activate this by going to the File > Options > Customize Ribbon and activate it by entering a check by the Ribbon box.

The Visual Basic interface is a separate window that pops up.

In the “Sub ChangeCellColor()” macro, we’re going to include 20 gradient levels. It’s important to make sure the Range() includes the data that we’re interested in modifying. Since the first cell is in A1 and the last cell is in CZ61, the range is Range("A1:CZ61").

Then we include the 20 gradient levels by changing the Case statement with the corresponding RGB codes. As you modify each Case statement, make sure to change the value ranges for each statement. For example, if you want to apply the RGB code for (18, 123, 141), the death rate range is 1.8 to 1.8999999. You can do this for all the gradient levels.

Here is an example:

    Case 1.8 To 1.8999999
         oCell.Interior.Color = RGB(18, 23, 141)
         oCell.Font.Bold = True
         oCell.Font.Color = RGB(18, 23, 141)
         oCell.Font.Name = "Times New Roman"
         oCell.HorizontalAlignment = xlCenter

Case 1.8 to 1.8999999 denotes the range of the values in each cell (7-day average deaths per 100,000 population).

oCell.Interior.Color = RGB(18, 23, 141) denotes the RGB color scheme for our gradient

oCell.Font.Bold = True denotes that the font is bolded

oCell.Font.Color = RGB(18, 23, 141) denotes that the font color matches the cell color

oCell.Font.Name = "Times New Roman" denotes that the font is Times New Roman

oCell.HorizontalAlignment = xlCenter denotes that the value is aligned in the center

After you’ve adjusted your code, you can execute the macro. To execute the macro, go to the Ribbon and select “Macros.” The Macro window will appear with three macros. Select the “ChangeCellColor” macro and click “Run.” This should execute the macro, and you will notice that the data will start to change color to the corresponding gradient values.

To sort by the last column, select the “SortColumn” macro and click “Run.”

To create white borders around the cells, select on the “WhiteOutlineCells” macro and click “Run.”

Step 4. Final touches. You can select the columns and change width to 2.

Once you have the correct cell sizes, you can start to add labels to the file. I included a line to delineate when the first vaccine was introduced and a line for when the president announced that COVID-19 was a national emergency. I also added labels to the bottom part of the table to indicate dates along the timeline. The rows represented the states and U.S. territories.

CONCLUSIONS

The number of deaths was high early in the pandemic in a few select places in the U.S. As the vaccine is introduced, the number of deaths reached a zenith around December 2020 before falling to low levels in February 2021. Then, the death rate started to increase around the beginning of July 2021. Based on the heatmap, the vaccine may have resulted in a decrease in deaths. But the death rate increased approximately 6 months later in what appears to be the beginning of a seasonal pattern. It is unclear whether the introduction of new variants causes the increased death rate, but there is speculation that it may be a contributor. This heatmap does not generate any claims to what is actually happening; it only provides a visual of the patterns that are reported across each U.S. state and territory.

REFERENCES

I took inspiration from the data visualization by Tynan DeBold and Dov Friedman from the Wall Street Journal.

Date for this exercise came from the CDC (link).

A previous article on how to create heatmaps is available here.

I used the Gradient Generator by ColorDesigner to find out the RGB values for my gradient levels.

Mark Bounthavong

August 30, 2021

Data visualization, Methods, Commentary

Reproduction number—COVID-19

Mark Bounthavong

August 30, 2021

Data visualization, Methods, Commentary

BACKGROUND

As the COVID-19 pandemic, which began in December 2019, continues into its second year, public health measures have been put into place to mitigate its spread. At the time of writing this article, there have been over 4.5 million deaths and over 216 million cases due to COVID-19.[1] Surveillance of COVID-19 remains an important public health measure of understanding the spread and impact. Daily reports such as the John Hopkins COVID-19 dashboard provide end users with visual and statistical information about the surges in cases and deaths associated with COVID-19. However, one measure that is of great interest is the reproduction number or R0.

Reproduction number (R0) and effective reproduction number (Rt)

The reproduction number is the number of new cases that is directly caused by exposure to a single case.[2,3] Figure 1 provides a visual explanation of the basic reproduction number. However, the underlying assumption with R0 is that everyone in the population is susceptible to infection. With the introduction of vaccines, the R0 isn’t a good measure of the reproductive capabilities of COVID-19. Instead, the effective reproduction number (Rt) is used to provide a more realistic reproduction number based on the population being infected, recovered, or vaccinated. The Rt changes over time as the population susceptible to infection changes.

Figure 1. Basic reproduction number.

I wanted to create a figure that would highlight the changes associated with the Rt for each state in the United States. To do this, I downloaded the Rt data from the by Xihong Lin's Group in the Department of Biostatistics at the Harvard T.H. Chan School of Public Health. They have an amazing COVID-19 tracker dashboard that captures the changing patterns of Rt for each state. Then I created a Cleveland plot to show where the Rt was near the beginning of the pandemic and where it is currently (August 2021). (Note: I wrote a tutorial on creating Cleveland plots that you can review here.) Here is the final figure (because of the length of the figure, I cropped it to show the first 30 states or territories):

Figure 2. Effective reproduction number (Rt) for U.S. states and territories, April 17, 2020 (past) to August 14, 2021 (recent).

The blue dots denote the most recent effective reproduction number (14 August 2021) and the past dots denote the earliest effective reproduction number (17 April 2020).

It seems that some states have gotten worse in terms of increase effective reproduction number since the beginning of the pandemic. This could be due to lack of good data in the early phases of the pandemic. However, what is of concern is the high effective reproduction numbers in some states (Rt > 2), which indicates that the pandemic is still spreading at an alarming rate.

There were some missing data which are identified by a single dot (blue or red) or an empty field in the recent or past effective reproduction number. Rather than fill these in, I left them empty. There may be data in between the two time periods that I could have used, but I left those out.

One thing to mention is that this Cleveland plot only tells us one dimension of the effective reproduction number story (the difference between the most recent Rt and the earliest Rt). It doesn’t tell us much about how the effective reproduction number changes across time. For that, I direct your attention to the Lin’s Laboratory Group at Harvard, they have a great figure that shows the fluctuation of the effective reproduction number for the U.S. and its states/territories (see example):

Source: Lin’s Laboratory Group at Harvard (link). [last accessed on 30 August 2021].

CONCLUSIONS

The effective reproduction number provides us with some interesting patterns in spread of COVID-19 by states/territories. It seems to have worsened over time, but this could be due to poor data early in the pandemic. There are some issues with the us of effective reproduction number for policy decisions. Reporting delays can impact the estimates for the effective reproduction number. A technique called “nowcasting” is used to estimate the reproduction number.[3] But when I explored some of the work in this area, there appears to be a variety of methods for performing this technique. Despite this limitation, the effective reproduction number may be useful to evaluate public health policy decisions to reduce the spread of the COVID-19 pandemic.[4,5]

DATA SOURCE

I provided the link to the COVID-19 Spread Tracker from the Lin Lab at Harvard. You can also download a curated version of the data for this article from my Dropbox folder. The data are current as of 17 August 2021. If you’re interested in recreating this Cleveland plot, I recommend downloading the most recent data to see how much the effective reproduction number has changed.

REFERENCES

Worldometeres.info. COVID Live Update: 217,770,381 Cases and 4,521,936 Deaths from the Coronavirus - Worldometer. Accessed August 30, 2021. https://www.worldometers.info/coronavirus/
Lim J-S, Cho S-I, Ryu S, Pak S-I. Interpretation of the Basic and Effective Reproduction Number. J Prev Med Pub Health. 2020;53(6):405-408. doi:10.3961/jpmph.20.288
Adam D. A guide to R — the pandemic’s misunderstood metric. Nature. 2020;583(7816):346-348. doi:10.1038/d41586-020-02009-w
Inglesby TV. Public Health Measures and the Reproduction Number of SARS-CoV-2. JAMA. 2020;323(21):2186-2187. doi:10.1001/jama.2020.7878
Pan A, Liu L, Wang C, et al. Association of Public Health Interventions With the Epidemiology of the COVID-19 Outbreak in Wuhan, China. JAMA. 2020;323(19):1915-1923. doi:10.1001/jama.2020.6130

Mark Bounthavong

June 24, 2021

Data visualization

Communicating data effectively with data visualizations: Part 38 (Replicating the LA Times COVID-19 tracker)

Mark Bounthavong

June 24, 2021

Data visualization

INTRODUCTION

Recently, the staff at the Los Angeles Times (LA Times) provided a COVID-19 tracker on their website. This is an impressive set of data visualizations of COVID-19 cumulative cases, new cases, vaccinations, and deaths. I was particularly struck by the “New cases by day” figure which includes a bar chart overlaid with a 7-day moving average line chart. The visualization effectively used the moving average to adjust for the spikes in new COVID-19 cases but maintained the spikes on a daily basis. None of the data are lost and illustrates the spikes in new COVID-19 cases while adjusting for the moving average. The color schemes were also optimal where the daily new cases used a softer color, but the moving average line used a darker color highlighting its importance in the figure.

I wanted to write an article on how to replicate this figure using Excel.

Source: Los Angeles Times, “Tracking the coronavirus in California,” url: https://www.latimes.com/projects/california-coronavirus-cases-tracking-outbreak/ [Accessed on June 24, 2021] * This is for educational purposes only.

DATA SOURCE

Data used in this article can be found on the LA Times GitHub site. I used the “latimes-county-totals.csv” data (link to the raw data). I also made the data available with the final figure on the following Dropbox location.

TUTORIAL

Step 1. Download and visually inspect the data.

After you’ve downloaded the data, take a moment to inspect them. The columns that are used in this tutorial are “date” and “new_confirmed_cases.” But you can use the other columns to replicate other parts of the LA Times COVID-19 tracker.

Step 2. Insert a bar chart and select the appropriate data.

Insert a clustered column chart using the Insert tab on the Excel ribbon. When selecting the data, make sure that you select the “new_confirmed_cases” (other data are available, but the new cases are what we are replicating in this exercise).

The default bar chart does a pretty good job of replicating the LA Times figure.

However, we’ll have to do a few edits to the axes to match the LA Times figure.

Step 3. Modify the axes.

Let’s focus on the Y-axis first. Right-click on the Y-axis and select “Format Axis…” In the Axis Options panel, change the Minimum value to 0 and the Major value to 20000. This will match the settings in the target figure. (Note: There are negative values in the data, but these are very small numbers and assumed to be ignorable.) Next, in the Number options, change Category to “Number” and the value in the “Decimal places” to 0. Make sure that you check the box next to “Use 1000 Separator (,)” to replicate the same format in the target figure.

For the X-axis, right-click on the bottom axis and select “Format Axis…”This will open the Axis Options panel where you can make several adjustments to the X-axis. First, we want to change the X-axis display values from dates to months. Change the Number Category field to “Custom” then change the Format Code to “mmm”; make sure to click on “Add” for the changes to take effect. Next, go to the Axis Type area and change the Minimum to “02/01/2020” since we want our timeline to begin on Feb of 2020. Then change the Major value to 4 to match the monthly interval of the target figure. The X-axis should be thicker with tick marks on the outside. To modify these, navigate to the Tick Marks option and change Major type to “Outside” and then click on the Paint Bucket (Fill & Line) option; increase the Width to 1.5. These should match the target figure’s X-axis format.

Step 4. Add the 7-day moving average.

Excel has a Data Analysis tool that will automatically estimate the 7-day average. I’ve written a previous tutorial that describes how to use this tool. I’ll briefly review how to estimate a 7-day moving average.

In the Data tab, click on the Data Analysis tool (instructions on how to install the Data Analysis tool is here). This will open the Data Analysis Tools box. Select “Moving Average” from the tools kit and enter the appropriate values in the options box. For the Input Range, select all the values from “new_confirmed_cases” column. Enter a value of “7” in the Interval field; this will automatically calculated the 7-day moving average. In the Output Range, select a single cell where you want to moving average to be pasted after it is calculated. I chose to use the next available cell on the dataset ($F$2).

Step 5. Add the 7-day moving average to the chart.

To include the moving average data to the current daily new cases bar chart, right-click on the chart and select “Select Data.” This will open a box where you can add new data. Select “Add” which will open the “Edit Series” box. Updates the Series name with the name of the column (“moving_avg”). For the Interval field, change this to “7” for the 7-day moving average. Then in the Series values, select the 7-day moving average data.

By default, Excel will generate a bar chart for the 7-day moving average. However, we want a link chart. We can change this by right-clicking on the bars of the chart and selecting “Change Series Chart Type…” This will open a box that will allow us to select the type of chart for each data. For the “moving_avg” data, change the Chart Type to “Line.” This will create a line chart for the 7-day moving average which will be overlaid over the daily new cases.

Step 6. Modifying the chart format.

To closely match the chart to the one presented in the LA Times, I made the following adjustments. Your mileage may vary depending on the library of fonts available. I tried to select fonts that most Excel users will have access to.

I changed the Y-axis font to Adabi script. The X-axis font was changed to Arial Nova.

The width of the horizontal gridlines was increased to 1.5. The color of the daily new cases bar chart was changed to a light blue using a hex code of #8DC6DF. The color of the 7-day moving average was changed to a dark blue using a hex code of #2B869B; additionally, the width was increased to 2.0.

Step 7. Comparison between LA Times and user-generated charts.

Once the modifications have been made, compare the charts.

CONCLUSIONS

Using data from the LA Times, we can replicate the data visuals on their COVID-19 tracker website. This allows users to verify the data that are presented on a public site. Additionally, it allows us to generate our own data visualizations that could inform policy and education the public on the rate of new cases in California.

REFERENCES

Data was based on the LA Times (link), which was accessed on 24 June 2021.

Excel file used for this exercise can be download from the following Dropbox folder.

ACKNOWLEDGEMENT

The data visual used in this exercise was based on the work of the staffers at the LA Times. They deserve all the credit and acknowledgement for developing these stunning visuals.

Mark Bounthavong

May 31, 2021

Data visualization

Communicating data effectively with data visualizations: Part 37 (visualization COVID vaccinations by race, gender, and age)

Mark Bounthavong

May 31, 2021

Data visualization

INTRODUCTION

The introduction of COVID-19 vaccines sparked a race to administer as many people as possible in the United States (US). To date (31 May 2021), there have been over 167 million people with at least one dose of the vaccine administered according to the Centers for Disease Control and Prevention (CDC).1 However, there have been some concerns about disparities in vaccine coverage among racial, gender, and age groups. The CDC Covid Data Tracker indicates that a disproportionate number of racial groups are not receiving the vaccine. Additionally, there are a disproportionate number of older people who have received at least one dose of the vaccine and less males receiving the vaccine compared to the percentages that make up the US population. We will use simple data visualization techniques to characterize the vaccination of racial groups in the US using data from the CDC.

DATA SOURCE

Data used in this article are located at the CDC’s website (link).

There are several pieces of data that are important. The “% Persons at least One Dose” represents the percentage of people who received at least one dose in the US among those with data on race. The “% US population” represents the percentage of the US population made up by the group. For example, among those who received vaccine, 14.2% identified as Hispanic/Latino, which makes up 17.2% of the entire US population.

VACCINATIONS BY RACE

A horizontal bar chart was constructed to visually compare the percentage of people in different racial groups who received a vaccine and the percentage of people that made up that racial group in the US.

Based on this chart, the percentages of Hispanic/Latino and Black, non-Hispanic who received vaccines are less than the percentages that make up the US population. This is in contract to the White non-Hispanic and Asian, non-Hispanic racial groups who have a similar percentages of people who received the vaccine and make up the US population. This figure gives us an illustration on the potential disparities associated with vaccinations.

However, another way to view this figure is to look at the differences in the percentages that received vaccination and make up the US population. With the figure below, we can easily see the disparities.

VACCINATIONS BY GENDER

According to the CDC, the percentage of males who received vaccination is lower than the percentage that make up the US population. Females have the opposite issue; the percentage of females who received vaccination exceed the percentage the make up the US population.

We can also view this as the differences in the percentages that received COVID-19 vaccination and make up the US population. The percentages of females who received a vaccination exceeds the percentage that makes up the US population, but the percentage of males who received a vaccination is below the percentage that makes up the US population.

VACCINATIONS BY AGE GROUPS

According to the CDC, most of the vaccinations occurred among age groups of 40-49, 50-64, 65-74, and 75+ years. However, fewer percentages of the younger age groups have received the vaccination compared to the percentages that make up the US population.

When we change the metric to look at the differences in the percentages that received the vaccination and the percentages that make up the US population, it is apparent that younger age groups are not receiving the vaccines. However, this is a reflection of the vaccination roll out when early on older age groups were given priority.

CONCLUSIONS

Using simple data visualization techniques and differences in the percentages of vaccination and US population composition, we can graphically display disparities in vaccinations by racial, gender, and age groups. It is unclear what causes these disparities, but it will be paramount that any barriers to a fair and equitable distribution of healthcare be addressed. The biggest challenge will be to establish healthcare policies that will distribute vaccines equitably to all patients in the US.

REFERENCES

Centers for Disease Control and Prevention. COVID Data Tracker. Centers for Disease Control and Prevention. Published March 28, 2020. Accessed May 31, 2021. https://covid.cdc.gov/covid-data-tracker

Mark Bounthavong

April 27, 2021

Data visualization

Communicating data effectively with data visualizations: Part 36 (Scatter plots and Moving Averages for COVID-19 vaccination)

Mark Bounthavong

April 27, 2021

Data visualization

INTRODUCTION

As of 27 April 2021, the Centers for Disease Control and Prevention (CDC) reported that approximately 42.7% of the U.S. population have received at least one dose of the coronavirus (COVID-19) vaccine and 29.1% are fully vaccinated.1 Moreover, a total of over 200 million doses have been administered. Despite the effectiveness of the availability of COVID-19 vaccines, there are some people who have decided to not receive the vaccine. The Economist published a data visualization that summarized the results of weekly surveys that asked respondents, “When a covid-19 vaccine becomes available to you, will you get vaccinated?” This figure shows that approximately 60% of the population will get or already have received the COVID-19 vaccine. However, a little over 20% stated that they will not get the vaccine and nearly 20% are unsure. Using data from The Economist, we can use Microsoft Excel to recreate the figure below:

* Source: The Economist – “When a covid-19 vaccine becomes available to you, will you get vaccinated?” (link). This is for educational purposes only.

DATA

The data to recreate the figure can be found at The Economist YouGov data site (link). I also abstracted the data and saved them onto an Excel file, which is located here. Using the data from The Economist, we will create a scatter plot and use the 2-day moving average to re-create the figure above.

Creating the Scatter Plot with Moving Average

Step 1. Inspect the data

The data include the percentage of respondents who answered “Yes” (includes both Yes and those who received the vaccine), “No”, and “Not Sure”. The timeline is composed of the calendar year, month, and the week the data was collected. Notice that for some of the months, there were 5 weeks. I altered the scale so that the data would fit into a 4-week bin. Hence, you’ll notice that the weekly scale is 1, 2, 3, 3.5, and 4.

Step 2. Highlight the data and insert a line chart

Step 3. Modify the data source

Once the line chart is inserted onto the workspace, you’ll notice that the scatter plot hasn’t been created. A few more steps are required. Right-click on the chart and click on “Select Data” to open the window that will allow you to make modifications to the data source.

Click on the “Edit” option and select the rows containing the Year and Month on the data sheet. This will create the X-axis labels for our data visualization. You should notice that the X-axis labels have changed to reflect the Year and Month.

To further modify the data source, right-click on the chart again and click on “Select Data….” At the data source window, remove “Year”, “Month”, and “Week” from the legend series to only display the percentage of respondents to the survey question. Once these have been removed, click “Ok.” This will generate a link chart with the respondents’ answers to the survey question.

Step 4. Change line chart to scatter plot

Currently, the X-axis displays the data from 2021 before 2020. We can change this by right-clicking on the X-axis and then clicking on “Format Axis.” This brings up a window where we can click on the box by “Categories in reverse order” for the X-axis to display the respondents answers in chronological order.

To change the line chart to a scatter plot, right-click on the line and select “Format Data Series…” This will open the options to modify the data on the chart. Instead of a line chart, we want a scatter plot. To make this change, click on the paint bucket and under the line group click on “No line” to remove the line chart. Then under the Marker group, select the circle under “Type.” This will replace the line chart with a scatter plot.

Repeat this for the other lines and you should have a chart that looks like the following:

Step 5. Adjust the aesthetics and add a moving average line

The final step involves making changes to the aesthetics of the chart. You can remove the labels on the Y-axis, delete the gridlines, delete the chart title, eliminate the border of the X-axis, and then delete the legend. You can add moving average lines for each of the scatter plots by right-clicking on the data points and selecting “Add Trendline…” Then select “Moving average” with “2” periods. Change the moving average line from a dashed line to a solid line and a higher thickness.

The final figure can be achieved by changing the colors of the scatter plots and moving average lines. You add additional labels to emulate the figure from The Economist.

CONCLUSIONS

Using data from The Economist, we recreated the figure showing how the perceptions of respondents changed regarding getting a COVID-19 vaccine from 2020 to 2021. All the data files can be downloaded from here.

REFERENCES

1. CDC. COVID Data Tracker. Centers for Disease Control and Prevention. Published March 28, 2020. Accessed April 27, 2021. https://covid.cdc.gov/covid-data-tracker

The Economist YouGov data site (link)

Mark Bounthavong

March 26, 2021

Data visualization

Communicating data effectively with data visualizations: Part 35 (Heat maps – COVID-19 Deaths by Week and State)

Mark Bounthavong

March 26, 2021

Data visualization

INTRODUCTION

With the COVID-19 pandemic entering its second year, the number of deaths in the United States (US), has total over 546 thousand deaths (data from March 26, 2021). Of note is the variation in death rates across US. CDC provides a daily update on the number of deaths reported by each state (and major cities) in the US, which is accessible through their portal. However, it can be difficult to visualize all these data; proper visualizations will assist the audience with interpretability and add to the narrative that the COVID-19 pandemic has been more severe is some states compared to others. The Economist provided some helpful data visualizations for Great Britain by using heat maps to report the deviations from the expected weekly death counts due to COVID-19. Here is an example.

* Source: The Economist – Tracking COVID-19 excess deaths across countries (link). This is for educational purposes only.

From this visualization, it is clear that all sectors of Britain were hit by COVID-19 at similar times. Some areas had higher deviations from the expected mortality rates, while other had lower deviations. Additionally, there appears to be another peak in the deviations from the expected number of deaths in late 2020. Using this example as a model, we can recreate this heat map for the US. Fortunately, The Economist has a GitHub portal with all the data for different countries including the US.

DATA

Data for this exercise is located the The Economist GitHub portal. We will use the “covid-19-excess-deaths-tracker,” which includes data on the deviation from the expected deaths (%). Data on the deviation from the expected deaths (%) for the US will be used to emulate the example from The Economist.

Heat map of deviations from the number expected death (%) for the US.

We will use Excel to create a heat map of the weekly deviations from the number of expected death because of COVID-19 in the US from January 2020 to February 2021.

Step 1. Inspect the data:

The data for this exercise is located here. Download and open the file. Review the data under the “exercise data” sheet. It should look like the following:

We will manipulate this data so copy and paste this into a new sheet.

Step 2. Add spaces:

One of the elements that The Economist added to their heat map were spaces between each state or territory.

This can be accomplished manually by inserted a row between each state or territory. However, when there are a lot of rows, this can be a burden. A systematic method includes a couple of columns (number and alternate) to create the spaces we need. These will represent the spaces between the states/territories.

Once you have created these, select the entire data set. Then sort them by “number” and then by “alternate”; see picture below:

You should have spaces between each state/territory now.

Step 3. Change column width and row height:

Select the columns and right click to change the width. Set the value to 2. Once you done this, the weekly deviations should look like the following:

Next, use the Filter feature in Excel to only select the “alternate” values of 1. This should only show values for the “alternate” column that equal to 1. Select the rows where the column values are 1 and right-click. This will allow you to select “Row Height” so that you can change the value to 25. Please see the illustration on how to do this below:

Now that you’ve change the row height for the “alternate” column values for 1, we will change the row height for the “alternate” column values for 0 and 0.5 using the same method. The only difference is that the row height will be 5 instead of 25. See the illustration below on how to perform this step:

Step 4. Add condition formatting rules:

Now that the data column width and row height have been set properly, you can add the conditional formatting rules to change the color of the boxes to reflect the deviations from the number of expected death (%) due to COVID-19.

First, select all the data. Then go to the Home tab and select “Conditional Formatting.” Make sure to click on “New Rule...” Select the “3-Color Scale” because we will use the gradient levels to create our heat map. The darker color will denote the higher deviations and the lighter color will denote the low or no deviations from the number of expected deaths (%). Make sure to change the “Type” to “Number.” You can enter any values that work for you, but for this example, use “0”, “0.5”, and “1” for the “Minimum”, “Midpoint”, and “Maximum” types, respectively.

Next, with the data still selected, right-click on the data to open a window. Select “Format Cells” because we want to remove the values from the cells. In the Number tab, select “Custom” and enter the following: "";"";"";"". This will change the number values so that it doesn’t show in the cells.

The Excel file should start looking like a heat map. However, there are a few more features we can do to emulate the heat map on The Economist website.

Step 5. Adding borders to the heat map:

After formatting the cells so that the numbers are not visible, you can add borders to the heat map. Select the heat map area, right-click anywhere on the data and select “Format Cells”; you will select the “Border” tab and change the border color to white. Then apply the borders to the heat map.

Once you’ve applied the border color change, the heat map should look like the following:

Next, we will add a border to separate the states/territories from each other.

Use the “Filter” feature and select rows where the “alternate” values are 0.5. The rows will be very narrow since we set them to a value of 5 earlier. Now, select the data area including the state/territories column and right-click on the data. Select “Format Cells” and then change the border color to a light shade of gray and apply this to the top border. This will apply a light gray border to separate the states/territories from each other. See illustration below for how to perform this step.

Once you’ve added the border, the heat map should look like the following:

Step 6. Final steps:

In the last steps, you can change the font, hide some of the values, and add a label to identify the weekly deviations. For the final figure, I hid the “alternate” and “number” columns, and I hid the weekly labels. I replaced the weekly labels with a more informative timeline to based on quarters. I also rearranged the states/territories so that I had the United States at the top and some of the more active states/territories at the top (e.g., New York and New York City). This should be remarkably similar to the heat map developed by The Economist. The final figure can be downloaded from the following link.

CONCLUSIONS

Using the data from The Economist, we recreated their heat map for the US. We can visualize when the COVID-19-related deaths exceeded the expected all-cause mortality deaths on a weekly basis from January 2020 to February 2021. For example, New York City was hit with a high percentage of COVID-19 deaths around April 2020, which was also reported for The District of Columbia, New York State, and New Jersey. However, North Dakota, South Dakota, California, and Arizona were hit with high percentages of COVID-19 deaths that exceeded the expected number of deaths later in the year.

Heat maps provide us with a useful visual to compare and contrast different states/territories and the severity of COVID-19 deaths using simple tools provided by Excel.

REFERENCES

I used the lessons from a previous blog on developing heat maps (link).

Data on deviations from the number of expected deaths (%) in the US was provided by The Economist (link). However, I cleaned the data so that you can use it for your own exercise (link).

Mark Bounthavong

February 22, 2021

Data visualization

Communicating data effectively with data visualizations: Part 34 (Progress bars in Excel)

Mark Bounthavong

February 22, 2021

Data visualization

INTRODUCTION

With the introduction of the COVID-19 vaccines, states have monitored their vaccination progress. Current progress of each state’s vaccination rates are available at USAFacts.org. California, which has been experiencing some of the highest COVID-19 related rates and deaths, has approximately 81.2% of its current allocation of COVID-19 vaccines. Here is a screen capture of California’s progress as of 22 February 2021:

This visualization contains several progress bars showcasing the COVID-19 vaccination progress in California. We will re-create this in Excel using the data presented in this visualization.

Step 1. Enter data into Excel.

Using the visualization, we can enter the data into Excel. The data should be arranged in the following manner as reflected in Table 1:

Step 2. Select the data and insert a clustered bar chart.

The default chart is automatically generated by Excel. However, we want to have the labels for the axis to denote the legend. To do that we need to right-click on the chart area and click on “Select Data…” This will bring up a window where you will need to click on “Switch Row/Column” to change the horizontal axis labels.

Step 3. Modify chart to add data labels.

Right-click on the chart area and click on “Select the data…” This will bring up a window that will allow you to make changes to the data structure. We will unselect the “Goal” and “Current progress” data. This will only leave the Goal (%) and % bars. Next, right-click the top bar and click on “Add data labels.”

Step 4. Overlap the two bars.

Next, we want to overall the two bars so that we can show the progress of vaccinations in California. Right-click on one of the bars and click on “Format Data Series…” Change the series overlap to 100%. This will cause the two bars to overlap, which will look like a progress bar. Since we only left the data label for the top bar, we can see distinctly the vaccination progress in California.

Step 5. Modify the aesthetics.

The progress chart is modified using the Abadi font and changed the patterns and colors of the bars. I also added the title “COVID-19 vaccination progress, California.

Step 6. Add some text.

Then we can add some text to emulate the visualization from USAFacts.org. Change the order of the data so that it reflects the one on the USAFacts.org website.

Conclusions

Using the overlap feature of Excel will allow us to make progress charts. The example from California showcases how this visualization can effectively show the current vaccination progress. These types of visualizations will help decision makers monitor progress and make any changes when necessary. Files for this exercise can be downloaded here.

References

Data from USAFacts.org.

Mark Bounthavong

January 20, 2021

Data visualization

Communicating data effectively with data visualizations: Part 33 (Bar charts with secondary axis)

Mark Bounthavong

January 20, 2021

Data visualization

INTRODUCTION

Secondary axis allows us to plot two pieces of data with large differences in their scale. For instance, plotting the number of new COVID cases, which number in the hundreds of thousands, will outweigh the number of employees who lose their jobs, which number in the tens of thousands. There is an order of magnitude that makes it difficult to see how these two metrics are presented side-by-side on a single figure. However, we can take advantage of the secondary vertical axis to present both pieces of data that will be visually intuitive to interpret, while preserving the differences in their scale. One problem with this method is how Excel executes this. Instead of maintaining the bar or column chart, Excel will overlay both bars (or columns). This is illustrated in Figure 1.

Figure 1. Excel overlays the two bars in the figure when using the secondary axis.

To address this issue, we will use a method described by Steve True on his Excel Dashboard Templates Website.

MOTIVATING EXAMPLE

We will use data from the California’s Employment Development Department to visualize the number of employees who lost their jobs during the COVID-19 pandemic and compare these trends to the number of new COVID-19 cases in California. Data on the number of COVDI-19 cases in California came from the California Data Portal. You can also download the Excel exercise file from the following shared Dropbox location.

Step 1. Selecting the data.

First, inspect the data. There are three columns of data the contain the month, number of employees affected, and the number of new COVID-19 cases. Next select the data and insert a “Clustered Column” chart.

Excel will automatically generate a figure where the dates are on the X-axis and the values for the metrics are on the Primary Y-axis.

Notice that the Number of new cases is exponentially larger than the number of employees who lost their jobs during the pandemic. It is difficult to see how the pandemic impacts the number of employees who’ve lost their jobs. To address this, we’ll use the secondary axis for the Number of new cases.

Step 2. Select the secondary axis.

To select the secondary axis, right-click on one of the orange bars that denotes the Number of new cases. This will open a window where you can select “Format Data Series…” Once you’ve done this, select the “Secondary Axis.” This will change your current bar chart into a chart with two axes.

The problem with this updated chart is the way Excel overlays the bars. Here is an illustration of how Excel does this. By changing the gap width, we reveal that the bars are actually over one another and not stacked.

Step 3. Fixing this problem.

The solution to this issue is to add gaps between the bars. Add two new columns between the Number of Employees affected and the Number of new cases; label these as “gap1” and “gap2.”

Now, select all the data and insert a bar chart. You should get the following chart.

Right-click on the yellow bar to open the Format Data Series option. In the Format Data Series window, use the “Series Options” drop-down button to select the data element we want to edit. The first data element is “gap2” and the second data element is “gap1.” We want to assign “gap2” to the Secondary Axis and “gap1” to the Primary Axis. Make sure that you change the “Gap Width” to 25% for both the “gap1” and “gap2” data elements. Keep the “Series Overlap” at 0% for both data elements.

Now, your bar chart should have the bars for the Number of Employees affected and the Number of new cases side-by-side (rather than overlaid) and using both axes.

We could improve this bar chart by editing the legend to remove the “gap1” and “gap2” labels, change the color of the bars, modify the fonts, add axis labels, and add a chart title.

CONCLUSIONS

It is possible to use Excel to create bar charts with two vertical axes. Although Excel doesn’t do this intuitively, we can use the extra columns denoted by “gap1” and “gap2” to generate the space needed to have the bars side-by-side.

REFERENCES

I ran into this problem when I wanted to use two different scales of metrics on a recent project, and I was perplexed as how Excel doesn’t intuitively create the bar chart that I needed. Fortunately, I found Steve True’s Excel Dashboard Templates website where he details how to solve this issue. I highly recommend visiting his site as he has wealth of resources on how to use Excel efficiently.

California COVID-19 data are located at the California Open Data Portal

Data on California layoffs are located at California Employment Development Department

Mark Bounthavong

November 18, 2020

Data visualization

Communicating data effectively with data visualizations: Part 31 (Visualizing COVID-19 deaths using sparklines)

Mark Bounthavong

November 18, 2020

Data visualization

INTRODUCTION

As the COVID-19 pandemic continues to place a public health burden on our society, the death toll in the United States has reached a terrible milestone; over 240,000 deaths were reported as of November 17, 2020.[1] COVID-19 is now the third leading cause of death behind heart disease and cancer followed by accidents (unintentional injuries) and chronic respiratory diseases (reported by Scientific American).[2] Of note is the epidemiology spread of new deaths over time. New York has borne the brunt of the incidental deaths; early in the pandemic New York City had the highest 7-day moving average of COVID-related death in the country (1,193 deaths in April 21, 2020), but, recently, the 7-day moving average has dropped to 8 deaths in November 16, 2020. One way that we can visualize this is to use sparklines. Excel has a feature that allows users to plot sparklines onto the spreadsheet. We will use this feature to visualize the change in new COVID-related deaths over time by state.

MOTIVATING EXAMPLE

We will recreate the following figure using the Excel sparklines feature to visualize the changing number of new deaths due to COVID-19 from January to October 2020. Data can be downloaded from the CDC Tracker site or from the Dropbox folder. (Data was pulled on November 18, 2020.)

CONSTRUCTING THE SPARKLINES

Step 1. Open and examine the data

You can download and open the Excel file located here. The data should look like the following:

The data are in wide format and are ordered by Total Deaths in descending order. The abbreviation NYC represents New York City instead of the state. The data also include districts (some cities) and US territories.

Step 2. Identify the location of the sparklines options

Excel’s sparkline feature is located in the ribbon area. You can locate this in the Insert tab under the sparklines area.

For this tutorial, we will use the “Column” option. This will generate bar charts within the cell that we identify.

Step 3. Insert the sparkline

In the cell next to the “Total Deaths” column, we will add the column label “New Deaths Over Time.” Select the empty cell for the first state and insert the sparkline. A window will open prompting for information. Data Range should include the data for the state from Jan to Oct. Once you’ve selected your data, click on “OK.”

Step 4. Change row height to fit the sparkline

The sparkline is inserted into a narrow cell with a low row height. We can change this by right-clicking on the row label, selecting “Row Height…” changing the “Row Height” to 45, and then confirming that this is the height you want. Feel free to change the height to fit your needs.

Step 5. Select other states

At this point, the sparklines are nearly complete. You can expand this to the other states by selecting the corner of the cell and dragging the sparkline to the other cells.

Step 6. Hide cells

We can hide the cells with the number of deaths from Jan to Oct. However, when you do this, the sparklines will disappear. Excel uses the unhidden cells to create the sparklines. When you hide the data, the sparklines will not generate the visual. To fix this, open the sparklines ribbon and select the “Edit Data” box. This opens several options. Select the “Hidden & Empty Cell…” and click on the “Show data in hidden rows and columns.” This will allow you to hide cells that the sparklines are using to generate the visuals.

Step 7. Final touches

You can change the color of the sparklines and add some table lines to separate the states. In this tutorial, the Arial Nova Light font was used. Here is the final sparklines for the first five states with the most total number of COVID-related deaths in the United States.

CONCLUSIONS

Using sparklines can enhance a data visualization by adding a time series of historical events (e.g., COVID-19 deaths). This is useful when you want to incorporate this alongside some data in a table, spreadsheet, or dashboard. In our tutorial, we used the historical of deaths for each states in the United States to give the audience some perspective of what has occurred despite the high number of deaths. For instance, New York City had the highest number of total deaths, but historically, these deaths occurred in the months of April to July and dropped significantly in recent months. Other states started out with low number of deaths but have many their total deaths in recent months. This provides us with another dimension to the visualization of how different the epidemiologic spread of deaths occurred across the United States.

REFERENCES

CDC. COVID-19 Cases, Deaths, and Trends in the US | CDC COVID Data Tracker. Centers for Disease Control and Prevention. Published March 28, 2020. Accessed November 18, 2020. https://covid.cdc.gov/covid-data-tracker
Stix YZ Gary. COVID-19 Is Now the Third Leading Cause of Death in the U.S. Scientific American. Accessed November 18, 2020. https://www.scientificamerican.com/article/covid-19-is-now-the-third-leading-cause-of-death-in-the-u-s1/