Communicating data effectively with data visualizations: Part 26 (COVID-19 choropleth)

INTRODUCTION

Power BI is a useful tool that allows you to build dashboards and data visuals. One ability that makes Power BI unique is its mapping feature. Although there are pre-built map functions in Power BI, better and easier applications are available through their AppSource store. This tutorial will go over how to create a choropleth of the most recent COVID-19 pandemic using the incidence rate (dated June 21, 2020) for the United States (US) using Power BI and the Drilldown Choropleth application.

 

DATA

You will need to download two types of data: (1) COVID-19 incidence rate for each county in the United States (You can download the Excel file from this Dropbox link) and (2) Shape files for the United States. Fortunately, there are many sources of shape files for the United States, and I have made one available using the Albers USA projection, which has the state of Alaska and Hawaii in the lower left corner of the map. You can use the link to the map shape file here. Data for the shape files comes from the US Census. [I downloaded the ZIP file and converted the SHP files to TopoJSON using the following conversion tool (Mapshaper.org).] The COVID-19 data contains US county codes called the Federal Information Processing Standard Publication (FIPS), a 5-digit code that identifies counties. Incidence rate is defined as the number of confirmed cases per 100,000 persons.

Albers US projection

POWER BI TUTORIAL

For this tutorial, you will need to have Power BI installed on your desktop to re-create this choropleth.

 

Step 1. Get the data

When you start Power BI, you will see a start screen that allows you to select the data you need. Since the COVID-19 incidence rates for the US data are saved as an Excel file, you will need to select Excel as the data type and then connect to the data.

After clicking “Connect,” the Navigator will open. Select Sheet 1 containing the data. Notice that there are two variables (FIPS1 and FIPS). These are the data that will be used to pair the incidence rate to the US county code. The FIPS code has a “0” in front of some of the county code (e.g., 05001) for a total of 5 digits. However, notice that these are in numeric format so the preceding “0” is missing. To remedy this, we need transform the data. Make sure to select “Transform Data” from the Navigator screen.

After selection “Transform,” you will need to change the FIPS variable from numeric to text.

You will be asked to either “Replace” or “Add New Step;” make sure that you select “Replace.” This will replace the numeric data to text, which is what we want for the FIPS variable. Check the data after you do this; notice that the “0” is now preceding values that are less than 5 digits. (FIPS code is a 5-digit county code.) After the data has been transformed, select “Close & Apply” to finalize the data. This will load the data into Power BI, which you will use to build the choropleth.

 

Step 2. Download and Install “Drilldown Choropleth” app

Power BI allows you to download apps from the AppSource store. Click on the “…” and select “Get more visuals.” This will open the AppSource store where you can type the name of the app you want.

Step 3. Enter data into the Drilldown Choropleth app

After you install the Drilldown Choropleth, an icon will appear in your Visualization panel. Click on this icon (Drilldown Choropleth); this will open up options in the Visualization panel. There are two important fields (“Location” and “Values”). Click and drag the FIPS variable into the “Location” field and the Incidence_Rate variable into the “Values” field.

Next, click on the Paint roll icon to open the Drilldown Choropleth options. Expand the “Shape” option to change the “Projection” to “albersUSA” and then enter the link to the TopoJSON shape file for the United States counties. The TopoJSON link is here: https://raw.githubusercontent.com/mbounthavong/Maps/master/c_03mr20.json

 

 

After entering the projection type and the TopoJSON link, you can make changes to the color of the choropleth. Since this is a choropleth, you will need to enter values for the Minimum, Center, and Maximum colors. This will generate a gradient where the darker colors reflect high incidence rates and the lighter colors reflect low incidence rates of COVID-19.

Finally, you can change the size of the visual and save it. Since there are a lot of data being processed, updating the choropleth will take a few minutes. With a little patience, you will generate a suitable choropleth of the COVID-19 incidence across US counties for June 21, 2020.

CONCLUSIONS

Power BI allows us to create visually spectacular choropleths. Additionally, it allows us to create choropleths that are interactive. You can hover over each shape and see the incidence rate per county. However, there are limitations. Inserting a legend to describe what the colors represent is not easily possible (I was unable to solve this problem). There are roundabout ways to get the legend inserted into this choropleth, but it will require additional software (as far as I know). There is potential for the Drilldown Choropleth app to improve by adding an easy way to include functionalities for legends and labels. Despite this limitation, Power BI allows us to create these choropleths without having to resort to more complex programming in R or Python.

Files used in this tutorial are available here.

REFERENCES

Mapshaper.org (link: https://mapshaper.org/)

US Census Cartographic files (link: https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html)

COVID-19 data come from the Johns Hopkins GitHub site (link: https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_daily_reports/06-21-2020.csv)