color

Communicating data effectively with data visualizations - Part 10 (Heat Maps)

BACKGROUND

A heat map is a data visualization tool that uses positioning and coloring to identify clusters and correlations in multivariable analysis. The most common type of heat map is a 2 x 2 matrix, where two variables are examined using the rows and columns (R x C) positions on a matrix. A heat map matrix helps us identify any patterns or similarities across the different dimensions. In a heat map, color is critical in denoting degrees of change. (Please see refer to the past blog on colors.)

For example, we can see the changes in opioid overdose rates across time for each state (Figure 1). As the rates increase, the cells are darker (indicated by the legend). All the states experience an increase in opioid overdose deaths, but State 1 is experiencing it faster than States 2 and 3.

In this tutorial, we will perform two methods to create heat maps. The first method will use the built in Excel conditional formatting rules and the second method will use VBA macros.

Figure 1. Heat map matrix.

Figure 1 - heatmap matrix.png

MOTIVATING EXAMPLE

We will continue to use the state-level drug overdose mortality data from the CDC.

https://www.cdc.gov/drugoverdose/data/statedeaths.html

Mortality rate is presented as the number of deaths per 100,000 population.

In this tutorial, we will develop a heat map using opioid overdose mortality from 2013 to 2016.

The data setup in Excel has State indicators as the rows and time indicators as the columns. We are visualizing the change in opioid overdose mortality from 2013 to 2016 for each state. Figure 2 illustrates the data structure for the first seven states.

 

Figure 2. Data structure for the first seven states.  

Figure 2 - Data structure.png

 

METHOD 1: CONDITIONAL FORMATTING RULES

Excel has a convenient tool that allow us to use conditional formatting to shade our heat map matrix.

Step 1: First highlight all the data.

 

Step 2: In the Excel Ribbon, select Conditional formatting and then New.

Figure 3 - conditional formatting.png

Step 3: Select “Format all cells based on their values” and change the Format Style to “3-Color scale.” Change the color to the different shades you are interested in using. (In this example, we used a blue base with varying degrees of shading.) Select percentile and then click “Ok.” The percentile will use the Median for each column to distinguish the middle category for the rates of opioid overdose mortality for each state.

Figure 4 - 3-color scale.png

Step 4: Visually inspect the results. If there are no apparent pattern in the heat map, we will need to sort the rate of opioid overdose mortality for 2016 in descending order.

Figure 4 - Descending order.png

The heat map should look like the following:

 
Figure 6 - heat map example 1.png
 

There is a pattern emerging in regards to the rate of opioid overdose mortality across the different states. West Virginia has the highest opioid overdose mortality rate in 2016, but they also appear to have the highest from 2013 to 2015. The dark cell in 2016 indicates that West Virginia has exceeded the 50 deaths per 1000 population incidence rate. Other states also have high rates of opioid overdose mortality across 2013 to 2016, which continued to increase. The 2013 column is clearly lighter indicating that the opioid overdose mortality has increased over time up to the available data in 2016.

 

Step 5: We can improve this heat map by removing the numbers, which can be distracting. We need to select all the data and format the cells. Select only the numeric data and then format the number. Use the “Custom” category and enter the following: "";"";"";"". This will change the number values so that it doesn’t show in the cells.

Figure 7 - change the number format.png

Step 6: Change the column width and row height so that you have a nice square-like matrix for the final result. We used a column width of 5 and a row height of 30. (Only the first fifteen states are shown.)

The heat map can be used to quickly identify states with the highest opioid overdose mortality and the trends across time. The states with the highest rates of opioid overdose mortality are clustered at the top while the states with the lowest rates are clustered at the bottom.

We could also arrange this into regions of the US to further stratify the results (not shown).

 

METHOD 2: USING VBA MACROS FOR CONDITIONAL FORMATTING OF MORE THAN 3 CATEGORIES

Excel only allows us to choose up to 3-Color scales. If we wanted to use more than 3 color categories, we will need to use VBA macros.

But before we do, we need to think about the colors for the scales. Since we have more than 3 categories, we will need to figure out how to divide the colors.

 

COLOR CODES

We will need to determine the base color for our heat map. In this example, we will use a blue base-color and change the shading using the RGB color values. RGB colors are based on a system using a combination of three base colors (red, green, and blue) that can be used to change the intensity of the color from a range between 0 and 250. An example of an RGC color table can be found in the following site.

For this example, we used the following RGB color values where dark-navy denotes high rates of opioid overdose mortality (50 or more per 1000 population). However, you can change the values of these colors however you like.

Figure color codes.png

We will continue to use the blue color base and change the gradient using the RGB values.

 

VISUAL BASIC FOR APPLICATION (VBA) MACRO

The VBA macro comes from the site Excel For Beginners and written by Kristoff deCunha. We used the VBA code on the site, but modified it for this tutorial.

The VBA code is written in a way where any changes in the values will automatically update the color of the cell. Additionally, the code also sorts the 2016 opioid overdose mortality rate in descending order, adds thin-white continuous borders around the cells, and changes the font to Times New Roman.

Here are the VBA macros used for Method 2.

 

Macro 1 changes the font to Times New Roman.

‘Macro1
Sub ChangeFont()

Dim rng As Range
Set rng = Range("J1:N52")

With rng.Font
    .Name = "Times New Roman"
    .Size = 12
    .Strikethrough = False
    .Superscript = False
    .Subscript = False
    .OutlineFont = False
    .Shadow = False
    .Underline = xlUnderlineStyleNone
    .ThemeColor = xlThemeColorLight1
    .TintAndShade = 0
    .ThemeFont = xlThemeFontNone
End With

End Sub

 

Macro 2 changes the cell shading to the 6-Color scale (blue-base)

‘Macro2
Sub ChangeCellColor()


Dim rng As Range
Dim oCell As Range

Set rng = Range("K2:N52")

For Each oCell In rng
        Select Case oCell.Value
                 Case 50 To 100
                     oCell.Interior.Color = RGB(37, 54, 97)
                     oCell.Font.Bold = True
                     oCell.Font.Name = "Times New Roman"
                     oCell.HorizontalAlignment = xlCenter
                 Case 40 To 49.999
                     oCell.Interior.Color = RGB(56, 83, 145)
                     oCell.Font.Bold = True
                     oCell.Font.Name = "Times New Roman"
                     oCell.HorizontalAlignment = xlCenter
                 Case 30 To 39.999
                     oCell.Interior.Color = RGB(147, 168, 215)
                     oCell.Font.Bold = True
                     oCell.Font.Name = "Times New Roman"
                     oCell.HorizontalAlignment = xlCenter
                 Case 20 To 29.999
                     oCell.Interior.Color = RGB(183, 198, 228)
                     oCell.Font.Bold = True
                     oCell.Font.Name = "Times New Roman"
                     oCell.HorizontalAlignment = xlCenter
                 Case 10 To 19.999
                     oCell.Interior.Color = RGB(218, 225, 240)
                     oCell.Font.Bold = True
                     oCell.Font.Name = "Times New Roman"
                     oCell.HorizontalAlignment = xlCenter
                 Case 0 To 9.999
                     oCell.Interior.Color = RGB(230, 237, 253)
                     oCell.Font.Bold = True
                     oCell.Font.Name = "Times New Roman"
                     oCell.HorizontalAlignment = xlCenter
                  Case Else
                     oCell.Interior.ColorIndex = xlNone
         End Select
    Next oCell
    
End Sub

 

Macro 3 sorts the 2016 opioid overdose mortality rates in descending order.

‘Macro3
Sub SortColumn()

Dim DataRange As Range
Dim keyRange As Range
Set DataRange = Range("J1:N52")
Set keyRange = Range("N1")
DataRange.Sort Key1:=keyRange, Order1:=xlDescending

End Sub

 

Macro 4 hides the font from the heat map.

‘Macro4
Sub HideFont()
    
Dim rng As Range
Dim oCell As Range

Set rng = Range("K2:N52")

    For Each oCell In rng
        oCell.Font.Color = oCell.Interior.Color
    Next oCell
    
End Sub

 

Macro 5 creates thick white borders for each cell in the table.

‘Macro5
Sub WhiteOutlineCells()

    Dim rng As Range

    Set rng = Range("J1:N52")

    With rng.Borders
        .LineStyle = xlContinuous
        .Color = vbWhite
        .Weight = xlThick
    End With
    
End Sub

INSERTING DATA AND RUNNING MACRO

The five macros are assigned to a button in the Excel Macro-Enabled Workbook. Pressing the button will perform the task of creating a 6-Color scale heat map. Download the Excel Macro-Enabled Workbook here. This file will have the raw data and the macro-enabled worksheet for you to create a heat map.

Step 1: Copy the raw data from the “combined” worksheet.

Figure 9 - Copy Data.png

Step 2: Paste it in the worksheet “heatmap_2” starting on cell "J1".

Figure 10 - Copy onto J1.png

Step 3: Then press the “PressStart” button to run the macros. Your final heat map should look like the following:

Compare the heat map from Method 1 (3-Color scale) to the one generated by Method 2 (6-Color scale). The heat map with the 6-Color scale has a lighter pattern compared to the 3-Color scale heat map. The differences are dramatic. Depending on the granularity of the heat map you want, either one of these color scales would be fine. However, Method 2 requires some VBA coding.

* Not all states shown.

 

Conclusions

Heat maps allow us to observe patterns in the data. In our example, we notice that West Virginia has a high rate of opioid overdose mortality indicated by the clustering of dark cells from 2013 to 2016. Other states had similar patterns as West Virginia. Using heat maps provides a quick and easy interpretation of the changes in opioid overdose mortality across time and the states that are clustered together that have high rates of opioid overdose mortality.

 

REFERENCES

I used the following websites to help develop this tutorial.

Conditional formatting with more than 3 categories:
https://sites.google.com/site/excelforbeginners/Home/vba-codes/conditional-formatting-more-than-3-criterias
 
Changing the RGB color codes:
https://analysistabs.com/excel-vba/change-background-color-of-cell-range/
https://www.w3schools.com/colors/colors_rgb.asp

Excel color palette library:
http://dmcritchie.mvps.org/excel/colors.htm

Excellent site for VBA coding:
https://www.excel-easy.com/vba.html

 

Communicating data effectively with data visualizations - Part 5 (Colors)

Background

As we decide the type of chart we want to visualize our data with, we should also think about our color scheme. Color should represent the data value. We should be able to see a chart, identify the color that interests us and associate a value with it. However, it is easy to misidentify the data that the color is supposed to represent and it is also easy to confuse the magnitude of the difference between groups.

In this tutorial, we will discuss some basic elements of color theory that will help you to select the best color palette for your graphical presentations. We will use the following simple data and color schemes to illustrate some of these lessons (Figure 1). The X column denotes some arbitrary category that uses the alphabet and the Y column denotes the values associated with each category.

Figure 1. Example data used to illustrate the different color schemes.

table example.png

Figure 2. Color schemes used in this example (single-hue color and rainbow color schemes).

color hues.png

 

Bias-Precision Tradeoff

There is always a tradeoff between bias and precision. Bias is associated with the ability to accurately identify the data value associated with the color. Precision is associated with the ability to correctly identify the data with little variation. Figure 3 illustrates the differences between the two.

 

Figure 3. Bias and Precision tradeoff.

Figure 3. Bias-precision tradeoff.png

For example, if you had a single-hue progression color scheme (blue), it is easy to identify the color that contains the higher value versus the color that contains the lower value. In Figure 4A, the darker shade of blue is associated with a higher value compared to the lighter shade of blue. However, it may not be easy when you have a rainbow color scheme because the values can be arbitrarily associated with a different color (Figure 4B). In other words, Figure 4A helps to reduce the bias while Figure 4B can generate a lot of bias.

 

Figure 4. Comparison between single hue color scheme and rainbow color scheme.

Additionally, Figure 4B is easier to distinguish between the groups. You can easily identify Group F compared to Group E because the colors are distinctly differentiable. However, in Figure 4A, it is very difficult to distinguish between Group F and Group E because the shades of blue associated with both groups appear similar. Therefore, the ability to precisely detect the differences in the groups is limited by the single-hue color scheme (Figure 4A) compared to the rainbow color scheme (Figure 4B).

Another way to look at these principles is to think about the concurrence between the true value and the estimated value. Suppose you were given these color schemes and asked to estimate the correct value. How do you think you’ll do? Would it be easier to estimate the true value from the single-hue color scheme or the rainbow color scheme?

 

Figure 5. Performance of color schemes according to estimated and true values.

I generated data in Figure 5 to illustrate the principles in this tutorial. Figure 5 compares performance between the single-hue color scheme and rainbow color scheme. Figure 5A has low bias but some scatter (moderate precision) across the gray line, which denotes the 1:1 accuracy between true and estimated values. However, Figure 5B has high bias but very little scatter (high precision) due to the easy identification of the groups. Whereas, there was some uncertainty in correctly identifying the true value within each group.

 

Summary

When deciding on the color scheme for your data, take into consideration what is more important. Is it critical that your audience precisely distinguish one group from another? Or is it more important to have them visually identify the correct value associated with the color? Ideally, you want to be able to use a color scheme that has low bias and high precision, but in reality, you will need to make a tradeoff between the two.

 

References

I used the following references to develop this tutorial.

Liu Y, Heer J. Somewhere over the rainbow: An empirical assessment of quantitative colormaps. ACM Human Factors in Computing Systems (CHI) 2018, April 21-26, 2018, Montreal, QC, Canada. http://dx.doi.org/10.1145/3173574.3174172

Cromwell W. Colour schemes in data visualisation: Bias and Precision. Presented at the useR! 2016 international R User conference, June 15, 2017. URL: https://channel9.msdn.com/Events/useR-international-R-User-conference/useR2016/Colour-schemes-in-data-visualisation-Bias-and-Precision