data visualization

Communicating data effectively with data visualization – Part 17 (Multivariate Dimensions)

MULTI-DIMENSIONAL DATA VISUALIZATIONS

Data visualizations can improve how we see complex data, in particular, where multiple dimensions are involved. For instance, in an X-Y plan, we can have the months on the X-axis and the number of patients on the Y-axis (Figure 1). Let’s imagine that number of patients represents some outcome you are interested in (e.g., number of patients who has 5+ prescription medications). Time and the number of patients are dimensions on this two dimensional plan.

 

Figure 1. Two-dimensional X-Y axes figure.

Figure 1.png

As a rule, whenever you want to display multiple dimensions, each dimension needs to be represented onto a single figure, which is challenging given that a figure is normally on a two-dimensional plane. What if we wanted to have a figure with more than two dimensions? What if we wanted to have a figure with months on the X-axis, number of patients on the Y-axis, and include a third dimension denoting different genders? How would we go about doing that? Figure 2 illustrates how we can do this by adding lines and labeling them using different colors.

 

Figure 2. Figure with three dimensions.

Figure 2.png

Figure 2 is able to capture three dimensions of data into a single two dimensional figure. The number of patients is captured in the Y-axis and the time in months is captured in the X-axis. Gender is represented by the colored lines that show the difference in the relationship between number of patients and time associated with males and females.

Alternatively, we use the color blue for the following conditions:

E[Y | male]

 

We use the color red for the following conditions:

E[Y | female]

 

The legend providers additional clarification that the different line colors denote the gender types. It is critical to include clear and intuitive legends so that your readers will immediately recognize their reference and label. Without a legend, your audience will have to guess what color belongs to what gender type.

How about adding another dimension such as age? This would increase the number of dimensions on this figure from three to four. For example, what if we wanted to see how being older (80+ years) impacted the relationship between the number of patients and time across genders? Well, this can be accomplished by using different types of lines (e.g., dotted lines and dashed lines).

Figure 3 illustrates how using different types of lines (dotted for the 80+ year old patient and dashed for the <80 year old patient) can provide a visual accounting of the differences across genders and across age in terms of the number of patients and months. The legend provides additional clarification as to the age groups associated with the different line types.

 

Figure 3. Figure with four dimensions.

Figure 3.png

Alternatively, we continue to use the color blue for the following conditions but add different line types for the age groups:

E[Y | 80+years & male] (dotted lines)

E[Y | <80 years & male] (dashed lines)

 

Similarly, we continue to use the color red for the following conditions but add different line types for the age groups:

E[Y | 80+years & female] (dotted lines)

E[Y | <80 years & female] (dashed lines)

Using colors and line types allow us to capture multiple dimensions onto a two dimensional figure. We essentially are showing a stratified descriptive analysis of the age groups nested within each gender and their relationships between the number of patients and time (months).

How about adding a fifth dimension? How could one do that?

A simple way to introduce a fifth dimension is to use the concept of small multiples by Edward Tufte.[1] Tufte uses small multiples to include additional dimensions. Figure 4 illustrates how we can leverage small multiples to look at the differences in the relationships between number of patient and time for different genders and age groups across different states.

 

Figure 4. Small multiples of states with differing patterns of number of patients and time for different genders and age groups.

Figure 4.png

Using small multiples allow us to compare the differences in the association between number of patients and months across different states stratified by gender and age groups. The number of patients increased across time for all gender and age groups in California. Similar patterns are observed in Virginia, but the rate of increase in the number of patients across time is lower in the female group and age cohorts. In Ohio, different patterns are observed compared to California and Virginia. Males and their associated age groups have a decreasing number of patients across time. Conversely, females and their age cohorts have a positive correlation between the number of patients and time.

 

CONCLUSIONS

Adding dimensions can improve the figure you design by incorporating complex relationships across different data characteristics. In our example, we demonstrate how we can integrate dive dimensions of data to a two-dimensional figure that tell us information about the association between the outcomes (number of patients) with time (months) across states stratified by gender and age groups. Be creative with how you integrate multiple dimensions into a figure. Ask yourself if this is something that will help improve the story the figure is conveying. There are times when a simple figure will do. But when you have a lot of data and want to tell a story, consider adding dimensions to the figure to get a narrative that will excite and capture your audience’s attention.

 

REFERENCES

  1. Tufte ER. The Visual Display of Quantitative Information. Second. Cheshire, CT: Graphics Press, LLC.; 2001.

Communicating data effectively with data visualization – Part 16 (UpSet diagrams)

INTRODUCTION

Venn diagrams are useful visualizations that illustrate intersections between several groups. A common Venn diagram includes three transparent circles that overlap each other. In Figure 1, a Venn diagram with three groups denoted as A, B, and C has a total of 4 intersections (AB, AC, BC, and ABC).

Figure 1. Venn diagram with three groups and four intersections.

 
Figure 1.png
 

However, this can get more complicated when there are more than three groups or multiple interactions. Figure 2 illustrates a more complicated example.

Figure 2. Complicated Venn diagram.

 
Figure 2.png
 

How many interactions are there? It’s quite difficult to determine based on this figure. An alternative data visualization method to capture the complexity of interactions and overlaps illustrated by Figure 1 is called an UpSet diagram. Please go to the Gehlenborg Lab for more information about UpSet diagrams.

UpSet diagrams visualizes complex intersections as sets of a matrix where the columns represent the number of interactions across different sets of groups (rows).[1,2] Each set can represent a combination of interactions. This is a very useful way to see how many campaigns are delivered in a single academic detailing visit.

Academic detailers deliver educational outreach to providers using a combination of tools that include unbiased evidence-based educational materials, online clinical dashboards, and audit-and-feedback approaches.[3,4] However, a single academic detailing interaction can include several campaigns and key messages. Since each visit may incorporate several campaigns, there needs to be a mechanism to capture the different sets of combinations. From an operations perspective, monitoring the number of interactions is challenging, especially, when you want to capture workload for your staff or facility. Fortunately, UpSet diagrams, developed by researchers at the Department of Biomedical Informatics, Harvard Medical School and the Institute School of Computing, University of Utah, can easily capture these complex combinations of campaigns with each academic detailing visit.

Below is an example of an UpSet diagram that lists the different campaigns in the rows and the number of visits using the various combinations of campaigns (Figure 3). For example, there are 31 visits that include the Pain and OEND campaigns. Further, there are 119 visits that include the OUD and OEND campaigns. Notice how easy it was to visually see the different campaigns that were combined into a single visit. A single visit can be illustrated to show how many campaigns were involved. By using an UpSet diagram, you can visually capture which complex interactions were more frequent or infrequent.

These UpSet diagrams are useful ways to capture the complex nature of academic detailing and the various combinations of campaigns delivered.

Figure 3. Example UpSet diagram with several campaigns and the number of visits.  

MOTIVATING EXAMPLE

Excel is unable to create UpSet diagrams with its current tools. Fortunately, the researchers at Harvard University and the University of Utah have created an UpSet R Shiny app that you can access online. All you need is your own set of data to upload. The R Shiny app will generate the UpSet figure and you can adjust the settings to get the figure you need.

The UpSet R Shiny app is located here.

When creating a data set, make sure that you save it as a *.CSV file with headers.

Use the following example dataset to upload onto the UpSet R Shiny app, which is located here.

After you open the UpSet R Shiny app in a browser, make sure that you are in the Option 1 tab. This is where you will see some instructions to upload the data you want to view the complex interactions for an academic detailing visit.

Browse for the file that contains the data on all the academic detailing visits with their campaigns. The data should be saved as a *.CSV file. We will use the pbmads_2.csv file. The data structure should have the each campaign as a binary variable (1 = YES, 0=NO) similar to how you would create a dummy variable.

Figure 4_1.png

Make sure to keep the columns separated using the “Comma” separator.

Then, click on “Plot” to see how the UpSet diagram appears in the R Shiny app.

After you click on Plot, you will see the app update and you should be able to view the Setting and Plot areas. The Setting area allows you to configure your plot. You can change the order of the plot and the type of ordering rules (e.g., Frequency and Degree). You can also change the size of the fonts using the Advanced section.

The Frequency order provides the largest number of visits in descending order.

The Degree order provides the campaigns with the most complex interactions or combinations in ascending order. 

Based on these figures, you can easily discern the complexity of interactions academic detailing visits include into a single educational outreach. The most complex interaction are the visits that include the Opioid Use Disorder (OUD), Opioid Overdose Education and Naloxone Distribution (OEND), posttraumatic stress disorder (PTSD), Pain, and Other campaign topics (N=3). The individual solo campaign with the most visits is the OUD (alone) campaign with 175 visits.

 

CONCLUSIONS

UpSet diagrams make it easy to categorize the academic detailing visits into different combination categories. We can apply this method to other program monitoring metrics such the differences in these visit combinations across time and the number of attendees. Other additional areas where UpSet diagrams are useful include complex genetic markers. Try and think of ways where you can use this method to simplify complex Venn diagrams or complex interactions across different groups.

Since the UpSet R Shiny app has limited functionality, you can explore other features using R or Python to generate more complex UpSet diagrams. The UpSetR package is available for the R environment. The UpSetPlot package is available for Python.

You can access the GitHub site for UpSet diagrams here.

I encourage you to read the papers on UpSet diagrams by Conway and colleagues and Lex and colleagues: Paper 1 and Paper 2.

YouTube video on UpSet diagrams.

 

REFERENCES

1. Conway JR, Lex A, Gehlenborg N. UpSetR: an R package for the visualization of intersecting sets and their properties. Bioinforma Oxf Engl. 2017;33(18):2938-2940. doi:10.1093/bioinformatics/btx364

2. Lex A, Gehlenborg N, Strobelt H, Vuillemot R, Pfister H. UpSet: Visualization of Intersecting Sets. IEEE Trans Vis Comput Graph. 2014;20(12):1983-1992. doi:10.1109/TVCG.2014.2346248

3. Avorn J, Soumerai SB. Improving drug-therapy decisions through educational outreach. A randomized controlled trial of academically based “detailing.” N Engl J Med. 1983;308(24):1457-1463. doi:10.1056/NEJM198306163082406

4.Avorn J. Academic Detailing: “Marketing” the Best Evidence to Clinicians. JAMA. 2017;317(4):361-362. doi:10.1001/jama.2016.16036

 

Communicating data effectively with data visualization – Part 15 (Diverging Stacked Bar Chart for Likert scales)

BACKGROUND

Surveys or questionnaires are used to capture respondent’s perceptions about any number of products, ideas, or subjects. You can ask someone how they are feeling to how much they agree with a particular statement. Or you can ask someone if they are satisfied with a product or service they recently received. Items (or questions) in surveys can solicit these types of responses. Free response questions allow the respondents to write their responses in an unstructured manner as long as it answers the question or purpose of the survey. But most surveys ask questions that require a specific response using multiple choice, ratings scale, or Likert scales.

In this article, we will discuss the Likert-type scale and how we can visualize this using Excel.

 

LIKERT-TYPE SCALE

The Likert scale was first developed by Rensis Likert, who was a psychologist in the early 20th Century. He developed the 5-point Likert scale as part of his PhD dissertation in order to capture peoples’ ratings of international affairs. The Likert scale is unique because it provides a rating that is ordered sequentially (Positively to Negatively or Agreement to Disagreement).

 

Figure 1. Example of a 5-point Likert scale.

Figure 1.png

The 5-point Likert scale is quite common in psychometric research. A statement is usually provided and the participant is asked to rate their level of agreement. Notice how the scale is ordered sequentially from Strongly Disagree in the left of the scale and Strongly Agree to the right of the scale. This is an important feature of Likert-type scales with an added convenience, higher values are associated with higher agreements. You can reverse this as well, but we will keep the order for the remainder of this article.

 

VISUALIZE THE LIKERT-TYPE SCALE

There are many ways to visualize a Likert scale. We can use pie or bar charts to capture the different responses to a Likert-type question or statement.

 

Figure 2. Bar and Pie charts used to visualize Likert scale responses.

Figure 2.png

However, the best way to visualize Likert scales is to build a Diverging Stacked Bar Chart.

Figure 3. Diverging stacked bar chart using a set of hypothetical data for three statements.

Figure 3-1.png

The red dotted line in Figure 3 represents the divergent point where the stacked horizontal bar chart aligns. This is effective when you want to suggest that certain set of ranked responses are more important than the other. In this example, Strongly Agree and Agree are given more precedence than the other ranks. Rightly so, since a majority of the responses were Strongly Agree or Agree.

 

MOTIVATING EXAMPLE

Let’s assume that we administered a survey with three questions. The results are as follows:

Figure 4.png

We have a 5-point Likert scale with responses in all the different ranks. A majority of the responses were either Strongly Agree or Agree, so we’ll create a diverging point with these two ranks.

 

TUTORIAL

The Excel file for the tutorial is located here.

There are two ways to create this diverging stacked bar chart.

Method 1: I learned how to create diverging stacked bar charts from Stephanie Evergreen’s blog Evergreendata.  I got her method step-by-step and then go over an alternative method.

Part 1.1 Estimate the buffers at the end of the stacks.

First, change the values to percentages (Step 1). Then, determine where the divergence will occur (Step 2). For our example, the divergence is between Neutral and Agree.

The next step involves estimating the values at the ends of the stacked bar chart. There are two ends, left and right (Steps 3 and 4). Once the buffers have been estimated, we will plot the stacked bar chart.

Part 1.2. Plot the divergence stacked bar chart.

First, select all the data (Step1). Then Select the 100% Stacked Bar Chart from the Insert tab (Step 2). This should generate a default stacked bar chart (Step 3).

Right-click on the any area in the chart and click on the “Select data…” to change the data arrangement (Step 4). Select the “Switch row/column” to change the Y-X arrangement of the data.

Once you switch the rows and column, the chart will change and look like the one below.

Now, we want to remove the color of the buffers at the ends of the stacked bar (Step 7). Right-click the left end of the stack and select “No Fill” (Step 8). Repeat this for the right end of the stack.

The diverging stacked bar chart should resemble the figure we presented at the beginning of this article.

Figure 11.png

Removing the buffer labels from the legend, deleting the grid lines, changing the font (Adobe Gothic Standard B), and changing the stacked bars colors can improve the figure.

Figure 12.png

The challenge with this chart is the labels on the axis. The statements are too far to the left of the diverging stacked bar chart. To fix this, delete the labels on the left and insert text boxes with the statements.

Figure 13.png

Remove the borders, right alignment with the statements, and add labels. We should be very close to the figure we presented above.

Figure 14.png

Here is the final diverging stacked bar chart.

Figure 3.png

Method 2: I learned this other method from John Peltier at his Peltier Tech Blog.

This method requires you to create a divergent point based on distance. For instance, if you want to make sure that you have the divergent point where the responses are at Strongly Agree and Agree. So, subtract the distance from that point.

Figure 14-5.png

First, change all the values left of the divergent point to negative (Step 1).

Figure 15.png

Then rearrange the order of responses so that the furthest rank is closest to the divergent point (Step 2).

The select the data (Step 3), Insert a 100% Stacked Bar Chart (Step 4), and then Visualize the chart (Step 5).

Right-click anywhere in the chart area and click “Select Date…” (Step 6). Then Switch row and column (Step 7).

You will get the chart on the left below. However, this is not complete. Right-click on the Y-axis (Step 8) and the Format Axis (Step 9).

Change the label position for “Low” (Step 10) and then review the chart. Notice where the divergent point is located at. This is the same as the previous stacked bar chart that was constructed using Method 1.

Now, you can change the colors, delete the gridlines, remove the X-axis to create a plot below. Notice that there are some values that are negative. That’s because of the data we rearranged earlier to generate the distance from the divergent point. To fix this, you will need to manually change the values.

Figure 21.png

After manually change the values, your plot will look similar to the one below.

Figure 22.png

Conclusions

Visualizing the Likert scale using horizontal diverging stacked bar charts is a good method to see how the participants respond to questions or statements on a survey or questionnaire. However, not all Likert-type scales will necessarily need a diverging stacked bar chart to illustrate its point. You can also use a conventional stacked bar chart, which we will discuss in a future article.

The Excel file for this tutorial is located here.

 

REFERENCES

I used the following references to help write this article.

Stephanie Evergreen’s blog Evergreendata is an excellent resource for learning about other data visualization methods.

John Peltier’s blot  Peltier Tech Blog is another wonderful resource where you can learn more about Excel charts and data visualization.

The following paper provides details on how to create diverging stacked bar charts using R.

Heiberger RM, Robbins NB. Design of diverging stacked bar charts for Likert scales and other applications. Journal of Statistical Software. 2014;57(5): 1-32. https://www.jstatsoft.org/article/view/v057i05/v57i05.pdf

Communicating data effectively with data visualization – Part 14 (Gantt Charts)

INTRODUCTION

A useful calendar can be helpful in scheduling your meetings, avoiding conflicts, and remembering important dates. Applying data visualization to a calendar can help to identify key events throughout the day, week, or month. Here is an example of a color-coded calendar for a single person from Sadiq Javer from BoostSolutions.

Figure 1 - exampel calendar.png

Each meeting or event is color coded to indicate a particular category. For a single user, this is sufficient to manage a complex day, week, or month. However, if you are a project manager or lead, managing the calendars of a group or team, this task can be challenging.

A solution is to use a Gantt chart to organize the calendars of several members in your team. Gantt chart is a type of bar chart that provides a longitudinal visualization of schedules and timelines. It was invented by Henry Gantt who is known for his work on scientific management. Gantt charts are useful for project management and can illustrate major deadlines or milestones in the project’s life cycle.

Conveniently, the same tools used for project management can be applied to managing schedules for multiple team members in a group. In this article, we will apply the Gantt chart to managing a team’s schedule using Excel.

MOTIVATING EXAMPLE

We will create a Gantt chart using a hypothetical team’s schedule to visualize their and vacations.

Download the Excel sheet here.

Suppose we have a team who will be taking vacation in the upcoming calendar year (2019). There are several important dates that the Team will need to block for meetings. In order to avoid conflicts, a Gantt chart is used to plan an efficient annual schedule.

Here is a figure of our Gantt chart.

The Gantt chart blocks weekend and holidays so that the manager can easily see the entire 7-day week. Each column represents a day nested in a 7-day week. The months are color coded to identify when it begins and ends. Each staff has a unique color to identify their days off, and the Team meeting is highlighted in red to indicate the critical meeting dates.


TUTORIAL

You can use any version of Excel to build the Gantt chart. After opening a new Excel sheet, follow these steps.

Step 1. Resize the column’s width:

Figure 3.png

Resizing the column’s width to 2.33 seems to give an efficient size cell for the days.

Step 2. Assigning days and months:

In our example, each column represents one day. Therefore, we can assign 7 days into a week. Since the month starts on different days, we make sure to start with the correct day in our calendar. In 2019, January begins on Tuesday, therefore, our Gantt chart will start on Tuesday.

Figure 4.png

Step 3. Highlight the weekends and holidays:

Hopefully, your team doesn’t have to work on the weekends. However, there are exceptions. Clinicians work on the weekends, so your Gantt chart may need an indicator for differential pay (if it is part of the benefits). In our example, we will assume that no one from the team works on the weekends.

The holidays are highlight with a different color from the weekend

Figure 5.png

Step 4. Include the team members and Meetings to the Gantt chart.

Once you block out the holidays and weekends, you can start entering information on meetings and team members’ vacations. Different colors were used for the meetings and individual team members to provide easy visualization.

Figure 6.png

CONCLUSIONS

The final Gantt chart should be able to help you organize your team’s schedule while making sure that there are no conflicts with important team meetings or deadlines. Although Gantt charts were designed for project management, it can also be used to efficiently manage a team’s complex schedule.

You can download the Excel exercise at this link.

REFERENCES

I used the following references to assist me with this tutorial.

Sadiq Javer’s article on Gantt charts published on the Boostsoultions.com website.

Wikipedia’s page on Gantt charts, Henry Gantt, and Scientific Management.

Communicating data effectively with data visualization - Part 13 (Box and Whisker Diagrams)

BACKGROUND

Box plot (box and whisker diagram) is a great way to display distribution of a continuous (e.g., interval) data variable. A typical box plot will contain the mean, median, interquartile values, and the minimum and maximum values. Figure 1 illustrates these elements on a box plot. Up until recently, Microsoft Excel did not have an option to graph box plots. However, in the 2016 version of Microsoft Excel, box plots were added as part of the statistical features.

Figure 1. Example of a box plot (box and whisker diagram (Figure 1).

 
Figure 1.png
 

MOTIVATING EXAMPLE

We will use data that was randomly generated to create box plots across four hypothetical quarters (Q1FY19, Q2FY19, Q3FY19, and Q4FY19). The data will contact the number of visits to the doctor from several outpatient specialty clinic. Here is what the data looks like from the first two sites. Data for the example can be found here.

 
Figure 2.png
 

Site 1 has 45 visits in Q1FY19 and Site 2 has 44 visits in Q1FY19. To create the box plots, we need to use the long format which uses multiple rows for each site.

 

EXERCISE

In this article, we will generate box plots that will visualize the average number of visits and its distribution across quarters.

Figure 3.png

After clicking on the Box and Whisker plot, you will need to select the data that will be used to generate the box plots across the quarters.  

Figure 4.png
Figure 5.png
Figure 6.png

Click “OK” and the default box plot will look like Figure 2.  

Figure 2. Default box plot generated by Excel 2016.

Figure 7.png

After a few changes to the color and labels, our box plot can be improved (Figure 3).

Figure 3. Updated box plots.

Figure 8.png

These box plots give us an idea of the changes in the number of visits across quarters including the distribution of the data. For each box plots, the mean indicated by the “X” is not too different from the median (indicated by the solid horizontal line).  However, there is greater variation in the distribution of the number of visits in Q2FY19 and Q3FY19 compared to Q1FY19 and Q4FY19. We can see that there was an increase in the number of visits, on average, between Q1FY19 and Q3FY19, but this drop significantly in Q4FY19. This may be due to some kind of change (e.g., seasonal variation) and should be explored.

 

Conclusions

The box plot provides us with a nice data visualization of the mean number of visits across quarters including the variation and distribution of the data. Plotting these in Microsoft Excel 2016 will allow you to explore your data and motivate you to explore and generate some explanation or hypothesis for their behavior.

References

I used several online references to write this article.

The Dummies series provide a good illustration of the box plot elements, which can be located here.  

I watched this YouTube video by stickpet on how to use Microsoft Excel 2016 to generate box plots.

Communicating data effectively with data visualization - Part 12 (Waffle Charts)

BACKGROUND

In data visualization circles, the pie chart is considered an inefficient tool to convey parts of a whole. Edward Tufte often criticizes the use of the pie chart to display data visually stating that

A table is nearly always better than a dumb pie chart; the only worse design than a pie chart is several of them, for then the viewer is asked to compare quantities located in spatial disarray both within and between charts.”[1]

The major reason why pie charts are disliked by data scientists and other pundits is due to the way our brain works. Mostly, we are good at judging things visually, but with pie charts, it is hard to distinguish the relative proportion of a slice to the whole. For example, can you tell the difference in proportions between the two pie charts below? Which one has more of component B?

Figure 1.png

Since it is challenging to identify the differences between the two pie charts, several alternatives exists to present the data accurately and effectively. In this article, we will discuss one such method using a waffle chart.

Waffle charts are grid-based visuals that have equal size blocks that convey parts of a whole accurately and efficiently. Some have called waffle charts the “square pie charts.” They are usually proportional and arranged in a 10 x 10 grid.

 
Figure 2.png
 

Colors can be used to distinguish the contribution of groups or categories to the whole where each square represents a percentage point totaling to 100.

Figure 3.png

Waffle charts are great at presenting data where you are describing the proportions or parts of a whole and should be used instead of pie charts.

 

MOTIVATING EXAMPLE

We will use the 2016 National Healthcare Expenditure dataset to illustrate the use of waffle charts. We will compare expenditures from the decades between 1965 and 2015.

You can download the data from the National Health Expenditures Accounts (NHEA) website:

https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/NationalHealthExpendData/NationalHealthAccountsHistorical.html

The data provide the percentages of expenditures for different components spent on health for each decade from 1965 to 2015 (Table 1).

 

Table 1. National Health Expenditures from 1965 to 2015.

Figure 4a.png

In the above table (we will start with 1965), 89% of Health Consumption Expenditures was due to Personal Health Care (83%), Government Administration And Net Cost Of Health Insurance (4%), and Government Public Health Activities (2%). The remainder was spent on Investments (11%). We have to estimate the cumulative percentage spent across the different categories to generate our waffle charts.

This is easily done by summing the individual components (Personal Health Care, Government Administration And Net Cost Of Health Insurance, Government Public Health Activities, and Investment) so that they will total 100 (Table 2).

 

Table 2. Cumulative values of the individual expenditures.

Figure 4b.png

Step 1. Setting up the grid

In Excel, we want to create a 10 by 10 square grid. To do this, change your view from Normal to Page Layout. We are doing this because Excel has a unique way of measuring row and height size (they are not on the same scale under the Normal view). When you change to the Page Layout, the scales for the columns and rows are in inches. Set the size for the columns and rows to 0.5 inch. You should have a 10 by 10 grid with squared that have a height and length of 0.5 inch.

Figure 5.png

Step 2: Label the squares in the grid

Once the grid has been sized correctly, we will assign a value for each square in the grid. These values will correspond to the percentage point that you have in the cumulative values in Table 2. Start at the lower left with a cell value of 1 up to a cell value of 100 in the upper right of the grid.

 
Figure 6.png
 

Step 3: Apply conditional formatting for each cell in the grid

Once the cells have been assigned a value, we can use Excel’s conditional formatting tool to assign colors for each of expenditure components.

Select the cell that has the value “91.” (It doesn’t matter what cell you select, but we will use the upper-most left corner for simplicity; this is also cell D5).

Change the Select option to “Classic.” Then change the New Formatting Rule to “Use a formula to determine which cells to format.”

Figure 7.png

Insert the formula where you have the value is less than or equal to the total cumulative percentage (e.g., 100 percent).

Figure 8.png

The cells for the equations are specific to this example, but you can apply this towards an example of your own. (The Excel file that is based on this article can be downloaded here.)

 

Step 4: Change colors to match the different expenditure categories

Next, we will change the fill and font colors so that they will match. By having the font color the same as the fill color, the values in the cells will appear invisible, but still referenced using the conditional formatting rules we just created.

Figure 9.png

Step 5: Apply the conditional formatting to ALL the cells in the grid

After you change the colors for the font and fill, you will need to make sure that the conditional formatting rule is applied to the entire grid.

Figure 10.png

You should notice that all the cells in the grid are the same color. This is because the conditional formatting is first based on the total cumulative percentage, which is 100. Therefore, all the values in the cells should be the same color.

 
Figure 11.png
 

Step 6: Add a new conditional formatting rule for the next expenditure component

The next step is to apply another conditional formatting rule for the next highest cumulative percentage value, which would be Government Public Health Access, which is 89 percent.

Figure 12.png

We repeat the process for each expenditure component. When we are done, we should have four conditional formatting rules for each expenditure component.

Figure 13.png

All the formulas for the conditional formatting are listed below:

Figure 14.png

Step 7: Repeat this process for the other decades

Once we complete these conditional formatting for the other decades, we can present these waffle charts together.

 

Final waffle chart

The following waffle chart incorporated the health expenditures for each decade starting from 1965 to 2015. The border color was changed to white and the label used the Century Gothic font. Inside each waffle is the percentage of Investment associated with each decade. You can download this exercise’s Excel file here.

Figure 15.png

Based on the waffle charts, we can see that Investments (spending for noncommercial biomedical research and expenditures by health care establishments on structures and equipment) has decreased over time. Conversely, expenditures for Personal Health Care, Administration, and Health Activities have increased.

CONCLUSIONS

Using waffle charts is a better alternative to pie charts because we can discern the exact value of the parts that make up the whole. In this case, we can easily visualize the decrease in Investments when it comes to health expenditure spending in the US for each decade between 1965 to 2015.

 

You can re-create these findings using the Excel file located here.

REFERENCES

I used the following references to assist with the development of this article. They have been incredibly helpful in learning the methods and better understanding how to leverage the power of using waffle charts.

 

[1] Tufte ER. The Visual Display of Quantitative Information. 2001. Graphic Press. Cheshire. CT.

 

Everyday Office’s YouTube video

https://www.youtube.com/watch?v=HZe5SzgxlsQ

Michael Sandberg’s Data Visualization Blog

https://datavizblog.com/2014/09/09/dataviz-squaring-the-pie-chart-waffle-chart/

Robert Kosara’s Eagereyes Blog

https://eagereyes.org/techniques/pie-charts

Sumit Bansal’s Trump Excel: The Smart Way Blog

https://trumpexcel.com/waffle-chart-excel/

Jonathan Schwabish’s PolicyViz Blog provides another method to creating waffle charts using data validation

https://policyviz.com/2018/04/26/interactive-waffle-charts-in-excel/

 

 

 

Communicating data effectively with data visualization - Part 11 (Waterfall charts)

BACKGROUND

Changes in data values happens. But when you want to visualize these changes, you need to choose the visualization that best explains the narrative. In this article, we will explore the use of waterfall charts, which are a form of data visualization that illustrates changes from the reference value across some sequentially ordered axis (e.g., time).

 

MOTIVATING EXAMPLE

I created a dataset where the average duration of an academic detailing educational outreach visit was estimated from Quarter 1 Fiscal Year 2017 (Q1 FY17) to Quarter 4 Fiscal Year 2018 (Q4 FY18). The initial dataset includes the reference (or base) average duration (in minutes) of an academic detailing educational outreach visit followed by positive and negative changes to the reference value. (Excel example can be retrieved from the following link.)

 
Figure 1.png
 

There are a few things to consider when reviewing this table. The reference value is 25 minutes, which was the average duration in Q1 FY17. However, in Q2 FY17, the average duration decreased by 5 minutes for a new value of 20 minutes. Then in Q3 FY17, the average duration increased by 15 minutes for a net gain of 10 minutes from the reference value of 25 minutes (25 + 10).

Figure 2.png

Understanding how these values reflect the net gain from the reference value is critical in building the waterfall chart.

 

TUTORIAL

Step 1: Create three new columns (base, fall, and rise)

We need to do some calculations to generate the base values for the waterfall chart.

Start by creating three additional columns and label them as base, fall, and rise (I learned how to design my table from this YouTube video by United Computers).

Figure 3.png

 Step 2: Estimate the value for the fall column

For the fall column, we want to look at the Duration column. If the value in the Duration column is positive, then we need to have the value in the “fall” column as zero. If the value in the Duration column is negative, when we want the absolute (positive) value in the “fall” column.  

Here is an illustration of what the “fall” column calculation requires.

Figure 4.png

Step 2: Estimate the value for the rise column

To estimate the rise column we, again, look to the Duration column. If the value in the Duration column is positive, we enter it into the “rise” column. If the value in the Duration column is negative, then we enter 0 in the “rise” column.

Here is an illustration of what the “rise” column calculation requires.

Figure 5.png

Step 3: Estimate the value for the base column

The base column provides the necessary reference for us to create the waterfall chart. Unlike the other columns, we start in the base column Q2 FY17 row since the “base” column value in Q1 FY17 is the reference. Then we add the “rise” column in Q1 FY17 and subtract the “fall” column in Q2 FY17. We do this for all the remaining cells.

Here is an illustration of what the “base” column calculation requires.

Figure 6.png

Step 4: Review the final data

The final data should look like the following.

Final data.png

Step 5: Create a stacked bar chart

Select the data from the Time, base, fall, and rise columns and then insert a stacked bar chart.

Figure 7.png

You should have a chart that looks like the following.

Figure 8.png

Step 6: Hide the base stacked bars

The next step is to hide the base column values by selecting the blue bars and changing the formatting to No Fill and No Line.

Figure 9.png

Step 7: Format the chart for final presentation

You should format the chart using colors that indicate the direction of the rise and fall in the average duration of educational outreach visits. I selected blue for an increase and red for a decrease. I also added the Y-axis label and decreased the gap width to 10%.

 

Final waterfall chart

Figure 10.png

CONCLUSIONS

The waterfall chart visualizes the rise and fall in average duration of an academic detailing visit across two fiscal years. This visualization provides the viewer with an intuitive summary of the changes that occurred and whether action plans are needed. From this example, we notice that the changes did not deviate from the base of 25 minutes per visit across two fiscal years. Therefore, there may not be a need to spend resources to investigate this element of academic detailing.

In this example, I re-created the waterfall chart to illustrate seasonal changes.

Figure 11.png

Like the previous waterfall chart, this is a clear and intuitive visualization of the changes that occurred across two fiscal years. The red bars easily illustrate that the average duration decreased immediately after Q1 FY17, but increased after Q1 FY18 only to follow the same pattern as the previous fiscal year. This tells us that something is going on during Q1 of each fiscal year, which may require some kind of investigation of the program at the beginning of each fiscal year.

Microsoft Office 365 for Windows and Microsoft Office 2016 for the Mac have a new feature for creating waterfall charts. You may need to check the Office Updates to acquire these new features.

 
waterfall icon.png
 

You can learn about these in the following Microsoft Office sites (link 1, link 2)

 

REFERENCES

I used the following YouTube video by United Computers to help generate the waterfall charts in this article.

I also referred to Cole N. Knaflic’s blog on waterfall charts.

 

Communicating data effectively with data visualizations - Part 10 (Heat Maps)

BACKGROUND

A heat map is a data visualization tool that uses positioning and coloring to identify clusters and correlations in multivariable analysis. The most common type of heat map is a 2 x 2 matrix, where two variables are examined using the rows and columns (R x C) positions on a matrix. A heat map matrix helps us identify any patterns or similarities across the different dimensions. In a heat map, color is critical in denoting degrees of change. (Please see refer to the past blog on colors.)

For example, we can see the changes in opioid overdose rates across time for each state (Figure 1). As the rates increase, the cells are darker (indicated by the legend). All the states experience an increase in opioid overdose deaths, but State 1 is experiencing it faster than States 2 and 3.

In this tutorial, we will perform two methods to create heat maps. The first method will use the built in Excel conditional formatting rules and the second method will use VBA macros.

Figure 1. Heat map matrix.

Figure 1 - heatmap matrix.png

MOTIVATING EXAMPLE

We will continue to use the state-level drug overdose mortality data from the CDC.

https://www.cdc.gov/drugoverdose/data/statedeaths.html

Mortality rate is presented as the number of deaths per 100,000 population.

In this tutorial, we will develop a heat map using opioid overdose mortality from 2013 to 2016.

The data setup in Excel has State indicators as the rows and time indicators as the columns. We are visualizing the change in opioid overdose mortality from 2013 to 2016 for each state. Figure 2 illustrates the data structure for the first seven states.

 

Figure 2. Data structure for the first seven states.  

Figure 2 - Data structure.png

 

METHOD 1: CONDITIONAL FORMATTING RULES

Excel has a convenient tool that allow us to use conditional formatting to shade our heat map matrix.

Step 1: First highlight all the data.

 

Step 2: In the Excel Ribbon, select Conditional formatting and then New.

Figure 3 - conditional formatting.png

Step 3: Select “Format all cells based on their values” and change the Format Style to “3-Color scale.” Change the color to the different shades you are interested in using. (In this example, we used a blue base with varying degrees of shading.) Select percentile and then click “Ok.” The percentile will use the Median for each column to distinguish the middle category for the rates of opioid overdose mortality for each state.

Figure 4 - 3-color scale.png

Step 4: Visually inspect the results. If there are no apparent pattern in the heat map, we will need to sort the rate of opioid overdose mortality for 2016 in descending order.

Figure 4 - Descending order.png

The heat map should look like the following:

 
Figure 6 - heat map example 1.png
 

There is a pattern emerging in regards to the rate of opioid overdose mortality across the different states. West Virginia has the highest opioid overdose mortality rate in 2016, but they also appear to have the highest from 2013 to 2015. The dark cell in 2016 indicates that West Virginia has exceeded the 50 deaths per 1000 population incidence rate. Other states also have high rates of opioid overdose mortality across 2013 to 2016, which continued to increase. The 2013 column is clearly lighter indicating that the opioid overdose mortality has increased over time up to the available data in 2016.

 

Step 5: We can improve this heat map by removing the numbers, which can be distracting. We need to select all the data and format the cells. Select only the numeric data and then format the number. Use the “Custom” category and enter the following: "";"";"";"". This will change the number values so that it doesn’t show in the cells.

Figure 7 - change the number format.png

Step 6: Change the column width and row height so that you have a nice square-like matrix for the final result. We used a column width of 5 and a row height of 30. (Only the first fifteen states are shown.)

The heat map can be used to quickly identify states with the highest opioid overdose mortality and the trends across time. The states with the highest rates of opioid overdose mortality are clustered at the top while the states with the lowest rates are clustered at the bottom.

We could also arrange this into regions of the US to further stratify the results (not shown).

 

METHOD 2: USING VBA MACROS FOR CONDITIONAL FORMATTING OF MORE THAN 3 CATEGORIES

Excel only allows us to choose up to 3-Color scales. If we wanted to use more than 3 color categories, we will need to use VBA macros.

But before we do, we need to think about the colors for the scales. Since we have more than 3 categories, we will need to figure out how to divide the colors.

 

COLOR CODES

We will need to determine the base color for our heat map. In this example, we will use a blue base-color and change the shading using the RGB color values. RGB colors are based on a system using a combination of three base colors (red, green, and blue) that can be used to change the intensity of the color from a range between 0 and 250. An example of an RGC color table can be found in the following site.

For this example, we used the following RGB color values where dark-navy denotes high rates of opioid overdose mortality (50 or more per 1000 population). However, you can change the values of these colors however you like.

Figure color codes.png

We will continue to use the blue color base and change the gradient using the RGB values.

 

VISUAL BASIC FOR APPLICATION (VBA) MACRO

The VBA macro comes from the site Excel For Beginners and written by Kristoff deCunha. We used the VBA code on the site, but modified it for this tutorial.

The VBA code is written in a way where any changes in the values will automatically update the color of the cell. Additionally, the code also sorts the 2016 opioid overdose mortality rate in descending order, adds thin-white continuous borders around the cells, and changes the font to Times New Roman.

Here are the VBA macros used for Method 2.

 

Macro 1 changes the font to Times New Roman.

‘Macro1
Sub ChangeFont()

Dim rng As Range
Set rng = Range("J1:N52")

With rng.Font
    .Name = "Times New Roman"
    .Size = 12
    .Strikethrough = False
    .Superscript = False
    .Subscript = False
    .OutlineFont = False
    .Shadow = False
    .Underline = xlUnderlineStyleNone
    .ThemeColor = xlThemeColorLight1
    .TintAndShade = 0
    .ThemeFont = xlThemeFontNone
End With

End Sub

 

Macro 2 changes the cell shading to the 6-Color scale (blue-base)

‘Macro2
Sub ChangeCellColor()


Dim rng As Range
Dim oCell As Range

Set rng = Range("K2:N52")

For Each oCell In rng
        Select Case oCell.Value
                 Case 50 To 100
                     oCell.Interior.Color = RGB(37, 54, 97)
                     oCell.Font.Bold = True
                     oCell.Font.Name = "Times New Roman"
                     oCell.HorizontalAlignment = xlCenter
                 Case 40 To 49.999
                     oCell.Interior.Color = RGB(56, 83, 145)
                     oCell.Font.Bold = True
                     oCell.Font.Name = "Times New Roman"
                     oCell.HorizontalAlignment = xlCenter
                 Case 30 To 39.999
                     oCell.Interior.Color = RGB(147, 168, 215)
                     oCell.Font.Bold = True
                     oCell.Font.Name = "Times New Roman"
                     oCell.HorizontalAlignment = xlCenter
                 Case 20 To 29.999
                     oCell.Interior.Color = RGB(183, 198, 228)
                     oCell.Font.Bold = True
                     oCell.Font.Name = "Times New Roman"
                     oCell.HorizontalAlignment = xlCenter
                 Case 10 To 19.999
                     oCell.Interior.Color = RGB(218, 225, 240)
                     oCell.Font.Bold = True
                     oCell.Font.Name = "Times New Roman"
                     oCell.HorizontalAlignment = xlCenter
                 Case 0 To 9.999
                     oCell.Interior.Color = RGB(230, 237, 253)
                     oCell.Font.Bold = True
                     oCell.Font.Name = "Times New Roman"
                     oCell.HorizontalAlignment = xlCenter
                  Case Else
                     oCell.Interior.ColorIndex = xlNone
         End Select
    Next oCell
    
End Sub

 

Macro 3 sorts the 2016 opioid overdose mortality rates in descending order.

‘Macro3
Sub SortColumn()

Dim DataRange As Range
Dim keyRange As Range
Set DataRange = Range("J1:N52")
Set keyRange = Range("N1")
DataRange.Sort Key1:=keyRange, Order1:=xlDescending

End Sub

 

Macro 4 hides the font from the heat map.

‘Macro4
Sub HideFont()
    
Dim rng As Range
Dim oCell As Range

Set rng = Range("K2:N52")

    For Each oCell In rng
        oCell.Font.Color = oCell.Interior.Color
    Next oCell
    
End Sub

 

Macro 5 creates thick white borders for each cell in the table.

‘Macro5
Sub WhiteOutlineCells()

    Dim rng As Range

    Set rng = Range("J1:N52")

    With rng.Borders
        .LineStyle = xlContinuous
        .Color = vbWhite
        .Weight = xlThick
    End With
    
End Sub

INSERTING DATA AND RUNNING MACRO

The five macros are assigned to a button in the Excel Macro-Enabled Workbook. Pressing the button will perform the task of creating a 6-Color scale heat map. Download the Excel Macro-Enabled Workbook here. This file will have the raw data and the macro-enabled worksheet for you to create a heat map.

Step 1: Copy the raw data from the “combined” worksheet.

Figure 9 - Copy Data.png

Step 2: Paste it in the worksheet “heatmap_2” starting on cell "J1".

Figure 10 - Copy onto J1.png

Step 3: Then press the “PressStart” button to run the macros. Your final heat map should look like the following:

Compare the heat map from Method 1 (3-Color scale) to the one generated by Method 2 (6-Color scale). The heat map with the 6-Color scale has a lighter pattern compared to the 3-Color scale heat map. The differences are dramatic. Depending on the granularity of the heat map you want, either one of these color scales would be fine. However, Method 2 requires some VBA coding.

* Not all states shown.

 

Conclusions

Heat maps allow us to observe patterns in the data. In our example, we notice that West Virginia has a high rate of opioid overdose mortality indicated by the clustering of dark cells from 2013 to 2016. Other states had similar patterns as West Virginia. Using heat maps provides a quick and easy interpretation of the changes in opioid overdose mortality across time and the states that are clustered together that have high rates of opioid overdose mortality.

 

REFERENCES

I used the following websites to help develop this tutorial.

Conditional formatting with more than 3 categories:
https://sites.google.com/site/excelforbeginners/Home/vba-codes/conditional-formatting-more-than-3-criterias
 
Changing the RGB color codes:
https://analysistabs.com/excel-vba/change-background-color-of-cell-range/
https://www.w3schools.com/colors/colors_rgb.asp

Excel color palette library:
http://dmcritchie.mvps.org/excel/colors.htm

Excellent site for VBA coding:
https://www.excel-easy.com/vba.html

 

Communicating data effectively with data visualizations - Part 9 (Cleveland Plots)

BACKGROUND

Visualizing change across two time points allows your audience to see the impact of a program’s impact. In a previous article, I demonstrated how you can use slope graphs to illustrate changes across two time points. However, there is a risk of the slope graphs becoming a tangled mess (or spaghetti plot) if too many comparisons are being made. An easier way to illustrated changes cross two time points for a large number of groups is with a Cleveland dot plot (or lollipop plot).

By using a horizontal line between two points arranged from most change to least change, your audience can quickly visualize the program’s impact and rank them according to the subject or group.

 

MOTIVATING EXAMPLE

We will continue to use the state-level drug overdose mortality data from the CDC.

https://www.cdc.gov/drugoverdose/data/statedeaths.html

Mortality rate is presented as the number of deaths per 100,000 population.

In a previous tutorial, we only looked at eight different states. In this tutorial, we will illustrate a change in all 50 states onto a single plot using a Cleveland plot.

Here is the data setup in Excel:

Figure 1 - example data.png

The difference is calculated as the difference in mortality rates between 2016 and 2010.

In Excel, select the first two rows (State and 2016 rate) and generate a horizontal bar chart. Make sure to sort the order of the 2016 rate from least to greatest.

Figure 1 a - select horizontal bar.png

The horizontal bar chart should look like the following:

Figure 2 - bar chart of us states.png

After you created the horizontal bar plot, you will need to add error bars. Select the Design tab > Add Chart Elements > Error Bars > More Error Bars Options:

figure 3 - adding error bards.png

In the Format Error Bars options, make sure to check Minus under Directions, No Cap under End Style, and Custom under Error Amount:

figure 4 - error bars format.png

Click Specify Value and then select the differences column for the Negative Error Value:

figure 5 - select the error bar values.png

On the bar chart, select the bars and then under the Format

figure 6 - no fill selection.png

The horizontal bar chart should have the fill remove and the error bars present and should look like the following:

figure 7 - error bars with differences and no fill.png

Select the error bar and go to the Format Error Bars option. Under the Joint type select Round and under the Begin Arrow Size select the Oval Arrow. This will add a circle on the one end of the error bar.

figure 8 - error join type round.png

You can also increase the size of the Oval Arrow by selecting a larger size from the Begin Arrow Size drop down.

 

After selecting the Oval Arrow for the Join Type and the Begin Arrow Type, your chart should look like the following:

figure 9 - error bars with rounded caps.png

The next steps will require you to add a second data series. Right-click on the chart area and select the 2010 mortality rates.

figure 10 - adding a series.png

After selecting the data, make sure to map the Horizontal Axis Labels to the corresponding states.

figure 10a - adding a series p2.png

Your chart should now include the second data series (2010 mortality rates) as horizontal bars.

figure 11 - data series2.png

In the next steps, you will add error bars to the 2010 mortality horizontal bars (similar to the error bars for the 2010 mortality rates data). Instead of adding error bars for the Minus, you will add error bars for the Plus.

figure 12 - horizontal bars series 2.png

Similar to the 2016 mortality rates data, you will remove the horizontal bars by selecting No fill and then setting the Series Overlap to 100%

figure 13 - series overlap.png

Select the error bars and then change the Begin Arrow Type to Oval and increase the size.

figure 14 - oval type series2.png

Your chart should now include two dots with lines between them for each state.

figure 15 - second dots are visitble.png

After a few more adjustments to the dot size and colors, the final Cleveland plot can look like the following:

Figure 1. Cleveland plot comparing the drug overdose mortality rates between 2010 and 2016.

figure 17 - final figure.png

The dots give us a good illustration of the magnitude of change in the drug overdose mortality rates between 2010 and 2016. Additionally, using the colored fonts help to map the drug overdose mortality values with the dots. For example, West Virginia had the highest drug overdose mortality rate in 2016 (blue dot) and a large increase in drug overdose mortality rates between 2010 (red dot) to 2016. Nebraska has the lowest drug overdose mortality rate in 2016 (blue dot) among the states, which was lower than in 2010 (red dot).

 

CONCLUSIONS

Cleveland plots can illustrate the change in drug overdose mortality rates between two time points for multiple groups without cluttering the chart space. Unlike slope graphs which can be difficult to distinguish between states, the Cleveland plot separates each state into their own rows allowing for a simple estimation of change across two time points. Moreover, it is also easier to see the magnitude in change in between 2010 and 2016. We recommend using Cleveland plots when you have a lot of groups (e.g., states) with an outcome that changes across two time points (2010 and 2016).

 

REFERENCES

I used the following websites to help develop this tutorial.

http://stephanieevergreen.com/lollipop/

https://policyviz.com/2016/02/04/lollipop_graph_in_excel/

https://zebrabi.com/lollipop-charts-excel/

https://peltiertech.com/dot-plots-microsoft-excel/

 

The following video provides step-by-step instructions in making a horizontal lollipop chart:

https://www.youtube.com/watch?v=tHa8eEb-LTg

 

The following website provides examples of lollipop charts:

https://www.r-graph-gallery.com/lollipop-plot/

 

 

Communicating data effectively with data visualizations - Part 8 (Slope graphs)

BACKGROUND

Suppose you wanted to show a change across two periods in time for several groups. How would you do that? One of the best data visualizations for this is the slope graph. In this tutorial, we’ll show you how to create a slope graph in Excel and highlight a state using contrasting colors.

 

MOTIVATING EXAMPLE

We will use the state-level drug overdose mortality data from the CDC.

https://www.cdc.gov/drugoverdose/data/statedeaths.html

Here is a sample of set of data from the CDC:

figure 1.png

Each value is a rate for the number of deaths per 100,000 population.

 

In Excel, select the three columns (State, 2010 rate, and 2016 rate).

figure 2.png

Go to Insert Chart and select Line with Markers.

figure 3.png

The chart should look like the following:

figure 4.png

Right-click anywhere on the chart and click Select Date and then click on the Switch Row/Column.

figure 5.png

The Y and X axes should switch places and the chart should look like the following:

figure 6.png

Delete the Chart title, legend, and Y-axis.

 

Right-click on the X-axis and select Format Axis. Then select “On tick marks.”

figure 7.png

This will move the lines to the edges of the chart.

figure 8.png

Now, we want to add labels to the ends of each line. Right click on one of the lines and select Add Data Labels. Then select the left data label and format it according to the following guide:

figure 9.png

Do the same thing for the right label, but make sure to use the right label position. Bring both ends of the chart closer to the middle to give your data labels room to expand (otherwise, the state will stack on top of the value, which you do not want). The chart should look like the following:

figure 10.png

Finally, you can format the figure using different colors and removing the excess gridlines. Try to experiment with different shades of color to generate the desired effect. In our example, we wanted to highlight West Virginia from the rest of the other states.

 

Here is what the slope graph looks like after some tweaking and adjustments:

figure 11.png

In the above slope graph, West Virginia was highlighted using a cranberry color to give it a greater contrast to the lighter grey of the other states. You could use this method to highlight other states of interest such as New Hampshire or the District of Columbia, which had greater percentage increases between 2010 and 2016.

 

In the final slope graph, we highlight West Virginia and New Hampshire in different colors to distinguish them from the other states.

figure 12.png

CONCLUSIONS

Using slope graphs can show simple trends across two periods in time. More complex slope graphs can have multiple time points and more states. In our simple example, we highlight the key steps to generate a slope graph in Excel. Using colors to highlight certain states can make a simple slope graph more visually appealing and assist the eyes in identifying the states that are important according to the data.

 

REFERENCES

Knaflic, CN. Storytelling with Data. 2015. John Wiley & Sons, Inc., Hoboken, New Jersey.