Data visualization

Communicating data effectively with data visualization – Part 15 (Diverging Stacked Bar Chart for Likert scales)


Surveys or questionnaires are used to capture respondent’s perceptions about any number of products, ideas, or subjects. You can ask someone how they are feeling to how much they agree with a particular statement. Or you can ask someone if they are satisfied with a product or service they recently received. Items (or questions) in surveys can solicit these types of responses. Free response questions allow the respondents to write their responses in an unstructured manner as long as it answers the question or purpose of the survey. But most surveys ask questions that require a specific response using multiple choice, ratings scale, or Likert scales.

In this article, we will discuss the Likert-type scale and how we can visualize this using Excel.



The Likert scale was first developed by Rensis Likert, who was a psychologist in the early 20th Century. He developed the 5-point Likert scale as part of his PhD dissertation in order to capture peoples’ ratings of international affairs. The Likert scale is unique because it provides a rating that is ordered sequentially (Positively to Negatively or Agreement to Disagreement).


Figure 1. Example of a 5-point Likert scale.

Figure 1.png

The 5-point Likert scale is quite common in psychometric research. A statement is usually provided and the participant is asked to rate their level of agreement. Notice how the scale is ordered sequentially from Strongly Disagree in the left of the scale and Strongly Agree to the right of the scale. This is an important feature of Likert-type scales with an added convenience, higher values are associated with higher agreements. You can reverse this as well, but we will keep the order for the remainder of this article.



There are many ways to visualize a Likert scale. We can use pie or bar charts to capture the different responses to a Likert-type question or statement.


Figure 2. Bar and Pie charts used to visualize Likert scale responses.

Figure 2.png

However, the best way to visualize Likert scales is to build a Diverging Stacked Bar Chart.

Figure 3. Diverging stacked bar chart using a set of hypothetical data for three statements.

Figure 3-1.png

The red dotted line in Figure 3 represents the divergent point where the stacked horizontal bar chart aligns. This is effective when you want to suggest that certain set of ranked responses are more important than the other. In this example, Strongly Agree and Agree are given more precedence than the other ranks. Rightly so, since a majority of the responses were Strongly Agree or Agree.



Let’s assume that we administered a survey with three questions. The results are as follows:

Figure 4.png

We have a 5-point Likert scale with responses in all the different ranks. A majority of the responses were either Strongly Agree or Agree, so we’ll create a diverging point with these two ranks.



The Excel file for the tutorial is located here.

There are two ways to create this diverging stacked bar chart.

Method 1: I learned how to create diverging stacked bar charts from Stephanie Evergreen’s blog Evergreendata.  I got her method step-by-step and then go over an alternative method.

Part 1.1 Estimate the buffers at the end of the stacks.

First, change the values to percentages (Step 1). Then, determine where the divergence will occur (Step 2). For our example, the divergence is between Neutral and Agree.

The next step involves estimating the values at the ends of the stacked bar chart. There are two ends, left and right (Steps 3 and 4). Once the buffers have been estimated, we will plot the stacked bar chart.

Part 1.2. Plot the divergence stacked bar chart.

First, select all the data (Step1). Then Select the 100% Stacked Bar Chart from the Insert tab (Step 2). This should generate a default stacked bar chart (Step 3).

Right-click on the any area in the chart and click on the “Select data…” to change the data arrangement (Step 4). Select the “Switch row/column” to change the Y-X arrangement of the data.

Once you switch the rows and column, the chart will change and look like the one below.

Now, we want to remove the color of the buffers at the ends of the stacked bar (Step 7). Right-click the left end of the stack and select “No Fill” (Step 8). Repeat this for the right end of the stack.

The diverging stacked bar chart should resemble the figure we presented at the beginning of this article.

Figure 11.png

Removing the buffer labels from the legend, deleting the grid lines, changing the font (Adobe Gothic Standard B), and changing the stacked bars colors can improve the figure.

Figure 12.png

The challenge with this chart is the labels on the axis. The statements are too far to the left of the diverging stacked bar chart. To fix this, delete the labels on the left and insert text boxes with the statements.

Figure 13.png

Remove the borders, right alignment with the statements, and add labels. We should be very close to the figure we presented above.

Figure 14.png

Here is the final diverging stacked bar chart.

Figure 3.png

Method 2: I learned this other method from John Peltier at his Peltier Tech Blog.

This method requires you to create a divergent point based on distance. For instance, if you want to make sure that you have the divergent point where the responses are at Strongly Agree and Agree. So, subtract the distance from that point.

Figure 14-5.png

First, change all the values left of the divergent point to negative (Step 1).

Figure 15.png

Then rearrange the order of responses so that the furthest rank is closest to the divergent point (Step 2).

The select the data (Step 3), Insert a 100% Stacked Bar Chart (Step 4), and then Visualize the chart (Step 5).

Right-click anywhere in the chart area and click “Select Date…” (Step 6). Then Switch row and column (Step 7).

You will get the chart on the left below. However, this is not complete. Right-click on the Y-axis (Step 8) and the Format Axis (Step 9).

Change the label position for “Low” (Step 10) and then review the chart. Notice where the divergent point is located at. This is the same as the previous stacked bar chart that was constructed using Method 1.

Now, you can change the colors, delete the gridlines, remove the X-axis to create a plot below. Notice that there are some values that are negative. That’s because of the data we rearranged earlier to generate the distance from the divergent point. To fix this, you will need to manually change the values.

Figure 21.png

After manually change the values, your plot will look similar to the one below.

Figure 22.png


Visualizing the Likert scale using horizontal diverging stacked bar charts is a good method to see how the participants respond to questions or statements on a survey or questionnaire. However, not all Likert-type scales will necessarily need a diverging stacked bar chart to illustrate its point. You can also use a conventional stacked bar chart, which we will discuss in a future article.

The Excel file for this tutorial is located here.



I used the following references to help write this article.

Stephanie Evergreen’s blog Evergreendata is an excellent resource for learning about other data visualization methods.

John Peltier’s blot  Peltier Tech Blog is another wonderful resource where you can learn more about Excel charts and data visualization.

The following paper provides details on how to create diverging stacked bar charts using R.

Heiberger RM, Robbins NB. Design of diverging stacked bar charts for Likert scales and other applications. Journal of Statistical Software. 2014;57(5): 1-32.

Communicating data effectively with data visualization – Part 14 (Gantt Charts)


A useful calendar can be helpful in scheduling your meetings, avoiding conflicts, and remembering important dates. Applying data visualization to a calendar can help to identify key events throughout the day, week, or month. Here is an example of a color-coded calendar for a single person from Sadiq Javer from BoostSolutions.

Figure 1 - exampel calendar.png

Each meeting or event is color coded to indicate a particular category. For a single user, this is sufficient to manage a complex day, week, or month. However, if you are a project manager or lead, managing the calendars of a group or team, this task can be challenging.

A solution is to use a Gantt chart to organize the calendars of several members in your team. Gantt chart is a type of bar chart that provides a longitudinal visualization of schedules and timelines. It was invented by Henry Gantt who is known for his work on scientific management. Gantt charts are useful for project management and can illustrate major deadlines or milestones in the project’s life cycle.

Conveniently, the same tools used for project management can be applied to managing schedules for multiple team members in a group. In this article, we will apply the Gantt chart to managing a team’s schedule using Excel.


We will create a Gantt chart using a hypothetical team’s schedule to visualize their and vacations.

Download the Excel sheet here.

Suppose we have a team who will be taking vacation in the upcoming calendar year (2019). There are several important dates that the Team will need to block for meetings. In order to avoid conflicts, a Gantt chart is used to plan an efficient annual schedule.

Here is a figure of our Gantt chart.

The Gantt chart blocks weekend and holidays so that the manager can easily see the entire 7-day week. Each column represents a day nested in a 7-day week. The months are color coded to identify when it begins and ends. Each staff has a unique color to identify their days off, and the Team meeting is highlighted in red to indicate the critical meeting dates.


You can use any version of Excel to build the Gantt chart. After opening a new Excel sheet, follow these steps.

Step 1. Resize the column’s width:

Figure 3.png

Resizing the column’s width to 2.33 seems to give an efficient size cell for the days.

Step 2. Assigning days and months:

In our example, each column represents one day. Therefore, we can assign 7 days into a week. Since the month starts on different days, we make sure to start with the correct day in our calendar. In 2019, January begins on Tuesday, therefore, our Gantt chart will start on Tuesday.

Figure 4.png

Step 3. Highlight the weekends and holidays:

Hopefully, your team doesn’t have to work on the weekends. However, there are exceptions. Clinicians work on the weekends, so your Gantt chart may need an indicator for differential pay (if it is part of the benefits). In our example, we will assume that no one from the team works on the weekends.

The holidays are highlight with a different color from the weekend

Figure 5.png

Step 4. Include the team members and Meetings to the Gantt chart.

Once you block out the holidays and weekends, you can start entering information on meetings and team members’ vacations. Different colors were used for the meetings and individual team members to provide easy visualization.

Figure 6.png


The final Gantt chart should be able to help you organize your team’s schedule while making sure that there are no conflicts with important team meetings or deadlines. Although Gantt charts were designed for project management, it can also be used to efficiently manage a team’s complex schedule.

You can download the Excel exercise at this link.


I used the following references to assist me with this tutorial.

Sadiq Javer’s article on Gantt charts published on the website.

Wikipedia’s page on Gantt charts, Henry Gantt, and Scientific Management.

Communicating data effectively with data visualization - Part 13 (Box and Whisker Diagrams)


Box plot (box and whisker diagram) is a great way to display distribution of a continuous (e.g., interval) data variable. A typical box plot will contain the mean, median, interquartile values, and the minimum and maximum values. Figure 1 illustrates these elements on a box plot. Up until recently, Microsoft Excel did not have an option to graph box plots. However, in the 2016 version of Microsoft Excel, box plots were added as part of the statistical features.

Figure 1. Example of a box plot (box and whisker diagram (Figure 1).

Figure 1.png


We will use data that was randomly generated to create box plots across four hypothetical quarters (Q1FY19, Q2FY19, Q3FY19, and Q4FY19). The data will contact the number of visits to the doctor from several outpatient specialty clinic. Here is what the data looks like from the first two sites. Data for the example can be found here.

Figure 2.png

Site 1 has 45 visits in Q1FY19 and Site 2 has 44 visits in Q1FY19. To create the box plots, we need to use the long format which uses multiple rows for each site.



In this article, we will generate box plots that will visualize the average number of visits and its distribution across quarters.

Figure 3.png

After clicking on the Box and Whisker plot, you will need to select the data that will be used to generate the box plots across the quarters.  

Figure 4.png
Figure 5.png
Figure 6.png

Click “OK” and the default box plot will look like Figure 2.  

Figure 2. Default box plot generated by Excel 2016.

Figure 7.png

After a few changes to the color and labels, our box plot can be improved (Figure 3).

Figure 3. Updated box plots.

Figure 8.png

These box plots give us an idea of the changes in the number of visits across quarters including the distribution of the data. For each box plots, the mean indicated by the “X” is not too different from the median (indicated by the solid horizontal line).  However, there is greater variation in the distribution of the number of visits in Q2FY19 and Q3FY19 compared to Q1FY19 and Q4FY19. We can see that there was an increase in the number of visits, on average, between Q1FY19 and Q3FY19, but this drop significantly in Q4FY19. This may be due to some kind of change (e.g., seasonal variation) and should be explored.



The box plot provides us with a nice data visualization of the mean number of visits across quarters including the variation and distribution of the data. Plotting these in Microsoft Excel 2016 will allow you to explore your data and motivate you to explore and generate some explanation or hypothesis for their behavior.


I used several online references to write this article.

The Dummies series provide a good illustration of the box plot elements, which can be located here.  

I watched this YouTube video by stickpet on how to use Microsoft Excel 2016 to generate box plots.

Communicating data effectively with data visualization - Part 12 (Waffle Charts)


In data visualization circles, the pie chart is considered an inefficient tool to convey parts of a whole. Edward Tufte often criticizes the use of the pie chart to display data visually stating that

A table is nearly always better than a dumb pie chart; the only worse design than a pie chart is several of them, for then the viewer is asked to compare quantities located in spatial disarray both within and between charts.”[1]

The major reason why pie charts are disliked by data scientists and other pundits is due to the way our brain works. Mostly, we are good at judging things visually, but with pie charts, it is hard to distinguish the relative proportion of a slice to the whole. For example, can you tell the difference in proportions between the two pie charts below? Which one has more of component B?

Figure 1.png

Since it is challenging to identify the differences between the two pie charts, several alternatives exists to present the data accurately and effectively. In this article, we will discuss one such method using a waffle chart.

Waffle charts are grid-based visuals that have equal size blocks that convey parts of a whole accurately and efficiently. Some have called waffle charts the “square pie charts.” They are usually proportional and arranged in a 10 x 10 grid.

Figure 2.png

Colors can be used to distinguish the contribution of groups or categories to the whole where each square represents a percentage point totaling to 100.

Figure 3.png

Waffle charts are great at presenting data where you are describing the proportions or parts of a whole and should be used instead of pie charts.



We will use the 2016 National Healthcare Expenditure dataset to illustrate the use of waffle charts. We will compare expenditures from the decades between 1965 and 2015.

You can download the data from the National Health Expenditures Accounts (NHEA) website:

The data provide the percentages of expenditures for different components spent on health for each decade from 1965 to 2015 (Table 1).


Table 1. National Health Expenditures from 1965 to 2015.

Figure 4a.png

In the above table (we will start with 1965), 89% of Health Consumption Expenditures was due to Personal Health Care (83%), Government Administration And Net Cost Of Health Insurance (4%), and Government Public Health Activities (2%). The remainder was spent on Investments (11%). We have to estimate the cumulative percentage spent across the different categories to generate our waffle charts.

This is easily done by summing the individual components (Personal Health Care, Government Administration And Net Cost Of Health Insurance, Government Public Health Activities, and Investment) so that they will total 100 (Table 2).


Table 2. Cumulative values of the individual expenditures.

Figure 4b.png

Step 1. Setting up the grid

In Excel, we want to create a 10 by 10 square grid. To do this, change your view from Normal to Page Layout. We are doing this because Excel has a unique way of measuring row and height size (they are not on the same scale under the Normal view). When you change to the Page Layout, the scales for the columns and rows are in inches. Set the size for the columns and rows to 0.5 inch. You should have a 10 by 10 grid with squared that have a height and length of 0.5 inch.

Figure 5.png

Step 2: Label the squares in the grid

Once the grid has been sized correctly, we will assign a value for each square in the grid. These values will correspond to the percentage point that you have in the cumulative values in Table 2. Start at the lower left with a cell value of 1 up to a cell value of 100 in the upper right of the grid.

Figure 6.png

Step 3: Apply conditional formatting for each cell in the grid

Once the cells have been assigned a value, we can use Excel’s conditional formatting tool to assign colors for each of expenditure components.

Select the cell that has the value “91.” (It doesn’t matter what cell you select, but we will use the upper-most left corner for simplicity; this is also cell D5).

Change the Select option to “Classic.” Then change the New Formatting Rule to “Use a formula to determine which cells to format.”

Figure 7.png

Insert the formula where you have the value is less than or equal to the total cumulative percentage (e.g., 100 percent).

Figure 8.png

The cells for the equations are specific to this example, but you can apply this towards an example of your own. (The Excel file that is based on this article can be downloaded here.)


Step 4: Change colors to match the different expenditure categories

Next, we will change the fill and font colors so that they will match. By having the font color the same as the fill color, the values in the cells will appear invisible, but still referenced using the conditional formatting rules we just created.

Figure 9.png

Step 5: Apply the conditional formatting to ALL the cells in the grid

After you change the colors for the font and fill, you will need to make sure that the conditional formatting rule is applied to the entire grid.

Figure 10.png

You should notice that all the cells in the grid are the same color. This is because the conditional formatting is first based on the total cumulative percentage, which is 100. Therefore, all the values in the cells should be the same color.

Figure 11.png

Step 6: Add a new conditional formatting rule for the next expenditure component

The next step is to apply another conditional formatting rule for the next highest cumulative percentage value, which would be Government Public Health Access, which is 89 percent.

Figure 12.png

We repeat the process for each expenditure component. When we are done, we should have four conditional formatting rules for each expenditure component.

Figure 13.png

All the formulas for the conditional formatting are listed below:

Figure 14.png

Step 7: Repeat this process for the other decades

Once we complete these conditional formatting for the other decades, we can present these waffle charts together.


Final waffle chart

The following waffle chart incorporated the health expenditures for each decade starting from 1965 to 2015. The border color was changed to white and the label used the Century Gothic font. Inside each waffle is the percentage of Investment associated with each decade. You can download this exercise’s Excel file here.

Figure 15.png

Based on the waffle charts, we can see that Investments (spending for noncommercial biomedical research and expenditures by health care establishments on structures and equipment) has decreased over time. Conversely, expenditures for Personal Health Care, Administration, and Health Activities have increased.


Using waffle charts is a better alternative to pie charts because we can discern the exact value of the parts that make up the whole. In this case, we can easily visualize the decrease in Investments when it comes to health expenditure spending in the US for each decade between 1965 to 2015.


You can re-create these findings using the Excel file located here.


I used the following references to assist with the development of this article. They have been incredibly helpful in learning the methods and better understanding how to leverage the power of using waffle charts.


[1] Tufte ER. The Visual Display of Quantitative Information. 2001. Graphic Press. Cheshire. CT.


Everyday Office’s YouTube video

Michael Sandberg’s Data Visualization Blog

Robert Kosara’s Eagereyes Blog

Sumit Bansal’s Trump Excel: The Smart Way Blog

Jonathan Schwabish’s PolicyViz Blog provides another method to creating waffle charts using data validation




Communicating data effectively with data visualization - Part 11 (Waterfall charts)


Changes in data values happens. But when you want to visualize these changes, you need to choose the visualization that best explains the narrative. In this article, we will explore the use of waterfall charts, which are a form of data visualization that illustrates changes from the reference value across some sequentially ordered axis (e.g., time).



I created a dataset where the average duration of an academic detailing educational outreach visit was estimated from Quarter 1 Fiscal Year 2017 (Q1 FY17) to Quarter 4 Fiscal Year 2018 (Q4 FY18). The initial dataset includes the reference (or base) average duration (in minutes) of an academic detailing educational outreach visit followed by positive and negative changes to the reference value. (Excel example can be retrieved from the following link.)

Figure 1.png

There are a few things to consider when reviewing this table. The reference value is 25 minutes, which was the average duration in Q1 FY17. However, in Q2 FY17, the average duration decreased by 5 minutes for a new value of 20 minutes. Then in Q3 FY17, the average duration increased by 15 minutes for a net gain of 10 minutes from the reference value of 25 minutes (25 + 10).

Figure 2.png

Understanding how these values reflect the net gain from the reference value is critical in building the waterfall chart.



Step 1: Create three new columns (base, fall, and rise)

We need to do some calculations to generate the base values for the waterfall chart.

Start by creating three additional columns and label them as base, fall, and rise (I learned how to design my table from this YouTube video by United Computers).

Figure 3.png

 Step 2: Estimate the value for the fall column

For the fall column, we want to look at the Duration column. If the value in the Duration column is positive, then we need to have the value in the “fall” column as zero. If the value in the Duration column is negative, when we want the absolute (positive) value in the “fall” column.  

Here is an illustration of what the “fall” column calculation requires.

Figure 4.png

Step 2: Estimate the value for the rise column

To estimate the rise column we, again, look to the Duration column. If the value in the Duration column is positive, we enter it into the “rise” column. If the value in the Duration column is negative, then we enter 0 in the “rise” column.

Here is an illustration of what the “rise” column calculation requires.

Figure 5.png

Step 3: Estimate the value for the base column

The base column provides the necessary reference for us to create the waterfall chart. Unlike the other columns, we start in the base column Q2 FY17 row since the “base” column value in Q1 FY17 is the reference. Then we add the “rise” column in Q1 FY17 and subtract the “fall” column in Q2 FY17. We do this for all the remaining cells.

Here is an illustration of what the “base” column calculation requires.

Figure 6.png

Step 4: Review the final data

The final data should look like the following.

Final data.png

Step 5: Create a stacked bar chart

Select the data from the Time, base, fall, and rise columns and then insert a stacked bar chart.

Figure 7.png

You should have a chart that looks like the following.

Figure 8.png

Step 6: Hide the base stacked bars

The next step is to hide the base column values by selecting the blue bars and changing the formatting to No Fill and No Line.

Figure 9.png

Step 7: Format the chart for final presentation

You should format the chart using colors that indicate the direction of the rise and fall in the average duration of educational outreach visits. I selected blue for an increase and red for a decrease. I also added the Y-axis label and decreased the gap width to 10%.


Final waterfall chart

Figure 10.png


The waterfall chart visualizes the rise and fall in average duration of an academic detailing visit across two fiscal years. This visualization provides the viewer with an intuitive summary of the changes that occurred and whether action plans are needed. From this example, we notice that the changes did not deviate from the base of 25 minutes per visit across two fiscal years. Therefore, there may not be a need to spend resources to investigate this element of academic detailing.

In this example, I re-created the waterfall chart to illustrate seasonal changes.

Figure 11.png

Like the previous waterfall chart, this is a clear and intuitive visualization of the changes that occurred across two fiscal years. The red bars easily illustrate that the average duration decreased immediately after Q1 FY17, but increased after Q1 FY18 only to follow the same pattern as the previous fiscal year. This tells us that something is going on during Q1 of each fiscal year, which may require some kind of investigation of the program at the beginning of each fiscal year.

Microsoft Office 365 for Windows and Microsoft Office 2016 for the Mac have a new feature for creating waterfall charts. You may need to check the Office Updates to acquire these new features.

waterfall icon.png

You can learn about these in the following Microsoft Office sites (link 1, link 2)



I used the following YouTube video by United Computers to help generate the waterfall charts in this article.

I also referred to Cole N. Knaflic’s blog on waterfall charts.


Developing choropleths using the United States Veterans Integrated System Network (VISN) shapefiles


When I want to present VISN-level data, I consider using choropleths because they are visually appealing and provide a good reference to other VISNs. Choropleths are maps that uses polygons or shapes that are shaded according to a metric such that the color indicates the intensity of that metric. For instance, if you wanted to compare population density across different states, you can use a choropleth to illustrate this difference.

An example of a choropleth comes from the Centers for Disease Control and Prevention that illustrates the prevalence of obesity by state. The legend tells us the prevalence of obesity at each state and the colors denote the level of the prevalence. The cranberry color denotes an obesity prevalence of 35% or greater whereas a lighter green color with dots denotes a prevalence of less than 20%. This choropleth providers a quick visual guide on the prevalence of obesity across the United States (U.S.).

Figure 1.png

Motivating example

In past reports, I have generated a choropleth using VISN-level data. Unlike the U.S. shapefile (map files with coordinates; normally with the *.shp extension), the VISN shapefile is specific to the VA and doesn’t not follow the borders of the states used in typical U.S. shapefiles.

In this example, I will provide a step by step guide on how you can generate your own VISN-level choropleth for use in reports and presentations. The files for this tutorial are available on following Dropbox link.



Step 1: Download QGIS

Download QGIS, which is a free Geographic Information Systems (GIS) software for both the Windows and Mac operating systems. Watch the following video for a step-by-step guide on downloading and using the program. The program is simple to use and does not involve any coding. After you install the software, proceed to the next step. (Contact your local IT support to have this installed on a government-owned system.)


Step 2: Download the VISN shapefiles

The VA provides shapefiles online at the following link. Download the file titled This file will contain all the necessary files that you will need to build your choropleth.


Step 3: Download VISN-level data on total population

We will need VISN-level data to join with the VISN-level shapefile in QGIS. You can download the data from the following VA public site. Go to the Population Tables and download the “All Veterans Integrated Service Networks” Excel file. It will contain data on the total population at each VISN.

With QGIS, you will need to have two types of files for the data. I recommend using a text editor (not the Windows native notepad) such as Notepad++ or Brackets. In the text editor, open the file with the data and save it as a *.csv. The reason we do this is to make sure that the data is in text format. There are certain values that you want to ensure include the “0” in front of the other numbers (e.g., “01,” “02,” “03,” etc). If you don’t include the “0” you will not be able to join the data to the shapefiles.

After you save this as a *.csv (please include the extension onto the title), then you can open a new document on Notepad++ and enter the data column format. For instance, if column 1 is in text format, then type “String” for the first column. If the second data column is in numeric format, type “Integer.” We have a total of seven columns; therefore, we need to have seven data formats. See the example below.

Step 4: Open QGIS and add the VISN shapefile layer

Click on the Layer tab > Add Layer > Add Vector Layer and browse for the VISN shapefile.

Click on Open and make sure to click Add to add the VISN shapefile onto your QGIS software workspace. You should see the following image appear.

Notice how the polygons are in the form of the VISNs instead of the states. This is an important difference between what you see with the U.S. shapefiles and the VA shapefiles.


Step 5: Add the VISN-level population data

Include the VISN-level population data by downloading it from the VA public site. However, you can also use the Dropbox link that I host with the files already formatted for QGIS here. For this exercise, it would be easier to use the files I provide in the Dropbox link since the formatting may be challenging to implement. For more discussion about the proper formatting, please watch the following video.

You add the VISN-level population data by dragging and dropping it into the Layers panel. Use the file titled “visn_population_2018.csv” and make sure to drop it into the Layers panel. QGIS will automatically recognize the data types because the “visn_population_2018.csvt” file is in the same folder as the “visn_population_2018.csv.”

Step 6: Join the data to the shapefile

Double-click the VISN shapefile in the Layers panel; this will open a new window called the Layers Properties. Click on the JOIN icon and select the data you want to join to the VISN shapefile. Make sure to select VISN for the Join and Target field. This will use the VISN number to join the data to the shapefile. After you select OK, make sure to click on Apply.

Step 7: Adding classes and color

In the Layer Properties window, select the Symbology icon, which will open the menu to add classes and change the color of the different classes. Above the column field, select the Graduated level. In the Column field, select the visn_population_2018_Total, which is the total population of the VISN. Then select Quantile in the Mode field. Change the color ramp field to a blue hue. Click apply and you should immediately see the VISN shapefile file change colors in the workspace.

Your project workspace should look like the following map.

Step 8: Adding labels

Click on the Label icon and select the Single Labels option. Select the Labels variable and then click apply. This may take a while since QGIS has to identify the polygon’s location and insert the labels. The labeling may take about 3-5 minutes because the VISN shapefiles have layers and polygons. I recommend saving this step for when you export the final image generated using the composure function of QGIS to save yourself time.

After the labeling is done, you should see the VISN labels for each polygon.


Step 9: Use the composer to finalize your choropleth

The composer is QGIS’s workspace that allows you to customize the choropleth. Select the composer and name it “VISN population” and then select the sections you want to insert into the composer using the Adds New Map to the Layout icon. Once everything is finalized, you can export this as a *.png or *.pdf file. (At this time, you may turn on the labels if you waited to add these at the very end.)

This is what the choropleth looks like after we finish composing it.

Step 10: Add a coordinate reference system (CRS)

Right now, the map is not an accurate portrayal of the United States in relation to the surface of the Earth. It should be more round at the top due to the distance from the North Pole and the fact that the Earth is a sphere. To ensure that we are accurately portraying the U.S., we need to install the appropriate coordinate reference system (CRS). To do this we need to first click on the Properties of the project and select CRS. We add the CRS from the server using the Datum Transformations window. We change both the Source and Destination CRSs and use the USA_Contiguous_Albers_Equal_Area_Conic CRS (ID = EPSG:102003).

Once the CRS is installed into our CRS database, we can select it to change the shape of the map to correctly conform to the shape of the Earth.

This is what the choropleth looks like in the project workspace.

After we add the labels and compose the final elements, the choropleth looks like the following.


Using choropleths can highlight important differences across VISNs that would be lost in a table or difficult to present in an alternative chart. Based on the population levels, VISN 22, 17, 10 and 8 have large populations of veterans receiving care at the VA. Areas where there is low prevalence of veterans are in VISN 1, 5, 9, and 15. One thing to consider is that we did not normalize the data based on a single denominator. You can play around with how you want to do that by using the U.S. census, which can be found here. As an added exercise, see if you can create something similar using the U.S. shapefiles, which are located here. Additionally, you can use multiple choropleths (small multiples) to show changes across time or another dimension. Choropleths are excellent visuals that can contribute to a narrative; using the VISN shapefiles will allow you to generate visuals that can enhance a report or presentation.



I used the following references to develop this tutorial.


Download QGIS here:


VA population data:


VISN shapefiles:



Communicating data effectively with data visualizations - Part 10 (Heat Maps)


A heat map is a data visualization tool that uses positioning and coloring to identify clusters and correlations in multivariable analysis. The most common type of heat map is a 2 x 2 matrix, where two variables are examined using the rows and columns (R x C) positions on a matrix. A heat map matrix helps us identify any patterns or similarities across the different dimensions. In a heat map, color is critical in denoting degrees of change. (Please see refer to the past blog on colors.)

For example, we can see the changes in opioid overdose rates across time for each state (Figure 1). As the rates increase, the cells are darker (indicated by the legend). All the states experience an increase in opioid overdose deaths, but State 1 is experiencing it faster than States 2 and 3.

In this tutorial, we will perform two methods to create heat maps. The first method will use the built in Excel conditional formatting rules and the second method will use VBA macros.

Figure 1. Heat map matrix.

Figure 1 - heatmap matrix.png


We will continue to use the state-level drug overdose mortality data from the CDC.

Mortality rate is presented as the number of deaths per 100,000 population.

In this tutorial, we will develop a heat map using opioid overdose mortality from 2013 to 2016.

The data setup in Excel has State indicators as the rows and time indicators as the columns. We are visualizing the change in opioid overdose mortality from 2013 to 2016 for each state. Figure 2 illustrates the data structure for the first seven states.


Figure 2. Data structure for the first seven states.  

Figure 2 - Data structure.png



Excel has a convenient tool that allow us to use conditional formatting to shade our heat map matrix.

Step 1: First highlight all the data.


Step 2: In the Excel Ribbon, select Conditional formatting and then New.

Figure 3 - conditional formatting.png

Step 3: Select “Format all cells based on their values” and change the Format Style to “3-Color scale.” Change the color to the different shades you are interested in using. (In this example, we used a blue base with varying degrees of shading.) Select percentile and then click “Ok.” The percentile will use the Median for each column to distinguish the middle category for the rates of opioid overdose mortality for each state.

Figure 4 - 3-color scale.png

Step 4: Visually inspect the results. If there are no apparent pattern in the heat map, we will need to sort the rate of opioid overdose mortality for 2016 in descending order.

Figure 4 - Descending order.png

The heat map should look like the following:

Figure 6 - heat map example 1.png

There is a pattern emerging in regards to the rate of opioid overdose mortality across the different states. West Virginia has the highest opioid overdose mortality rate in 2016, but they also appear to have the highest from 2013 to 2015. The dark cell in 2016 indicates that West Virginia has exceeded the 50 deaths per 1000 population incidence rate. Other states also have high rates of opioid overdose mortality across 2013 to 2016, which continued to increase. The 2013 column is clearly lighter indicating that the opioid overdose mortality has increased over time up to the available data in 2016.


Step 5: We can improve this heat map by removing the numbers, which can be distracting. We need to select all the data and format the cells. Select only the numeric data and then format the number. Use the “Custom” category and enter the following: "";"";"";"". This will change the number values so that it doesn’t show in the cells.

Figure 7 - change the number format.png

Step 6: Change the column width and row height so that you have a nice square-like matrix for the final result. We used a column width of 5 and a row height of 30. (Only the first fifteen states are shown.)

The heat map can be used to quickly identify states with the highest opioid overdose mortality and the trends across time. The states with the highest rates of opioid overdose mortality are clustered at the top while the states with the lowest rates are clustered at the bottom.

We could also arrange this into regions of the US to further stratify the results (not shown).



Excel only allows us to choose up to 3-Color scales. If we wanted to use more than 3 color categories, we will need to use VBA macros.

But before we do, we need to think about the colors for the scales. Since we have more than 3 categories, we will need to figure out how to divide the colors.



We will need to determine the base color for our heat map. In this example, we will use a blue base-color and change the shading using the RGB color values. RGB colors are based on a system using a combination of three base colors (red, green, and blue) that can be used to change the intensity of the color from a range between 0 and 250. An example of an RGC color table can be found in the following site.

For this example, we used the following RGB color values where dark-navy denotes high rates of opioid overdose mortality (50 or more per 1000 population). However, you can change the values of these colors however you like.

Figure color codes.png

We will continue to use the blue color base and change the gradient using the RGB values.



The VBA macro comes from the site Excel For Beginners and written by Kristoff deCunha. We used the VBA code on the site, but modified it for this tutorial.

The VBA code is written in a way where any changes in the values will automatically update the color of the cell. Additionally, the code also sorts the 2016 opioid overdose mortality rate in descending order, adds thin-white continuous borders around the cells, and changes the font to Times New Roman.

Here are the VBA macros used for Method 2.


Macro 1 changes the font to Times New Roman.

Sub ChangeFont()

Dim rng As Range
Set rng = Range("J1:N52")

With rng.Font
    .Name = "Times New Roman"
    .Size = 12
    .Strikethrough = False
    .Superscript = False
    .Subscript = False
    .OutlineFont = False
    .Shadow = False
    .Underline = xlUnderlineStyleNone
    .ThemeColor = xlThemeColorLight1
    .TintAndShade = 0
    .ThemeFont = xlThemeFontNone
End With

End Sub


Macro 2 changes the cell shading to the 6-Color scale (blue-base)

Sub ChangeCellColor()

Dim rng As Range
Dim oCell As Range

Set rng = Range("K2:N52")

For Each oCell In rng
        Select Case oCell.Value
                 Case 50 To 100
                     oCell.Interior.Color = RGB(37, 54, 97)
                     oCell.Font.Bold = True
                     oCell.Font.Name = "Times New Roman"
                     oCell.HorizontalAlignment = xlCenter
                 Case 40 To 49.999
                     oCell.Interior.Color = RGB(56, 83, 145)
                     oCell.Font.Bold = True
                     oCell.Font.Name = "Times New Roman"
                     oCell.HorizontalAlignment = xlCenter
                 Case 30 To 39.999
                     oCell.Interior.Color = RGB(147, 168, 215)
                     oCell.Font.Bold = True
                     oCell.Font.Name = "Times New Roman"
                     oCell.HorizontalAlignment = xlCenter
                 Case 20 To 29.999
                     oCell.Interior.Color = RGB(183, 198, 228)
                     oCell.Font.Bold = True
                     oCell.Font.Name = "Times New Roman"
                     oCell.HorizontalAlignment = xlCenter
                 Case 10 To 19.999
                     oCell.Interior.Color = RGB(218, 225, 240)
                     oCell.Font.Bold = True
                     oCell.Font.Name = "Times New Roman"
                     oCell.HorizontalAlignment = xlCenter
                 Case 0 To 9.999
                     oCell.Interior.Color = RGB(230, 237, 253)
                     oCell.Font.Bold = True
                     oCell.Font.Name = "Times New Roman"
                     oCell.HorizontalAlignment = xlCenter
                  Case Else
                     oCell.Interior.ColorIndex = xlNone
         End Select
    Next oCell
End Sub


Macro 3 sorts the 2016 opioid overdose mortality rates in descending order.

Sub SortColumn()

Dim DataRange As Range
Dim keyRange As Range
Set DataRange = Range("J1:N52")
Set keyRange = Range("N1")
DataRange.Sort Key1:=keyRange, Order1:=xlDescending

End Sub


Macro 4 hides the font from the heat map.

Sub HideFont()
Dim rng As Range
Dim oCell As Range

Set rng = Range("K2:N52")

    For Each oCell In rng
        oCell.Font.Color = oCell.Interior.Color
    Next oCell
End Sub


Macro 5 creates thick white borders for each cell in the table.

Sub WhiteOutlineCells()

    Dim rng As Range

    Set rng = Range("J1:N52")

    With rng.Borders
        .LineStyle = xlContinuous
        .Color = vbWhite
        .Weight = xlThick
    End With
End Sub


The five macros are assigned to a button in the Excel Macro-Enabled Workbook. Pressing the button will perform the task of creating a 6-Color scale heat map. Download the Excel Macro-Enabled Workbook here. This file will have the raw data and the macro-enabled worksheet for you to create a heat map.

Step 1: Copy the raw data from the “combined” worksheet.

Figure 9 - Copy Data.png

Step 2: Paste it in the worksheet “heatmap_2” starting on cell "J1".

Figure 10 - Copy onto J1.png

Step 3: Then press the “PressStart” button to run the macros. Your final heat map should look like the following:

Compare the heat map from Method 1 (3-Color scale) to the one generated by Method 2 (6-Color scale). The heat map with the 6-Color scale has a lighter pattern compared to the 3-Color scale heat map. The differences are dramatic. Depending on the granularity of the heat map you want, either one of these color scales would be fine. However, Method 2 requires some VBA coding.

* Not all states shown.



Heat maps allow us to observe patterns in the data. In our example, we notice that West Virginia has a high rate of opioid overdose mortality indicated by the clustering of dark cells from 2013 to 2016. Other states had similar patterns as West Virginia. Using heat maps provides a quick and easy interpretation of the changes in opioid overdose mortality across time and the states that are clustered together that have high rates of opioid overdose mortality.



I used the following websites to help develop this tutorial.

Conditional formatting with more than 3 categories:
Changing the RGB color codes:

Excel color palette library:

Excellent site for VBA coding:


Communicating data effectively with data visualizations - Part 9 (Cleveland Plots)


Visualizing change across two time points allows your audience to see the impact of a program’s impact. In a previous article, I demonstrated how you can use slope graphs to illustrate changes across two time points. However, there is a risk of the slope graphs becoming a tangled mess (or spaghetti plot) if too many comparisons are being made. An easier way to illustrated changes cross two time points for a large number of groups is with a Cleveland dot plot (or lollipop plot).

By using a horizontal line between two points arranged from most change to least change, your audience can quickly visualize the program’s impact and rank them according to the subject or group.



We will continue to use the state-level drug overdose mortality data from the CDC.

Mortality rate is presented as the number of deaths per 100,000 population.

In a previous tutorial, we only looked at eight different states. In this tutorial, we will illustrate a change in all 50 states onto a single plot using a Cleveland plot.

Here is the data setup in Excel:

Figure 1 - example data.png

The difference is calculated as the difference in mortality rates between 2016 and 2010.

In Excel, select the first two rows (State and 2016 rate) and generate a horizontal bar chart. Make sure to sort the order of the 2016 rate from least to greatest.

Figure 1 a - select horizontal bar.png

The horizontal bar chart should look like the following:

Figure 2 - bar chart of us states.png

After you created the horizontal bar plot, you will need to add error bars. Select the Design tab > Add Chart Elements > Error Bars > More Error Bars Options:

figure 3 - adding error bards.png

In the Format Error Bars options, make sure to check Minus under Directions, No Cap under End Style, and Custom under Error Amount:

figure 4 - error bars format.png

Click Specify Value and then select the differences column for the Negative Error Value:

figure 5 - select the error bar values.png

On the bar chart, select the bars and then under the Format

figure 6 - no fill selection.png

The horizontal bar chart should have the fill remove and the error bars present and should look like the following:

figure 7 - error bars with differences and no fill.png

Select the error bar and go to the Format Error Bars option. Under the Joint type select Round and under the Begin Arrow Size select the Oval Arrow. This will add a circle on the one end of the error bar.

figure 8 - error join type round.png

You can also increase the size of the Oval Arrow by selecting a larger size from the Begin Arrow Size drop down.


After selecting the Oval Arrow for the Join Type and the Begin Arrow Type, your chart should look like the following:

figure 9 - error bars with rounded caps.png

The next steps will require you to add a second data series. Right-click on the chart area and select the 2010 mortality rates.

figure 10 - adding a series.png

After selecting the data, make sure to map the Horizontal Axis Labels to the corresponding states.

figure 10a - adding a series p2.png

Your chart should now include the second data series (2010 mortality rates) as horizontal bars.

figure 11 - data series2.png

In the next steps, you will add error bars to the 2010 mortality horizontal bars (similar to the error bars for the 2010 mortality rates data). Instead of adding error bars for the Minus, you will add error bars for the Plus.

figure 12 - horizontal bars series 2.png

Similar to the 2016 mortality rates data, you will remove the horizontal bars by selecting No fill and then setting the Series Overlap to 100%

figure 13 - series overlap.png

Select the error bars and then change the Begin Arrow Type to Oval and increase the size.

figure 14 - oval type series2.png

Your chart should now include two dots with lines between them for each state.

figure 15 - second dots are visitble.png

After a few more adjustments to the dot size and colors, the final Cleveland plot can look like the following:

Figure 1. Cleveland plot comparing the drug overdose mortality rates between 2010 and 2016.

figure 17 - final figure.png

The dots give us a good illustration of the magnitude of change in the drug overdose mortality rates between 2010 and 2016. Additionally, using the colored fonts help to map the drug overdose mortality values with the dots. For example, West Virginia had the highest drug overdose mortality rate in 2016 (blue dot) and a large increase in drug overdose mortality rates between 2010 (red dot) to 2016. Nebraska has the lowest drug overdose mortality rate in 2016 (blue dot) among the states, which was lower than in 2010 (red dot).



Cleveland plots can illustrate the change in drug overdose mortality rates between two time points for multiple groups without cluttering the chart space. Unlike slope graphs which can be difficult to distinguish between states, the Cleveland plot separates each state into their own rows allowing for a simple estimation of change across two time points. Moreover, it is also easier to see the magnitude in change in between 2010 and 2016. We recommend using Cleveland plots when you have a lot of groups (e.g., states) with an outcome that changes across two time points (2010 and 2016).



I used the following websites to help develop this tutorial.


The following video provides step-by-step instructions in making a horizontal lollipop chart:


The following website provides examples of lollipop charts:



Communicating data effectively with data visualizations - Part 8 (Slope graphs)


Suppose you wanted to show a change across two periods in time for several groups. How would you do that? One of the best data visualizations for this is the slope graph. In this tutorial, we’ll show you how to create a slope graph in Excel and highlight a state using contrasting colors.



We will use the state-level drug overdose mortality data from the CDC.

Here is a sample of set of data from the CDC:

figure 1.png

Each value is a rate for the number of deaths per 100,000 population.


In Excel, select the three columns (State, 2010 rate, and 2016 rate).

figure 2.png

Go to Insert Chart and select Line with Markers.

figure 3.png

The chart should look like the following:

figure 4.png

Right-click anywhere on the chart and click Select Date and then click on the Switch Row/Column.

figure 5.png

The Y and X axes should switch places and the chart should look like the following:

figure 6.png

Delete the Chart title, legend, and Y-axis.


Right-click on the X-axis and select Format Axis. Then select “On tick marks.”

figure 7.png

This will move the lines to the edges of the chart.

figure 8.png

Now, we want to add labels to the ends of each line. Right click on one of the lines and select Add Data Labels. Then select the left data label and format it according to the following guide:

figure 9.png

Do the same thing for the right label, but make sure to use the right label position. Bring both ends of the chart closer to the middle to give your data labels room to expand (otherwise, the state will stack on top of the value, which you do not want). The chart should look like the following:

figure 10.png

Finally, you can format the figure using different colors and removing the excess gridlines. Try to experiment with different shades of color to generate the desired effect. In our example, we wanted to highlight West Virginia from the rest of the other states.


Here is what the slope graph looks like after some tweaking and adjustments:

figure 11.png

In the above slope graph, West Virginia was highlighted using a cranberry color to give it a greater contrast to the lighter grey of the other states. You could use this method to highlight other states of interest such as New Hampshire or the District of Columbia, which had greater percentage increases between 2010 and 2016.


In the final slope graph, we highlight West Virginia and New Hampshire in different colors to distinguish them from the other states.

figure 12.png


Using slope graphs can show simple trends across two periods in time. More complex slope graphs can have multiple time points and more states. In our simple example, we highlight the key steps to generate a slope graph in Excel. Using colors to highlight certain states can make a simple slope graph more visually appealing and assist the eyes in identifying the states that are important according to the data.



Knaflic, CN. Storytelling with Data. 2015. John Wiley & Sons, Inc., Hoboken, New Jersey.

Communicating data effectively with data visualizations - Part 7 (Using Small Multiples or Panel Charts in Excel)


It can be challenging when you’re trying to visualize many different groups using the same metric repeatedly. Initially, you may want to do this with a single figure, but this is too crowded and prevents you from seeing the differences across the groups. Alternatively, you can separate and visualize the groups individually without compromising the space or size of the figure. In this article, we will discuss the use of small multiples or panel charts in Excel. This will make it easier for the eyes to see the differences while presenting a large number of data.



I have created a data set that contains the average temperature (degrees F) across four quarters for eight states in the United States. These temperatures were generated using a random number generator.

The data has the following structure:

Figure 1.png

Using this data, we can generate a single plot that contains all the states and their average temperature for each quarter.

Figure 2.png

There are several issues with this plot. First, there is so much clutter, it is difficult to discern which states are increasing or decreasing over time. Second, the colors, despite being different, seem to mix in too much like spaghetti. In fact, this is commonly called a “spaghetti” plot.[1]

To avoid this cluster of tangled lines, it is best to view these lines separately in small multiples. We can apply the principle of small multiples into an 8 by 1 matrix (8 rows and 1 column of data). 

Figure 3.png

By separating each state into its own separate line, we can identify which states had temperatures that increased across time. From this figure, we can see that Alabama, Alaska, Arizona, and Kansas have temperatures that increased from Q1 to Q4. However, the magnitude in the change for Kansas looks similar to Alaska (absolute difference of 37 degrees F and 76 degrees F, respectively). Despite using the same Y-axis, the compressed scale gives the illusion that all the changes were similar in magnitude, which they were not.

We can improve upon this figure by furthering the use of small multiples. Tufte argues that small multiples can present a large amount of comparisons through repeated panels.[2] With this in mind, we can take the above figure and separate each state into a 2 by 4 matrix.

Figure 4.png

This is vastly improved. We can see trajectory for each state across the quarters. For example, we can clearly see that Alabama, Alaska, Arizona, and Kansas have temperatures that were increasing steadily across time. However, large fluctuations in temperatures were observed for New Jersey, New York, and Oklahoma. Only California appears to have a steady trend in the temperature across time.

The panel chart has equal sized boxes with temperature scales that are identical. This allowed our eyes to make quick comparisons between the states. Moreover, we can also identify the trends much more easily that in the “spaghetti” plot.



To generate the above panel chart, we will use the randomly generate temperature dataset, which can be downloaded from here.

An important element of the dataset is the column called “strata.” This column will allow us to generate a staggered line plot separated by the strata.

The following figure illustrates the different variable names including the strata column.

Figure 5.png

Notice how the strata column alternates between 1 and 2? This will be used to separate the temperatures of each state into clusters. In other words, we will cluster the temperature from Q1, Q2, Q3, and Q4 for each state.

Once you’ve reviewed the data, select all the data and then use the pivot table feature to insert the new table onto a different worksheet.

Figure 6.png

Excel will automatically create a new worksheet with a blank pivot table work space. You can use this work space to generate different tables using the powerful pivot table features.

Figure 7.png

We will use the pivot table builder to generate the table we need for the panel chart. After you insert the pivot table into a new worksheet, move the State variable into the Row box followed by the Quarter variable. (This is denoted by A.) Move the Temperature variable into the Values box. (This is denoted by B.) Then move the Strata variable into the Columns box. (This is denoted by C.) Review your work with the following figure below.

Figure 8.png

Once you set up you pivot table, you should notice that the temperature values are staggered between the states. This is important for when you construct the line graphs for the panel charts.

Next, we will remove the Grand Total column. This isn’t important for us, so let’s right click the cell that contains the column header Grand Total and select Hide.

Figure 9.png

Then click on the Design tab on the ribbon and select Report Layout > Show in Tabular Form. This will change the design of the default pivot table.

Figure 10.png

Next, we are going to remove the total rows for each state. Right click on the first state (Alabama) and select Hide. You should notice that the total row for each state is now hidden. This updated pivot table design will make it easier to create panel charts.

Figure 11.png

You pivot table should look like the following:

Figure 12.png

Copy and paste this table and its values onto a blank section of the worksheet. Then highlight the data for the first four states and insert a line chart.

Figure 13.png

You will see the trend lines for the four states, which will be separated by the different time periods.

Figure 14.png

There are two colors for the alternating states. Change this to a single color (e.g. Blue). Then remove the gridlines, chart title, and the legend. The chart should look like the following:

Figure 15.png

The next part will be to include line partitions between each state, which will allow the eye to distinguish each line as separate trends for the states.

We will need to create a new dataset for the line partitions. To do that, count the number of time periods for one of the states (the time periods should be equivalent for all the states. In our case, there are four quarters for each state). In between Q4 and Q1, there is a gap. If you take all the quarters from Q1 and Q4 for Alabama to California, there are a total of 16 intervals. In between interval 4 and 5 is where we want to put first line partition. Interval 8 to 9 is where we want to put the second line partition, etc. For this example, the dataset should look like the following:

Figure 16.png

Select the chart and click on the Chart Design tab and click on Select Data.

Figure 17.png

The Select Data Source window will appear and list the previous data already used to generate the trend lines for each state. We will add a new dataset (A) and select the Y values from the dataset that was generated for the partitions (B). 

Figure 18.png

This will create a line at the bottom of the chart area.

Figure 19.png

Right click on the line (A) and select Change Chart Type > X Y (Scatter) > Scatter (B).

Figure 20.png

Right click on the scatter and go to Format Data Series > Series Options > Plot Series On > Secondary Axis.

Figure 21.png

Select the Y-axis on the right side of the chart (A).

Figure 22.png

Then go to Format Axis > Axis Options and set the Maximum to 1.0.

Figure 23.png

Next, we will align the scatter points into the correct partition. Right click on the chart area and click on Select Data. Select Series 3 and go to the X values box. In the X values box, select the data in the Partition column.

Figure 24.png

The scatter points appear to be in the correct location. The next steps will involve including error bars for the partition lines and formatting.

Figure 25.png

Click on scatter points and select Chart Design in the Ribbon. Click on the Add Chart Element and select Error Bars > More Error Bars Options….

Figure 26.png

Format the error bars and remove the cap. Depending on where the scatter points are, you want to choose error bars that will cover the empty region. In our example, the scatter points are at the top, so having the “Minus” or “Both” direction options will work for us. We will leave it at “Both” for this example.

Figure 27.png

Click on the horizontal error bars and delete them.

Figure 28.png

Click on the scatter point and go to the Format Data Series. Go to the Fill & Line > Marker Options > None.

Figure 29.png

The chart should look like the following:

Figure 30.png

The remainder of this tutorial will be to make this aesthetically appealing.

We want to hide the secondary Y-axis, which can be done by hiding the line and change the font color to match the background. Then we want to change the size of the chart using the Chart Options.

Figure 31.png

We also changed the font to Helvetica (native to Apple products), resized the chart boxes, included a Vertical Y-title bar for the temperature axis, added a mean Temperature band using 60% transparency, change the color of the vertical error bars to a lighter gray, changed font size on the Y-axis, and moved the X-axis labels to the top. We repeated these steps for the final four states and carefully placed it below the first four states, using the grid to assist with alignment.

The final panel chart:

Figure 32.png



When you encounter a data set that can be separated into smaller graphics, consider using the small multiple principle or panel charts approach. It requires a little more work, but the results can provide a narrative that is easy to visualize and interpret. In our example, we start to notice different trends using different approaches to the small multiple principle. It’s best to experiment which ones work best for your needs. Here are some examples of small multiples that may inspire you: link1 and link2.



1. Knaflic CN. Strategies for avoiding the spaghetti graph. Storytelling with data. Published March 14, 2013. Accessed May 16, 2018.

2. Tufte ER. The Visual Display of Quantitative Information. Second. Cheshire, CT: Graphics Press, LLC.; 2001.