History

Communicating data effectively with data visualizations: Part 21 [Examples of famous (and infamous) data visualizations]

FAMOUS (AND INFAMOUS) DATA VISUALIZATIONS

Modern data visualization has a relatively young history compared to other forms of science (e.g., physics, mathematics, chemistry, biology, etc). However, it’s existence can arguably be more historic. Throughout history, we have examples of data visualizations that helped us understand communicable diseases, wartime operations, and the diffusion of technology. Each of these are important in their own respective fields, but making them comprehensive and intuitive would be nearly impossible without creative data visualizations. This article will review several key historical data visualizations from the cholera outbreak to the dawn of the internet and their impact on our society.

JOHN SNOW AND THE CHOLERA OUTBREAK

In the 19th century, little was known about the transmission of disease. The discovery of the germ theory of disease was still in the horizon, and medical knowledge and understanding of its significance had yet to make its way into public health policy. This was true for London during the cholera outbreak of 1854.[1]

John Snow (1813-1858) was an English obstetrician who is considered one of the founders of epidemiology, the study of health and diseases in populations. At the time, diseases were thought to have spread through the air, popularly known as the miasma theory. Snow was one of the first to reject this theory and believed instead that cholera was due to contaminated water that when drunk caused a viscous cycle of diarrhea and dysentery that ultimately led to death. This belief was further supported when Snow discovered that sewage was dumped directly into the Thames River where the city got their drinking water supply. But to prove his theory, Snow had to chart out the outbreak of cholera in Soho, one the London’s suburbs.

Snow meticulously went to the homes of cholera infected patients and learned where they received their drinking water supply. He mapped his findings onto a grid of the city and observed that clusters of outbreaks occurred around specific points in the suburbs, mainly the water pumps (Figure 1).

Figure 1. John Snow’s map of cholera outbreak in Soho, London, 1854.

Source: John Snow - Published by C.F. Cheffins, Lith, Southhampton Buildings, London, England, 1854 in Snow, John. On the Mode of Communication of Cholera, 2nd Ed, John Churchill, New Burlington Street, London, England, 1855. (This image was originally from en.wikipedia; description page is/was here. Image copied from http://matrix.msu.edu/~johnsnow/images/online_companion/chapter_images/fig12-5.jpg)

Snow also noticed that a large cluster of cholera cases occurred in households near the Broad Street pump (Figure 2). In Figure 2, each bar stack represents the number of cholera cases. In particular, the large number of cholera cases near the Broad Street pump provided further evidence that the drinking water supply was contaminated and was the source of the outbreak.

Figure 2. John Snow’s map of cholera outbreaks near the Broad Street pump.

To prove his point, John Snow had the Broad Street pump handle removed and water delivered from another source, further away from the contaminated Thames River. As he predicted, the incidence of cholera dropped rapidly and the outbreak was mitigated.

This was an early example of using data visualization for real-time surveillance of an outbreak that led to a public health intervention. Clusters of cases within the proximity of the hypothesized contamination source effectively illustrated the benefits of geospatial data visualization of the cholera outcomes in the Soho suburbs of London. Today, we rely on spatial data analysis to monitor the influenza epidemic as well as several other diseases, which will help us to quickly react and contain potential outbreaks.

Napoleon’s Russian campaign of 1812

During the Summer of 1812, Napoleon Bonaparte raised over 422,000 troops and personnel to invade Russia. This was in response to the Russian tsar’s, Alexander I, decision to leave the French-led trade union, which undermined Napoleon’s ideologies for an economically strong centralized Europe.

Charles Joseph Minard (1781-1870) illustrated Napoleon’s doomed campaign of 1812 in a graph that famously shows the decline of the once Grande Armée as it began in the Summer to its fall in the early Winter (Figure 3). The graph tells two stories. The first is the start of the campaign which began in the Summer of 1812 and is displayed by the brown line going from Left to Right. The width of the line represents the size of Napoleon’s army at the beginning of the campaign, which numbered approximately 422,000 strong (troops and personnel). Also displayed is the route the army took to reach Moscow. During the journey, the width of the brown line thins representing the attrition of troops due to desertions and causalities. When Napoleon reached Moscow (represented in the right part of the graph) he only had a small fraction of his original strength (approximately 100,000 troops).

On the return trip, represented by the black line, the width of the line thins considerably and is correlated with the rapid drop in temperature, which is represented by the bottom chart. Desertions, casualties, and the weather reduced Napoleon’s army to approximately 10,000 troops and personnel (less than 3 percent of his original strength) by the time he reached the Neman River.

Figure 3. Charles J. Minard’s graph depicted Napoleon’s Grande Armée ill-fated Russian Campaign of 1812.

Source: Charles Joseph Minard's famous graph showing the decreasing size of the Grande Armée as it marches to Moscow (brown line, from left to right) and back (black line, from right to left) with the size of the army equal to the width of the line. Temperature is plotted on the lower graph for the return journey (multiply Réaumur temperatures by 1¼ to get Celsius, e.g. −30 °R = −37.5 °C). Published November 20, 1869. (This image was originally from en.wikipedia; description page is/was here. Image copied from https://en.wikipedia.org/wiki/French_invasion_of_Russia#/media/File:Minard.png)

Minard’s graph shows many data elements highlighting the potential for multiple dimensions incorporated onto a two-dimensional canvas. The lines (both brown and black) denote the route of the army and its strength. At the very bottom of the graph, the temperature of the return journey dropped to below freezing temperatures highlighting the misery of the French troops during the long retreat to France (Figure 4). The creative use of space allowed Minard to include many data dimensions to tell the horribly tragic story of Napoleon’s disastrous Russian campaign. To date, Minard’s graphic is a reminder of the devastating defeat of Napoleon’s ambitions in Europe and the effective use of data visualizations to tell a compelling story.[2]

Figure 4. Temperatures on the return journey (Right to Left).

CARNA BOTNET MAP

In what is now called the Internet Census of 2012, an anonymous hacker produced one of the most important and invaluable data visualization of the diffusion of internet traffic across the globe.[3] Using a botnet and taking advantage of vulnerabilities in network systems, this anonymous hacker was able to penetrate the securities of these networks and then ping these IP addresses to yield a census of active internet networks across the world. The botnet was called Carna, named after the Roman goddess of the door hinge (but she is also known as the goddess of the body). The Carna botnet captured over 1.3 billion IP addresses in the world.

The Carna botnet map is an animated Graphic Interchange Format (GIF) file that provides a 24-hour cycle of internet use around the globe (Figure 5). It was first published sometime in June to October 2012 by the anonymous hacker who wanted to illustrate internet use around the world with all the data that was available. To this day, no one knows the identity of the hacker.

Figure 5. 24-hour world map of IP addresses observed using IP ping requests.

Source: World map of 24 hour relative average utilization of IPv4 addresses observed using ICMP ping requests. Carna Botnet, * Internet Census 2012: Port scanning /0 using insecure embedded devices, Carna Botnet, June - October 2012. 16 March 2013.

The author of this animated GIF uses colors and contrast ratio effectively to deliver a powerful narrative of the daily cycle of internet use. The warm colors represent internet usage during the day and the cool colors represent internet usage after sunset. The nightly cycle moves from Right to Left giving the impression that the world is rotating from being asleep to being awake. More importantly, the image of the world provides the audience with a reference that is recognizable and easy to understand. The data that were used to generate this animated GIF continue to be used by researchers to study their implications on internet security and ethics.[4,5]

It is highly recommended that you download and view the GIF on your own to appreciate the animation.

CONCLUSIONS

Data visualization is an effective tool to tell complicated stories; sometimes, it’s the only way. Historically, we have been doing this without the aid of personal computers and visual software. In most cases, data visualization was something that was done by hand and carefully illustrated like a piece of art. In these examples, stories from the cholera outbreak, failed military ambition, and an illegal comprehensive internet census have provided us with a better understanding of how our world operates and the impact of these data on our society.

REFERENCES

  1. Johnson S. The Ghost Map: The Story of London’s Most Terrifying Epidemic—And How It Changed Science, Cities and the Modern World. New York, NY, USA: Riverhead Books; 2006.

  2. Joyce H. Minard and Napoleon’s march on Moscow. Significance. 2008;5(3):133-134. doi:10.1111/j.1740-9713.2008.00311.x

  3. Internet Census 2012. http://census2012.sourceforge.net/paper.html. Accessed December 12, 2019.

  4. Krenc T, Hohlfeld O, Feldmann A. An Internet Census Taken by an Illegal Botnet: A Qualitative Assessment of Published Measurements. SIGCOMM Comput Commun Rev. 2014;44(3):103–111. doi:10.1145/2656877.2656893

  5. Dittrich D, Carpenter K, Karir M. The Internet Census 2012 Dataset: An Ethical Analysis. IEEE Technology and Society Magazine. June 2015:40-46. doi:10.1109/MTS.2015.2425592