Introduction to Data Visualization
"The simple act of visualising data, even in the brain, makes a difference to how we interpret and understand it." - David McCandless, author of "Information is Beautiful".
1.1 Why Do We Need to Visualize the Data?
In a bustling marketplace in ancient Rome, merchants called out the prices of their wares, relying on the spoken word to convey information. But as societies grew complex and the amount of information exploded, it became evident that relying solely on words or, for that matter, numbers had limitations. Mathematicians and other professionals who work with numbers started exploring options like organizing numbers in lists and tables, finding numerical patterns and sequence analysis and going even as far as depicting data in geometrical shapes and spatial patterns. But as information and data kept exploding in terms of volume (amount of data being generated) and velocity(the speed with which data is being generated), the above methods soon turned inadequate. Then came data visualization techniques, starting from the fundamental number line to bar charts and line charts, to pie charts and growing to the more complex visualizations of today. Enter the realm of data visualization.
Let's imagine a scenario: A CEO is presented with a spreadsheet containing thousands of rows of sales data. Would she be able to quickly discern patterns, trends, or anomalies? Probably not. Now, if the same data were presented as a colourful line chart or a bar graph, patterns would emerge, trends would be evident, and decision-making would become infinitely more manageable. This is the power of data visualization. It turns abstract numbers into a form the human brain can understand, process, and act upon efficiently.
Let me take you to an imaginary but picturesque town of Eldoria, nestled among rolling hills and serene lakes. There lived a librarian named Elara. Eldoria's library was not ordinary—it was the town's pride, holding not just books but records of every event, large or small, that had occurred in Eldoria's history.
Elara had an impeccable memory. People from far and wide would visit her, and she would regale them with tales from the past, painting vivid images with her words. However, with the town's rapid expansion and an influx of data, even Elara found it challenging to process everything. During one of these overwhelming days, she had an epiphany: "
If only I could show the stories and not just narrate them."
This realization is where our journey into the importance of data visualization begins.
1.1.1 The Power of Sight¶
Imagine having to describe a sunset to someone who had never witnessed one. No matter how poetic or detailed your description might be, words would always fall short of capturing the myriad of hues, the gentle gradation from twilight to dusk, or the sheer beauty of the sun kissing the horizon. The phrase, "A picture is worth a thousand words," underscores this very idea.
In our world, data is abundant. We are inundated with numbers, from population growth and weather patterns to market trends and medical research. While raw data holds valuable information, its true potential remains locked away, much like the stories in Eldoria's records. When we transform this data into visual formats—charts, graphs, heat maps—we unlock its stories.
For instance, if you hear that global temperatures have risen by 1.2°C since the late 19th century, it might not seem alarming. A similar reaction follows the increase in temperature since industrialization started to gain pace or since global wars pillaged the world. But when the same data is plotted in a graph with proper colour and shape, even a simple bar chart showing temperature difference speaks more than volumes of pages of raw data. Imagine clubbing such plots with the corresponding increase in sea levels, the melting of polar ice caps, and the frequency of wildfires; the gravity of climate change becomes undeniable.
1.1.2 Making Complex Data Understandable¶
In Eldoria(our imaginary town), as the city grew, so did its records. Births, harvests, trade, festivals, and a myriad of other events filled the library's logbooks. One day, the town's council approached Elara, seeking patterns in the annual wheat yield over the past century. Amidst thousands of pages of records, Elara was stumped.
This dilemma mirrors the challenges industries face today. For a stock market analyst, sifting through vast datasets to identify patterns might be like searching for a needle in a haystack. However, when this data is visualized using candlestick charts or trend lines, fluctuations in stock prices, market highs and lows, and trading volumes become immediately apparent.
1.1.3 Immediate Insights and Faster Decision Making¶
When the Mayor of Eldoria wanted to understand the town's population growth to plan infrastructural development, Elara took weeks poring over data. By the time she presented her findings, the council had lost precious time. But what if Elara had a population growth chart spanning decades? Decisions could have been made in mere hours.
Similarly, in sectors like healthcare, time is often of the essence. Visualizing the spread of a virus using heat maps can help governments and organizations deploy resources efficiently, potentially saving thousands of lives.
The same is true for a less dreadful but equally pressing issue of navigating the traffic in a bustling metropolitan. Tools like Google Maps or Here Maps crunch the continuous flow of humongous data of live traffic movement and present them in terms of beautiful and convenient live traffic differentiated by colours (blue, orange and red lines showing the intensity of traffic congestion) overlayed over the streetmaps and assisted by GPS satellites. Can you take the "Right Turn" in time if the above data was in the raw data format? We have become so dependent on Digital Maps and Navigation systems for visually representing this raw data as they present in their super popular applications.
1.1.4 Facilitating Engagement and Memory Retention¶
One summer, Eldoria faced a water crisis. Elara tried warning the townsfolk about decreasing water levels using data from the past. But her words fell on deaf ears. Only when a local artist painted a visual timeline of the town's lakes—showing them shrinking over the years—did people notice.
Humans are inherently visual creatures. We are more likely to engage with and remember visual information. For businesses, this means that visualized data can captivate audiences, making presentations more impactful and persuasive.
1.1.5 Building a Universal Language¶
Eldoria was a melting pot of cultures; not everyone spoke the same language. Yet, when Elara showcased a pictorial representation of the town's history during a festival, everyone understood, irrespective of their language.
Data visualization transcends linguistic barriers. A bar graph or a pie chart looks and conveys the same meaning everywhere, ensuring that data-driven stories are universally understood.
As Eldoria thrived, Elara's visual records became the town's cornerstone. The library walls were adorned with colourful charts, maps, and timelines, each telling a unique story.
In our interconnected world, where data-driven decisions are paramount, visualizing data is not just a need—it's an imperative. It empowers individuals and organizations to comprehend complex scenarios, identify patterns, make informed decisions, and share stories that resonate.
And so, as we delve deeper into numbers and datasets, let's not forget the lessons from Eldoria:
For in visual tales, data finds its voice.
1.2 Brief History of Data Visualisation¶
Long before the digital age's vast data sets and sophisticated visualization tools, humans have been finding ways to represent information visually. From ancient civilizations to modern-day analysts, the evolution of data visualization has been a testament to our species' ongoing pursuit of understanding and communicating complex information.
The art of presenting information visually dates back centuries. Ancient Egyptians used hieroglyphics and maps. Centuries later, in 1786, William Playfair, a Scottish engineer and political economist, invented the line, bar, and pie charts.
1.2.1 The Middle Ages: Graphical Chronicles¶
It can be contended that the earliest forms of data visualization appeared in the realm of cartography. Maps, serving initially as tools for navigation, delineating property boundaries, and satisfying human inquisitiveness, have existed for a minimum of ten millennia. In the days of antiquity, knowledge of the world was gathered from direct observations and a considerable amount of conjecture was etched into stone or moulded in clay. As time progressed, the development of tools like the compass around 200 BC and the sextant in 1731 provided the means for more exact measurements, leading to more precise maps. Furthermore, the advent of the printing press facilitated the widespread distribution of these maps.
The Turin Papyrus Map, dated 1150 BC, is the oldest surviving visualization. It illustrates the distribution of geological resources with quarrying information.
however, the geographical maps alone fail to satisfy the quest of many in the field of data visualization, and they argue against considering them as forms of visual representation of data. Data visualization in the form we know of today was first introduced by van Langren. Michael Florent Van Langren was a Flemish astronomer. He, in 1644, provided the first visual representation of statistical data in the form of a one-dimensional line graph.
1.2.2 The Renaissance: A Period of Exploration and Discovery¶
The Renaissance, a period marked by profound intellectual growth, saw the advent of more advanced and accurate map-making techniques. With explorers like Christopher Columbus and Vasco da Gama discovering new lands, cartographers quickly adapted, updating their maps to represent newly found territories.
However, during the late 18th century, we witnessed some of the foundational elements of modern data visualization. William Playfair, a Scottish engineer and political economist, is credited with inventing the line, bar, and pie charts. Playfair's visualizations transformed the way economic data was represented and understood.
The line chart and bar chart first appeared in 1786, and the pie chart and circle graph in 1801. As these concepts were so new in 1786, he had to describe how to read and understand them.
The best-known example of Playfair's work is his comparison of the price of wheat to wages to establish a link between the cost of living and the average pay of workers.
Playfair also apprenticed with James Watt, the inventor of the modern steam engine. In 1805, he remarked,
Whatever can be expressed in numbers may be expressed by lines". "Data should speak to the eyes". & "A good data visualization produces form and shape to several separate ideas which are otherwise abstract and unconnected".
Playfair's work stretched to the beginning of the 19th century when visualization began to take off.
1.2.3 19th Century: Statistical Graphics and Public Health
The 19th century brought advancements that changed the public's perception of data. Perhaps the most famous is Dr John Snow's map of London in 1854. Amid a severe cholera outbreak, Snow plotted each case on a map of London, visually identifying the water pump at Broad Street (now Broadwick Street) as the outbreak's epicentre. This visualization solved a public health crisis and became an iconic example of data-driven decision-making.
The more than-a-century-old plot by Dr Snow is so impactful both in terms of its novelty as a visual representation of information and also as a means to identify and understand the hidden patterns in the data which ultimately lead to significant discoveries and saving lives, that we will spend some time to describe what it did. This is a substantial piece of epidemiological history. This plot was created by Dr. John Snow in 1854 during a cholera outbreak in the Soho district of London. Dr. Snow is often referred to as the father of modern epidemiology. His map is famous for its role in identifying the source of the outbreak. By plotting cholera cases on the map, Dr. Snow could identify a pattern. Most cases clustered around the Broad Street pump (now Broadwick Street). Snow's hypothesis was that contaminated water from the pump was the source of the outbreak. To test his theory, he had the handle of the pump removed, and soon after, the number of new cholera cases decreased.
This was a groundbreaking moment in public health. Before this, the prevailing theory was that diseases like cholera were spread through "miasma," or bad air. Snow's work was pivotal in shifting the understanding towards the role of contaminated water in spreading disease. This map and Snow's research played a significant role in developing modern disease control methods and establishing sanitary reforms in urban areas.
The cholera map by Dr. John Snow is an early example of data visualization and spatial analysis. As a data visualization, it conveys the following:
1. Geographical Context: The map lays out the streets, landmarks, and buildings in the Soho district of London. This provides a precise geographical context and a sense of scale.
2. Disease Concentration: The black bars or dots on the map represent individual cholera cases. By plotting these on the map, clusters or concentrations of cases become apparent. This visually demonstrates where the disease was most prevalent.
3. Point of Interest – The Water Pump: The Broad Street pump, central to Snow's hypothesis, is prominently marked on the map. It is immediately evident that many cholera cases are clustered around this pump, suggesting a potential link.
4. Distribution Pattern: By visualizing the cases, it becomes apparent that the farther away from the Broad Street pump, the fewer cases there are. This diminishing pattern from a central point strongly indicates the pump is a potential source of the outbreak.
5. Comparative Analysis: While the main focus was on the Broad Street pump, other water pumps in the area are also marked. The visualization allows for comparison, showing that other pumps did not have as many cases surrounding them. This strengthens the argument for the Broad Street pump being the primary source.
In summary, Dr. Snow's map is a powerful data visualization tool that visually communicates the relationship between the geographic distribution of cholera cases and their proximity to the Broad Street pump. It effectively combined epidemiological data with geographical data to identify the cause of the outbreak and is a foundational example of how visualization can solve complex problems.
Around the same period, the "rose diagram" by Florence Nightingale played a crucial role in reforming sanitary conditions during the Crimean War. Nightingale's diagram visually depicted the number of deaths due to various causes, revealing that most fatalities were preventable and stemmed from poor sanitary conditions. This revelation catalyzed the push for improved healthcare sanitation standards. The "rose diagram" was phenomenal in two aspects, both of which we take for granted now. It was a beautiful representation of data. Literally, beautiful. Today, infographics and designing appealing visuals is an industry in itself. Even by today's standards, this handmade data visualization, which experts in the field will tell you as a "coxcomb chart," is a masterpiece in terms of its aesthetic appeal and storytelling ability. The second aspect is how it changed our world. Today, we take sanitation as a health measure, and more so in hospitals and health care. It wasn't always so clear. At least not until 1858, when Florence Nightingale, an English nurse, shook the establishment by showing the data, correlating the unsanitary conditions in military hospitals and deaths. This is what the rose diagram does.
The Rose Diagram is a type of radial chart, a graphical method of displaying data in a multi-directional bar chart, with each segment radiating from a common central point like the petals of a rose, hence the name. Florence Nightingale used this presentation to illustrate the number and causes of deaths during each month of the Crimean War. Specifically, the chart displayed the number of deaths that were due to preventable diseases, those that were due to wounds, and those due to other causes. The area of each wedge, or 'petal', represented the number of deaths, with the length of a radial line proportional to the number of deaths. This was a dramatic way to present data, and it visually demonstrated the impact of unsanitary conditions on the health of soldiers, which was her primary concern.
1.2.4 20th Century: Computers Usher in a New Era¶
The 20th century saw an exponential growth in data generation, primarily due to the advent of computers. An American mathematician, John Tukey, introduced the concept of "exploratory data analysis" in the 1970s. This approach emphasizes visual methods to find patterns and anomalies in data. John Tukey contributed significantly to statistical practice and data analysis in general. In fact, some regard John Tukey as the father of Data Science. At the very least, he pioneered many of the critical foundations of what came later to be known as Data Science. Making sense of data has a long history. It has been addressed by statisticians, mathematicians, scientists, and others for many years. During the 1960s, Tukey challenged the dominance at the time of what he called "confirmatory data analysis", statistical analyses driven by rigid mathematical configurations. Tukey emphasized the importance of having a more flexible attitude towards data analysis and exploring data carefully to see what structures and information might be contained therein. He called this "exploratory data analysis" (EDA). In many ways, EDA was a precursor to Data Science.
Tukey also realized the importance of computer science to EDA. Graphics are an integral part of EDA methodology. While much of Tukey's work focused on static displays (such as box plots) that could be drawn by hand, he realized that computer graphics would be more effective for studying multivariate data. PRIM-9, the first program for viewing multivariate data, was conceived by him during the early 1970s. This coupling of data analysis and computer science is now called Data Science. https://en.wikipedia.org/wiki/John_Tukey
Furthermore, with computers becoming household items, software like Microsoft Excel became accessible. These tools democratized data visualization, allowing anyone to create simple graphs and charts.
1.2.5 21st Century: The Age of Interactivity¶
In an era where the hum of computers is the new birdsong and the glow of screens the new sunlight, data is the lifeblood that flows through the arteries of our modern existence. We live in the 21st century—when data doesn't just speak; it sings, dances, and paints a thousand pictures. Data visualizations have transformed from mere numbers and bars into dynamic, interactive storytelling experiences.
In this transformed world, data weaves enchanting tales. It became the cartographer of our digital landscapes, mapping not just the physical world but the contours of our societies, economies, and even our collective psyche. Interactive dashboards became the new crystal balls, granting anyone with curiosity a glimpse into the future by merely dragging a slider or selecting a filter.
The stories data tells have become personal. It is no longer a tale of the many but a conversation with the individual. Retail websites recommended products with an uncanny understanding of personal taste, weaving the narrative of past choices into future delights. Health apps turned heartbeats and footsteps into visuals of wellness, personal goals of triumph, and, sometimes, cautionary tales.
And as our world becomes more complex, visual stories bring more power due to their simplifying ability. Big data and machine learning allow us to see once-invisible connections. Networks of global trade to local logistics, the intricate dance of financial markets, and the silent spread of a virus are all laid bare in the intricate canvas of interactive visualizations.
The artists of this age, the data scientists and visualization experts, are no longer just number crunchers; they have become narrators and painters of the pixel. They paint with a palette that includes not only colours and shapes but also motion and interaction. Their canvases are limitless, their galleries are web browsers, and their exhibits are viewed by an audience of millions across the globe, connected by the invisible threads of the internet.
In conclusion, the trajectory of data visualization reflects humanity's relentless quest for knowledge. From clay maps in ancient Babylon to intricate, interactive digital charts today, we have come a long way in our ability to represent, interpret, and understand the world around us. As technology continues to evolve, so will our methods of visualizing data, promising even more insights and discoveries in the future.
1.3 What is Data Visualization?¶
"By visualizing information, we turn it into a landscape that you can explore with your eyes. A sort of information map. And when you're lost in information, an information map is kind of useful." - David McCandless.
Data visualization is the art and science of transforming raw, often complex, data into visual formats that can be easily interpreted. This centuries-old practice has aided humans in understanding vast amounts of information, drawing insights, and making informed decisions.
Data visualization is the graphical representation of information and data. It uses visual elements like charts, graphs, and maps to provide an accessible way to see and understand trends, outliers, and patterns in data.
Data visualization stands apart in its utilitarian purpose. At the same time, we encounter various visualizations daily—like a child's drawing, a painting in a museum, or a billboard advertisement. It's not just about aesthetics; it's about conveying complex information simply and effectively.
For instance, consider a beautiful painting of a landscape. While it conveys mood, emotion, and aesthetic beauty, it doesn't necessarily represent quantifiable information. On the other hand, a bar graph comparing deforestation rates across decades tells a specific story, backed by data, about the environment.
1.3.1 Data Visualization: A Symphony of Numbers and Art¶
Imagine walking into the office of statistician John Tukey in the 1970s. Amid stacks of research papers and the soft hum of computer servers, Tukey is engrossed in a task that, at first glance, might seem more befitting an artist than a mathematician. With meticulous precision, he's sketching graphs and plots, transforming columns of numbers into visual masterpieces. This is data visualization in action.
Tukey believed that data on its own, presented as mere numbers or text, could often be incomprehensible or even overwhelming. However, patterns, anomalies, and stories would emerge when this data was visually represented. These visual representations made data accessible and understandable, not just to statisticians and mathematicians, but to anyone.
A simple example is a bar chart comparing a company's sales figures across several months. While a table might list these numbers, a bar chart would immediately show which month had the highest sales or if there was a declining trend. The visual element aids in rapid comprehension.
1.3.2 Beyond Just Pictures: The Essence of Data Visualisation¶
It's essential to understand that data visualization is not just about creating pretty graphics or aesthetically pleasing charts. Edward Tufte, a pioneer in the field, emphasized that the goal of data visualization is to provide a clear, truthful representation of data. It's about maximizing the data-ink ratio, ensuring every bit of ink (or pixel in the digital world) provides meaningful information.
Consider the renowned Minard's map of Napoleon's Russian campaign of 1812. This map, often hailed as one of the most compelling visualizations ever created beautifully encapsulates six types of data: the number of Napoleon's troops, their location, the direction of movement, temperature, latitude, and longitude. While this could have been a convoluted mass of information, Minard's visualization presents a straightforward, tragic story of a dwindling army facing harsh winter conditions.
1.3.3 How Is Data Visualization Different from Other Visualizations?¶
While the term "visualization" broadly refers to any technique used to create images, diagrams, or animations to communicate a message, data visualization deals explicitly with the representation of data. It's essential to differentiate between the two.
Think of Leonardo da Vinci's "The Last Supper." It's a visualization in the sense that it presents an artistic representation of a biblical scene. It conveys emotion, narrative, and symbolism. Contrast this with Charles Minard's map of Napoleon's Russian campaign mentioned earlier. While also telling a story, Minard's map is grounded in hard data.
Artistic visualizations, like paintings or sculptures, often originate from the artist's interpretation or imagination. They might not be based on empirical data and can be subjective. On the other hand, data visualizations, whether line graphs, pie charts, or heat maps, are rooted in factual data. They aim to objectively represent this data to highlight patterns or insights.
Additionally, while both forms of visualization aim to communicate, their primary objectives might differ. Artistic visualizations often evoke emotion, capture beauty, or challenge societal norms. Data visualizations, while they can also be aesthetically pleasing, primarily serve to inform, educate, or guide decision-making.
1.3.4 In the Realm of Digital Technology¶
In today's digital age, the distinction becomes even more pronounced with the rise of Virtual Reality (VR) and Augmented Reality (AR). These technologies create immersive visualizations, transporting users to different worlds or superimposing digital information onto the real world.
For instance, a VR game might take you to a fantastical land filled with dragons and castles – a visual feast, no doubt, but not rooted in data. Contrast this with a VR visualization of a city's traffic patterns. Here, the user might fly over a virtual city, witnessing real-time traffic data represented as light streams. While both are visualizations in VR, one is an artistic representation, and the other is a data visualization.
In conclusion, data visualization stands at the intersection of science and art. While it borrows principles from design and aesthetics, its foundation is empirical data. As we continue to generate more and more data in our increasingly digitized world, the importance of tools and techniques to visualize this data becomes paramount. Whether you're a statistician, a business professional, or just a curious individual, data visualization offers a powerful lens to view, understand, and interact with the world around us.
As we continue our journey into the world of data visualization, remember this:
At its core, data visualization is a bridge—a bridge that takes raw, unprocessed data and transforms it into a format that we, with our human biases and limited attention spans, can understand and use.
As we delve deeper into this subject in the subsequent chapters, we'll discover the techniques, tools, and tales that make this bridge sturdy and captivating.
Pune’s Best AWS Cloud Training offers a comprehensive learning experience to power up your cloud knowledge and career prospects. This course covers everything from fundamental cloud concepts to advanced AWS services like EC2, S3, Lambda, and more. You’ll learn to architect, deploy, and manage secure, scalable cloud solutions on the AWS platform through hands-on labs and real-world scenarios.