Visual insights

In large amounts of data, information is hidden that can hardly be recognized with simple means. Special methods for data analysis are in demand and visualization techniques in particular help to overview the information gained and to pass it on in an understandable way.

Media have recognised the potential of statistical and other data years ago; this has led to what has been practised as data journalism in various large newspapers and also in newspaper co-operations.

The Datablog

A pioneer is The Guardian, whose datablog celebrated its 10th anniversary in March 2019:

Computer-assisted reporting

 But hardly anyone is ever the first. Especially when it comes to the visualization of data, there are examples that date back centuries.
But a new era has dawned with the use of computers in data analysis to generate interesting journalistic stories.
Of central importance here is the person of Philip Meyer, who began to use computer-assisted reporting as a journalist in the 1960s.

In his book Precision Journalism: A Reporter’s Introduction to Social Science Methods‘, published in his first edition in 1973, Meyer describes the demands on journalism that are still valid today and that are becoming data journalism.

‘There was a time when all you [as a journalist] needed was dedication to truth, plenty of energy, and some talent for writing. You still need those things, but they are no longer sufficient. The world has become so complicated, the growth of available information so explosive, that the journalist needs to be a filter, as well as a transmitter; an organizer and interpreter, as well as one who gath ers and delivers facts. In addition to knowing how to get information into print, online, or on the air, he or she also must know how to get it into the receiver’s head. In short, a journalist has to be a database manager, a data processor, and a data analyst. …..
In the information society, the needs are more complex. Read any of the popular journals of media criticism and you will find the same complaints about modern journalism. It misses important stories, is
too dependent on press releases, is easily manipulated by politicians and special interests, and does not communicate what it does know in an effective manner. All of these complaints are justified. Their Cause is not so much a lack of energy, talent, or dedication to truth, as the critics some times imply, but a simple lag in the application of information science—a body of knowledge—to the daunting problems of reporting the news in a time of information overload.
….
Today’s journalist must also be familiar with the growingjournalistic body of knowledge, which, therefore, must include these elements:
1 How to find information.
2 How to evaluate and analyze it
3 How to communicate it in a way that will pierce the babble of infor-
mation overload and reach the people who need and want it.
4 How to determine, and then obtain, the amount of precision needed
for a particular story. ‘

(Meyer, p. 1-2)


‘Data is not just about numbers’

Today’s data journalism is closely linked to the philosophy of open data. Data should be available in easily usable formats and be evaluable for everyone. But the claim of current data journalism – as represented by the Guardian authors – still follows the essential ideas of Philip Meyer.

‘We keep some of Meyer’s approach alive in how we do data journalism and we work alongside reporters to get the most out of the combination of data and specialist knowledge. Data is not just about numbers, and behind every row in a database there is a human story. They’re the stories we’re striving to tell. ‘ The Guardian Sat 23 Mar 2019

Examples

Since then, data-based journalism has set a trend. Many others publish data using graphics and are always looking for new ways to communicate the analysed data in an understandable way.
One of many examples is the New York Times, which celebrates Upshot’s 5th anniversary in 2019:

‘Five years ago today, The New York Times introduced The Upshot with the aim of examining politics, policy and everyday life in new ways. We wanted to experiment with formats, using whatever mix of text, data visualizations, images and interactive features seemed best for the subject at hand.


In the meantime there are networks that share their knowledge and offer help for data journalism or Data Driven Journalism DDJ. One of them (mostly in German) is datenjournalismus.net

Outstanding

Among the thousands of data-based stories and their visualizations there are highlights again and again. I don’t want to withhold my recent favourite. It is the analysis and visualization of the internal migration after the German reunification. Die Zeit presented this with a lot of effort and fascinating results in May 2019.

… and much more

Statistical Self-Defense

No day without numbers in (social) media, in everyday life. And they not only want to inform us, they also want to orient us in one direction or the other.

And every day are among them deliberately or unintentionally false or misleading numbers.

Therefore, statistics must arm themselves against incorrect use of data and repeatedly teach the correct handling of statistical data.

There have long been numerous works on this subject. Here is another quite basic presentation by the Dutch journalist Sanne Blauw.

She picks out five statistical sins.

The fact that such presentations often use numbers themselves, which would also have to be viewed critically, does not diminish the value of her warnings.

Synchronously Visualized

Once again:  the New York Times presents an innovative graphic, which you always want to watch again and again.

It’s this Downhill Race at the Olympics:

.

Start

Run

Finish

 

The link to the moving graphic is below this picture:

For Statistics?

It would be exciting to follow such visualizations, e. g. on changes in unemployment, GDP etc. of different countries from today-minus-x to today.

Easy-to-understand Statistics for the Public

In a recently published EUROSTAT publication, the authors demand innovative forms of communication from public statistics in order not to lose their socially important role. Among other things, they demand ‘…. to tell stories close to the people; to create communities around specific themes; to develop among citizens the ability to read the data and understand what is behind the statistical process.’

Telling Stories

The UNECE hackathon that has just been completed responds to this challenge.
‘A hackathon is an intensive problem-solving event. In this case, the focus is on statistical content and effective communication. The teams will be challenged to “Create a user-oriented product that tells a story about the younger population”. During the Hackathon, fifteen teams from nine countries had 64.5 hours to create a product that tells a story about the younger population. The teams were multidisciplinary – with members from statistical offices and other government departments. The product created should be innovative, engaging, and targeted towards the general public (that is, not specialists). There was no limit on the form of the product, but the teams had to include a mandatory SDG indicator in the product.
The mandatory indicator was “Proportion of youth (aged 15-24 years) not in education, employment or training” SDG indicator (Indicator 8.6.1).‘ (Source)

Winners

And the hackathon shows impressive results, even if only a few organisations have participated.

The four winners are:

My Favourites

My favourites are number 3 from the National Institute of Statistics and Geography (INEGI-Mexico) and number 2 from the Central Statistical Office of Poland.

Why?

The Mexican solution…

…is aesthetically pleasing and easy to use. The interaction is left to the user and can be individually controlled by him/her in the speed.

The diagrams do not stand alone, but are explained by short texts while scrolling.

The results are not just being accepted. Rather, the concepts are explained and questioned – statistics are presented with the methodological background.

The Polish solution…

…starts with a jourmalistic approach. Here too, the interactivity can be controlled by the user at the desired speed.

At the end, the authors also seek direct contact with the users; a quiz personalizes the statistical data and gives an individual assessment of where the users stand personally with regard to these statistics.

Success Factors

The two applications mentioned above combine decisive user-friendly features:
– visually attractive,
– easy-to-understand navigation that can be controlled by the user according to his needs,
– the journalistic approach,
– concise and instructive explanations,
– personalization,
– hints on the methodological background.

Many of the other applications show the frequently encountered weaknesses: Too much information should be provided, no courage to leave something behind and concentrate on the most important elements. And this leads to long texts and complex navigation with the effect that users quit quickly.

Learning by Doing

The New York Times did it after the election, in January 2017: You Draw It, Learning Statistics by drawing and comparing charts.

‘Draw your guesses on the charts below to see if you’re as smart
as you think you are.’

 

And Bayerischer Rundfunk did it before the election, in April 2017.

This kind of giving information is an excellent strategy to foster insights and against forgetting. And it’s an old tradition in didactics. 360 years ago Amos Comenius emphasized this technique in his Didactica Magna:

“Agenda agendo discantur”

 

You Can See in Numbers

‘We are extremely sad to announce that Professor Hans Rosling died this morning. Hans suffered from a pancreatic cancer which was diagnosed one year ago. He passed away early Tuesday morning, February 7, 2017, surrounded by his family in Uppsala, Sweden.’ Anna R. Rönnlund & Ola Rosling, Co-founders of Gapminder. He died aged 68.

rosling-30102009
Hans Rosling, preparing his presentation in Geneva, 2009-10-30              Photo: A. Grossenbacher

In 2009, the Swiss Statistics’ Meeting took place in Geneva, Switzerland. Hans Rosling was there and his talk’s topic: ‘Unveiling the beauty of statistics’. He wanted data to be free, free from legal and technical barriers. His ambition – and his success – was to disseminate these data beautifully … in order to change the world.

A difficult task. In an interview in the Guardian, in 2013: “It’s that I became so famous with so little impact on knowledge,” he says, when asked what’s surprised him most about the reaction he’s received. “Fame is easy to acquire, impact is much more difficult. …. He’s similarly nonplussed about being a data guru. “I don’t like it. My interest is not data, it’s the world. And part of world development you can see in numbers.”  (Taken from the Guardian interview 2013).

And that’s why statistics and the world need more people like Hans Rosling – more than ever!

Reading a Picture

Visual storytelling

Visualising data helps understanding facts.
Sometimes it’s very easy to understand a graph; sometimes it’s necessary to read it and to study it to discover unknown territory.

Such graphs are little masterpieces. Here’s one of these and I am sure the authors had more than one iteration and discussion while creating it.
The graph tells the story of the average disposable income and savings of households in Switzerland, published by the Swiss Federal Statistical Office FSO.

snip_disposable-income2

The authors kindly give a short explanation:

How to read this graph.
In one-person households aged 64 or under, the upper-income group has a disposable income of CHF 8487 per month and savings of CHF 2758 per month. Representing 4.0% of all households, this income group corresponds to a fifth of one-person households aged 64 or under (20.1%)

There’s another nice graph, a little bit less elaborated, also explained by the authors:

snip-povertyrates

Statistics ♥

But there’s one thing that is not explained:

snip_poverty-cithe confidence interval!

‘A confidence interval gives an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data,‘ and the above poverty data are from a sample of ‘approximately 7000 households, i.e. more than 17,000 persons who are randomly selected…’.
Or:
The confidence intervals for the mean give us a range of values around the mean where we expect the “true” (population) mean is located (with a given level of certainty, see also Elementary Concepts). ….. as we all know from the weather forecast, the more “vague” the prediction (i.e., wider the confidence interval), the more likely it will materialize. Note that the width of the confidence interval depends on the sample size and on the variation of data values…..’

Khan Academy gives lectures about topics like confidence intervals, sampling, etc.

snip_20161129160845.

Which one ?

The above graphs use just one of multiple possibilities for visualising data.

snip_graph-catalogue

Severino Ribecca’s Data Visualisation Catalogue is one of many websites trying to give an overview. And there’s the risk to get lost in these compilations.

snip_swimring                            © listverse.com

Next Step in OGD Websites

What DataUsa is doing could be – I guess – the next step in the evolution of Open Government Data websites. It’s the step from offering file downloads to presenting data (and not files) interactively. And it’s a kind of presentation many official statistical websites would surely be proud of.

César A. Hidalgo from MIT discusses the philosophy behind this. More at the end of this post; at first a short look at this website.

snip_datausahome

Bringing data together

Merging data from different sources may have been the most expensive and challenging task and the conditio sine qua non for the existence of this website. And perhaps it’s more an organizational than a technical challenge.

Seven public data sources are accessible via DataUsa

snip_datausa-datasources

Presenting data

Adapting to what internauts normally do, the main entrance is a search bar;

snip_datausa-homesearch

 

Thematical and geographical profiles are available, too. But in a hidden menu.

The presentation of the data is a mix of generated text and various types of graphs.

snip_datausa-graph3

snip_datausa-graph

 

The option above every graph allows to share, embed, download, get a table and even an API for the data.

snip_datausa-data3

 

And finally thematical maps provide other views and insights:
snip_datausa-map

Storytelling

But the fascinating part is Stories
snip_datausa-storiessnip_datausa-stories2

Various authors write stories focussing on special topics and using the presentation techniques of the site.

Background

A glossary explains technical terms and the About Section presents the authors and their aim:
‘In 2014, Deloitte, Datawheel, and Cesar Hidalgo, Professor at the MIT Media Lab and Director of MacroConnections, came together to embark on an ambitious journey — to understand and visualize the critical issues facing the United States in areas like jobs, skills and education across industry and geography. And, to use this knowledge to inform decision making among executives, policymakers and citizens.’

And this leads to the
Philosophy behind 

César A. Hidalgo, one of the websites’ authors explains why they did what they did in a blog post with the title ‘What’s Wrong with Open-Data Sites–and How We Can Fix Them.’

Here’s the design philosophy in a visual nutshell:

snip_datausa-design

 

‘Our hope is to make the data shopping experience joyful, instead of maddening, and by doing so increase the ease with which data journalists, analysts, teachers, and students, use public data. Moreover, we have made sure to make all visualizations embeddable, so people can use them to create their own stories, whether they run a personal blog or a major newspaper.’

And:

‘After all, the goal of open data should not be just to open files, but to stimulate our understanding of the systems that this data describes. To get there, however, we have to make sure we don’t forget that design is also part of what’s needed to tame the unwieldy bottoms of the deep web.’

 

 

Optimism with Data

What will our future be like? Is there no or some hope that things evolve in a good direction? Will we make progress?

Data play a crucial role in answering these questions.

Steven Pinker (Harvard University, Department of Psychology) in his answer to the EDGE question of 2016 considers that Quantifying Human Progress is the most interesting recent (scientific) news:

But the most interesting news is that the quantification of life has been extended to the biggest question of all: Have we made progress? Have the collective strivings of the human race against entropy and the nastier edges of evolution succeeded in improving the human condition?’

‘Human intuition is a notoriously poor guide to reality. …. But the cognitive and data revolutions warn us not to base our assessment of anything on subjective impressions or cherry-picked incidents. As long as bad things haven’t vanished altogether, there will always be enough to fill the news, and people will intuit that the world is falling apart. The only way to circumvent this illusion is to plot the incidence of good and bad things over time. Most people agree that life is better than death, health better than disease, prosperity better than poverty, knowledge better than ignorance, peace better than war, safety better than violence, freedom better than coercion. That gives us a set of yardsticks by which we can measure whether progress has actually occurred.

The interesting news is that the answer is mostly “yes.” …. Economic historians and development scholars (including Gregory Clark, Angus Deaton, Charles Kenny, and Steven Radelet) have plotted the growth of prosperity in their data-rich books, and the case has been made even more vividly in websites with innovative graphics such as Hans Rosling’s Gapminder, Max Roser’s Our World in Data, and Marian Tupy’s HumanProgress.’

What may be true for the world must not be true for the individuals.
Let’s have a look at these mostly well-known data sites:

Max Roser: Our World in Data

‘Max Roser is the founder of OurWorldInData. He is an economist working at the University of Oxford. His background is in economics, geoscience and philosophy. His research is focusing on the long-term growth and distribution of living standards.’

‘On my website I am presenting the long-term data on how we are changing our world. The idea is to tell the history of our present world – based on empirical data and visualised in graphs.’

snip_worldindata-about-method

‘Most of the long-run trends are positive and paint an optimistic view of our world. Topic by topic, the empirical view of our world shows how the Enlightenment continues to make our world a better place. It chronicles how we are becoming less violent and increasingly more tolerant. The data displays how new ideas continue to improve living standards, allowing us to live a healthier, richer and happier life. It is the story of declining poverty and better food provision in a world we care about.

The empirical view on our world shows how misplaced doom and defeatism is and my aim is to encourage those who work to make our world a better place still. At the same time my hope is also to help to change the mind of those of you who do not think that we are creating a better world. By looking at the empirical data I want to explain why I am optimistic about how we are changing our world and why I think it is worthwhile to engage in the global long-term project of Enlightenment. Although most trends are clearly going in the right direction I also show where this is not the case. In a world of hysteria we cannot focus on what is important, but a fact based view on our world should help us to focus on the topics that are most important.’  http://ourworldindata.org/about/

snip_roser-health

Human Progress.org

Human Progress’ mission statement (http://humanprogress.org/about):

‘Evidence from academic institutions and international organizations shows dramatic improvements in human well-being. These improvements are especially striking in the developing world.
Unfortunately, there is often a wide gap between the reality and public perception, including that of many policymakers, scholars in unrelated fields, and intelligent lay persons. To make matters worse, the media emphasizes bad news, while ignoring many positive long-term trends.

We hope to help in correcting misperceptions regarding the state of humanity through the presentation of empirical data that focuses on long-term developments. All of our wide-ranging data comes from third parties, including the World Bank, the OECD, the Eurostat, and the United Nations. By putting together this comprehensive data in an accessible way, our goal is to provide a useful resource for scholars, journalists, students, and the general public.

While we think that policies and institutions compatible with freedom and openness are important factors in promoting human progress, we let the evidence speak for itself. We hope that this website leads to a greater appreciation of the improving state of the world and stimulates an intelligent debate on the drivers of human progress.

Note: HumanProgress.org is a project of the Cato Institute with major support from the John Templeton Foundation, the Searle Freedom Trust, the Brinson Foundation and the Dian Graves Owen Foundation.’

Some data:

snip_humanprogress-infographics.

Gapminder

And here is top-star Hans Rosling with his gapminder.org where he deconstructs misleading, ’60-years-behind-reality’ opinions with data.

An example: Hans Rosling asks: Has the UN gone mad?

‘The United Nations just announced their boldest goal ever: To eradicate extreme poverty for all people everywhere, already by 2030.
Looking at the realities of extremely poor people the goal seems impossible. The rains didn’t fall in Malawi this year. The poor farmers Dunstar & Jenet, gather a tiny maize harvest in a small pile on the ground outside their mud hut. But Dunstar & Jenet know exactly what they need to break the vicious circle of poverty. And Hans Rosling shows how billions of people have already managed. This year’s “hunger season” may very well be Dunster’s & Jenet’s last.
Up-to-date statistics show that recent global progress is ‘the greatest story of our time – possibly the greatest story in all of human history. The goal seems unrealistic to many highly educated people because their worldview is lagging 60 years behind reality.’

snip_roslingpoverty

snip_gapminder-panic

.

A focussed view: OXFAM’s new study

‘An Economy for the 1%

 Runaway inequality has created a world where 62 people own as much wealth as the poorest half of the world’s population – a figure that has fallen from 388 just five years ago, according to an Oxfam report published on January 18th.
How pivilege and power in the economy drive extreme inequality and how this can  be stopped. The global inequality crisis is reaching new extremes.The richest 1%now have more wealth than the rest of the world combined.
Power and privilege is being used to skew the economic system to increase the gap between the richest and the rest. A global network of tax havens further enables the richest individuals to hide $7.6 trillion.’ -> Methodology
snip-oxfam2016
OXFAM’s conclusion:
‘The fight against poverty will not be won until the inequality crisis is tackled.’

Income distribution. Data on Max Roser

snip_wealt-roser
‘A lesson that that we can take away from this empirical research is that political forces at work on the national level are possibly important for how incomes are distributed. If there was a universal trend towards more inequality it would be in line with the notion that inequality is determined by global market forces and technological progress where it is very hard (or for other reasons undesirable) to change the forces that lead to higher inequality. Inequality would then be inevitable. The reality of different inequality trends within countries suggests that the institutional and political framework in different countries play a role in shaping inequality of incomes.’

 

 

 

 

In Love with Data

An amazing, year-long, analog data drawing project made by two women: Giorgia Lupi and Stefanie Posavec.

‘Each week we collect and measure a particular type of data about our lives, use this data to make a drawing on a postcard-sized sheet of paper, and then drop the postcard in an English “postbox” (Stefanie) or an American “mailbox” (Giorgia)!’

An example: Week 14 by Stefanie

card-time-titlecard-time

card-time-explain

About the project

‘The process:
Every week we choose a topic we want to explore about our days and lives, and on Monday start our separate-but-parallel data collection.

The data-collecting ends the evening of the following Sunday, and through the course of the following week we analyse our data and draw our postcard, all the while collecting the next dataset.

On Monday we scan and drop our data postcard into the mailbox/postbox and start to plan the next week’s drawing!

The postcards:
The data drawing is shown on the front of the postcard, while the back always includes a “how to read it” key to enable the other to understand the data collection and insight behind the drawing.’

The Book

data-book

 

‘The book explores the role that data plays in our lives and originates from a correspondence between the two authors – both data visualisation artists who met at a data conference and chose to keep in touch by sending weekly postcards composed of data visualisations in place of words. The result is described as “a thought-provoking visual feast”.’

Next: Dear Data two

Data two ‘project was inspired by Dear-Data.com, a wonderful collaboration between Giorgia Lupi and Stefanie Posavec. We (Jeffrey Shaffer and Andy Kriebel) decided to follow in their footsteps and coincidentally, Andy recently moved from California to London, England.’