Visual insights

In large amounts of data, information is hidden that can hardly be recognized with simple means. Special methods for data analysis are in demand and visualization techniques in particular help to overview the information gained and to pass it on in an understandable way.

Media have recognised the potential of statistical and other data years ago; this has led to what has been practised as data journalism in various large newspapers and also in newspaper co-operations.

The Datablog

A pioneer is The Guardian, whose datablog celebrated its 10th anniversary in March 2019:

Computer-assisted reporting

 But hardly anyone is ever the first. Especially when it comes to the visualization of data, there are examples that date back centuries.
But a new era has dawned with the use of computers in data analysis to generate interesting journalistic stories.
Of central importance here is the person of Philip Meyer, who began to use computer-assisted reporting as a journalist in the 1960s.

In his book Precision Journalism: A Reporter’s Introduction to Social Science Methods‘, published in his first edition in 1973, Meyer describes the demands on journalism that are still valid today and that are becoming data journalism.

‘There was a time when all you [as a journalist] needed was dedication to truth, plenty of energy, and some talent for writing. You still need those things, but they are no longer sufficient. The world has become so complicated, the growth of available information so explosive, that the journalist needs to be a filter, as well as a transmitter; an organizer and interpreter, as well as one who gath ers and delivers facts. In addition to knowing how to get information into print, online, or on the air, he or she also must know how to get it into the receiver’s head. In short, a journalist has to be a database manager, a data processor, and a data analyst. …..
In the information society, the needs are more complex. Read any of the popular journals of media criticism and you will find the same complaints about modern journalism. It misses important stories, is
too dependent on press releases, is easily manipulated by politicians and special interests, and does not communicate what it does know in an effective manner. All of these complaints are justified. Their Cause is not so much a lack of energy, talent, or dedication to truth, as the critics some times imply, but a simple lag in the application of information science—a body of knowledge—to the daunting problems of reporting the news in a time of information overload.
….
Today’s journalist must also be familiar with the growingjournalistic body of knowledge, which, therefore, must include these elements:
1 How to find information.
2 How to evaluate and analyze it
3 How to communicate it in a way that will pierce the babble of infor-
mation overload and reach the people who need and want it.
4 How to determine, and then obtain, the amount of precision needed
for a particular story. ‘

(Meyer, p. 1-2)


‘Data is not just about numbers’

Today’s data journalism is closely linked to the philosophy of open data. Data should be available in easily usable formats and be evaluable for everyone. But the claim of current data journalism – as represented by the Guardian authors – still follows the essential ideas of Philip Meyer.

‘We keep some of Meyer’s approach alive in how we do data journalism and we work alongside reporters to get the most out of the combination of data and specialist knowledge. Data is not just about numbers, and behind every row in a database there is a human story. They’re the stories we’re striving to tell. ‘ The Guardian Sat 23 Mar 2019

Examples

Since then, data-based journalism has set a trend. Many others publish data using graphics and are always looking for new ways to communicate the analysed data in an understandable way.
One of many examples is the New York Times, which celebrates Upshot’s 5th anniversary in 2019:

‘Five years ago today, The New York Times introduced The Upshot with the aim of examining politics, policy and everyday life in new ways. We wanted to experiment with formats, using whatever mix of text, data visualizations, images and interactive features seemed best for the subject at hand.


In the meantime there are networks that share their knowledge and offer help for data journalism or Data Driven Journalism DDJ. One of them (mostly in German) is datenjournalismus.net

Outstanding

Among the thousands of data-based stories and their visualizations there are highlights again and again. I don’t want to withhold my recent favourite. It is the analysis and visualization of the internal migration after the German reunification. Die Zeit presented this with a lot of effort and fascinating results in May 2019.

… and much more

Statistical Self-Defense

No day without numbers in (social) media, in everyday life. And they not only want to inform us, they also want to orient us in one direction or the other.

And every day are among them deliberately or unintentionally false or misleading numbers.

Therefore, statistics must arm themselves against incorrect use of data and repeatedly teach the correct handling of statistical data.

There have long been numerous works on this subject. Here is another quite basic presentation by the Dutch journalist Sanne Blauw.

She picks out five statistical sins.

The fact that such presentations often use numbers themselves, which would also have to be viewed critically, does not diminish the value of her warnings.

Synchronously Visualized

Once again:  the New York Times presents an innovative graphic, which you always want to watch again and again.

It’s this Downhill Race at the Olympics:

.

Start

Run

Finish

 

The link to the moving graphic is below this picture:

For Statistics?

It would be exciting to follow such visualizations, e. g. on changes in unemployment, GDP etc. of different countries from today-minus-x to today.

Easy-to-understand Statistics for the Public

In a recently published EUROSTAT publication, the authors demand innovative forms of communication from public statistics in order not to lose their socially important role. Among other things, they demand ‘…. to tell stories close to the people; to create communities around specific themes; to develop among citizens the ability to read the data and understand what is behind the statistical process.’

Telling Stories

The UNECE hackathon that has just been completed responds to this challenge.
‘A hackathon is an intensive problem-solving event. In this case, the focus is on statistical content and effective communication. The teams will be challenged to “Create a user-oriented product that tells a story about the younger population”. During the Hackathon, fifteen teams from nine countries had 64.5 hours to create a product that tells a story about the younger population. The teams were multidisciplinary – with members from statistical offices and other government departments. The product created should be innovative, engaging, and targeted towards the general public (that is, not specialists). There was no limit on the form of the product, but the teams had to include a mandatory SDG indicator in the product.
The mandatory indicator was “Proportion of youth (aged 15-24 years) not in education, employment or training” SDG indicator (Indicator 8.6.1).‘ (Source)

Winners

And the hackathon shows impressive results, even if only a few organisations have participated.

The four winners are:

My Favourites

My favourites are number 3 from the National Institute of Statistics and Geography (INEGI-Mexico) and number 2 from the Central Statistical Office of Poland.

Why?

The Mexican solution…

…is aesthetically pleasing and easy to use. The interaction is left to the user and can be individually controlled by him/her in the speed.

The diagrams do not stand alone, but are explained by short texts while scrolling.

The results are not just being accepted. Rather, the concepts are explained and questioned – statistics are presented with the methodological background.

The Polish solution…

…starts with a jourmalistic approach. Here too, the interactivity can be controlled by the user at the desired speed.

At the end, the authors also seek direct contact with the users; a quiz personalizes the statistical data and gives an individual assessment of where the users stand personally with regard to these statistics.

Success Factors

The two applications mentioned above combine decisive user-friendly features:
– visually attractive,
– easy-to-understand navigation that can be controlled by the user according to his needs,
– the journalistic approach,
– concise and instructive explanations,
– personalization,
– hints on the methodological background.

Many of the other applications show the frequently encountered weaknesses: Too much information should be provided, no courage to leave something behind and concentrate on the most important elements. And this leads to long texts and complex navigation with the effect that users quit quickly.

Learning by Doing

The New York Times did it after the election, in January 2017: You Draw It, Learning Statistics by drawing and comparing charts.

‘Draw your guesses on the charts below to see if you’re as smart
as you think you are.’

 

And Bayerischer Rundfunk did it before the election, in April 2017.

This kind of giving information is an excellent strategy to foster insights and against forgetting. And it’s an old tradition in didactics. 360 years ago Amos Comenius emphasized this technique in his Didactica Magna:

“Agenda agendo discantur”

 

You Can See in Numbers

‘We are extremely sad to announce that Professor Hans Rosling died this morning. Hans suffered from a pancreatic cancer which was diagnosed one year ago. He passed away early Tuesday morning, February 7, 2017, surrounded by his family in Uppsala, Sweden.’ Anna R. Rönnlund & Ola Rosling, Co-founders of Gapminder. He died aged 68.

rosling-30102009
Hans Rosling, Geneva, 2009-10-30

In 2009, the Swiss Statistics’ Meeting took place in Geneva, Switzerland. Hans Rosling was there and his talk’s topic: ‘Unveiling the beauty of statistics’. He wanted data to be free, free from legal and technical barriers. His ambition – and his success – was to disseminate these data beautifully … in order to change the world.

A difficult task. In an interview in the Guardian, in 2013: “It’s that I became so famous with so little impact on knowledge,” he says, when asked what’s surprised him most about the reaction he’s received. “Fame is easy to acquire, impact is much more difficult. …. He’s similarly nonplussed about being a data guru. “I don’t like it. My interest is not data, it’s the world. And part of world development you can see in numbers.”  (Taken from the Guardian interview 2013).

And that’s why statistics and the world need more people like Hans Rosling – more than ever!

Reading a Picture

Visual storytelling

Visualising data helps understanding facts.
Sometimes it’s very easy to understand a graph; sometimes it’s necessary to read it and to study it to discover unknown territory.

Such graphs are little masterpieces. Here’s one of these and I am sure the authors had more than one iteration and discussion while creating it.
The graph tells the story of the average disposable income and savings of households in Switzerland, published by the Swiss Federal Statistical Office FSO.

snip_disposable-income2

The authors kindly give a short explanation:

How to read this graph.
In one-person households aged 64 or under, the upper-income group has a disposable income of CHF 8487 per month and savings of CHF 2758 per month. Representing 4.0% of all households, this income group corresponds to a fifth of one-person households aged 64 or under (20.1%)

There’s another nice graph, a little bit less elaborated, also explained by the authors:

snip-povertyrates

Statistics ♥

But there’s one thing that is not explained:

snip_poverty-cithe confidence interval!

‘A confidence interval gives an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data,‘ and the above poverty data are from a sample of ‘approximately 7000 households, i.e. more than 17,000 persons who are randomly selected…’.
Or:
The confidence intervals for the mean give us a range of values around the mean where we expect the “true” (population) mean is located (with a given level of certainty, see also Elementary Concepts). ….. as we all know from the weather forecast, the more “vague” the prediction (i.e., wider the confidence interval), the more likely it will materialize. Note that the width of the confidence interval depends on the sample size and on the variation of data values…..’

Khan Academy gives lectures about topics like confidence intervals, sampling, etc.

snip_20161129160845.

Which one ?

The above graphs use just one of multiple possibilities for visualising data.

snip_graph-catalogue

Severino Ribecca’s Data Visualisation Catalogue is one of many websites trying to give an overview. And there’s the risk to get lost in these compilations.

snip_swimring                            © listverse.com