Easy-to-understand Statistics for the Public

In a recently published EUROSTAT publication, the authors demand innovative forms of communication from public statistics in order not to lose their socially important role. Among other things, they demand ‘…. to tell stories close to the people; to create communities around specific themes; to develop among citizens the ability to read the data and understand what is behind the statistical process.’

Telling Stories

The UNECE hackathon that has just been completed responds to this challenge.
‘A hackathon is an intensive problem-solving event. In this case, the focus is on statistical content and effective communication. The teams will be challenged to “Create a user-oriented product that tells a story about the younger population”. During the Hackathon, fifteen teams from nine countries had 64.5 hours to create a product that tells a story about the younger population. The teams were multidisciplinary – with members from statistical offices and other government departments. The product created should be innovative, engaging, and targeted towards the general public (that is, not specialists). There was no limit on the form of the product, but the teams had to include a mandatory SDG indicator in the product.
The mandatory indicator was “Proportion of youth (aged 15-24 years) not in education, employment or training” SDG indicator (Indicator 8.6.1).‘ (Source)


And the hackathon shows impressive results, even if only a few organisations have participated.

The four winners are:

My Favourites

My favourites are number 3 from the National Institute of Statistics and Geography (INEGI-Mexico) and number 2 from the Central Statistical Office of Poland.


The Mexican solution…

…is aesthetically pleasing and easy to use. The interaction is left to the user and can be individually controlled by him/her in the speed.

The diagrams do not stand alone, but are explained by short texts while scrolling.

The results are not just being accepted. Rather, the concepts are explained and questioned – statistics are presented with the methodological background.

The Polish solution…

…starts with a jourmalistic approach. Here too, the interactivity can be controlled by the user at the desired speed.

At the end, the authors also seek direct contact with the users; a quiz personalizes the statistical data and gives an individual assessment of where the users stand personally with regard to these statistics.

Success Factors

The two applications mentioned above combine decisive user-friendly features:
– visually attractive,
– easy-to-understand navigation that can be controlled by the user according to his needs,
– the journalistic approach,
– concise and instructive explanations,
– personalization,
– hints on the methodological background.

Many of the other applications show the frequently encountered weaknesses: Too much information should be provided, no courage to leave something behind and concentrate on the most important elements. And this leads to long texts and complex navigation with the effect that users quit quickly.


The Good, the Bad and the Ugly

Communication of statistics in times of fake news

In a recent paper Emanuele Baldacci, (Director, Eurostat) and Felicia Pelagalli, (President, InnovaFiducia) deal with the ‘challenges for official statistics of changes in the information market spurred by network technology, data revolution and changes in information consumers’ behaviours’ (p.3)

Three scenarios

The status-quo or bad scenario:

‘Information will continue to be consumed via multiple decentralized channels, with new information intermediaries emerging through social platforms, digital opinion leaders, technologies that reinforce belonging to peers with similar profiles and backgrounds, including in terms of beliefs.’  … ‘Under this scenario it is likely that increased competition from alternative data providers will put pressure on the official statistics position in the information ecosystem and lead to drastic reduction of public resources invested in official statistics, as a result of the perceived lack of relevance.’ (p.8)


The ugly scenario:

‘Big oligopoly giants will emerge by integrating technologies, data and content and providing these to a variety of smaller scale platforms and information intermediaries, with limited pricing power for further dissemination. In this scenario, data generated by sensors and machines connected to the network will increasingly create smart information for individuals. However, individuals will not participate in the data processing task, but will be mostly confined to crowdsourcing data for digital platforms and using information services.’
‘In this scenario, official statistics will be further marginalized and its very existence could be put in jeopardy. More importantly, no public authority with significant influence could be in charge of assessing the quality of data used in the information markets. Statistics as a public good may be curtailed and limited to a narrow set of dimensions. …  Official statisticians will appear as old dinosaurs on the way to extinction, separated from the data ecosystem by a huge technology and capability gap.’ (p.9)


The good scenario:

The authors do not stop here. They also see a good scenario, but a scenario that implies a huge engagement.

This scenario is ‘predicated on two major assumptions.
First, the information market will be increasingly competitive by sound regulations that prevent the emergence of dominant positions in countries and even more important across them.
Second, official statistics pursue a strong modernization to evolve towards the production of smart statistics, which fully leverage technology and new data sources while maintaining and enhancing the quality of the data provided to the public.
In this scenario, official statistics will generate new more sophisticated data analytics that cater to different users by tailored information services. It uses network technologies (e.g., blockchain, networks) to involve individuals, companies and institutions in the design, collection, processing and dissemination of statistics. It engages users with open collaborative tools and invests heavily in data literacy to ensure their usability. It strengthens skills and capacity on statistical communication to help users understand in transparent manners what are the strengths and limitations of official statistics.’ (p. 9/10)


Actions needed to face the challenges ahead

The good scenario already depicts some needed actions to be taken by official statisticians. The authors conclude with proposals that are not really new, ideas that have been on the table for some time but are not so easy to implement.

‘It is important to change mindsets and practices which have been established, in order to put in contact the citizens with official statistics, to make data accessible, to expand the understanding of their analysis, to support individuals, business and institutions in the decision-making process.

The key issue is how to be authoritative and to develop quality knowledge in the new and changing information market. It is important to know the rules and languages of the media platforms used for communication; to overcome the technicalities; to tell stories close to the people; to create communities around specific themes; to develop among citizens the ability to read the data and
understand what is behind the statistical process. In summary, put people at the center (overused phrase, but extremely valuable):
⎯ communicate statistics through engaging experiences and relevant to the people who benefit from them;
⎯ customize the content;
⎯ adopt “user analytics” to acquire the knowledge of the “users” through the analysis of data (web and social analytics) and the understanding of people’s interaction with the different platforms.’ (p.11)

And the concluding words call for external assistance:

‘It will be essential for statisticians to build more tailored data insight services and team up with communication experts to play a more proactive role in contrasting fake news, checking facts appropriately and building users’ capacity to harness the power of data.’ (p.12)






Corporate nieuws

Eurostat’s biennial scientific conference on New Techniques and Technologies for Statistics (NTTS) is over, a labyrinth of a website is online and tons of documents are somewhere published.

CBS Corporate nieuws summarizes the important trends discussed:
1) New data sources and the consequences
2) The importance of a proactive communication
3) Big Data and algorithms in official statistics

trends.pngCBS06-06-2017 Miriam van der Sangen 

Corporate websites

Why taking this information just from CBS (the Dutch Statistical Office)? Because CBS Corporate nieuws is an excellent example of the second trend: proactive communication, proactivity in delivering (statistical) information to users. The website makes corporate information public and gives insights into activities of CBS and statistics. You see topics …

… and the people behind it.

The target public of this corporate website are enterprises, administrations, journalists, students and whoever may be interested.

A shorter English version is integrated into the CBS website.

Corporate websites like CBS’ are not quite usual. They are resource consuming but are probably very good in helping to understand statisticians’ mission and work .. and in motivating employees.





Learning by Doing

The New York Times did it after the election, in January 2017: You Draw It, Learning Statistics by drawing and comparing charts.

‘Draw your guesses on the charts below to see if you’re as smart
as you think you are.’


And Bayerischer Rundfunk did it before the election, in April 2017.

This kind of giving information is an excellent strategy to foster insights and against forgetting. And it’s an old tradition in didactics. 360 years ago Amos Comenius emphasized this technique in his Didactica Magna:

“Agenda agendo discantur”


Post Post-Truth


‘Fake-news’ and ‘post-truth’ (postfaktisch) are the words dominating today many discussions about truth in communication.

' ... in post-truth [post] has a meaning more like ‘belonging to a time in which the specified concept [truth] has become unimportant or irrelevant’' (https://www.oxforddictionaries.com/press/news/2016/11/15/WOTY-16).

False information or even lies are not new in the information business. And therefore many, and many more websites help to separate wrong from right:

The Reporters’ Lab maintains a database of global fact-checking sites.


And Alexios Mantzarlis ‘collected 366 links, one for each day of the year …  to understand fact-checking in 2016′.


Official Statistics’ Ethical Codex

Officials Statistics collect, analyze and disseminate statistical information since long and are also confronted with wrong citations, misuse of statistics and lies. Many of the ethical codices of official statistics recommend acting against such false information.

‘In 1992, the United Nations Economic Commission for Europe (UNECE) adopted the fundamental principles of official statistics in the UNECE region. The United Nations Statistical Commission adopted these principles in 1994 at the global level. The Economic and Social Council (ECOSOC) endorsed the Fundamental Principles of Official Statistics in 2013; and in January 2014, they were adopted by General Assembly. This recognition at the highest political level underlines that official statistics – reliable and objective information – is crucial for decision making.’


Two paragraphs are of special interest:

‘ 2. Professional standards and ethics
To retain trust in official statistics, the statistical agencies need to decide according to strictly professional considerations, including scientific principles and professional ethics, on the methods and procedures for the collection, processing, storage and presentation of statistical data.’


‘4. Prevention of misuse
The statistical agencies are entitled to comment on erroneous interpretation and misuse of statistics.’

The European Statistics Code of Practice says in principle 1:

1.7: The National Statistical Institute and Eurostat and, where appropriate, other statistical authorities, comment publicly on statistical issues, including criticisms and misuses of statistics as far as considered suitable.


N.B: Wikipedia’s page on Misuse of statistics presents a broad view how readers can be fooled by many types of misuse.

It’s dissemination – …

False – and especially deliberately false – information as a weapon in manipulating decisions isn’t new either. But new is how such information spreads: with the help of social media dissemination gains a new level  (some say like earlier Gutenberg’s printing press ).

Anne Applebaum gives a practical illustration of how it can work:

‘I was a victim of a Russian smear campaign. I understand the power of fake news.

It was a peculiar experience, but I learned a lot. As I watched the story move around the Web, I saw how the worlds of fake websites and fake news exist to reinforce one another and give falsehood credence. Many of the websites quoted not the original, dodgy source, but one another. There were more phony sites than I’d realized, though I also learned that many of their “followers” (maybe even most of them) are bots — bits of computer code that can be programmed to imitate human social media accounts and told to pass on particular stories.
But it is also true that we are living through a global media revolution, that people are hearing and digesting political information in brand-new ways and that nobody yet understands the consequences. Fake stories are easier to create, fake websites can be designed to host them, and social media rapidly disseminates disinformation that people trust because they get it from friends. This radical revolution has happened without many politicians noticing or caring — unless, like me, they happened to have seen how the new system of information exchange works.’



May 2017 become the year of people who know about the power and the dangers of misleading information!
My best wishes to the colleagues in Official Statistics and their professional producing and disseminating information …. and perhaps statistical dissemination will need to be more active on social media, too.



Statistics is Dead – Long Live Statistics

To be an expert in a thematic field!

Lee Baker wrote an article that will please the whole community of official statistics where specialists of many thematic fields (and not alone statisticians or mathematicians or … data scientists) are collecting, analysing, interpreting, explaining and publishing data.
It’s this core message that counts:
“… if you want to be an expert Data Scientist in Business, Medicine or Engineering”  (or vice versa: An expert statistician in a field of official statistics like demography, economy, etc.)  “then the biggest skill you’ll need will be in Business, Medicine or Engineering…. In other words, …. you really do need to be an expert in your field as well as having some of the other listed skills”

Here is his chain of arguments:

“Statistics is Dead – Long Live Data Science…

by Lee Barker

I keep hearing Data Scientists say that ‘Statistics is Dead’, and they even have big debates about it attended by the good and great of Data Science. Interestingly, there seem to be very few actual statisticians at these debates.

So why do Data Scientists think that stats is dead? Where does the notion that there is no longer any need for statistical analysis come from? And are they right?

Is statistics dead or is it just pining for the fjords?

I guess that really we should start at the beginning by asking the question ‘What Is Statistics?’.
Briefly, what makes statistics unique and a distinct branch of mathematics is that statistics is the study of the uncertainty of data.
So let’s look at this logically. If Data Scientists are correct (well, at least some of them) and statistics is dead, then either (1) we don’t need to quantify the uncertainty or (2) we have better tools than statistics to measure it.

Quantifying the Uncertainty in Data

Why would we no longer have any need to measure and control the uncertainty in our data?
Have we discovered some amazing new way of observing, collecting, collating and analysing our data that we no longer have uncertainty?
I don’t believe so and, as far as I can tell, with the explosion of data that we’re experiencing – the amount of data that currently exists doubles every 18 months – the level of uncertainty in data is on the increase.

So we must have better tools than statistics to quantify the uncertainty, then?
Well, no. It may be true that most statistical measures were developed decades ago when ‘Big Data’ just didn’t exist, and that the ‘old’ statistical tests often creak at the hinges when faced with enormous volumes of data, but there simply isn’t a better way of measuring uncertainty than with statistics – at least not yet, anyway.

So why is it that many Data Scientists are insistent that there is no place for statistics in the 21st Century?

Well, I guess if it’s not statistics that’s the problem, there must be something wrong with Data Science.

So let’s have a heated debate…

What is Data Science?

Nobody seems to be able to come up with a firm definition of what Data Science is.
Some believe that Data Science is just a sexed-up term for statistics, whilst others suggest that it is an alternative name for ‘Business Intelligence’. Some claim that Data Science is all about the creation of data products to be able to analyse the incredible amounts of data that we’re faced with.
I don’t disagree with any of these, but suggest that maybe all these definitions are a small part of a much bigger beast.

To get a better understanding of Data Science it might be easier to look at what Data Scientists do rather than what they are.

Data Science is all about extracting knowledge from data (I think just about everyone agrees with this very vague description), and it incorporates many diverse skills, such as mathematics, statistics, artificial intelligence, computer programming, visualisation, image analysis, and much more.

It is in the last bit, the ‘much more’ that I think defines a Data Scientist more than the previous bits. In my view, if you want to be an expert Data Scientist in Business, Medicine or Engineering then the biggest skill you’ll need will be in Business, Medicine or Engineering. Ally that with a combination of some/all of the other skills and you’ll be well on your way to being in great demand by the top dogs in your field.

In other words, if you want to call yourself a Data Scientist you really do need to be an expert in your field as well as having some of the other listed skills.

Are Computer Programmers Data Scientists?

On the other hand – as seems to be happening in Universities here in the UK and over the pond in the good old US of A – there are Data Science courses full of computer programmers that are learning how to handle data, use Hadoop and R, program in Python and plug their data into Artificial Neural Networks.

It seems that we’re creating a generation of Computer Programmers that, with the addition of a few extra tools on their CV, claim to be expert Data Scientists.

I think we’re in dangerous territory here.

It’s easy to learn how to use a few tools, but much much harder to use those tools intelligently to extract valuable, actionable information in a specialised field.

If you have little/no medical knowledge, how do you know which data outcomes are valuable?
If you’re not an expert in business, then how do you know which insights should be acted upon to make sound business decisions, and which should be ignored?

Plug-And-Play Data Analysis

This, to me, is the crux of the problem. Many of the current crop of Data Scientists – talented computer programmers though they may be – see Data Science as an exercise in plug-and-play.

Plug your dataset into tool A and you get some descriptions of your data. Plug it into tool B and you get a visualisation. Want predictions? Great – just use tool C.

Statistics, though, seems to be lagging behind in the Data Science revolution. There aren’t nearly as many automated statistical tools as there are visualisation tools or predictive tools, so the Data Scientists have to actually do the statistics themselves.

And statistics is hard.
So they ask if it’s really, really necessary.
I mean, we’ve already got the answer, so why do we need to waste our time with stats?


So statistics gets relegated to such an extent that Data Scientists declare it dead.”

The original article and discussion –>here

About the Author

Lee Baker is an award-winning software creator with a passion for turning data into a story.
A proud Yorkshireman, he now lives by the sparkling shores of the East Coast of Scotland. Physicist, statistician and programmer, child of the flower-power psychedelic ‘60s, it’s amazing he turned out so normal!
Turning his back on a promising academic career to do something more satisfying, as the CEO and co-founder of Chi-Squared Innovations he now works double the hours for half the pay and 10 times the stress – but 100 times the fun!”

This post is taken from datascience.central and has been published previously in Innovation Enterprise and LinkedIn Pulse