Over 11,000 students from 16 European countries signed up for the fourth edition of the European Statistics Competition (ESC). The ESC is a competition organized by Eurostat and volunteering National Statistical Institutes addressed to secondary education students with the purpose of encouraging students to get familiar with statistics and official statistical sources.
This year 61 teams participated in the European finals where they prepared a video on the topic »Information and disinformation: Official statistics in world overflowing with data«. Teams were very creative and demonstrated the importance of official statistical data in everyday life in various ways.
A jury of European experts selected the top five videos in both the age group 14–16 (category B) and the age group 16–18 (category A). The winners of each category will receive their prize at a virtual award ceremony that will take place on 16 June 2021.
The end for the traditional statistical yearbooks – be they printed or as ebooks – is approaching gradually.
The German yearbook has recently been hit. The last edition had its farewell at a press conference on 30 October 2019: “Digitisation is shaping the statistics of the 21st century. The expansion of our digital communication is necessary if we want to remain the leading provider of statistical information about Germany. We say goodbye to the Statistical Yearbook, which stood for our activities for almost seven decades. The yearbook goes, but the data remains. They are already available via our online services in greater abundance than ever before. …. One thing is clear: Rigid reference books are hardly in demand today. The trend is towards up-to-date, digitally available information. The information is researched online.” Source: Press conference destatis, 30 October 2019. Original in German.
The rationale for abolishing printed yearbooks is always the same: digit(al)ization continues, users have new needs and go online.
The three D’s.
What is meant by digit(al)isation?
‘Digitization essentially refers to taking analog information and encoding it into zeroes and ones so that computers can store, process, and transmit such information. … We refer to digitalization as the way in which many domains of social life are restructured around digital communication and media infrastructures.” Instead, … digital transformation … refers to the customer-driven strategic business transformation that requires cross-cutting organizational change as well as the implementation of digital technologies.
In the final analysis, therefore, we digitize information, we digitalize processes and roles that make up the operations of a business, and we digitally transform the business and its strategy. Each one is necessary but not sufficient for the next, and most importantly, digitization and digitalization are essentially about technology, but digital transformation is not. Digital transformation is about the customer.
There is no doubt that the Internet as a source of information is a priority, the first step does not go to the bookshelf, but digitally, to the smartphone, tablet or PC. Digitization has taken place, everything is available in binary form. And also digitalization in the form of digital types of information and communication: There are comprehensive websites of statistical institutions, some with more, some with less sensitive user guidance. And there are many interactive databases of these institutions, too. When users get to these sources, they await some work to find their way around, searching databases, before a table, a file or a simple website appears on the (often too small) screen.
After the end of the German Statistical Yearbook, there is a comprehensive alternative offer for the content: More tables, graphs and methodological explanations can be found on the web – with a little more effort, not concentrated. it’s like leaving a small, manageable town and having to find your way around a big city. And it is no longer a physically tangible object, guaranteed to be accessible over a long period and no longer – as a book can be – a visible showpiece and image carrier of the institution.
Yearbooks with stories to tell
A specialty of traditional yearbooks is their texts. They offer a certain kind of storytelling. This is quite demanding because it is more than just boring retelling of table contents and it must not get involved in controversial or even politically colored explanations. Describing the context in the various thematic areas and pointing out remarkable developments make them stand out. They help to get a quick first orientation in the extensive data.
Here are a few examples of such storytelling yearbooks and how they – whether discontinued or not – have responded to the trend to digit(al)ization.
Canada till 2012
The Canadian Statistical Yearbook was an early standard for yearbooks that wanted to present a country and its international position in an attractive and widely understandable form. ‘Presented in almanac style, the 2012 Canada Year Bookcontains more than 500 pages of tables, charts and succinct analytical articles on every major area of Statistics Canada’s expertise. The Canada Year Book is the premier reference on the social and economic life of Canada and its citizens. This publication has been discontinued as of April 2013. The last issue of this publication was November 2012.The Canada Year Book 2006 to 2012 is available online in html and pdf formats.’
After some changes, it was closed in 2013. There is no digital alternative, unless – similar to the German solution – there is a thematically ordered overview of data, analyses, and references
The Dutch statistical yearbook was early converted from a printed to a PDF version. Abolished under the title Yearbook, but then continued as Trends in the Netherlands in 2014.
The yearbook went, ‘Trends in the Netherlands’ came – even with more storytelling than before. To be found on the homepage.
The Swiss Statistical Yearbook is one of the last international editions still to be printed. And it is a comprehensive, multimedia, thematically organized reference work: infographics, extensive texts, tables, references in two languages and abstracts in two other languages make it widely accessible.
Digitalization has not passed this yearbook by either. Older editions can be consulted on the Office’s website and the text of the current yearbook is included as an introductory panorama in each of the thematic pages on the web.
The Panorama: An excerpt from the current printed yearbook, format pdf:
Most existing yearbooks entered the era of digit(al)ization entirely through file lists and interactive databases or through PDF versions. Eurostat has been going the other way for several years. The idea of a storytelling book functioning as a unit has been implemented digitally from the very beginning. And last but not least, with an educational intention that promotes statistical literacy. This edition is therefore also called Statistics Explained.
In each topic, this website finally leads to the all-embracing world of digital data and databases.
What else …. ?
Statistical yearbooks encounter digit(al)ization in very different ways: they disappear into (interactive) databases on the web, survive as PDF editions (more or less well integrated into websites) or celebrate a kind of resurrection in web-based book-like products. The strengths of yearbooks (especially those based on storytelling) are thus more or less lost: For example, a professionally curated, guaranteed reliable, easily usable and explained introduction to the essential data topics in one place and guaranteed to be available for many years to come. And an ever more extensive and better presented world of data on the Internet has emerged. An accessible wealth of information, of which one could hardly dream a few years ago.
But no matter how developed this data offer may be, it still lacks some simplicity and quick access to the right data. Anyone who has ever searched for data on different topics and over different periods knows how frustrating this can be. Which in the mass of partly similar files is the right one? How can various topics be combined in databases? How can the different time series be combined? Is it the right data for the question asked, can I use it without risk? Perhaps there are nuances in the method or definition of the data and they should not be compared with other data?
But often users don’t even come to the official sources, because the most common change in user behavior is googling. And the result may be a single figure or a large amount of links to very different sources
… digital transformation.
Statistical institutions are making great efforts in the field of digital innovation, as shown not least by the sometimes very attractive offerings. Many are working on so-called experimental statistics: Coding data faster and better with the help of artificial intelligence, creating and extracting indicators from big data and much more. All this should make the production of statistical data more efficient, less dependent on human intervention (and human error) and faster. In the field of data dissemination, such experiments are still lacking, at least to this day.
Are all these innovations the often mentioned digital transformation? At best, they are elements of it.
What digital transformation can users dream of? Perhaps that statistical information is produced in a rapid and uninterrupted process (like in a pipeline) and is provided with semantic information in such a way that a simple search over topics and periods delivers an unambiguous result and refers to important context information. That even in a digital transformation human intervention will still be necessary (at the latest in presentation, explanation, and support), is not a paradox: Perhaps the overall package of digital transformation also includes non-digital elements, dedicated print products that skilfully lead into the digital world.
In large amounts of data, information is hidden that can hardly be recognized with simple means. Special methods for data analysis are in demand and visualization techniques in particular help to overview the information gained and to pass it on in an understandable way.
Media have recognised the potential of statistical and other data years ago; this has led to what has been practised as data journalism in various large newspapers and also in newspaper co-operations.
A pioneer is The Guardian, whose datablog celebrated its 10th anniversary in March 2019:
But hardly anyone is ever the first. Especially when it comes to the visualization of data, there are examples that date back centuries. But a new era has dawned with the use of computers in data analysis to generate interesting journalistic stories. Of central importance here is the person of Philip Meyer, who began to use computer-assisted reporting as a journalist in the 1960s.
In his book ‘Precision Journalism: A Reporter’s Introduction to Social Science Methods‘, published in his first edition in 1973, Meyer describes the demands on journalism that are still valid today and that are becoming data journalism.
‘There was a time when all you [as a journalist] needed was dedication to truth, plenty of energy, and some talent for writing. You still need those things, but they are no longer sufficient. The world has become so complicated, the growth of available information so explosive, that the journalist needs to be a filter, as well as a transmitter; an organizer and interpreter, as well as one who gath ers and delivers facts. In addition to knowing how to get information into print, online, or on the air, he or she also must know how to get it into the receiver’s head. In short, a journalist has to be a database manager, a data processor, and a data analyst. ….. In the information society, the needs are more complex. Read any of the popular journals of media criticism and you will find the same complaints about modern journalism. It misses important stories, is too dependent on press releases, is easily manipulated by politicians and special interests, and does not communicate what it does know in an effective manner. All of these complaints are justified. Their Cause is not so much a lack of energy, talent, or dedication to truth, as the critics some times imply, but a simple lag in the application of information science—a body of knowledge—to the daunting problems of reporting the news in a time of information overload. …. Today’s journalist must also be familiar with the growingjournalistic body of knowledge, which, therefore, must include these elements: 1 How to find information. 2 How to evaluate and analyze it 3 How to communicate it in a way that will pierce the babble of infor- mation overload and reach the people who need and want it. 4 How to determine, and then obtain, the amount of precision needed for a particular story. ‘
(Meyer, p. 1-2)
‘Data is not just about numbers’
Today’s data journalism is closely linked to the philosophy of open data. Data should be available in easily usable formats and be evaluable for everyone. But the claim of current data journalism – as represented by the Guardian authors – still follows the essential ideas of Philip Meyer.
‘We keep some of Meyer’s approach alive in how we do data journalism and we work alongside reporters to get the most out of the combination of data and specialist knowledge. Data is not just about numbers, and behind every row in a database there is a human story. They’re the stories we’re striving to tell. ‘ The Guardian Sat 23 Mar 2019
Since then, data-based journalism has set a trend. Many others publish data using graphics and are always looking for new ways to communicate the analysed data in an understandable way. One of many examples is the New York Times, which celebrates Upshot’s 5th anniversary in 2019:
‘Five years ago today, The New York Times introduced The Upshot with the aim of examining politics, policy and everyday life in new ways. We wanted to experiment with formats, using whatever mix of text, data visualizations, images and interactive features seemed best for the subject at hand.
In the meantime there are networks that share their knowledge and offer help for data journalism or Data Driven Journalism DDJ. One of them (mostly in German) is datenjournalismus.net
Among the thousands of data-based stories and their visualizations there are highlights again and again. I don’t want to withhold my recent favourite. It is the analysis and visualization of the internal migration after the German reunification. Die Zeit presented this with a lot of effort and fascinating results in May 2019.
He was a pioneer and a great inspiration for what public statistics always strives for: more visibility, more understanding and more resonance. Two years ago Hans Rosling (27 July 1948 – 7 February 2017) died too young.
Demanding and enriching was an encounter with Hans Rosling. His demand for public statistics was urgent and a prerequisite for his enlightening work: that statistical data should be open to all. Here he saw successes. It was and is enriching how he conveyed these data combined with a message. With innovative, precise, entertaining and always very personal presentations, he clarified what had happened and what developments could be desired. He was a realist regarding his effectiveness and yet always an optimist ….. better: a “possibilist”. What remains for me is how he taught to see with numbers – a constant challenge for public statistics.
Gapminder (“a fact tank, not a think tank”), with its innovative tools and commitment, continues to live with Anna Rosling Rönnlund and Ola Rosling.
And recently Factfulness, a book by the three (Hans Rosling, Anna Rosling Rönnlund, Ola Rosling) has been published with the subtitle “Ten Reasons We’re Wrong About The World – And Why Things Are Better Than You Think”
My colleagues published the Slovene multi-player statistical quiz app on Tuesday 22nd. I love it!
We’ve all heard “statistics is boring”, but once you add lovely design, humorous content and a strategic game to it, it can be fun. In two days after the announcement there are more than 1200 players who already played abt. 24.000 games. And we know a lot more about Slovenia than we knew three days ago 😉
How the game works: First one selects his player name, then a favourite character (a hero from Slovenia), turns the wheel of fortune (automatic selection of the region one plays for) and then looks for an opponent (a region or a player). Each game has 7 questions, the last one always being a number range slider (statistical data). The one who wins gets some resources and so each region gradually evolves from the prehistory to the future. The competiton of the 12 regions lasts for 7 days, then the game resets to the starting point (keeping the overall score board of players).
In the current version we have about 2000 questions, 500 of these are statistical (others include interesting info about local peculiarities, history, literature, language, geography …).
At the moment there is the Slovene version only, but the App is ready to be translated or adjusted for another country (if adjusting you’d have to invest also into some graphic design adjustments). I sincerely hope we’d make an English version of the Slovene game someday, for our foreign visitors or fans of Slovenia 😉
In a recently published EUROSTAT publication, the authors demand innovative forms of communication from public statistics in order not to lose their socially important role. Among other things, they demand ‘…. to tell stories close to the people; to create communities around specific themes; to develop among citizens the ability to read the data and understand what is behind the statistical process.’
The UNECE hackathon that has just been completed responds to this challenge.
‘A hackathon is an intensive problem-solving event. In this case, the focus is on statistical content and effective communication. The teams will be challenged to “Create a user-oriented product that tells a story about the younger population”. During the Hackathon, fifteen teams from nine countries had 64.5 hours to create a product that tells a story about the younger population. The teams were multidisciplinary – with members from statistical offices and other government departments. The product created should be innovative, engaging, and targeted towards the general public (that is, not specialists). There was no limit on the form of the product, but the teams had to include a mandatory SDG indicator in the product. The mandatory indicator was “Proportion of youth (aged 15-24 years) not in education, employment or training” SDG indicator (Indicator 8.6.1).‘ (Source)
And the hackathon shows impressive results, even if only a few organisations have participated.
…starts with a jourmalistic approach. Here too, the interactivity can be controlled by the user at the desired speed.
At the end, the authors also seek direct contact with the users; a quiz personalizes the statistical data and gives an individual assessment of where the users stand personally with regard to these statistics.
The two applications mentioned above combine decisive user-friendly features:
– visually attractive,
– easy-to-understand navigation that can be controlled by the user according to his needs,
– the journalistic approach,
– concise and instructive explanations,
– hints on the methodological background.
Many of the other applications show the frequently encountered weaknesses: Too much information should be provided, no courage to leave something behind and concentrate on the most important elements. And this leads to long texts and complex navigation with the effect that users quit quickly.
In a recent paper Emanuele Baldacci, (Director, Eurostat) and Felicia Pelagalli, (President, InnovaFiducia) deal with the ‘challenges for official statistics of changes in the information market spurred by network technology, data revolution and changes in information consumers’ behaviours’ (p.3)
The status-quo or bad scenario:
‘Information will continue to be consumed via multiple decentralized channels, with new information intermediaries emerging through social platforms, digital opinion leaders, technologies that reinforce belonging to peers with similar profiles and backgrounds, including in terms of beliefs.’ … ‘Under this scenario it is likely that increased competition from alternative data providers will put pressure on the official statistics position in the information ecosystem and lead to drastic reduction of public resources invested in official statistics, as a result of the perceived lack of relevance.’ (p.8)
The ugly scenario:
‘Big oligopoly giants will emerge by integrating technologies, data and content and providing these to a variety of smaller scale platforms and information intermediaries, with limited pricing power for further dissemination. In this scenario, data generated by sensors and machines connected to the network will increasingly create smart information for individuals. However, individuals will not participate in the data processing task, but will be mostly confined to crowdsourcing data for digital platforms and using information services.’
‘In this scenario, official statistics will be further marginalized and its very existence could be put in jeopardy. More importantly, no public authority with significant influence could be in charge of assessing the quality of data used in the information markets. Statistics as a public good may be curtailed and limited to a narrow set of dimensions. … Official statisticians will appear as old dinosaurs on the way to extinction, separated from the data ecosystem by a huge technology and capability gap.’ (p.9)
The good scenario:
The authors do not stop here. They also see a good scenario, but a scenario that implies a huge engagement.
This scenario is ‘predicated on two major assumptions. First, the information market will be increasingly competitive by sound regulations that prevent the emergence of dominant positions in countries and even more important across them. Second, official statistics pursue a strong modernization to evolve towards the production of smart statistics, which fully leverage technology and new data sources while maintaining and enhancing the quality of the data provided to the public.
In this scenario, official statistics will generate new more sophisticated data analytics that cater to different users by tailored information services. It uses network technologies (e.g., blockchain, networks) to involve individuals, companies and institutions in the design, collection, processing and dissemination of statistics. It engages users with open collaborative tools and invests heavily in data literacy to ensure their usability. It strengthens skills and capacity on statistical communication to help users understand in transparent manners what are the strengths and limitations of official statistics.’ (p. 9/10)
Actions needed to face the challenges ahead
The good scenario already depicts some needed actions to be taken by official statisticians. The authors conclude with proposals that are not really new, ideas that have been on the table for some time but are not so easy to implement.
‘It is important to change mindsets and practices which have been established, in order to put in contact the citizens with official statistics, to make data accessible, to expand the understanding of their analysis, to support individuals, business and institutions in the decision-making process.
The key issue is how to be authoritative and to develop quality knowledge in the new and changing information market. It is important to know the rules and languages of the media platforms used for communication; to overcome the technicalities; to tell stories close to the people; to create communities around specific themes; to develop among citizens the ability to read the data and
understand what is behind the statistical process. In summary, put people at the center (overused phrase, but extremely valuable):
⎯ communicate statistics through engaging experiences and relevant to the people who benefit from them;
⎯ customize the content;
⎯ adopt “user analytics” to acquire the knowledge of the “users” through the analysis of data (web and social analytics) and the understanding of people’s interaction with the different platforms.’ (p.11)
And the concluding words call for external assistance:
‘It will be essential for statisticians to build more tailored data insight services and team up with communication experts to play a more proactive role in contrasting fake news, checking facts appropriately and building users’ capacity to harness the power of data.’ (p.12)
CBS Corporate nieuws summarizes the important trends discussed:
1) New data sources and the consequences
2) The importance of a proactive communication
3) Big Data and algorithms in official statistics
CBS, 06-06-2017 Miriam van der Sangen
Why taking this information just from CBS (the Dutch Statistical Office)? Because CBS Corporate nieuws is an excellent example of the second trend: proactive communication, proactivity in delivering (statistical) information to users. The website makes corporate information public and gives insights into activities of CBS and statistics. You see topics …
… and the people behind it.
The target public of this corporate website are enterprises, administrations, journalists, students and whoever may be interested.
This kind of giving information is an excellent strategy to foster insights and against forgetting. And it’s an old tradition in didactics. 360 years ago Amos Comenius emphasized this technique in his Didactica Magna:
Officials Statistics collect, analyze and disseminate statistical information since long and are also confronted with wrong citations, misuse of statistics and lies. Many of the ethical codices of official statistics recommend acting against such false information.
‘In 1992, the United NationsEconomic Commission for Europe (UNECE) adopted the fundamental principles of official statistics in the UNECE region. The United Nations Statistical Commission adopted these principles in 1994 at the global level. The Economic and Social Council (ECOSOC) endorsed the Fundamental Principles of Official Statistics in 2013; and in January 2014, they were adopted by General Assembly. This recognition at the highest political level underlines that official statistics – reliable and objective information – is crucial for decision making.’
Two paragraphs are of special interest:
‘ 2. Professional standards and ethics To retain trust in official statistics, the statistical agencies need to decide according to strictly professional considerations, including scientific principles and professional ethics, on the methods and procedures for the collection, processing, storage and presentation of statistical data.’
‘4. Prevention of misuse The statistical agencies are entitled to comment on erroneous interpretation and misuse of statistics.’
1.7: The National Statistical Institute and Eurostat and, where appropriate, other statistical authorities, comment publicly on statistical issues, including criticisms and misuses of statistics as far as considered suitable.
N.B: Wikipedia’s page on Misuse of statistics presents a broad view how readers can be fooled by many types of misuse.
It’s dissemination – …
False – and especially deliberately false – information as a weapon in manipulating decisions isn’t new either. But new is how such information spreads: with the help of social media dissemination gains a new level (some say like earlier Gutenberg’s printing press ).
‘I was a victim of a Russian smear campaign. I understand the power of fake news.
It was a peculiar experience, but I learned a lot. As I watched the story move around the Web, I saw how the worlds of fake websites and fake news exist to reinforce one another and give falsehood credence. Many of the websites quoted not the original, dodgy source, but one another. There were more phony sites than I’d realized, though I also learned that many of their “followers” (maybe even most of them) are bots — bits of computer code that can be programmed to imitate human social media accounts and told to pass on particular stories.
But it is also true that we are living through a global media revolution, that people are hearing and digesting political information in brand-new ways and that nobody yet understands the consequences. Fake stories are easier to create, fake websites can be designed to host them, and social media rapidly disseminates disinformation that people trust because they get it from friends. This radical revolution has happened without many politicians noticing or caring — unless, like me, they happened to have seen how the new system of information exchange works.’
May 2017 become the year of people who know about the power and the dangers of misleading information!
My best wishes to the colleagues in Official Statistics and their professional producing and disseminating information …. and perhaps statistical dissemination will need to be more active on social media, too.