The Good, the Bad and the Ugly

Communication of statistics in times of fake news

In a recent paper Emanuele Baldacci, (Director, Eurostat) and Felicia Pelagalli, (President, InnovaFiducia) deal with the ‘challenges for official statistics of changes in the information market spurred by network technology, data revolution and changes in information consumers’ behaviours’ (p.3)

Three scenarios

The status-quo or bad scenario:

‘Information will continue to be consumed via multiple decentralized channels, with new information intermediaries emerging through social platforms, digital opinion leaders, technologies that reinforce belonging to peers with similar profiles and backgrounds, including in terms of beliefs.’  … ‘Under this scenario it is likely that increased competition from alternative data providers will put pressure on the official statistics position in the information ecosystem and lead to drastic reduction of public resources invested in official statistics, as a result of the perceived lack of relevance.’ (p.8)

 

The ugly scenario:

‘Big oligopoly giants will emerge by integrating technologies, data and content and providing these to a variety of smaller scale platforms and information intermediaries, with limited pricing power for further dissemination. In this scenario, data generated by sensors and machines connected to the network will increasingly create smart information for individuals. However, individuals will not participate in the data processing task, but will be mostly confined to crowdsourcing data for digital platforms and using information services.’
‘In this scenario, official statistics will be further marginalized and its very existence could be put in jeopardy. More importantly, no public authority with significant influence could be in charge of assessing the quality of data used in the information markets. Statistics as a public good may be curtailed and limited to a narrow set of dimensions. …  Official statisticians will appear as old dinosaurs on the way to extinction, separated from the data ecosystem by a huge technology and capability gap.’ (p.9)

 

The good scenario:

The authors do not stop here. They also see a good scenario, but a scenario that implies a huge engagement.

This scenario is ‘predicated on two major assumptions.
First, the information market will be increasingly competitive by sound regulations that prevent the emergence of dominant positions in countries and even more important across them.
Second, official statistics pursue a strong modernization to evolve towards the production of smart statistics, which fully leverage technology and new data sources while maintaining and enhancing the quality of the data provided to the public.
In this scenario, official statistics will generate new more sophisticated data analytics that cater to different users by tailored information services. It uses network technologies (e.g., blockchain, networks) to involve individuals, companies and institutions in the design, collection, processing and dissemination of statistics. It engages users with open collaborative tools and invests heavily in data literacy to ensure their usability. It strengthens skills and capacity on statistical communication to help users understand in transparent manners what are the strengths and limitations of official statistics.’ (p. 9/10)

 

Actions needed to face the challenges ahead

The good scenario already depicts some needed actions to be taken by official statisticians. The authors conclude with proposals that are not really new, ideas that have been on the table for some time but are not so easy to implement.

‘It is important to change mindsets and practices which have been established, in order to put in contact the citizens with official statistics, to make data accessible, to expand the understanding of their analysis, to support individuals, business and institutions in the decision-making process.

The key issue is how to be authoritative and to develop quality knowledge in the new and changing information market. It is important to know the rules and languages of the media platforms used for communication; to overcome the technicalities; to tell stories close to the people; to create communities around specific themes; to develop among citizens the ability to read the data and
understand what is behind the statistical process. In summary, put people at the center (overused phrase, but extremely valuable):
⎯ communicate statistics through engaging experiences and relevant to the people who benefit from them;
⎯ customize the content;
⎯ adopt “user analytics” to acquire the knowledge of the “users” through the analysis of data (web and social analytics) and the understanding of people’s interaction with the different platforms.’ (p.11)

And the concluding words call for external assistance:

‘It will be essential for statisticians to build more tailored data insight services and team up with communication experts to play a more proactive role in contrasting fake news, checking facts appropriately and building users’ capacity to harness the power of data.’ (p.12)

 

 

 

 

 

There is no New Thing under the Sun – Yes and No

Twitter reminded me that there’s #NTTS2017 going on, Eurostat’s biennial scientific conference on New Techniques and Technologies for Statistics (NTTS).

The opening session also focused on official statistics and its actual and future role in a world of data deluge and alt-facts. What will be Official Statistics in 30 years?
In Diego Kuonen’s presentation and discussion on ‘Big Data, Data Science, Machine Intelligence and Learning’ I could hear an answer to this question reminding me of a text in the Bible: “… that [thing] which is done is that which shall be done: and there is no new thing under the sun”.
And this not to be understood in a static but in a dynamic interpretation:
The work statistical institutions are doing today will be the same that they will do tomorrow … BUT a work adapted to the changing context.
The algorithms (understood in a broader sense as ‘a set of rules that precisely defines a sequence of operations ->‘) used in collecting, analyzing and disseminating data will be changing, manual work will / must be replaced by automation, robots. But the core role of being a trusted source of data-based and (in all operations) transparently produced information serving professional decision making will remain.
The challenge will be that these institutions
– are known,
– are noted for their veracity,
– are consulted
and with all this can play their role.
In this fighting to be heard humans will always play a decisive part.
That’s a clear message (as I understood it) of a data scientist looking ahead.
.
PS. A step towards automation consists of preparing and using linked data. See the NTTS 2017 satellite session “Hands-on workshop on Linked Open Statistical Data (LOD)”

And now: Semantic Statistics (SemStats)

Official Statistics has a long tradition in creating and providing high-quality metadata. And the Semantic Web needs just this: metadata!

So it’s not surprising that these two find together, more and more.
A special workshop will be organized during the The 12th International Semantic Web Conference ISWC, 21-25 October 2013, Sydney, Australia.

It is the 1st International Workshop on Semantic Statistics (SemStats 2013) organized by Raphaël Troncy (EURECOM), Franck Cotton (INSEE), Richard Cyganiak (DERI), Armin Haller(CSIRO) and Alistair Hamilton (ABS).

ISWC 2013 is the premier international forum for the Semantic Web / Linked Data Community. Here, scientists, industry specialists, and practitioners meet to discuss the future of practical, scalable, user-friendly, and game changing solutions.’

The workshop summary

How to publish linked statistics? And: How to use linked data for statistics? These are the key questions of this workshop.

‘The goal of this workshop is to explore and strengthen the relationship between the Semantic Web and statistical communities, to provide better access to the data held by statistical offices. It will focus on ways in which statisticians can use Semantic Web technologies and standards in order to formalize, publish, document and link their data and metadata.

The statistics community faces sometimes challenges when trying to adopt Semantic Web technologies, in particular:

  • difficulty to create and publish linked data: this can be alleviated by providing methods, tools, lessons learned and best practices, by publicizing successful examples and by providing support.
  • difficulty to see the purpose of publishing linked data: we must develop end-user tools leveraging statistical linked data, provide convincing examples of real use in applications or mashups, so that the end-user value of statistical linked data and metadata appears more clearly.
  • difficulty to use external linked data in their daily activity: it is important do develop statistical methods and tools especially tailored for linked data, so that statisticians can get accustomed to using them and get convinced of their specific utility.’

A tradition

RDF, Triples, Linked Data … these are topics statisticians already treated and adapted. But rather on an individual track and not as an organization.

This blog has a lot of information about Semantic Web and Official Statistics, about 40 posts since 2007.

See this post (2012) with a recent paper from Statistics Switzerland (where a study on publishing linked data has just been finished in collaboration with the Bern University of Applied Sciences): https://blogstats.wordpress.com/2012/10/15/imaodbc-2012-and-the-winner-is/

Or this (2009) about SDMX and RDF https://blogstats.wordpress.com/2009/10/27/sdmx-and-rdf-getting-acquainted/ or about LOD activities in 2009: https://blogstats.wordpress.com/2009/04/25/semantic-web-and-official-statistics/

Big Data, Open Data and Official Statistics

There are (at least) two big challenges official statistics will be faced with in the  next few years and which will possibly change its quasi-monoplistic position.

.

On the input side it’s Big Data

‘“Big Data” is a term used to describe massive information stores – generally measured in petabytes and exabytes – and also refers to the methods and technologies used to analyze these large data volumes.  The core principles of Big Data (data mining, analytics) have been around for some time, but recent technology has enabled the collection and analysis of previously unimaginable data volumes at extremely high speeds.’ So says for example SAP and gives some examples how  Big Data will change your life (big words and they show how big software and hardware players begin to occupy the field).

Official Statistics has already put this on the agenda! And so has the in United Nations Statistics Division’s (UNSD) Friday Seminar on Emerging Issues, 22 February 2013.

Some papers from this Seminar:

Gosse van der Veen Statistics Netherlands. High Level Group for the Modernization of Statistical Products and Services. Big Data: Big Opportunity!

2013-04-15_vanderveen-statcom2013

The High-Level Group for the Modernisation of Statistical Production and Services (HLG) established an informal Task Team of national and international experts, coordinated by the UNECE Secretariat. The Paper of this group gives an excellent overview of the topic: What Does “Big Data” mean for Official Statistics.

2013-04-15_HLG-BIGData-Paper

.

Andrew Wyckoff, Big Data for Policy,Development and Official Statistics, Directorate for Science, Technology & Industry. Organisation for Economic Co-operation and Development OECD (personal opinion).

2013-04-15_BigDataRoles-WykoffOECD

.

Aspects of Big Data and real-time analytics are provided in another paper by Global Pulse (an innovation initiative launched by the Executive Office of the United Nations Secretary-General): Big Data for Development: Opportunities & Challenges

2013-04-15_globalpulse

.

The discussion is launched and as mentions the HLG  paper: ‘To use Big data, statisticians are needed with a different mind-set and new skills. The processing of more and more data for official statistics requires statistically aware people with an analytical mind-set, an affinity for IT (e.g. programming skills) and a determination to extract valuable ‘knowledge’ from data. These so-called “data scientists” can be derived from various scientific disciplines.’

.

On the output side it’s (Linked) Open Data in combination with APIs

Open Data is not at all a new topic for Official Statistics. National Statistical Institutes were forerunners in openly providing data; organizations like UN or EUROSTAT went this way as well.

Several Open Data initiatives (USA, UK, France, EU …) consist mostly of data catalogues, and are in that sense also public relations initiatives. A large part of the data so provided consists of statistical data already available, often, on the website of the National Statistical Institute concerned. The EU portal, for instance, offers 5716 datasets  of statistical data from a total of 5893 (as of April 2013).

Further central questions are the licensing of data, 2013-04-20_CCBYas well as their availability in machine-readable formats.

Machine-readable statistical data, Application Programming Interfaces (APIs) to the data and especially Linked Open Data LOD (–> essentials, –>tutorial) open the way to creative applications and new models of presenting information.

2015-01-25_berners lee

An Europe-wide Linked Open Data (LOD2) project ‘was launched in September 2010 and will run for four years. It addresses exploitation of the web as a platform for data and information integration, and the use of semantic technologies to make government data more useable.’

Looking for third-party APPs

Data Providers are looking at applications or mashups made with their data  with much interest, and they are even sponsoring competitions and hack days (like Apps4EU) to stimulate the reuse of open data, especially from the public sector.

The most popular APP creator and statistical storyteller is Hans Roslings  with Gapminder. Rosling himself is a pioneer in fighting for open data.

http://www.youtube.com/watch?feature=player_embedded&v=jbkSRLYSojo

Changing paradigms

Open Data, Linked Open Data and APIs are changing the dissemination paradigm of statistical agencies. More people with new skills will do new things. Coding is becoming the new literacy, says i.e. Garrett Heath in his advice for his unborn daughter: ‘I was blown away that the buzz is not around mobile apps, but rather around using APIs. Ten years ago saw the creation of the social networking platforms. The past five years has been about accumulating the data. The next five years and beyond will be about interpreting that data. [My daughter will have access to] a boatload of interesting data sitting in accessible databases that is waiting to be exposed and interpreted with her [the programmer’s]) creativity.’

Storytelling with data

Storytelling based on data is less and less the domain of statistical agencies. Storytelling can access multiple (new) resources and take on new forms.  To satisfy the basic idea of an easily understandable and appealing presentation of statistical content, statistical institutions cannot avoid taking certain measures to improve their content and presentation. The “composer” must know how the music is to be played, that is as a quick, competent, qualitatively unique, reliable and indispensable data source.
But this presentation job can no longer be done on one’s own: cooperative partnerships are necessary and have already begun to some extent, both with partners outside statistical institutions and between such institutions. This discussion has been launched.

Statistical Storytelling revisited! More in a paper from IMAODBC Vilnius 2010:

2013-04-20_storytellingrevisited2010.

And this: Many small open data give big data insights

FORGET BIG DATA, SMALL DATA IS THE REAL REVOLUTION says Rufus Pollock co-Director of the Open Knowledge Foundation : ‘… the discussions around big data miss a much bigger and more important picture: the real opportunity is not big data, but small data. Not centralized “big iron”, but decentralized data wrangling. Not “one ring to rule them all” but “small pieces loosely joined”.’

small-data-640x120

Official Statistic’s SWOT

In the official statistics industry (an industry!) reflection and  collaboration are highly prioritized.

As an example: HLG-BAS.

What’s this? ‘The High-Level Group for Strategic Developments in Business Architecture in Statistics (HLG-BAS) was set up by the Bureau of the Conference of European Statisticians in 2010 to oversee and coordinate international work relating to the development of enterprise architectures within statistical organisations.’ More about HLG-BAS on UNECE statistics wikis.

And more about the Conference of European Statisticians CES:

Implement the HLG-BAS vision

HLG-BAS presents a very interesting paper for the 60th plenary session of the Conference of European Statisticians. It’s the ‘Strategy to implement the vision of the High-level Group for Strategic Developments in Business Architecture in Statistics‘.

This paper positions official statistics as part of the information industry:
‘The official statistics industry is part of a more extensive information industry. Within this wider information industry other players are claiming their place and statistical organisations cannot automatically assume that they will retain their current position and relevance.’ (point 5)

SWOT

And the paper summarizes in a short and impressive manner the Strengths, Weaknesses, Opportunities and Threats of Official Statistics. (point 9)

‘A SWOT (Strengths, Weaknesses, Opportunities, Threats) analysis was undertaken by Capgemini Consulting working for Statistics Netherlands to define the current situation of the official statistics industry assessing it from an international perspective. This analysis was based on existing information on the industry (including the vision of the HLG-BAS) complemented by interviews with members of the HLG-BAS (internal stakeholders), commercial organisations and government bodies (external stakeholders).

The results of this exercise are:

1. Strengths

(a) High quality with relevant and very strong statistical products over long term;
(b) Strong “brand value” of official statistics locally and internationally;
(c) Ability and ‘stamina’ to produce statistics for long-term records and consistency;
(d) International collaboration has started mainly because it is becoming too expensive for each NSO to individually change their tailor-made production processes and products.

2. Weaknesses

(a) A limited outside and “client-centric” view;
(b) Communication of products and results is often not good enough;
(c) Workforce and processes should be more agile to follow rapidly the changing needs of society;
(d) NSOs are not efficient enough in their processes and rely too much on human effort;
(e) The statistical industry as a whole has no clear silhouette or definition; international coherence is low;
(f) NSOs should provide more information about statistics, regarding both quality and other metadata;
(g) Top-level commitment to bring about the changes needed to align the statistical industry with the changing environment is not broadly understood as the key factor in this change process.

3. Opportunities

(a) In some specific statistical domains, cross-border data become more important (globalisation, enterprise groups, climate change). The work and products of NSOs should be expanded to explain what is happening on a multinational level;
(b) The “open data” movement may increase the sources available for official statistics;
(c) NSOs could collaborate (more) with (commercial) external parties;
(d) The official statistics industry could play a more active role regarding new and alternative data sources and collection methods;
(e) NSOs could be quality institutes that certify statistical inputs/outputs of other (commercial) parties;
(f) In the statistical domain the NSOs can lead when it comes to defining and maintaining international standards;
(g) Standardisation of production process (plug and play technology) and products of NSOs to increase international comparison and quality control of products;
(h) Consolidation of NSOs roles as public supplier of trust and quality;
(i) International coherence and the willingness to form a more closely knit statistical community or industry are beginning to materialize;
(j) Specialisation of NSOs in certain products to increase efficiency in the production process of these products. This specialisation in products could vary across countries and sectors to optimize the possibilities of specialisation.

4. Threats

(a) Other organisations are starting to create output NSOs used to have a monopoly on;
(b) Reduced staff and budget cuts;
(c) Weak/fragile coordination of international collaboration activities;
(d) Society wants more timeliness in statistics, both in disseminating existing products and in developing new products;
(e) Some government clients do not distinguish between official and non-official data sources for ad hoc questions, as long as it meets their purpose;
(f) New technologies like open data can seduce NSOs into losing focus of their core business.’

Official Statistics: Identify Common Challenges

In his Blog Director Groves of the US Census Bureau informs about an important discussion among his colleagues (thanks Xavier for this hint):

‘Several weeks ago, at the initiative of Brian Pink, the Australian statistician, leaders of the government statistical agencies from Australia, Canada, New Zealand, United Kingdom, and the United States held a summit meeting to identify common challenges and share information about current initiatives. ..

… They perceive the same likely future challenges for central government statistical agencies, and they are making similar organizational changes to prepare for the future. …

Ingredients of the future vision:

  1. The volume of data generated outside the government statistical systems is increasing much faster than the volume of data collected by the statistical systems; almost all of these data are digitized in electronic files.
  2. As this occurs, the leaders expect that relative cost, timeliness, and effectiveness of traditional survey and census approaches of the agencies may become less attractive.
  3. Blending together multiple available data sources (administrative and other records) with traditional surveys and censuses (using paper, internet, telephone, face-to-face interviewing) to create high quality, timely statistics that tell a coherent story of economic, social and environmental progress must become a major focus of central government statistical agencies.
  4. This requires efficient record linkage capabilities, the building of master universe frames that act as core infrastructure to the blending of data sources, and the use of modern statistical modeling to combine data sources with highest accuracy.
  5. Agencies will need to develop the analytical and communication capabilities to distill insights from more integrated views of the world and impart a stronger systems view across government and private sector information.
  6. There are growing demands from researchers and policy-related organizations to analyze the micro-data collected by the agencies, to extract more information from the data.

… In short, the five countries are actively inventing a future unlike the past, requiring new ways of thinking and calling for new skills.  The payoff sought is timelier, more trustworthy, and lower cost statistical information measuring new components of the society, economy, and environment, telling a richer story of our countries’ progress. ‘

Read the full blog post here: http://directorsblog.blogs.census.gov/2012/02/02/national-statistical-offices-independent-identical-simultaneous-actions-thousands-of-miles-apart/

Snapshots from the Census Years

Recently the ONS has published some new interactive content aimed at bringing together visualisation, narrative,audio,data table, images and animation into a single compact product. The Snapshots from the Census Years product is, in one sense, a logical continuation of the storytelling themes Armin talked about in his recent posting

Snapshots from the Census Years
Snapshots from the Census Years - integrating content

As we move from print to web, it becomes clear that we need not think of visualisation/narrative/data as such separate constructs.  Modern web content should allow us to integrate these forms so they can be authored together as a powerful composite product:  Clicking on a story in the text highlights the appropriate part of the graph.  Enabling audio allows you to follow the text while looking at the graph.  Each component allows and encourages further exploration of the other components.  Although only a small product aimed at promoting our 2011 Census, it really does suggest there is more scope for integrated outputs from official statistics producers.

Closely watched Office and much-debated GDP

Official Statistics become more and more closely watched by blogs. The Guardian DATA BLOG did it yesterday with ONS’  GDP data delay.

This was the occasion to launch once again the discussion about GDP and to show some alternatives like ranking countries in measures such as wellbeing or happiness.

Different ways of ranking the G20 countries (see also this earlier post about A happy GDP)

A happy GDP

‘What we measure affects what we do; and if our measurements are flawed, decisions may be distorted.’

‘….. there often seems to be a marked distance between standard measures of important socio economic variables like economic growth, inflation, unemployment, etc. and widespread perceptions. The standard measures may suggest, for instance that there is less inflation or more growth than individuals perceive to be the case, and the gap is so large and so universal that it cannot be explained by reference to money illusion or to human psychology. In some countries, this gap has undermined confidence in official statistics …’

These are two citations taken from the so-called Stiglitz report dealing with the question how to get a better measurement of the progress of societies.

Stiglitz report

‘The Commission’s [The Commission on the Measurement of Economic Performance and Social Progress”] aim has been to identify the limits of GDP as an indicator of economic performance and social progress, including the problems with its measurement; to consider what additional information might be required for the production of more relevant indicators of social progress; to assess the feasibility of alternative measurement tools, and to discuss how to present the statistical information in an appropriate way.’

********

End of October 2009 the OECD has launched a WIKI in order to give interested people a platform to deal with these questions. It’s called Wikiprogress for healty societies.

Wikiprogress

‘Wikiprogress is a global platform for sharing information in order to evaluate societal progress. Wikiprogress is a place to find information and statistics to facilitate the exchange of ideas, initiatives and knowledge on “measuring the progress of societies”. It is open to all members and communities for contribution– students and researchers, civil society organisations, governmental and intergovernmental organisations, multilateral institutions, businesses, statistical offices, community organisations and individuals – anyone who has an interest in the concept of “progress”.

*****

On this topic see also this site:

Beyond GDP

To build an ecosystem of data on the Web

Using statistical data to explain the world, telling stories with statistical data, visualizing statistical data to make these data accessible in a quick and instructive manner – all these topics are well known and belong to  long and intensive discussions and activities in many institutions of official statistics. Results can be seen on the websites of National Statistical Institutes and international statistical organisations.

Some examples:

oecdexplorer-small

OECD explorer

ecbinflation-small
ECB Inflation dashboard

worldbankatlas-small
World Bank Atlas

businesscyclesmall
Business Cycle Tracer Statistics Nertherlands

statatlassmall
Stat@las Statistics Switzerland

tgm-small
Eurostat TGM

There are many other visualizations and behind all these user friendly databases with free access for everybody. This is the ecosystem of official statistics.

Official satistical data are also used and presented outside the institutions of official statistics (see: earlier post raw data now and helping free up data), the discussions and aims are comparable, the instruments are innovative.

Well known is Gapminder which collects data from many sources and offers a presentation tool that has also been integrated as motion chart in the list of visualization widgets of Google spreadsheets.

Google’s Fusion Tables (see earlier post Fusion Tables and gov.data) provide some more possibilities of data collaboration and data visualization. Listening to Alon Halevy, senior Google engineer and Peter Gleick, president of the Pacific Institute (which uses Fusion Tables) well known arguments can be heard: ‘ “The biggest potential [of Google Fusion Tables] is to build an ecosystem of data on the Web. This means making it easy for the people to upload, to merge data sets, to discuss the data, to create visualizations and then to take these visualizations and put them elsewhere on the Web so that there’s better data on the Web.” ‘

Link: Google Fusion Tables Continue reading “To build an ecosystem of data on the Web”