The Challenge of Smart Data

Official statistics have never been exempt from the changes taking place around them. Numerous organisations at national and international level are constantly dealing with it and it is always interesting to see what the current 2019 assessment of future challenges is.

One example

Here is an example: Kurt Vandenberghe from the EU Commission (Directorate A) in his closing speech at the conference on New Techniques and Technologies for Official Statistics (NTTS 2019).

He focuses on data collection – especially smart data – , necessary qualifications and possible support from AI. Dissemination, contact to data users and questions about the comprehensible presentation and correct use of the data are left out. And also no reference to the potential of linked data, with which more can be pulled out of existing sources.

The following text includes the last part of Vandenberghes speech with the conclusion. I have adjusted the layout a bit with highlights:

‘So how will the future look like?

I recently came across a statement on a Eurostat website that in the course of the third decade of this century “Most if not all data is expected to be organic, i.e. by-products of people’s activities, systems and things, including billions of smart devices connected to the internet”. In that new context there is a growing need to examine and make use of the potential of “B-to-G”, business to government data transfer. This involves data from social media, mobile phones, Internet of Things, etc. There should be a new role for statistical institutions, captured by the term
smart statistics”.
I quote from the same Eurostat NTTS related page: “Smart Statistics can be seen as the future extended role of official statistics in a world impregnated with smart technologies.” Finally there is the issue of trusted smart statistics, again with an important role for official statistics, ensuring not only the validity and accuracy of the outputs, but also respecting data subjects’ privacy and protecting confidentiality.

Privacy and confidentiality
are a growing concern and we need more research on techniques and technologies helping to avoid misuses of data on individuals and enterprises.

I guess what we will see in the coming years is, however, not one technique replacing existing ones, but a
coexistence of and synergies between
established and new data sources
and techniques, of public and private ones, and of general and specialised providers that complement each other. This will include traditional questionnaire-based surveys, and administrative data sources, alongside new techniques such as big data. While some of these sources will provide basic structural information in high quality, others will provide more timely data on key trends.
What will be increasingly important is to have rich meta-information and knowledge about the quality of these sources and to guarantee and create trusted statistics, including trusted smart statistics.

And in all of this we cannot forget the role that
people with the right skills
will play. We saw already in the last few years that there is a strong growth in Europe in the demand for big data analysts and for managers who know how to deal with big data. This is only expected to grow further. To avoid a skills gap we will have to encourage young people to take up studies in these fields and educational institutions to provide corresponding courses. In the debate around “the future of work”(future technological change might endanger traditional jobs), there is one thing that is certain: the need for data analysts will grow further.

And I guess it is safe to say that they will be increasingly supported by Artificial Intelligence.
Artificial Intelligence
can help to make sense of increasingly large amounts of data, to check the validity and improve their quality, relieving statisticians from routine tasks. Artificial Intelligence could help us analysing data with greater scope, scale and speed. In fact, a lot of what I said before and what you have discussed during the conference relates – directly or indirectly – to artificial intelligence – although AI does not seem very prominent on the programme. Paraphrasing Isaac Asimov’s quote about computers, we could say ‘I don’t fear AI, I fear the lack of it’. And maybe we should especially fear a lack of a European AI. Europe needs to lead on AI and develop AI that respects European values and makes the lives of Europeans better. The Commission is therefore increasing its annual investments in AI by 70% under the research and innovation programme Horizon 2020. It will reach EUR 1.5 billion for the period 2018-2020, and resources will grow further after 2020. ‘

Smart data and appropriate processes

Smart data is the challenge in data collection. What has to be considered, how the processes have to be adapted in order to connect the different data sources to the standard of public statistics – this is the subject of discussion. Here, too, are two examples (from 2018).


Are Current Frameworks in the Official Statistical Production Appropriate for the Usage of Big Data and Trusted Smart Statistics? Bertrand LOISON Vice-Director, Swiss Federal Statistical Office, Diego KUONEN CEO, Statoo Consulting & Professor of Data Science, University of Geneva

From the abstract:
‘As a sequential approach of statistical production, GSBPM (“Generic Statistical Business Process Model”) has become a well-established standard using deductive reasoning as analytics’ paradigm. For example, the first GSBPM steps are entirely focused on deductive reasoning based on primary data collection and are not suited for inductive reasoning applied to (already existing) secondary data (e.g. big data resulting, for example, from smart ecosystems). Taken into account the apparent potential of big data in the official statistical production, the GSBPM process needs to be adapted to incorporate both complementary approaches of analytics (i.e. inductive and deductive reasoning) … . ‘

[4] Kuonen D. (2018). Production Processes of Official Statistics & Data Innovation Processes Augmented by Trusted Smart Statistics: Friends or Foes? Keynote presentation given on May 15, 2018 at the conference “Big Data for European Statistics (BDES)” in Sofia, Bulgaria
(https://goo.gl/RMfpfB).

Towards a Reference Architecture for Trusted Smart Statistics
Fabio Ricciato, Michail Skaliotis, Albrecht Wirthmann, Kostas Giannakouris, Fernando Reis EUROSTAT Task Force on Big Data, 5, rue Alphonse Weicker, L 2721 Luxembourg

From the abstract:
‘ …. we outline the concept of Trusted Smart Statistics as the natural evolution of official statistics in the new datafied world, where traditional data sources (survey and administrative data) represent a valuable but small portion of the global data stock, much thereof being held in the private sector. In order to move towards practical implementation of this vision a Reference Architecture for Trusted Smart Statistics is required, i.e., a coherent system of technical, organisational and legal means combined to provide an articulated set of trust guarantees to all involved players. In this paper we take a first step in this direction by proposing selected design principles and system components …. .’

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s