The Challenge of Smart Data

Official statistics have never been exempt from the changes taking place around them. Numerous organisations at national and international level are constantly dealing with it and it is always interesting to see what the current 2019 assessment of future challenges is.

One example

Here is an example: Kurt Vandenberghe from the EU Commission (Directorate A) in his closing speech at the conference on New Techniques and Technologies for Official Statistics (NTTS 2019).

He focuses on data collection – especially smart data – , necessary qualifications and possible support from AI. Dissemination, contact to data users and questions about the comprehensible presentation and correct use of the data are left out. And also no reference to the potential of linked data, with which more can be pulled out of existing sources.

The following text includes the last part of Vandenberghes speech with the conclusion. I have adjusted the layout a bit with highlights:

‘So how will the future look like?

I recently came across a statement on a Eurostat website that in the course of the third decade of this century “Most if not all data is expected to be organic, i.e. by-products of people’s activities, systems and things, including billions of smart devices connected to the internet”. In that new context there is a growing need to examine and make use of the potential of “B-to-G”, business to government data transfer. This involves data from social media, mobile phones, Internet of Things, etc. There should be a new role for statistical institutions, captured by the term
smart statistics”.
I quote from the same Eurostat NTTS related page: “Smart Statistics can be seen as the future extended role of official statistics in a world impregnated with smart technologies.” Finally there is the issue of trusted smart statistics, again with an important role for official statistics, ensuring not only the validity and accuracy of the outputs, but also respecting data subjects’ privacy and protecting confidentiality.

Privacy and confidentiality
are a growing concern and we need more research on techniques and technologies helping to avoid misuses of data on individuals and enterprises.

I guess what we will see in the coming years is, however, not one technique replacing existing ones, but a
coexistence of and synergies between
established and new data sources
and techniques, of public and private ones, and of general and specialised providers that complement each other. This will include traditional questionnaire-based surveys, and administrative data sources, alongside new techniques such as big data. While some of these sources will provide basic structural information in high quality, others will provide more timely data on key trends.
What will be increasingly important is to have rich meta-information and knowledge about the quality of these sources and to guarantee and create trusted statistics, including trusted smart statistics.

And in all of this we cannot forget the role that
people with the right skills
will play. We saw already in the last few years that there is a strong growth in Europe in the demand for big data analysts and for managers who know how to deal with big data. This is only expected to grow further. To avoid a skills gap we will have to encourage young people to take up studies in these fields and educational institutions to provide corresponding courses. In the debate around “the future of work”(future technological change might endanger traditional jobs), there is one thing that is certain: the need for data analysts will grow further.

And I guess it is safe to say that they will be increasingly supported by Artificial Intelligence.
Artificial Intelligence
can help to make sense of increasingly large amounts of data, to check the validity and improve their quality, relieving statisticians from routine tasks. Artificial Intelligence could help us analysing data with greater scope, scale and speed. In fact, a lot of what I said before and what you have discussed during the conference relates – directly or indirectly – to artificial intelligence – although AI does not seem very prominent on the programme. Paraphrasing Isaac Asimov’s quote about computers, we could say ‘I don’t fear AI, I fear the lack of it’. And maybe we should especially fear a lack of a European AI. Europe needs to lead on AI and develop AI that respects European values and makes the lives of Europeans better. The Commission is therefore increasing its annual investments in AI by 70% under the research and innovation programme Horizon 2020. It will reach EUR 1.5 billion for the period 2018-2020, and resources will grow further after 2020. ‘

Smart data and appropriate processes

Smart data is the challenge in data collection. What has to be considered, how the processes have to be adapted in order to connect the different data sources to the standard of public statistics – this is the subject of discussion. Here, too, are two examples (from 2018).

Are Current Frameworks in the Official Statistical Production Appropriate for the Usage of Big Data and Trusted Smart Statistics? Bertrand LOISON Vice-Director, Swiss Federal Statistical Office, Diego KUONEN CEO, Statoo Consulting & Professor of Data Science, University of Geneva

From the abstract:
‘As a sequential approach of statistical production, GSBPM (“Generic Statistical Business Process Model”) has become a well-established standard using deductive reasoning as analytics’ paradigm. For example, the first GSBPM steps are entirely focused on deductive reasoning based on primary data collection and are not suited for inductive reasoning applied to (already existing) secondary data (e.g. big data resulting, for example, from smart ecosystems). Taken into account the apparent potential of big data in the official statistical production, the GSBPM process needs to be adapted to incorporate both complementary approaches of analytics (i.e. inductive and deductive reasoning) … . ‘

[4] Kuonen D. (2018). Production Processes of Official Statistics & Data Innovation Processes Augmented by Trusted Smart Statistics: Friends or Foes? Keynote presentation given on May 15, 2018 at the conference “Big Data for European Statistics (BDES)” in Sofia, Bulgaria

Towards a Reference Architecture for Trusted Smart Statistics
Fabio Ricciato, Michail Skaliotis, Albrecht Wirthmann, Kostas Giannakouris, Fernando Reis EUROSTAT Task Force on Big Data, 5, rue Alphonse Weicker, L 2721 Luxembourg

From the abstract:
‘ …. we outline the concept of Trusted Smart Statistics as the natural evolution of official statistics in the new datafied world, where traditional data sources (survey and administrative data) represent a valuable but small portion of the global data stock, much thereof being held in the private sector. In order to move towards practical implementation of this vision a Reference Architecture for Trusted Smart Statistics is required, i.e., a coherent system of technical, organisational and legal means combined to provide an articulated set of trust guarantees to all involved players. In this paper we take a first step in this direction by proposing selected design principles and system components …. .’

Big Analytics

Technology Predictions for 2015

For 2015, Bing predicts the same three technologies for all continents to be in the first places: Wearables, Digital Personal Assistants and Home Automation.
All these technologies are part of the Internet of Things (IoT). They have many sensors delivering data, they are connected to several clouds and they provide information about processes or behaviours of people and environments.
Devices communicate with devices, devices with humans and humans with devices. So IoT is not another Internet or an Internet besides the already known. It’s an expansion of the Internet and it produces more massive data accessible for known and unknown parties: telecom and cloud providers, enterprises, governments ……
Finally also humans become part of IoT

IoT Infographic

These aspects are explained in a good infographic made by Postscapes in collaboration with Harbor Research.
The complete infographic can be found here:

IoT, Big Data and then?

Data from classical statistical surveys as well as from connected devices must be analyzed in order to be of use.
Analytics of things (of data produced by sensors of connected things including humans) are the new kind of statistical information every organisation, private person or government can use for their purposes, for well informed decisions and steering activities.  The usages are numberless and under no control.
Examples of such statistics and applications are already numberless:
  • Singapur itraffic
  • Some more example applications of IoT in Postscapes IoT toolkit
  • And some things are intelligent even without being connected, like thermostats.


Big Analytics, Data Science and Official Statistics

The quasi-monopoly of national statistical agencies having the resources to undertake huge surveys vanishes. A  new kind of statistics emerges – collected, controlled and used by new agents.
To get the  full potential of these data, special qualifications are needed. Classical statistical analysis expands to Data Science. (Big) Data need (Big) Analytics and this explains why many predict that statistician will be the sexiest profession.
To get an idea of this business in expansion or even explosion Diego Kuonen’s frequent tweets and presentations are an ideal source.
As new sources of data appear and broader analyzing techniques become necessary, Statistical Agencies are challenged. Attacking the paradigm change is on the agenda.

 Who owns the data

Who owns the data emerging from my devices (smartphone, wearables, home automation …)? What data come from sensors out of my personal reach?
And what are they doing with these data?
Huge data protection issues wait for answers. Reality seems to be faster than rules …
'Estimates vary, but by 2020 there could be over 30 billion devices connected to the Internet. Once dumb, they will have smartened up thanks to sensors and other technologies embedded in them and, thanks to your machines, your life will quite literally have gone online.'


Big Data – Big Projects – Big Discussions

‘Old’ Data vs. Reality Mining

For a long time Official Statistics are synonym for data. With the emergence (or better: the stronger awareness) of new information sources – aka Big Data – this is about to be changed. And the opportunities these data are offering are changing, too. With all the risks (privacy!) included.
In the light of the research activities and projects around Big Data and reality mining traditional statistical data management seems to date from another time. Evidence based decision making is migrating to a new level.
Some prominent examples of Big Data research:

Project FuturICT

One very interesting and very ambitious project facing such a BIG Data opportunity is FuturICT, lead by Dirk Helbling from the Swiss Federal Institute of Technology in Zurich (ETHZ).
FuturICT’s ‘ultimate goal … is to understand and manage complex, global, socially interactive systems’ (Homepage FuturICT).
Introducing FuturICT by Dirk Helbling:
Some points (taken from Edge and the FuturICT brochure):
‘There are two big global trends. One is big data. That means in the next ten years we’ll produce as many data, or even more data than in the past 1,000 years.
The other trend is hyperconnectivity. That means we have networking our world going on at a rapid pace; we’re creating an Internet of things. So everyone is talking to everyone else, and everything becomes interdependent. ….
But on the other hand, it turns out that we are, at the same time, creating highways for disaster spreading. We see many extreme events, we see problems such as the flash crash, or also the financial crisis. That is related to the fact that we have interconnected everything. In some sense, we have created unstable systems. We can show that many of the global trends that we are seeing at the moment, like increasing connectivity, increase in the speed, increase in complexity, are very good in the beginning, but (and this is kind of surprising) there is a turning point and that turning point can turn into a tipping point that makes the systems shift in an unknown way. ……
We really need to understand those systems, not just their components. It’s not good enough to have wonderful gadgets like smartphones and computers; each of them working fine in separation. Their interaction is creating a completely new world, and it is very important to recognize that it’s not just a gradual change of our world; there is a sudden transition in the behavior of those systems, as the coupling strength exceeds a certain threshold.’

Three components

‘The [first] component to ‘measure the state of the world’ is called the Planetary Nervous System. It can be imagined as a global sensor network, where ‘sensors’ include anything able to provide data in real-time about socio-economic, environmental or technological systems (including the Internet). Such an infrastructure will enable real-time data mining – reality mining – and the calibration and validation of coupled models of socio-economic, technological and environmental systems with their complex interactions. It will even be possible to extract suitable models in a data-driven way, guided by theoretical knowledge.’ (Future ICT. Global computing for our complex world, p.18)
The second component, the Living Earth Simulator will be very important here, because that will look at what-if scenarios. It will take those big data generated by the Planetary Nervous System and allow us to look at different scenarios, to explore the various options that we have, and the potential side effects or cascading effects, and unexpected behaviors, because those interdependencies make our global systems really hard to understand.’
The third component will be the Global Participatory Platform. That basically makes those other tools available for everybody: for business leaders, for political decision-makers, and for citizens. We want to create an open data and modeling platform that creates a new information ecosystem that allows you to create new businesses, to come up with large-scale cooperation much more easily, and to lower the barriers for social, political and economic participation.’

Scoop.IT: FuturICT



Social Physics: Another Approach

Alexander Pentland from MIT Media Labs is also dealing with the opportunities of Big Data. In his book “Social Physics” he reflects about what can be done with this treasure of information. And it’s a rather technocratic approach he follows. 2014-06-13_socialphysics.

Social physics?

‘Social physics is a quantitative social science that describes reliable, mathematical connections between information and idea flow on the one hand and people’s behavior on the other. Social physics helps us understand how ideas flow from person to person through the mechanism of social learning and how this flow of ideas ends up shaping the norms, productivity, and creative output of our companies, cities, and societies. It enables us to predict the productivity of small groups, of departments within companies, and even of entire cities. It also helps us tune communication networks so that we can reliably make better decisions and become more productive.’ …
See also Pentland at a Google show:


‘The engine that drives social physics is big data: the newly ubiquitous digital data now available about all aspects of human life. Social physics functions by analyzing patterns of human experience and idea exchange within the digital bread crumbs we all leave behind us as we move through the world—call records, credit card transactions, and GPS location fixes, among others. These data tell the story of everyday life by recording what each of us has chosen to do. And this is very different from what is put on Facebook; postings on Facebook are what people choose to tell each other, edited according to the standards of the day. Who we actually are is more accurately determined by where we spend our time and which things we buy, not just by what we say we do.
The process of analyzing the patterns within these digital bread crumbs is called reality mining, and through it we can tell an enormous amount about who individuals are.’ (From ‘Social Physics: How Good Ideas Spread-The Lessons from a New Science’, The Penguin Press, 2014) .

‘How to re-engineer the world’: The Economist’s critical voice

‘ Institutions should be redesigned around social physics, [Pentland] says. For instance, to improve health-care, anonymous medical records could be used to show what treatments work best. Mr Pentland’s research also offers lessons for policymakers and business people. He advances a new way to protect privacy by creating something of a property right for personal information. People would in most cases control what personal data were collected, how they are used, and with whom they are shared, treating their personal data as assets, as they do money in a bank. Yet he is less convincing when he strays from his research to make broader points about politics and economics. He reduces too much of the world’s complexity to something to be solved by data, when they are just part of the solution. His enthusiasm for a world run by datacrats rings of a zealotry that could easily go awry. Still, “Social Physics” is a fascinating look at a new field by one of its principal geeks.’ From


‘A society enabled by Big Data’

‘Reality mining’ is the buzzword and it’s tied to the other buzzword ‘Human-Data Interaction’ HDI.
  • Human Data Interaction HDI.2014-06-14_HDI flow
    ‘Personal data about and by each of us, whether we are aware of it or not, feeds into black-box analytics algorithms to infer facts, both correct and incorrect. These drive actions, whose eff ects may or may not be visible to us’.

Internet of things

Sources of big data are not only humans with or without their devices but also objects equipped with sensors and machines communicating with machines (M2M). In the Internet of things (IoT) things exchange data, semantic description helps for the interoperability of things and interconnected smart objects become reality.

‘The internet of things is a way to deliver cheap information that could be used for good or ill. So let’s start talking about what we want as a society’ This is the motto for one of several conferences dealing with this topic:

2014-06-14_iot cheap data

Data Explosion: Analytics Software Must Adapt or Die

From ReadWriteWeb: Written by Richard MacManus / June 2, 2010 12:30 AM

In my previous few articles, I’ve explored the potential impact of sensors on the Internet. Soon there will be a trillion sensors connected to the Web, which will result in an explosion of online data. How will this mass of new and mostly real-time data be processed and analyzed? Will current data analytics software be able to cope? The short answer is, no it won’t. New types of analytics software will be required, together with much more powerful computers.

During my visit to HP Labs last month, I sat down with Meichun Hsu – director of the Intelligent Information Management Lab at Hewlett Packard – to discuss this issue. Hsu has been researching new real-time, sensor analytics solutions for the coming Internet of Things era.

Read more……

Web Wide World – Internet of Things (IoT)

The Web evolves. Everything is being tracked. Data and real world objects are linked together and the web is the medium where all this happens – so the (perhaps not so far) vision.

Nova Spivack discusses this in his article ‘From World Wide Web to Web Wide World — The Web Breaks Out of its Petri Dish’.

And a European Union Conference starting 6th of October 2008 entitled “Internet of Things – Internet of the Future is also focussing on this issue: ‘The Internet is at a crossroads of its evolution. Mobile internet and Radio Frequency Identification (RFID), among other key technologies, will soon allow the creation of an « Internet of objects » whose services will weave themselves into users’ daily life. Tomorrow’s Internet services will expand to various fields like health, education, proximity services and energy management.’

Currently the EU has launched a Consultation on the early challenges regarding the “Internet of Things”: ‘The context of this consultation is the preparation of a Communication from the European Commission on the Internet of Things (IoT), planned for the second quarter of 2009. … The Communication on the Internet of Things will propose a policy approach addressing the whole range of political and technological issues related to the move from RFID and sensing technologies to the Internet of Things. It will focus especially on architectures, control of critical infrastructures, emerging applications, security, privacy and data protection, spectrum management, regulations and standards, broader socio-economic aspects.’

In a Working Paper of the EU Commission is explained with some instructive examples what IoT could mean: ‘The phrase “Internet of Things” heralds a vision of the future Internet1 where connecting physical things, from banknotes to bicycles, through a network will let them take an active part in the Internet, exchanging information about themselves and their surroundings. This will give immediate access to information about the physical world and the objects in it – leading to innovative services and gains in efficiency and productivity.’

What could this mean for statistical information?

First of all changes in data collection (‘i.e.: ‘The Internet of Things will have a profound effect on the way traffic, weather, particles in the air, water pollution, and the environment can be monitored and statistics collected.’ Working Paper of the EU Commission, p. 5).

But also (and much more difficult to anticipate) changes for dissemination of information. Semantic Web is often seen as a system of linked data (not documents). Every object gets its URI (its unique adress) and is described with a set of specific properties (i.e. RDF triples).

So in a world of described objects (data or also real world objects), search engines can bring together objects with common properties and open new dimensions of information and knowledge.
In theory (and perhaps in a distant future) objects in the real world can be linked with a lot of other objects, one of them (object-specific) statistical data.
To think about!

See also ‘Real World Internet‘. Position Paper and ‘Future Internet Portal‘.