Big Data and Official Statistics

2015-06-05_iaos

 

Big Data is THE topic of the freshly published Statistical Journal of the IAOS – Volume 31, issue 2.

.

Five articles deal with Big-Data topics:

In the editorial Fride Eeg-Henriksen and Peter Hackl give an overview of the Big-Data discussions hold in Official Statistics. Here some remarks taken from this editorial:

‘In spite of the wide interest in and the great popularity of Big Data, no clear and commonly accepted definition of the notion Big Data could be established so far [3]. Modern technological, social and economic developments including the growth of smart devices and infrastructure, the growing availability and efficiency of the internet, the appeal of social networking sites and the prevalence and ubiquity of IT systems are resulting in the generation of huge streams of digital data. The complexities of the structure and dynamic of corresponding datasets, the challenges in developing the suitable software tools for data analytics, generally the diversity of potentials in making use of the masses of available data make it difficult to find a suitable and generally applicable definition. The often mentioned characterization of Big Data by 3 – or more – Vs (volume, velocity, variety – as well as veracity and value), does not capture the enormous scope of the corresponding data sets and the extensive potentials of making use of these data. A highly relevant aspect is that Big Data are so large and complex that traditional database management tools and data processing applications are not feasible and efficient means. This is illustrated by a look at the categories of data sources which typically are seen in the context of Big Data: Such data sources may be
– Administrative, e.g., medical records, insurance records, bank records.
– Commercial transactions, e.g., credit card transactions, scanners in supermarkets.
– Sensors, e.g., satellite imaging, environmental sensors, road sensors.
– Tracking devices, e.g., tracking data from mobile telephones, GPS
– Tracks of human behaviour, e.g., online searches, online page viewing.
– Documentation of opinion, e.g., comments posted in social media.

……….

‘A general conclusion from the set of articles in this Special Section can be drawn as follows: The feasibility and the potentials of using Big Data in official statistics have to be assessed from case to case. In some areas the use of Big Data sources has already proved to be feasible. The choice of the appropriate IT technology and statistical methods must be specific for each situation. Also issues like the representativity and the quality of the resulting statistics, or the confidentiality and the risk of disclosure of personal data need to be assessed individually for each case. There is no doubt that Big Data will have a place in the future of official statistics, helping to reduce costs and burden on respondents. However, major efforts will be necessary to establish the routine wise use of Big Data, and new approaches will be needed for assessing all aspects of quality.’

[3] C. Reimsbach-Kounatze, (2015), The Proliferation of “Big Data” and Implications for Official Statistics and Statistical Agencies: A Preliminary Analysis”, OECD Digital Economy Papers, No. 245, OECD Publishing. http://dx.doi.org/10.1787/5js7t9wqzvg8-en

 

See also: Big Data in Action May 2015

 

Big Data in Action

Not long ago in Official Statistics the topic ‘Big Data’ was mostly discussed in a theoretical manner.

2015-04-29_BigDataTheory2013

https://blogstats.wordpress.com/2014/01/25/big-data-events/

However, now more and more real, and solid examples appear and demonstrate how Big Data work and what their outcome could be.

Some of these examples come from (Official) Statistics. These institutions use Big Data as a source and start applying a new analytical paradigm.

.

Example 1: Global Pulse (UN)

Global Pulse is a flagship innovation initiative of the United Nations Secretary-General on big data. Its vision is a future in which big data is harnessed safely and responsibly as a public good. Its mission is to accelerate discovery, development and scaled adoption of big data innovation for sustainable development and humanitarian action. … Big data represents a new, renewable natural resource with the potential to revolutionize sustainable development and humanitarian practice.’ –>

See some examples of using Big Data below:

  • analyse social media data for perceptions related to sanitation, in order to baseline public engagement
  • use of mobile phone data as a proxy for food security and poverty indicators
  • how risk factors (e.g., tobacco, alcohol, diet and physical activity) of non-communicable diseases (e.g., cancer, diabetes, depression) could be inferred from big data sources as social media and online internet searches

2015-05-23_UNGlobalPulseReport

‘This paper outlines the opportunities and challenges, which have guided the United Nations Global Pulse initiative since its inception in 2009. The paper builds on some of the most recent findings in the field of data science, and findings from our own collaborative research projects. It does not aim to cover the entire spectrum of challenges nor to offer definitive answers to those it addresses, but to serve as a reference for further reflection and discussion. The rest of this document is organised as follows: section one lays out the vision that underpins Big Data for Development; section two discusses the main challenges it raises; section three discusses its application. The concluding section examines options and priorities for the future.’

 .

Example 2: CBS

In Statistics Netherlands (CBS) Big Data is an important research topic.

2015-05-23_cbs-datatypes

.

2015-05-23_cbs-bigdata-challenges

Several examples were studied:

  • road sensors for traffic and transport statistics
  • mobile phone data for travel behaviour (of active phones) or tourism (new phones that register to network)
  • social media data for a sentiment analysis tracking words with their associated sentiment in Twitter, Facebook, Google+, Linkedin, etc.

2015-05-23_CBS-lessonslearned

 .
 .

Example 3: Report of the Global Working Group on Big Data for Official Statistics

In March 2015, the forty-sixth session of the UN Statistical Commission received a report about Big Data in Official Statistics:

‘The report presents the highlights of the International Conference on Big Data for Official Statistics, the outcome of the first meeting of the Global Working Group and the results of a survey on the use of big data for official statistics.’ …

‘The potential of big data sources resides in the timely — and sometimes real‑time — availability of large amounts of data, which are usually generated at minimal cost.  …. before introducing big data into official statistics …. it needs to adequately address issues pertaining to methodology, quality, technology, data access, legislation, privacy, management and finance, and provide adequate cost-benefit analyses.’

UN Statistical Commission Forty-sixth session 3-6 March 2015,
The full report (http://www.un.org/ga/search/view_doc.asp?symbol=E/CN.3/2015/4)

.

Example 4: UNECE Statistics Wiki on Big Data in OfficialStatistics

A dedicated wiki offers an overview of the ever growing activities in the field of Official Statistics and Big Data. It’s managed by the Geneva Office of UNECE.2015-05-23_BIGData-UNECE-wiki

The wiki provides an interesting Big Data Inventory

 

Next Step after OGD: Government’s Big Data Scientist

Open Government Data (OGD) Initiatives have been important steps helping to give broader access to administrative data.

But there was some disappointment because OGD didn’t bring up the mass of apps many hoped. And meanwhile big discussions about using Big Data emerged.

Now the US make a step forward going for a Big Data Initiative: President Obama just nominated a Chief Data Scientist in his Office, DJ Patil.

https://m.whitehouse.gov/blog/2015/02/18/white-house-names-dr-dj-patil-first-us-chief-data-scientist

‘Patil’s new role will involve the application of big data to all government areas, but particularly healthcare policy.’ (Source)

2015-03-19_Patil-Q&A

Data are from the Past

There’s a lot of discussion and also big hope about what is called Big Data and the role of Data Scientists. Will Data Scientists help us to create a better future?

Yes and no. ‘Making predictions about unprecedented futures requires more than data, it requires theory-driven models that envision futures that do not exist in data. Fortunately, digital tools also assist us in envisioning futures that have never been.’

This talk by Martin Hilbert published 13.01.2015 explains why Data are from the past and are not enough.

‘During his 15 years at the United Nations Secretariat, Martin Hilbert assisted governments to take advantage of the digital revolution. When the ‘big data’ age arrived, his research was the first to quantify the historical growth of how much technologically mediated information there actually is in the world.’

‘After joining the faculty of the University of California, Davis, he had more time to think more deeply about the theoretical underpinning and fundamental limitations of the ‘big data’ revolution.’

‘Martin Hilbert holds doctorates in Economics and Social Sciences, and in Communication, and has provided hands-on technical assistance to Presidents, government experts, legislators, diplomats, NGOs, and companies in over 20 countries.
At the University of California, Davis, Martin thinks about the fundamental theories of how digitization affects society.’ More http://www.martinhilbert.net

[Source: youtube https://www.youtube.com/watch?v=UXef6yfJZAI]

It’s high time to demystify

Data, Big Data, Data Scientist, Data Mining …. Statistics. And next: Linked Open Data?

Look at this semantically rich clearing process by Diego Kuonen. It’s worth while!

 

 

See also: https://blogstats.wordpress.com/2013/04/21/big-data-open-data-and-official-statistics/

Big Data Events

The Big Data discussion builds momentum in Official Statistics.

In October 2012 at the UNECE High-level Seminar on Modernization of Statistical Production and Services (HLG) in St.Petersburg Big Data popped up. A strategic paper was asked for.

22 February 2013 the United Nations Statistics Division’s (UNSD) organised the Friday Seminar on Emerging Issues, especially Big Data.

Soon after a very good paper on Big Data was delivered by the HLG.

And in September 2013 the Heads of European Statistical Offices (DGINS Directors General of the National Statistical Institutes) adopted the Scheveningen Memorandum.

Scheveningen Memorandum
Big Data and Official Statistics

‘The DGINS
CONSIDERING
1. Recent innovations in the information and communication technologies have been leading to an increasing degree of digitization of economies and societies at all levels that offer new opportunities for the compilation of statistics.
2. The use of Big Data for statistical purposes challenges the European Statistical System to effectively address a variety of issues.
3. The demand for timely and cost efficient production of high quality statistical data increases, as well the need for new solutions to declining response levels.
4. Official statistics should incorporate as much as possible all potential data sources, including Big Data, into their conceptual design.
5. The distinguishing aspect of many Big Data sources is that they are not confined to national borders and, as such, represent unique opportunities for collaboration at European level as well as on global level.
6. Many European initiatives have a link to Big Data, including the European
Commission’s ambition for developing a strategy for the European data value chain, the on-going EU Data Protection reform and the Horizon2020 program.
7. The implementation of new methods of production of European statistics represents an objective of the European Statistical Programme 2013-2017 (1) and aims at efficiency gains and quality improvements, including increased timeliness.

(1) Regulation (EU) No 99/2013 of the European Parliament and of the Council of 15 January 2013 on the European statistical programme 2013-17, OJ L 39, 9.2.2013, p. 12–29

The DGINS
1. Acknowledge that Big Data represent new opportunities and challenges for Official Statistics, and therefore encourage the European Statistical System and its partners to effectively examine the potential of Big Data sources in that regard.
2. Recognise that Big Data is a phenomenon which is impacting on many policy areas. It is therefore essential to develop an ‘Official Statistics Big Data strategy‘ and to examine the place and the interdependencies of this strategy within the wider context of an overall government strategy at national as well as at EU level.
3. Recognise that the implications of Big Data for legislation especially with regard to data protection and personal rights (e.g. access to Big Data sources held by third parties) should be properly addressed as a matter of priority in a coordinated manner.
4. Note that several NSIs are currently initiating or considering different uses of Big Data in a national context. There is a momentum to share experiences obtained from concrete Big Data projects and to collaborate within the ESS and beyond, on a global level.
5. Recognise that developing the necessary capabilities and skills to effectively explore Big Data is essential for their integration into the European Statistical System. This requires systematic efforts like appropriate training courses and establishing dedicated communities including academics for sharing experiences and best practice.
6. Acknowledge that the multidisciplinary character of Big Data requires synergies and partnerships to be effectively built with experts and stakeholders from various
domains including government, academics and owners of private data sources.
7. Acknowledge that the use of Big Data in the context of official statistics requires new developments in methodology, quality assessment and IT related issues. The European Statistical System should make a special effort to supports these
developments.
8. Agree on the importance of following up the implementation of this memorandum by adopting an ESS action plan and roadmap by mid-2014 that should be further integrated into the Statistical Annual Work Programmes of Eurostat.’
.

ESS Big Data Event Roma 2014

And now comes the ESS Big Data Event from  31 March-1 April 2014. It offers keynotes and seminars dealing with several of the Scheveningen topics. See the programme and the concept paper.
2014-01-25_BIGDATAevent.

The European Data Forum (EDF) Athens 2014

Some days before the ESS Event the annual European Data Forum will take place in Athens, March 19-20 2014. Big Data will be a topic there, too.

2014-01-25_EDF.

A practical example: Using Social media analysis for statistics

During the DGINS meeting 2013 in the Netherlands some examples of Big Data usage for statistical insights were accessible as presentations.
The Dutch Statistical Office CBS and Coosto the social media monitoring and engagement tool dived into the digital ocean of social media data and made some comparisons.

2014-01-25_bigdatasocialmedia

2014-01-25_smanalysis

2014-01-25_ecoclmate.

2014-01-25_smunemployment.

Big data and official statisics: A conclusion from Els Rijnierse’s  presentation on traffic big data:

2014-01forget

Big Open Public Official Data – What’s next?

Big Data joined Open Data in last year’s discussions.What’s behind this new buzz word? What’s the impact on traditional official statistics?

‘…. there are the open data and big data communities who have emerged over the last 5 years. Through them, we’ve seen a huge increase in the use of public data, and more importantly, potential opportunities to use new data sources and techniques – that are often faster and cheaper – to supplement, or even replace some of the work of official statistics.
Can this really be done? Can we apply the same statistical rigour to big data sources and techniques to help meet the goals of official statistics’ ->

These questions get an answer at

World Bank’s  Big Data and Official Statistics Event’

on December 19th.

Big Data Event

Speaker

Paul Cheung will talk about “Big Data, Official Statistics and
Social Science Research: Emerging Data Challenges” offering an overview of . Robert Groves will respond to Paul’s presentation, sharing his thinking and experiences informed by his recent work at the US Census bureau.