There’s a lot of discussion and also big hope about what is called Big Data and the role of Data Scientists. Will Data Scientists help us to create a better future?
Yes and no. ‘Making predictions about unprecedented futures requires more than data, it requires theory-driven models that envision futures that do not exist in data. Fortunately, digital tools also assist us in envisioning futures that have never been.’
This talk by Martin Hilbert published 13.01.2015 explains why Data are from the past and are not enough.
‘During his 15 years at the United Nations Secretariat, Martin Hilbert assisted governments to take advantage of the digital revolution. When the ‘big data’ age arrived, his research was the first to quantify the historical growth of how much technologically mediated information there actually is in the world.’
‘After joining the faculty of the University of California, Davis, he had more time to think more deeply about the theoretical underpinning and fundamental limitations of the ‘big data’ revolution.’
‘Martin Hilbert holds doctorates in Economics and Social Sciences, and in Communication, and has provided hands-on technical assistance to Presidents, government experts, legislators, diplomats, NGOs, and companies in over 20 countries.
At the University of California, Davis, Martin thinks about the fundamental theories of how digitization affects society.’ More http://www.martinhilbert.net
There are (at least) two big challenges official statistics will be faced with in the next few years and which will possibly change its quasi-monoplistic position.
.
On the input side it’s Big Data
‘“Big Data” is a term used to describe massive information stores – generally measured in petabytes and exabytes – and also refers to the methods and technologies used to analyze these large data volumes. The core principles of Big Data (data mining, analytics) have been around for some time, but recent technology has enabled the collection and analysis of previously unimaginable data volumes at extremely high speeds.’ So says for example SAP and gives some examples how Big Data will change your life (big words and they show how big software and hardware players begin to occupy the field).
Official Statistics has already put this on the agenda! And so has the in United Nations Statistics Division’s (UNSD) Friday Seminar on Emerging Issues, 22 February 2013.
Some papers from this Seminar:
Gosse van der Veen Statistics Netherlands. High Level Group for the Modernization of Statistical Products and Services. Big Data: Big Opportunity!
Aspects of Big Data and real-time analytics are provided in another paper by Global Pulse (an innovation initiative launched by the Executive Office of the United Nations Secretary-General): Big Data for Development: Opportunities & Challenges
.
The discussion is launchedand as mentions the HLG paper: ‘To use Big data, statisticians are needed with a different mind-set and new skills. The processing of more and more data for official statistics requires statistically aware people with an analytical mind-set, an affinity for IT (e.g. programming skills) and a determination to extract valuable ‘knowledge’ from data. These so-called “data scientists” can be derived from various scientific disciplines.’
.
On the output side it’s (Linked) Open Data in combination with APIs
Open Data is not at all a new topic for Official Statistics. National Statistical Institutes were forerunners in openly providing data; organizations like UN or EUROSTAT went this way as well.
Several Open Data initiatives (USA, UK, France, EU …) consist mostly of data catalogues, and are in that sense also public relations initiatives. A large part of the data so provided consists of statistical data already available, often, on the website of the National Statistical Institute concerned. The EU portal, for instance, offers 5716 datasets of statistical data from a total of 5893 (as of April 2013).
Further central questions are the licensing of data, as well as their availability in machine-readable formats.
Machine-readable statistical data, Application Programming Interfaces (APIs) to the data and especially Linked Open Data LOD (–> essentials, –>tutorial) open the way to creative applications and new models of presenting information.
An Europe-wide Linked Open Data (LOD2) project ‘was launched in September 2010 and will run for four years. It addresses exploitation of the web as a platform for data and information integration, and the use of semantic technologies to make government data more useable.’
Looking for third-party APPs
Data Providers are looking at applications or mashups made with their data with much interest, and they are even sponsoring competitions and hack days (like Apps4EU) to stimulate the reuse of open data, especially from the public sector.
The most popular APP creator and statistical storyteller is Hans Roslings with Gapminder. Rosling himself is a pioneer in fighting for open data.
Open Data, Linked Open Data and APIs are changing the dissemination paradigm of statistical agencies. More people with new skills will do new things. Coding is becoming the new literacy, says i.e. Garrett Heath in his advice for his unborn daughter: ‘I was blown away that the buzz is not around mobile apps, but rather around using APIs. Ten years ago saw the creation of the social networking platforms. The past five years has been about accumulating the data. The next five years and beyond will be about interpreting that data. [My daughter will have access to] a boatload of interesting data sitting in accessible databases that is waiting to be exposed and interpreted with her [the programmer’s]) creativity.’
Storytelling with data
Storytelling based on data is less and less the domain of statistical agencies. Storytelling can access multiple (new) resources and take on new forms. To satisfy the basic idea of an easily understandable and appealing presentation of statistical content, statistical institutions cannot avoid taking certain measures to improve their content and presentation. The “composer” must know how the music is to be played, that is as a quick, competent, qualitatively unique, reliable and indispensable data source.
But this presentation job can no longer be done on one’s own: cooperative partnerships are necessary and have already begun to some extent, both with partners outside statistical institutions and between such institutions. This discussion has been launched.
And this: Many small open data give big data insights
FORGET BIG DATA, SMALL DATA IS THE REAL REVOLUTION says Rufus Pollock co-Director of the Open Knowledge Foundation : ‘… the discussions around big data miss a much bigger and more important picture: the real opportunity is notbig data, but small data. Not centralized “big iron”, but decentralized data wrangling. Not “one ring to rule them all” but “small pieces loosely joined”.’