Big Data, Open Data and Official Statistics

There are (at least) two big challenges official statistics will be faced with in the  next few years and which will possibly change its quasi-monoplistic position.


On the input side it’s Big Data

‘“Big Data” is a term used to describe massive information stores – generally measured in petabytes and exabytes – and also refers to the methods and technologies used to analyze these large data volumes.  The core principles of Big Data (data mining, analytics) have been around for some time, but recent technology has enabled the collection and analysis of previously unimaginable data volumes at extremely high speeds.’ So says for example SAP and gives some examples how  Big Data will change your life (big words and they show how big software and hardware players begin to occupy the field).

Official Statistics has already put this on the agenda! And so has the in United Nations Statistics Division’s (UNSD) Friday Seminar on Emerging Issues, 22 February 2013.

Some papers from this Seminar:

Gosse van der Veen Statistics Netherlands. High Level Group for the Modernization of Statistical Products and Services. Big Data: Big Opportunity!


The High-Level Group for the Modernisation of Statistical Production and Services (HLG) established an informal Task Team of national and international experts, coordinated by the UNECE Secretariat. The Paper of this group gives an excellent overview of the topic: What Does “Big Data” mean for Official Statistics.



Andrew Wyckoff, Big Data for Policy,Development and Official Statistics, Directorate for Science, Technology & Industry. Organisation for Economic Co-operation and Development OECD (personal opinion).



Aspects of Big Data and real-time analytics are provided in another paper by Global Pulse (an innovation initiative launched by the Executive Office of the United Nations Secretary-General): Big Data for Development: Opportunities & Challenges



The discussion is launched and as mentions the HLG  paper: ‘To use Big data, statisticians are needed with a different mind-set and new skills. The processing of more and more data for official statistics requires statistically aware people with an analytical mind-set, an affinity for IT (e.g. programming skills) and a determination to extract valuable ‘knowledge’ from data. These so-called “data scientists” can be derived from various scientific disciplines.’


On the output side it’s (Linked) Open Data in combination with APIs

Open Data is not at all a new topic for Official Statistics. National Statistical Institutes were forerunners in openly providing data; organizations like UN or EUROSTAT went this way as well.

Several Open Data initiatives (USA, UK, France, EU …) consist mostly of data catalogues, and are in that sense also public relations initiatives. A large part of the data so provided consists of statistical data already available, often, on the website of the National Statistical Institute concerned. The EU portal, for instance, offers 5716 datasets  of statistical data from a total of 5893 (as of April 2013).

Further central questions are the licensing of data, 2013-04-20_CCBYas well as their availability in machine-readable formats.

Machine-readable statistical data, Application Programming Interfaces (APIs) to the data and especially Linked Open Data LOD (–> essentials, –>tutorial) open the way to creative applications and new models of presenting information.

2015-01-25_berners lee

An Europe-wide Linked Open Data (LOD2) project ‘was launched in September 2010 and will run for four years. It addresses exploitation of the web as a platform for data and information integration, and the use of semantic technologies to make government data more useable.’

Looking for third-party APPs

Data Providers are looking at applications or mashups made with their data  with much interest, and they are even sponsoring competitions and hack days (like Apps4EU) to stimulate the reuse of open data, especially from the public sector.

The most popular APP creator and statistical storyteller is Hans Roslings  with Gapminder. Rosling himself is a pioneer in fighting for open data.

Changing paradigms

Open Data, Linked Open Data and APIs are changing the dissemination paradigm of statistical agencies. More people with new skills will do new things. Coding is becoming the new literacy, says i.e. Garrett Heath in his advice for his unborn daughter: ‘I was blown away that the buzz is not around mobile apps, but rather around using APIs. Ten years ago saw the creation of the social networking platforms. The past five years has been about accumulating the data. The next five years and beyond will be about interpreting that data. [My daughter will have access to] a boatload of interesting data sitting in accessible databases that is waiting to be exposed and interpreted with her [the programmer’s]) creativity.’

Storytelling with data

Storytelling based on data is less and less the domain of statistical agencies. Storytelling can access multiple (new) resources and take on new forms.  To satisfy the basic idea of an easily understandable and appealing presentation of statistical content, statistical institutions cannot avoid taking certain measures to improve their content and presentation. The “composer” must know how the music is to be played, that is as a quick, competent, qualitatively unique, reliable and indispensable data source.
But this presentation job can no longer be done on one’s own: cooperative partnerships are necessary and have already begun to some extent, both with partners outside statistical institutions and between such institutions. This discussion has been launched.

Statistical Storytelling revisited! More in a paper from IMAODBC Vilnius 2010:


And this: Many small open data give big data insights

FORGET BIG DATA, SMALL DATA IS THE REAL REVOLUTION says Rufus Pollock co-Director of the Open Knowledge Foundation : ‘… the discussions around big data miss a much bigger and more important picture: the real opportunity is not big data, but small data. Not centralized “big iron”, but decentralized data wrangling. Not “one ring to rule them all” but “small pieces loosely joined”.’


Country Portraits – Open and Embedabble

Looking for important statistical indicators of European countries? Comparing these countries? Taking the application to your own website? Making a brochure of it?

All this is provided by a newly designed application on Statistic Switzerland’s portal.




And download all countries as a brochure


Open Data

The Source Data (from Eurostat and Swiss Statistics) are available as an EXCEL file: So data are open and the app made from these data is open, too. It provides selecting and embedding and also the output of all indicators as a PDF file. It may also be embedded into third party websites or other apps can be written by other people.


App made with a CMS

This Portrait-App is one of several Apps of the same flavour. There are also portraits of the 26 Swiss Cantons, the biggest Cities and and the (more than)  2500 Communes.


A Content Management System helps building these Portrait-Apps once the data are in correct shape. And this in a very short time (hours).

API and Apps: An example fom official statistics

An example of an API access to statistical data

The U.S. Census Bureau  now offers some of its public data in machine-readable format. This is done via an Application Programming Interface (“API”).
Based on this API an App has been developed helping to query data from the Cenus 2010:

No data without legal clarification. The Census Bureau does it like follows:

You may use the Census Bureau API to develop a service or service to search, display, analyze, retrieve, view and otherwise “get” information from Census Bureau data.
All services, which utilize or access the API, should display the following notice prominently within the application: “This product uses the Census Bureau Data API but is not endorsed or certified by the Census Bureau.” You may use the Census Bureau name in order to identify the source of API content subject to these rules. You may not use the Census Bureau name, or the like to imply endorsement of any product, service, or entity, not-for-profit, commercial or otherwise.’

Open Government Data Benchmark: FR, UK, USA

Finally there’s a very interesting comparison of OGD in three leading countries.

qunb did it . Have a look at this presentation.

1) There are lots of duplicates on OGD platforms


2) There are very few structured data yet



3) Apps are the real challenge

There are different strategies fostering the developmemt of Apps made with open data. The U.K. method seems to be one of the most productive


The presentation in French

Linked Data: It’s not a top-down system. Berners-Lee and OpenGov

There’s not much noise about Semantic Web these days. But in the fascinating and creative semantic-web niche activities go on.

Once more Tim Berners-Lee explains what Linked data are.


The 5-star system helps measuring i.e. how far or near open-gov data  are from being part of the Semantic Web.

Available on the web (whatever format), but with an open licence


Available as machine-readable structured data (e.g. excel instead of image scan of a table)


as (2) plus non-proprietary format (e.g. CSV instead of excel)


All the above plus, Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff


All the above, plus: Link your data to other people’s data to provide context



And there is a interesting example how Linked Data are published:

‘Linked data is data in which real-world things are given addresses on the web (URIs), and data is published about them in machine-readable formats at those locations. Other datasets can then point to those things using their URIs, which means that people using the data can find out more about something without that information being copied into the original dataset.

This page lists the sectors for which we currently publish linked data and some additional resources that will help you to use it. Most sectors have one or more SPARQL endpoints, which enable you to perform searches across the data; you can access these interactively on this site‘.


What’s the effect of open data? Some journals (like the Guardian) make ample use of open data, but there is no wide-spread activity or commitment or lots of evaluation studies to be seen. Infoweek just published an article about US open gov and found that there is a lot  to be done as only small groups seem to take notice of this government activity. ‘The most difficult part of open government may be getting the public to participate.   … the “if you build it, they will come” approach simply doesn’t work.’ (InformationWeek, Feb 21, 2011: Open Government Reality Check: Federal agencies are making progress on the Obama administration’s Open Government Directive, but there’s still a long way to go. Here’s our list of top priorities.)

Using facebook apps

As facebook captures more and more of the time users spend online, content providers are more often deciding to move to the continent of more than 500 mio users.

Facebook allows this by  integrating apps. Lots of companies specialise in this field and provide facebook-app-development services.

A much discussed facebook app is the one of the London School of Business and Finance LSBF.

LSBF goes where they think people interested in this school are.

Why not following this idea and put some statistical  literacy topics on facebook ?


And here some stats about facebook:

Bye-bye Browser (?)

For more and more online users the device of choice is a mobile device and for more and more of these users  ‘Apps are the Web and the Web is Apps”.

Applications (Apps)  for mobile devices can be downloaded and installed in seconds. These apps focus on certain needs and perhaps half a dozen of Apps meet the daily online demands for you and me.

With Apple’s planned App store for laptop and desktop computers  these devices join this philosophy, too.  So what about the future of Websurfing using classic browsers? And what about the future of complex Websites offering many levels of browser navigation and tons of pages delivering information?

The discussion (the fight) is under way and the users will decide.

For information suppliers like statistical agencies this issue is of huge importance.

How to ensure the mission for public information and democracy given such developments in the online world?

– with traditional websites?
– with (small) Apps (or Widgets) with specific, user-focused information portions?
– or both (for how long)?
– with integration into existing Apps or platforms where people are, like facebook or Google?

There are already today some interesting developments in statistics’ dissemination giving partial answers.

So have a look at:

CBS iPhone App (search CBS Statline in the iPhone App store)

And also some of the widgets like i.e.