Big Data and Official Statistics

2015-06-05_iaos

 

Big Data is THE topic of the freshly published Statistical Journal of the IAOS – Volume 31, issue 2.

.

Five articles deal with Big-Data topics:

In the editorial Fride Eeg-Henriksen and Peter Hackl give an overview of the Big-Data discussions hold in Official Statistics. Here some remarks taken from this editorial:

‘In spite of the wide interest in and the great popularity of Big Data, no clear and commonly accepted definition of the notion Big Data could be established so far [3]. Modern technological, social and economic developments including the growth of smart devices and infrastructure, the growing availability and efficiency of the internet, the appeal of social networking sites and the prevalence and ubiquity of IT systems are resulting in the generation of huge streams of digital data. The complexities of the structure and dynamic of corresponding datasets, the challenges in developing the suitable software tools for data analytics, generally the diversity of potentials in making use of the masses of available data make it difficult to find a suitable and generally applicable definition. The often mentioned characterization of Big Data by 3 – or more – Vs (volume, velocity, variety – as well as veracity and value), does not capture the enormous scope of the corresponding data sets and the extensive potentials of making use of these data. A highly relevant aspect is that Big Data are so large and complex that traditional database management tools and data processing applications are not feasible and efficient means. This is illustrated by a look at the categories of data sources which typically are seen in the context of Big Data: Such data sources may be
– Administrative, e.g., medical records, insurance records, bank records.
– Commercial transactions, e.g., credit card transactions, scanners in supermarkets.
– Sensors, e.g., satellite imaging, environmental sensors, road sensors.
– Tracking devices, e.g., tracking data from mobile telephones, GPS
– Tracks of human behaviour, e.g., online searches, online page viewing.
– Documentation of opinion, e.g., comments posted in social media.

……….

‘A general conclusion from the set of articles in this Special Section can be drawn as follows: The feasibility and the potentials of using Big Data in official statistics have to be assessed from case to case. In some areas the use of Big Data sources has already proved to be feasible. The choice of the appropriate IT technology and statistical methods must be specific for each situation. Also issues like the representativity and the quality of the resulting statistics, or the confidentiality and the risk of disclosure of personal data need to be assessed individually for each case. There is no doubt that Big Data will have a place in the future of official statistics, helping to reduce costs and burden on respondents. However, major efforts will be necessary to establish the routine wise use of Big Data, and new approaches will be needed for assessing all aspects of quality.’

[3] C. Reimsbach-Kounatze, (2015), The Proliferation of “Big Data” and Implications for Official Statistics and Statistical Agencies: A Preliminary Analysis”, OECD Digital Economy Papers, No. 245, OECD Publishing. http://dx.doi.org/10.1787/5js7t9wqzvg8-en

 

See also: Big Data in Action May 2015

 

A new animated population pyramid for Germany 1950–2050

Today Destatis released a new projection of Germany’s population by 2060 accompanied by an all new animated population pyramid. It is the first population pyramid that really moves upwards.

poppyr-mobile

In case the above doesn’t display in your preferred language, here are the distinct links for english, french, spanish, russian, german.

The pasted screenshot is the mobile version you will automatically see on small screens. There is much more to explore on larger displays, as birthyears are labeled directly, you can lock an outline for comparison and there are four different variants to choose from, so that you can judge the outcome with different assumptions.

Apart from starting the animation with the (Play) button you can navigate through the years by mousewheel, left/right cursor keys or on touch devices directly by swiping up or down on the pyramid.

Visual first – Visual.ONS

Visual representations of statistical data are attractive – and worth to build an own website with nothing but (info)graphs and maps … and more behind it!

ONS did it:

2015-04-25_VisualONS-livng

‘The Office for National Statistics (ONS) is the UK’s largest independent producer of official statistics and is the recognised national statistical institute for the UK. Visual.ONS is a website exploring new approaches to making ONS statistics accessible and relevant to a wide public audience. The site supports the UK Statistic Authority’s publicly stated intention of “making data, statistics and analysis more accessible, engaging and easier to understand”.
The site will be a home to a variety of different content, including infographics, interactive visualisations and short analysis, exploring data from a range of ONS outputs. It is neither a replacement nor a rebuild of the current ONS website which continues to be the home of ONS’ regular outputs and statistics.’

So far the statement of ONS.

.

More than pictures

Behind the graphs you can find lots of interactive tools.
A calculator to find out life expectancy is one example:

2015-04-25_lvecalclator

Great! And the graphs and interactive tools can be embedded into other websites.

Thematic Maps Revisited

A month ago ONS did an excellent job in hosting The Graphical Web 2014 conference in Winchester/UK under the theme “Visual Storytelling”. You should check the summary at the conference website and go from there.

It was a great event to meet people from a variety of backgrounds, e.g. academia, the media and of course NSIs (Statistics Norway, Statistics Austria to name just a few).

All presentations were taped and are currently released as they are processed. Let me take this opportunity to pitch my own presentation of the German Census Map

I guess we’ll be talking about this conference for a while and there are many presentations worth watching already posted. For example Alan Smith from the ONS shows us why learning to programm graphics is a worthwhile endeavour for statisticians who wouldn’t regard themselves as programmers.

His presentation discusses the practicalities of developing that capability in house as a key part of the corporate skills agenda. It borrows heavily from examples and lessons learned from the 7 year lifespan of the ONS Data Visualisation Centre.

Social Media Hub

Complexity

In earlier times it was so  clear and easy: You got a print publicationn or a letter with the latest news from a Statistical Agency.

Then came the Web and you switched more and more over to this information source. With links, bookmarks, RSS feeds.

And the came Social Media and you had to monitor more and more information sources. With the risk to be definitively  lost in Social media and information offers.

.

Help is around the Corner: Social media Hub

The example of IMF shows how Agencies are helping their customers to be always on top of their information and not to loose any information published in one of these multiple social media channels. The Social Media Hub! And it’s the good old Web giving you this overview!

2014-08-25_IMFSMHubIMF SM Hub presents 8 social media with direct view in a window and links to some more media:

  1. Twitter
  2. Facebook
  3. Google+
  4. IMF Blogs
  5. Weibo
  6. YouTube
  7. Flickr
  8. Podcasts
  9. LinkedIn
  10. IMF Apps and
  11. IMF RSS Feeds

Media Hub

But what’s with the other resources? I am waiting for the Media Hub bringing all possible channels together (like Web, publications …). Will I get it from my postman?

 

Making Statistical Data Meaningful

Part 4 of UNECE’s series “Making Data Meaningful” is about to be published in 2014.

Its title: ‘A Guide to Statistical Literacy’.

UNECE-Part4‘The 4″‘ installment of “Making Data Meaningful” series outlines current initiatives in the field of statistical literacy and provides recommendations for working with different groups of users on improving their ability to comprehend statistical information. The guide is intended as a practical tool for managers, statisticians, and communication and media relations officers in statistical organizations’

.

‘The Making Data Meaningful guides are intended as a practical tool to help managers, statisticians and media relations officers in statistical organizations use text, tables, charts, maps and other devices to bring statistics to life for non-statisticians.’

All parts can be found here, in English, Spanish, Croatiian and Japanese! ( Part 4 as final draft only (June 2014):

UNECE-MDM

For Everyone

The World Wide Web’s birthday! (webat25.org). And a greeting address from Tim Bernetrs-Lee:

2014-03-13_webat25-video

‘By working together, I believe we can build a Web that truly is for everyone: one that is accessible to all, from any device, and one that empowers all of us to achieve our dignity, rights and potential as humans. Let’s use this landmark birthday as a crucial step on that path.’ (T. B-L)

World Wide! World Wide?

.

2014-03-13_webwewant‘ the actions of some companies and some governments threaten our fundamental freedoms on the Web.’ https://webwewant.org

A segmented WWW is no longer a WWW.

 

Some events in WWW’s curriculum

To celebrate the 25th year of the Web,  in the opera blog has assembled 25 facts he thinks to be of interest:

1989: Tim Berners-Lee and the team at CERN invented the World Wide Web.

1990: The Archie Search Engine was created at McGill University. It is considered to be the first internet search engine.

1991The first ever website went live.

1992: The phrase surfing the internet was coined by Jean Armour Polly.

1993: The Mosaic web browser, often described as the first graphical web browser, was launched.

1994: Yahoo! was launched. It first came out as a web directory.

1995: Opera was born! We turned 18 years old last year.

1996: Nokia released its first phone with internet access, the Nokia 9000 Communicator.

1997BabelFish, the first automatic-translation application, was launched.

1998: The Google search engine was born.

1999: Napster was launched, changing the way we find and consume music online.

2000:  By 2000, over 20 million websites were up and running.

2001: The first Wikipedia article was published.

2002: Social-networking site Friendster was launched. Today, Friendster now runs as a social-gaming site.

2003: Skype, a voice and video-calling service, was released.

2004: Facebook went online.

2005: The first ever video on YouTube was uploaded. It has a guy and a zoo in it.

2006: Twttr was launched. It’s now called Twitter today and is one of the most used social media services.

2007: Apple released the iPhone, changing the way people use mobile browsers.

2008: Dropbox was launched at TechCrunch50.

2009: WhatsApp, a cross-platform mobile messaging app, was launched.

2010: Instagram was launched. Now, we could finally take pictures of our food.

2011: Google+ was released. It was first launched as an invite-only service.

2012: More than 115,000 websites participated in the largest online protest in history. It was aprotest against internet censorship bills SOPA and PIPA.

2013: Internet.org, a project by Facebook in partnership with Opera and other technology companies, was announced. Its aim is to connect the next 5 billion people online.

2014: The year has just begun! What do you think is the most important event so far?

Frightfully Boring? Not at all!

Statistical information is frightfully boring, it doesn’t regard me as a person! Yes and no. Yes, official statistics is not interested in a single person, data protection forbids this. But no, on a aggregated level we can find a lot of knowledge about our own situation. Interactive applications offer this.

There’s a quite new own from UNESCO Institute for Statistics (UIS). ‘UNESCO is making data count for the millions of children still being denied their right to education by benchmarking and monitoring global progress on education-related Millennium Development Goals and Education for All targets.’
Mind the Gap, a new online tool highlights the situation of girls and women in education.

2013-10-20_unesco.

2013-10-20_UNESCO-CH-profileABS Census Spotlight

But my favourite from a presentation view is still the Australian Census Spotlight, the new version with a personal infographic and social media link. Why? Perhaps it’s because there’s a speaker helping me navigating the information and leading me to some insights about the group of persons I belong to.

2013-10-20_censussptlight

Big Data, Open Data and Official Statistics

There are (at least) two big challenges official statistics will be faced with in the  next few years and which will possibly change its quasi-monoplistic position.

.

On the input side it’s Big Data

‘“Big Data” is a term used to describe massive information stores – generally measured in petabytes and exabytes – and also refers to the methods and technologies used to analyze these large data volumes.  The core principles of Big Data (data mining, analytics) have been around for some time, but recent technology has enabled the collection and analysis of previously unimaginable data volumes at extremely high speeds.’ So says for example SAP and gives some examples how  Big Data will change your life (big words and they show how big software and hardware players begin to occupy the field).

Official Statistics has already put this on the agenda! And so has the in United Nations Statistics Division’s (UNSD) Friday Seminar on Emerging Issues, 22 February 2013.

Some papers from this Seminar:

Gosse van der Veen Statistics Netherlands. High Level Group for the Modernization of Statistical Products and Services. Big Data: Big Opportunity!

2013-04-15_vanderveen-statcom2013

The High-Level Group for the Modernisation of Statistical Production and Services (HLG) established an informal Task Team of national and international experts, coordinated by the UNECE Secretariat. The Paper of this group gives an excellent overview of the topic: What Does “Big Data” mean for Official Statistics.

2013-04-15_HLG-BIGData-Paper

.

Andrew Wyckoff, Big Data for Policy,Development and Official Statistics, Directorate for Science, Technology & Industry. Organisation for Economic Co-operation and Development OECD (personal opinion).

2013-04-15_BigDataRoles-WykoffOECD

.

Aspects of Big Data and real-time analytics are provided in another paper by Global Pulse (an innovation initiative launched by the Executive Office of the United Nations Secretary-General): Big Data for Development: Opportunities & Challenges

2013-04-15_globalpulse

.

The discussion is launched and as mentions the HLG  paper: ‘To use Big data, statisticians are needed with a different mind-set and new skills. The processing of more and more data for official statistics requires statistically aware people with an analytical mind-set, an affinity for IT (e.g. programming skills) and a determination to extract valuable ‘knowledge’ from data. These so-called “data scientists” can be derived from various scientific disciplines.’

.

On the output side it’s (Linked) Open Data in combination with APIs

Open Data is not at all a new topic for Official Statistics. National Statistical Institutes were forerunners in openly providing data; organizations like UN or EUROSTAT went this way as well.

Several Open Data initiatives (USA, UK, France, EU …) consist mostly of data catalogues, and are in that sense also public relations initiatives. A large part of the data so provided consists of statistical data already available, often, on the website of the National Statistical Institute concerned. The EU portal, for instance, offers 5716 datasets  of statistical data from a total of 5893 (as of April 2013).

Further central questions are the licensing of data, 2013-04-20_CCBYas well as their availability in machine-readable formats.

Machine-readable statistical data, Application Programming Interfaces (APIs) to the data and especially Linked Open Data LOD (–> essentials, –>tutorial) open the way to creative applications and new models of presenting information.

2015-01-25_berners lee

An Europe-wide Linked Open Data (LOD2) project ‘was launched in September 2010 and will run for four years. It addresses exploitation of the web as a platform for data and information integration, and the use of semantic technologies to make government data more useable.’

Looking for third-party APPs

Data Providers are looking at applications or mashups made with their data  with much interest, and they are even sponsoring competitions and hack days (like Apps4EU) to stimulate the reuse of open data, especially from the public sector.

The most popular APP creator and statistical storyteller is Hans Roslings  with Gapminder. Rosling himself is a pioneer in fighting for open data.

http://www.youtube.com/watch?feature=player_embedded&v=jbkSRLYSojo

Changing paradigms

Open Data, Linked Open Data and APIs are changing the dissemination paradigm of statistical agencies. More people with new skills will do new things. Coding is becoming the new literacy, says i.e. Garrett Heath in his advice for his unborn daughter: ‘I was blown away that the buzz is not around mobile apps, but rather around using APIs. Ten years ago saw the creation of the social networking platforms. The past five years has been about accumulating the data. The next five years and beyond will be about interpreting that data. [My daughter will have access to] a boatload of interesting data sitting in accessible databases that is waiting to be exposed and interpreted with her [the programmer’s]) creativity.’

Storytelling with data

Storytelling based on data is less and less the domain of statistical agencies. Storytelling can access multiple (new) resources and take on new forms.  To satisfy the basic idea of an easily understandable and appealing presentation of statistical content, statistical institutions cannot avoid taking certain measures to improve their content and presentation. The “composer” must know how the music is to be played, that is as a quick, competent, qualitatively unique, reliable and indispensable data source.
But this presentation job can no longer be done on one’s own: cooperative partnerships are necessary and have already begun to some extent, both with partners outside statistical institutions and between such institutions. This discussion has been launched.

Statistical Storytelling revisited! More in a paper from IMAODBC Vilnius 2010:

2013-04-20_storytellingrevisited2010.

And this: Many small open data give big data insights

FORGET BIG DATA, SMALL DATA IS THE REAL REVOLUTION says Rufus Pollock co-Director of the Open Knowledge Foundation : ‘… the discussions around big data miss a much bigger and more important picture: the real opportunity is not big data, but small data. Not centralized “big iron”, but decentralized data wrangling. Not “one ring to rule them all” but “small pieces loosely joined”.’

small-data-640x120