There is no New Thing under the Sun – Yes and No

Twitter reminded me that there’s #NTTS2017 going on, Eurostat’s biennial scientific conference on New Techniques and Technologies for Statistics (NTTS).

The opening session also focused on official statistics and its actual and future role in a world of data deluge and alt-facts. What will be Official Statistics in 30 years?
In Diego Kuonen’s presentation and discussion on ‘Big Data, Data Science, Machine Intelligence and Learning’ I could hear an answer to this question reminding me of a text in the Bible: “… that [thing] which is done is that which shall be done: and there is no new thing under the sun”.
And this not to be understood in a static but in a dynamic interpretation:
The work statistical institutions are doing today will be the same that they will do tomorrow … BUT a work adapted to the changing context.
The algorithms (understood in a broader sense as ‘a set of rules that precisely defines a sequence of operations ->‘) used in collecting, analyzing and disseminating data will be changing, manual work will / must be replaced by automation, robots. But the core role of being a trusted source of data-based and (in all operations) transparently produced information serving professional decision making will remain.
The challenge will be that these institutions
– are known,
– are noted for their veracity,
– are consulted
and with all this can play their role.
In this fighting to be heard humans will always play a decisive part.
That’s a clear message (as I understood it) of a data scientist looking ahead.
.
PS. A step towards automation consists of preparing and using linked data. See the NTTS 2017 satellite session “Hands-on workshop on Linked Open Statistical Data (LOD)”

And now: Semantic Statistics (SemStats)

Official Statistics has a long tradition in creating and providing high-quality metadata. And the Semantic Web needs just this: metadata!

So it’s not surprising that these two find together, more and more.
A special workshop will be organized during the The 12th International Semantic Web Conference ISWC, 21-25 October 2013, Sydney, Australia.

It is the 1st International Workshop on Semantic Statistics (SemStats 2013) organized by Raphaël Troncy (EURECOM), Franck Cotton (INSEE), Richard Cyganiak (DERI), Armin Haller(CSIRO) and Alistair Hamilton (ABS).

ISWC 2013 is the premier international forum for the Semantic Web / Linked Data Community. Here, scientists, industry specialists, and practitioners meet to discuss the future of practical, scalable, user-friendly, and game changing solutions.’

The workshop summary

How to publish linked statistics? And: How to use linked data for statistics? These are the key questions of this workshop.

‘The goal of this workshop is to explore and strengthen the relationship between the Semantic Web and statistical communities, to provide better access to the data held by statistical offices. It will focus on ways in which statisticians can use Semantic Web technologies and standards in order to formalize, publish, document and link their data and metadata.

The statistics community faces sometimes challenges when trying to adopt Semantic Web technologies, in particular:

  • difficulty to create and publish linked data: this can be alleviated by providing methods, tools, lessons learned and best practices, by publicizing successful examples and by providing support.
  • difficulty to see the purpose of publishing linked data: we must develop end-user tools leveraging statistical linked data, provide convincing examples of real use in applications or mashups, so that the end-user value of statistical linked data and metadata appears more clearly.
  • difficulty to use external linked data in their daily activity: it is important do develop statistical methods and tools especially tailored for linked data, so that statisticians can get accustomed to using them and get convinced of their specific utility.’

A tradition

RDF, Triples, Linked Data … these are topics statisticians already treated and adapted. But rather on an individual track and not as an organization.

This blog has a lot of information about Semantic Web and Official Statistics, about 40 posts since 2007.

See this post (2012) with a recent paper from Statistics Switzerland (where a study on publishing linked data has just been finished in collaboration with the Bern University of Applied Sciences): https://blogstats.wordpress.com/2012/10/15/imaodbc-2012-and-the-winner-is/

Or this (2009) about SDMX and RDF https://blogstats.wordpress.com/2009/10/27/sdmx-and-rdf-getting-acquainted/ or about LOD activities in 2009: https://blogstats.wordpress.com/2009/04/25/semantic-web-and-official-statistics/

Big Data, Open Data and Official Statistics

There are (at least) two big challenges official statistics will be faced with in the  next few years and which will possibly change its quasi-monoplistic position.

.

On the input side it’s Big Data

‘“Big Data” is a term used to describe massive information stores – generally measured in petabytes and exabytes – and also refers to the methods and technologies used to analyze these large data volumes.  The core principles of Big Data (data mining, analytics) have been around for some time, but recent technology has enabled the collection and analysis of previously unimaginable data volumes at extremely high speeds.’ So says for example SAP and gives some examples how  Big Data will change your life (big words and they show how big software and hardware players begin to occupy the field).

Official Statistics has already put this on the agenda! And so has the in United Nations Statistics Division’s (UNSD) Friday Seminar on Emerging Issues, 22 February 2013.

Some papers from this Seminar:

Gosse van der Veen Statistics Netherlands. High Level Group for the Modernization of Statistical Products and Services. Big Data: Big Opportunity!

2013-04-15_vanderveen-statcom2013

The High-Level Group for the Modernisation of Statistical Production and Services (HLG) established an informal Task Team of national and international experts, coordinated by the UNECE Secretariat. The Paper of this group gives an excellent overview of the topic: What Does “Big Data” mean for Official Statistics.

2013-04-15_HLG-BIGData-Paper

.

Andrew Wyckoff, Big Data for Policy,Development and Official Statistics, Directorate for Science, Technology & Industry. Organisation for Economic Co-operation and Development OECD (personal opinion).

2013-04-15_BigDataRoles-WykoffOECD

.

Aspects of Big Data and real-time analytics are provided in another paper by Global Pulse (an innovation initiative launched by the Executive Office of the United Nations Secretary-General): Big Data for Development: Opportunities & Challenges

2013-04-15_globalpulse

.

The discussion is launched and as mentions the HLG  paper: ‘To use Big data, statisticians are needed with a different mind-set and new skills. The processing of more and more data for official statistics requires statistically aware people with an analytical mind-set, an affinity for IT (e.g. programming skills) and a determination to extract valuable ‘knowledge’ from data. These so-called “data scientists” can be derived from various scientific disciplines.’

.

On the output side it’s (Linked) Open Data in combination with APIs

Open Data is not at all a new topic for Official Statistics. National Statistical Institutes were forerunners in openly providing data; organizations like UN or EUROSTAT went this way as well.

Several Open Data initiatives (USA, UK, France, EU …) consist mostly of data catalogues, and are in that sense also public relations initiatives. A large part of the data so provided consists of statistical data already available, often, on the website of the National Statistical Institute concerned. The EU portal, for instance, offers 5716 datasets  of statistical data from a total of 5893 (as of April 2013).

Further central questions are the licensing of data, 2013-04-20_CCBYas well as their availability in machine-readable formats.

Machine-readable statistical data, Application Programming Interfaces (APIs) to the data and especially Linked Open Data LOD (–> essentials, –>tutorial) open the way to creative applications and new models of presenting information.

2015-01-25_berners lee

An Europe-wide Linked Open Data (LOD2) project ‘was launched in September 2010 and will run for four years. It addresses exploitation of the web as a platform for data and information integration, and the use of semantic technologies to make government data more useable.’

Looking for third-party APPs

Data Providers are looking at applications or mashups made with their data  with much interest, and they are even sponsoring competitions and hack days (like Apps4EU) to stimulate the reuse of open data, especially from the public sector.

The most popular APP creator and statistical storyteller is Hans Roslings  with Gapminder. Rosling himself is a pioneer in fighting for open data.

http://www.youtube.com/watch?feature=player_embedded&v=jbkSRLYSojo

Changing paradigms

Open Data, Linked Open Data and APIs are changing the dissemination paradigm of statistical agencies. More people with new skills will do new things. Coding is becoming the new literacy, says i.e. Garrett Heath in his advice for his unborn daughter: ‘I was blown away that the buzz is not around mobile apps, but rather around using APIs. Ten years ago saw the creation of the social networking platforms. The past five years has been about accumulating the data. The next five years and beyond will be about interpreting that data. [My daughter will have access to] a boatload of interesting data sitting in accessible databases that is waiting to be exposed and interpreted with her [the programmer’s]) creativity.’

Storytelling with data

Storytelling based on data is less and less the domain of statistical agencies. Storytelling can access multiple (new) resources and take on new forms.  To satisfy the basic idea of an easily understandable and appealing presentation of statistical content, statistical institutions cannot avoid taking certain measures to improve their content and presentation. The “composer” must know how the music is to be played, that is as a quick, competent, qualitatively unique, reliable and indispensable data source.
But this presentation job can no longer be done on one’s own: cooperative partnerships are necessary and have already begun to some extent, both with partners outside statistical institutions and between such institutions. This discussion has been launched.

Statistical Storytelling revisited! More in a paper from IMAODBC Vilnius 2010:

2013-04-20_storytellingrevisited2010.

And this: Many small open data give big data insights

FORGET BIG DATA, SMALL DATA IS THE REAL REVOLUTION says Rufus Pollock co-Director of the Open Knowledge Foundation : ‘… the discussions around big data miss a much bigger and more important picture: the real opportunity is not big data, but small data. Not centralized “big iron”, but decentralized data wrangling. Not “one ring to rule them all” but “small pieces loosely joined”.’

small-data-640x120

Official Statistic’s SWOT

In the official statistics industry (an industry!) reflection and  collaboration are highly prioritized.

As an example: HLG-BAS.

What’s this? ‘The High-Level Group for Strategic Developments in Business Architecture in Statistics (HLG-BAS) was set up by the Bureau of the Conference of European Statisticians in 2010 to oversee and coordinate international work relating to the development of enterprise architectures within statistical organisations.’ More about HLG-BAS on UNECE statistics wikis.

And more about the Conference of European Statisticians CES:

Implement the HLG-BAS vision

HLG-BAS presents a very interesting paper for the 60th plenary session of the Conference of European Statisticians. It’s the ‘Strategy to implement the vision of the High-level Group for Strategic Developments in Business Architecture in Statistics‘.

This paper positions official statistics as part of the information industry:
‘The official statistics industry is part of a more extensive information industry. Within this wider information industry other players are claiming their place and statistical organisations cannot automatically assume that they will retain their current position and relevance.’ (point 5)

SWOT

And the paper summarizes in a short and impressive manner the Strengths, Weaknesses, Opportunities and Threats of Official Statistics. (point 9)

‘A SWOT (Strengths, Weaknesses, Opportunities, Threats) analysis was undertaken by Capgemini Consulting working for Statistics Netherlands to define the current situation of the official statistics industry assessing it from an international perspective. This analysis was based on existing information on the industry (including the vision of the HLG-BAS) complemented by interviews with members of the HLG-BAS (internal stakeholders), commercial organisations and government bodies (external stakeholders).

The results of this exercise are:

1. Strengths

(a) High quality with relevant and very strong statistical products over long term;
(b) Strong “brand value” of official statistics locally and internationally;
(c) Ability and ‘stamina’ to produce statistics for long-term records and consistency;
(d) International collaboration has started mainly because it is becoming too expensive for each NSO to individually change their tailor-made production processes and products.

2. Weaknesses

(a) A limited outside and “client-centric” view;
(b) Communication of products and results is often not good enough;
(c) Workforce and processes should be more agile to follow rapidly the changing needs of society;
(d) NSOs are not efficient enough in their processes and rely too much on human effort;
(e) The statistical industry as a whole has no clear silhouette or definition; international coherence is low;
(f) NSOs should provide more information about statistics, regarding both quality and other metadata;
(g) Top-level commitment to bring about the changes needed to align the statistical industry with the changing environment is not broadly understood as the key factor in this change process.

3. Opportunities

(a) In some specific statistical domains, cross-border data become more important (globalisation, enterprise groups, climate change). The work and products of NSOs should be expanded to explain what is happening on a multinational level;
(b) The “open data” movement may increase the sources available for official statistics;
(c) NSOs could collaborate (more) with (commercial) external parties;
(d) The official statistics industry could play a more active role regarding new and alternative data sources and collection methods;
(e) NSOs could be quality institutes that certify statistical inputs/outputs of other (commercial) parties;
(f) In the statistical domain the NSOs can lead when it comes to defining and maintaining international standards;
(g) Standardisation of production process (plug and play technology) and products of NSOs to increase international comparison and quality control of products;
(h) Consolidation of NSOs roles as public supplier of trust and quality;
(i) International coherence and the willingness to form a more closely knit statistical community or industry are beginning to materialize;
(j) Specialisation of NSOs in certain products to increase efficiency in the production process of these products. This specialisation in products could vary across countries and sectors to optimize the possibilities of specialisation.

4. Threats

(a) Other organisations are starting to create output NSOs used to have a monopoly on;
(b) Reduced staff and budget cuts;
(c) Weak/fragile coordination of international collaboration activities;
(d) Society wants more timeliness in statistics, both in disseminating existing products and in developing new products;
(e) Some government clients do not distinguish between official and non-official data sources for ad hoc questions, as long as it meets their purpose;
(f) New technologies like open data can seduce NSOs into losing focus of their core business.’

Official Statistics: Identify Common Challenges

In his Blog Director Groves of the US Census Bureau informs about an important discussion among his colleagues (thanks Xavier for this hint):

‘Several weeks ago, at the initiative of Brian Pink, the Australian statistician, leaders of the government statistical agencies from Australia, Canada, New Zealand, United Kingdom, and the United States held a summit meeting to identify common challenges and share information about current initiatives. ..

… They perceive the same likely future challenges for central government statistical agencies, and they are making similar organizational changes to prepare for the future. …

Ingredients of the future vision:

  1. The volume of data generated outside the government statistical systems is increasing much faster than the volume of data collected by the statistical systems; almost all of these data are digitized in electronic files.
  2. As this occurs, the leaders expect that relative cost, timeliness, and effectiveness of traditional survey and census approaches of the agencies may become less attractive.
  3. Blending together multiple available data sources (administrative and other records) with traditional surveys and censuses (using paper, internet, telephone, face-to-face interviewing) to create high quality, timely statistics that tell a coherent story of economic, social and environmental progress must become a major focus of central government statistical agencies.
  4. This requires efficient record linkage capabilities, the building of master universe frames that act as core infrastructure to the blending of data sources, and the use of modern statistical modeling to combine data sources with highest accuracy.
  5. Agencies will need to develop the analytical and communication capabilities to distill insights from more integrated views of the world and impart a stronger systems view across government and private sector information.
  6. There are growing demands from researchers and policy-related organizations to analyze the micro-data collected by the agencies, to extract more information from the data.

… In short, the five countries are actively inventing a future unlike the past, requiring new ways of thinking and calling for new skills.  The payoff sought is timelier, more trustworthy, and lower cost statistical information measuring new components of the society, economy, and environment, telling a richer story of our countries’ progress. ‘

Read the full blog post here: http://directorsblog.blogs.census.gov/2012/02/02/national-statistical-offices-independent-identical-simultaneous-actions-thousands-of-miles-apart/

Snapshots from the Census Years

Recently the ONS has published some new interactive content aimed at bringing together visualisation, narrative,audio,data table, images and animation into a single compact product. The Snapshots from the Census Years product is, in one sense, a logical continuation of the storytelling themes Armin talked about in his recent posting

Snapshots from the Census Years
Snapshots from the Census Years - integrating content

As we move from print to web, it becomes clear that we need not think of visualisation/narrative/data as such separate constructs.  Modern web content should allow us to integrate these forms so they can be authored together as a powerful composite product:  Clicking on a story in the text highlights the appropriate part of the graph.  Enabling audio allows you to follow the text while looking at the graph.  Each component allows and encourages further exploration of the other components.  Although only a small product aimed at promoting our 2011 Census, it really does suggest there is more scope for integrated outputs from official statistics producers.

Closely watched Office and much-debated GDP

Official Statistics become more and more closely watched by blogs. The Guardian DATA BLOG did it yesterday with ONS’  GDP data delay.

This was the occasion to launch once again the discussion about GDP and to show some alternatives like ranking countries in measures such as wellbeing or happiness.

Different ways of ranking the G20 countries (see also this earlier post about A happy GDP)