Goodbye Statistical Yearbooks …. ?

028 Big Data, 030 User orientation, 032 Metadata, 033 Statistical literacy, 036 Databases, 09 Stat.Office / Organization

The end for the traditional statistical yearbooks – be they printed or as ebooks – is approaching gradually.

The German yearbook has recently been hit. The last edition had its farewell at a press conference on 30 October 2019:
“Digitisation is shaping the statistics of the 21st century. The expansion of our digital communication is necessary if we want to remain the leading provider of statistical information about Germany. We say goodbye to the Statistical Yearbook, which stood for our activities for almost seven decades. The yearbook goes, but the data remains. They are already available via our online services in greater abundance than ever before. ….
One thing is clear: Rigid reference books are hardly in demand today. The trend is towards up-to-date, digitally available information. The information is researched online.”
Source: Press conference destatis, 30 October 2019. Original in German.

Digi…. ?

The rationale for abolishing printed yearbooks is always the same: digit(al)ization continues, users have new needs and go online.

The three D’s.

What is meant by digit(al)isation?

‘Digitization essentially refers to taking analog information and encoding it into zeroes and ones so that computers can store, process, and transmit such information. …
We refer to digitalization as the way in which many domains of social life are restructured around digital communication and media infrastructures.”
Instead, … digital transformation … refers to the customer-driven strategic business transformation that requires cross-cutting organizational change as well as the implementation of digital technologies.

In the final analysis, therefore, we digitize information, we digitalize processes and roles that make up the operations of a business, and we digitally transform the business and its strategy. Each one is necessary but not sufficient for the next, and most importantly, digitization and digitalization are essentially about technology, but digital transformation is not. Digital transformation is about the customer.

Source: Forbes

Digital alternatives

There is no doubt that the Internet as a source of information is a priority, the first step does not go to the bookshelf, but digitally, to the smartphone, tablet or PC.
Digitization has taken place, everything is available in binary form.
And also digitalization in the form of digital types of information and communication: There are comprehensive websites of statistical institutions, some with more, some with less sensitive user guidance. And there are many interactive databases of these institutions, too.
When users get to these sources, they await some work to find their way around, searching databases, before a table, a file or a simple website appears on the (often too small) screen.

Table-based yearbooks

After the end of the German Statistical Yearbook, there is a comprehensive alternative offer for the content: More tables, graphs and methodological explanations can be found on the web – with a little more effort, not concentrated. it’s like leaving a small, manageable town and having to find your way around a big city. And it is no longer a physically tangible object, guaranteed to be accessible over a long period and no longer – as a book can be – a visible showpiece and image carrier of the institution.

Yearbooks with stories to tell

A specialty of traditional yearbooks is their texts. They offer a certain kind of storytelling. This is quite demanding because it is more than just boring retelling of table contents and it must not get involved in controversial or even politically colored explanations. Describing the context in the various thematic areas and pointing out remarkable developments make them stand out. They help to get a quick first orientation in the extensive data.

Here are a few examples of such storytelling yearbooks and how they – whether discontinued or not – have responded to the trend to digit(al)ization.

Canada till 2012

The Canadian Statistical Yearbook was an early standard for yearbooks that wanted to present a country and its international position in an attractive and widely understandable form.
‘Presented in almanac style, the 2012 Canada Year Book contains more than 500 pages of tables, charts and succinct analytical articles on every major area of Statistics Canada’s expertise. The Canada Year Book is the premier reference on the social and economic life of Canada and its citizens.
This publication has been discontinued as of April 2013. The last issue of this publication was November 2012.The Canada Year Book 2006 to 2012 is available online in html and pdf formats.’

After some changes, it was closed in 2013. There is no digital alternative, unless – similar to the German solution – there is a thematically ordered overview of data, analyses, and references

Netherlands

The Dutch statistical yearbook was early converted from a printed to a PDF version. Abolished under the title Yearbook, but then continued as Trends in the Netherlands in 2014.

The yearbook went, ‘Trends in the Netherlands’ came – even with more storytelling than before. To be found on the homepage.

Switzerland

The Swiss Statistical Yearbook is one of the last international editions still to be printed. And it is a comprehensive, multimedia, thematically organized reference work: infographics, extensive texts, tables, references in two languages and abstracts in two other languages make it widely accessible.

Digitalization has not passed this yearbook by either. Older editions can be consulted on the Office’s website and the text of the current yearbook is included as an introductory panorama in each of the thematic pages on the web.

The Panorama: An excerpt from the current printed yearbook, format pdf:

Eurostat

Most existing yearbooks entered the era of digit(al)ization entirely through file lists and interactive databases or through PDF versions. Eurostat has been going the other way for several years. The idea of a storytelling book functioning as a unit has been implemented digitally from the very beginning. And last but not least, with an educational intention that promotes statistical literacy. This edition is therefore also called Statistics Explained.

In each topic, this website finally leads to the all-embracing world of digital data and databases.

What else …. ?

Statistical yearbooks encounter digit(al)ization in very different ways: they disappear into (interactive) databases on the web, survive as PDF editions (more or less well integrated into websites) or celebrate a kind of resurrection in web-based book-like products.
The strengths of yearbooks (especially those based on storytelling) are thus more or less lost: For example, a professionally curated, guaranteed reliable, easily usable and explained introduction to the essential data topics in one place and guaranteed to be available for many years to come. And an ever more extensive and better presented world of data on the Internet has emerged. An accessible wealth of information, of which one could hardly dream a few years ago.

But no matter how developed this data offer may be, it still lacks some simplicity and quick access to the right data. Anyone who has ever searched for data on different topics and over different periods knows how frustrating this can be. Which in the mass of partly similar files is the right one? How can various topics be combined in databases? How can the different time series be combined?
Is it the right data for the question asked, can I use it without risk? Perhaps there are nuances in the method or definition of the data and they should not be compared with other data?

But often users don’t even come to the official sources, because the most common change in user behavior is googling. And the result may be a single figure or a large amount of links to very different sources

… digital transformation.

Statistical institutions are making great efforts in the field of digital innovation, as shown not least by the sometimes very attractive offerings. Many are working on so-called experimental statistics: Coding data faster and better with the help of artificial intelligence, creating and extracting indicators from big data and much more. All this should make the production of statistical data more efficient, less dependent on human intervention (and human error) and faster. In the field of data dissemination, such experiments are still lacking, at least to this day.

Are all these innovations the often mentioned digital transformation?
At best, they are elements of it.

What digital transformation can users dream of?
Perhaps that statistical information is produced in a rapid and uninterrupted process (like in a pipeline) and is provided with semantic information in such a way that a simple search over topics and periods delivers an unambiguous result and refers to important context information. That even in a digital transformation human intervention will still be necessary (at the latest in presentation, explanation, and support), is not a paradox: Perhaps the overall package of digital transformation also includes non-digital elements, dedicated print products that skilfully lead into the digital world.

The Challenge of Smart Data

025 Internet of Things (IoT), 028 Big Data, 09 Stat.Office / Organization

Official statistics have never been exempt from the changes taking place around them. Numerous organisations at national and international level are constantly dealing with it and it is always interesting to see what the current 2019 assessment of future challenges is.

One example

Here is an example: Kurt Vandenberghe from the EU Commission (Directorate A) in his closing speech at the conference on New Techniques and Technologies for Official Statistics (NTTS 2019).

He focuses on data collection – especially smart data – , necessary qualifications and possible support from AI. Dissemination, contact to data users and questions about the comprehensible presentation and correct use of the data are left out. And also no reference to the potential of linked data, with which more can be pulled out of existing sources.

The following text includes the last part of Vandenberghes speech with the conclusion. I have adjusted the layout a bit with highlights:

‘So how will the future look like?

I recently came across a statement on a Eurostat website that in the course of the third decade of this century “Most if not all data is expected to be organic, i.e. by-products of people’s activities, systems and things, including billions of smart devices connected to the internet”. In that new context there is a growing need to examine and make use of the potential of “B-to-G”, business to government data transfer. This involves data from social media, mobile phones, Internet of Things, etc. There should be a new role for statistical institutions, captured by the term
“smart statistics”.
I quote from the same Eurostat NTTS related page: “Smart Statistics can be seen as the future extended role of official statistics in a world impregnated with smart technologies.” Finally there is the issue of trusted smart statistics, again with an important role for official statistics, ensuring not only the validity and accuracy of the outputs, but also respecting data subjects’ privacy and protecting confidentiality.

Privacy and confidentiality
are a growing concern and we need more research on techniques and technologies helping to avoid misuses of data on individuals and enterprises.

I guess what we will see in the coming years is, however, not one technique replacing existing ones, but a
coexistence of and synergies between
established and new data sources
and techniques, of public and private ones, and of general and specialised providers that complement each other. This will include traditional questionnaire-based surveys, and administrative data sources, alongside new techniques such as big data. While some of these sources will provide basic structural information in high quality, others will provide more timely data on key trends.
What will be increasingly important is to have rich meta-information and knowledge about the quality of these sources and to guarantee and create trusted statistics, including trusted smart statistics.

And in all of this we cannot forget the role that
people with the right skills
will play. We saw already in the last few years that there is a strong growth in Europe in the demand for big data analysts and for managers who know how to deal with big data. This is only expected to grow further. To avoid a skills gap we will have to encourage young people to take up studies in these fields and educational institutions to provide corresponding courses. In the debate around “the future of work”(future technological change might endanger traditional jobs), there is one thing that is certain: the need for data analysts will grow further.

And I guess it is safe to say that they will be increasingly supported by Artificial Intelligence.
Artificial Intelligence
can help to make sense of increasingly large amounts of data, to check the validity and improve their quality, relieving statisticians from routine tasks. Artificial Intelligence could help us analysing data with greater scope, scale and speed. In fact, a lot of what I said before and what you have discussed during the conference relates – directly or indirectly – to artificial intelligence – although AI does not seem very prominent on the programme. Paraphrasing Isaac Asimov’s quote about computers, we could say ‘I don’t fear AI, I fear the lack of it’. And maybe we should especially fear a lack of a European AI. Europe needs to lead on AI and develop AI that respects European values and makes the lives of Europeans better. The Commission is therefore increasing its annual investments in AI by 70% under the research and innovation programme Horizon 2020. It will reach EUR 1.5 billion for the period 2018-2020, and resources will grow further after 2020. ‘

Smart data and appropriate processes

Smart data is the challenge in data collection. What has to be considered, how the processes have to be adapted in order to connect the different data sources to the standard of public statistics – this is the subject of discussion. Here, too, are two examples (from 2018).

Are Current Frameworks in the Official Statistical Production Appropriate for the Usage of Big Data and Trusted Smart Statistics? Bertrand LOISON Vice-Director, Swiss Federal Statistical Office, Diego KUONEN CEO, Statoo Consulting & Professor of Data Science, University of Geneva

From the abstract:
‘As a sequential approach of statistical production, GSBPM (“Generic Statistical Business Process Model”) has become a well-established standard using deductive reasoning as analytics’ paradigm. For example, the first GSBPM steps are entirely focused on deductive reasoning based on primary data collection and are not suited for inductive reasoning applied to (already existing) secondary data (e.g. big data resulting, for example, from smart ecosystems). Taken into account the apparent potential of big data in the official statistical production, the GSBPM process needs to be adapted to incorporate both complementary approaches of analytics (i.e. inductive and deductive reasoning) … . ‘

[4] Kuonen D. (2018). Production Processes of Official Statistics & Data Innovation Processes Augmented by Trusted Smart Statistics: Friends or Foes? Keynote presentation given on May 15, 2018 at the conference “Big Data for European Statistics (BDES)” in Sofia, Bulgaria
(https://goo.gl/RMfpfB).

Towards a Reference Architecture for Trusted Smart Statistics
Fabio Ricciato, Michail Skaliotis, Albrecht Wirthmann, Kostas Giannakouris, Fernando Reis EUROSTAT Task Force on Big Data, 5, rue Alphonse Weicker, L 2721 Luxembourg

From the abstract:
‘ …. we outline the concept of Trusted Smart Statistics as the natural evolution of official statistics in the new datafied world, where traditional data sources (survey and administrative data) represent a valuable but small portion of the global data stock, much thereof being held in the private sector. In order to move towards practical implementation of this vision a Reference Architecture for Trusted Smart Statistics is required, i.e., a coherent system of technical, organisational and legal means combined to provide an articulated set of trust guarantees to all involved players. In this paper we take a first step in this direction by proposing selected design principles and system components …. .’

Statistics is Dead – Long Live Statistics

028 Big Data, 033 Statistical literacy

To be an expert in a thematic field!

Lee Baker wrote an article that will please the whole community of official statistics where specialists of many thematic fields (and not alone statisticians or mathematicians or … data scientists) are collecting, analysing, interpreting, explaining and publishing data.
It’s this core message that counts:
“… if you want to be an expert Data Scientist in Business, Medicine or Engineering” (or vice versa: An expert statistician in a field of official statistics like demography, economy, etc.) “then the biggest skill you’ll need will be in Business, Medicine or Engineering…. In other words, …. you really do need to be an expert in your field as well as having some of the other listed skills”

Here is his chain of arguments:

“Statistics is Dead – Long Live Data Science…

by Lee Barker

I keep hearing Data Scientists say that ‘Statistics is Dead’, and they even have big debates about it attended by the good and great of Data Science. Interestingly, there seem to be very few actual statisticians at these debates.

So why do Data Scientists think that stats is dead? Where does the notion that there is no longer any need for statistical analysis come from? And are they right?

Is statistics dead or is it just pining for the fjords?

I guess that really we should start at the beginning by asking the question ‘What Is Statistics?’.
Briefly, what makes statistics unique and a distinct branch of mathematics is that statistics is the study of the uncertainty of data.
So let’s look at this logically. If Data Scientists are correct (well, at least some of them) and statistics is dead, then either (1) we don’t need to quantify the uncertainty or (2) we have better tools than statistics to measure it.

Quantifying the Uncertainty in Data

Why would we no longer have any need to measure and control the uncertainty in our data?
Have we discovered some amazing new way of observing, collecting, collating and analysing our data that we no longer have uncertainty?
I don’t believe so and, as far as I can tell, with the explosion of data that we’re experiencing – the amount of data that currently exists doubles every 18 months – the level of uncertainty in data is on the increase.

So we must have better tools than statistics to quantify the uncertainty, then?
Well, no. It may be true that most statistical measures were developed decades ago when ‘Big Data’ just didn’t exist, and that the ‘old’ statistical tests often creak at the hinges when faced with enormous volumes of data, but there simply isn’t a better way of measuring uncertainty than with statistics – at least not yet, anyway.

So why is it that many Data Scientists are insistent that there is no place for statistics in the 21^st Century?

Well, I guess if it’s not statistics that’s the problem, there must be something wrong with Data Science.

So let’s have a heated debate…

What is Data Science?

Nobody seems to be able to come up with a firm definition of what Data Science is.
Some believe that Data Science is just a sexed-up term for statistics, whilst others suggest that it is an alternative name for ‘Business Intelligence’. Some claim that Data Science is all about the creation of data products to be able to analyse the incredible amounts of data that we’re faced with.
I don’t disagree with any of these, but suggest that maybe all these definitions are a small part of a much bigger beast.

To get a better understanding of Data Science it might be easier to look at what Data Scientists do rather than what they are.

Data Science is all about extracting knowledge from data (I think just about everyone agrees with this very vague description), and it incorporates many diverse skills, such as mathematics, statistics, artificial intelligence, computer programming, visualisation, image analysis, and much more.

It is in the last bit, the ‘much more’ that I think defines a Data Scientist more than the previous bits. In my view, if you want to be an expert Data Scientist in Business, Medicine or Engineering then the biggest skill you’ll need will be in Business, Medicine or Engineering. Ally that with a combination of some/all of the other skills and you’ll be well on your way to being in great demand by the top dogs in your field.

In other words, if you want to call yourself a Data Scientist you really do need to be an expert in your field as well as having some of the other listed skills.

Are Computer Programmers Data Scientists?

On the other hand – as seems to be happening in Universities here in the UK and over the pond in the good old US of A – there are Data Science courses full of computer programmers that are learning how to handle data, use Hadoop and R, program in Python and plug their data into Artificial Neural Networks.

It seems that we’re creating a generation of Computer Programmers that, with the addition of a few extra tools on their CV, claim to be expert Data Scientists.

I think we’re in dangerous territory here.

It’s easy to learn how to use a few tools, but much much harder to use those tools intelligently to extract valuable, actionable information in a specialised field.

If you have little/no medical knowledge, how do you know which data outcomes are valuable?
If you’re not an expert in business, then how do you know which insights should be acted upon to make sound business decisions, and which should be ignored?

Plug-And-Play Data Analysis

This, to me, is the crux of the problem. Many of the current crop of Data Scientists – talented computer programmers though they may be – see Data Science as an exercise in plug-and-play.

Plug your dataset into tool A and you get some descriptions of your data. Plug it into tool B and you get a visualisation. Want predictions? Great – just use tool C.

Statistics, though, seems to be lagging behind in the Data Science revolution. There aren’t nearly as many automated statistical tools as there are visualisation tools or predictive tools, so the Data Scientists have to actually do the statistics themselves.

And statistics is hard.
So they ask if it’s really, really necessary.
I mean, we’ve already got the answer, so why do we need to waste our time with stats?

Booooring….

So statistics gets relegated to such an extent that Data Scientists declare it dead.”

The original article and discussion –>here

About the Author

Lee Baker is an award-winning software creator with a passion for turning data into a story.
A proud Yorkshireman, he now lives by the sparkling shores of the East Coast of Scotland. Physicist, statistician and programmer, child of the flower-power psychedelic ‘60s, it’s amazing he turned out so normal!
Turning his back on a promising academic career to do something more satisfying, as the CEO and co-founder of Chi-Squared Innovations he now works double the hours for half the pay and 10 times the stress – but 100 times the fun!”

This post is taken from datascience.central and has been published previously in Innovation Enterprise and LinkedIn Pulse

Big Data and Official Statistics

01 New on the Web, 028 Big Data, 09 Stat.Office / Organization

Big Data is THE topic of the freshly published Statistical Journal of the IAOS – Volume 31, issue 2.

.

Five articles deal with Big-Data topics:

In the editorial Fride Eeg-Henriksen and Peter Hackl give an overview of the Big-Data discussions hold in Official Statistics. Here some remarks taken from this editorial:

‘In spite of the wide interest in and the great popularity of Big Data, no clear and commonly accepted definition of the notion Big Data could be established so far [3]. Modern technological, social and economic developments including the growth of smart devices and infrastructure, the growing availability and efficiency of the internet, the appeal of social networking sites and the prevalence and ubiquity of IT systems are resulting in the generation of huge streams of digital data. The complexities of the structure and dynamic of corresponding datasets, the challenges in developing the suitable software tools for data analytics, generally the diversity of potentials in making use of the masses of available data make it difficult to find a suitable and generally applicable definition. The often mentioned characterization of Big Data by 3 – or more – Vs (volume, velocity, variety – as well as veracity and value), does not capture the enormous scope of the corresponding data sets and the extensive potentials of making use of these data. A highly relevant aspect is that Big Data are so large and complex that traditional database management tools and data processing applications are not feasible and efficient means. This is illustrated by a look at the categories of data sources which typically are seen in the context of Big Data: Such data sources may be
– Administrative, e.g., medical records, insurance records, bank records.
– Commercial transactions, e.g., credit card transactions, scanners in supermarkets.
– Sensors, e.g., satellite imaging, environmental sensors, road sensors.
– Tracking devices, e.g., tracking data from mobile telephones, GPS
– Tracks of human behaviour, e.g., online searches, online page viewing.
– Documentation of opinion, e.g., comments posted in social media.

……….

‘A general conclusion from the set of articles in this Special Section can be drawn as follows: The feasibility and the potentials of using Big Data in official statistics have to be assessed from case to case. In some areas the use of Big Data sources has already proved to be feasible. The choice of the appropriate IT technology and statistical methods must be specific for each situation. Also issues like the representativity and the quality of the resulting statistics, or the confidentiality and the risk of disclosure of personal data need to be assessed individually for each case. There is no doubt that Big Data will have a place in the future of official statistics, helping to reduce costs and burden on respondents. However, major efforts will be necessary to establish the routine wise use of Big Data, and new approaches will be needed for assessing all aspects of quality.’

[3] C. Reimsbach-Kounatze, (2015), The Proliferation of “Big Data” and Implications for Official Statistics and Statistical Agencies: A Preliminary Analysis”, OECD Digital Economy Papers, No. 245, OECD Publishing. http://dx.doi.org/10.1787/5js7t9wqzvg8-en

See also: Big Data in Action May 2015

Big Data in Action

028 Big Data, 09 Stat.Office / Organization, Netherlands, UN

Not long ago in Official Statistics the topic ‘Big Data’ was mostly discussed in a theoretical manner.

https://blogstats.wordpress.com/2014/01/25/big-data-events/

However, now more and more real, and solid examples appear and demonstrate how Big Data work and what their outcome could be.

Some of these examples come from (Official) Statistics. These institutions use Big Data as a source and start applying a new analytical paradigm.

Example 1: Global Pulse (UN)

‘Global Pulse is a flagship innovation initiative of the United Nations Secretary-General on big data. Its vision is a future in which big data is harnessed safely and responsibly as a public good. Its mission is to accelerate discovery, development and scaled adoption of big data innovation for sustainable development and humanitarian action. … Big data represents a new, renewable natural resource with the potential to revolutionize sustainable development and humanitarian practice.’ –>

See some examples of using Big Data below:

analyse social media data for perceptions related to sanitation, in order to baseline public engagement
use of mobile phone data as a proxy for food security and poverty indicators
how risk factors (e.g., tobacco, alcohol, diet and physical activity) of non-communicable diseases (e.g., cancer, diabetes, depression) could be inferred from big data sources as social media and online internet searches

‘This paper outlines the opportunities and challenges, which have guided the United Nations Global Pulse initiative since its inception in 2009. The paper builds on some of the most recent findings in the field of data science, and findings from our own collaborative research projects. It does not aim to cover the entire spectrum of challenges nor to offer definitive answers to those it addresses, but to serve as a reference for further reflection and discussion. The rest of this document is organised as follows: section one lays out the vision that underpins Big Data for Development; section two discusses the main challenges it raises; section three discusses its application. The concluding section examines options and priorities for the future.’

Example 2: CBS

In Statistics Netherlands (CBS) Big Data is an important research topic.

Several examples were studied:

road sensors for traffic and transport statistics
mobile phone data for travel behaviour (of active phones) or tourism (new phones that register to network)
social media data for a sentiment analysis tracking words with their associated sentiment in Twitter, Facebook, Google+, Linkedin, etc.

http://www.slideshare.net/pietdaas/big-data-cbs-47331618 (April 2015)

Example 3: Report of the Global Working Group on Big Data for Official Statistics

In March 2015, the forty-sixth session of the UN Statistical Commission received a report about Big Data in Official Statistics:

‘The report presents the highlights of the International Conference on Big Data for Official Statistics, the outcome of the first meeting of the Global Working Group and the results of a survey on the use of big data for official statistics.’ …

‘The potential of big data sources resides in the timely — and sometimes real‑time — availability of large amounts of data, which are usually generated at minimal cost. …. before introducing big data into official statistics …. it needs to adequately address issues pertaining to methodology, quality, technology, data access, legislation, privacy, management and finance, and provide adequate cost-benefit analyses.’

UN Statistical Commission Forty-sixth session 3-6 March 2015,
The full report (http://www.un.org/ga/search/view_doc.asp?symbol=E/CN.3/2015/4)

Example 4: UNECE Statistics Wiki on Big Data in OfficialStatistics

A dedicated wiki offers an overview of the ever growing activities in the field of Official Statistics and Big Data. It’s managed by the Geneva Office of UNECE.

The wiki provides an interesting Big Data Inventory.

Translators!

028 Big Data, 030 User orientation, 033 Statistical literacy, 09 Stat.Office / Organization

Which is the working model helping to get the best results from data? It’s not a specific qualification alone, it’s melting together multiple skills around data: data strategy, best methods, analytical and statistical skills. ‘The ability to work together quickly and flexibly is critical.’

‘Matt Ariker, Peter Breuer, and Tim McGuire from McKinsey give some hints in their article ‘How to get the most from big data?‘. And this could also be of interest for Statistical Offices, traditional specialists in working with Big Data.

Next Step after OGD: Government’s Big Data Scientist

028 Big Data, 037 Open data initiatives

Open Government Data (OGD) Initiatives have been important steps helping to give broader access to administrative data.

But there was some disappointment because OGD didn’t bring up the mass of apps many hoped. And meanwhile big discussions about using Big Data emerged.

Now the US make a step forward going for a Big Data Initiative: President Obama just nominated a Chief Data Scientist in his Office, DJ Patil.

https://m.whitehouse.gov/blog/2015/02/18/white-house-names-dr-dj-patil-first-us-chief-data-scientist

‘Patil’s new role will involve the application of big data to all government areas, but particularly healthcare policy.’ (Source)

Data are from the Past

028 Big Data, 033 Statistical literacy

There’s a lot of discussion and also big hope about what is called Big Data and the role of Data Scientists. Will Data Scientists help us to create a better future?

Yes and no. ‘Making predictions about unprecedented futures requires more than data, it requires theory-driven models that envision futures that do not exist in data. Fortunately, digital tools also assist us in envisioning futures that have never been.’

This talk by Martin Hilbert published 13.01.2015 explains why Data are from the past and are not enough.

‘During his 15 years at the United Nations Secretariat, Martin Hilbert assisted governments to take advantage of the digital revolution. When the ‘big data’ age arrived, his research was the first to quantify the historical growth of how much technologically mediated information there actually is in the world.’

‘After joining the faculty of the University of California, Davis, he had more time to think more deeply about the theoretical underpinning and fundamental limitations of the ‘big data’ revolution.’

‘Martin Hilbert holds doctorates in Economics and Social Sciences, and in Communication, and has provided hands-on technical assistance to Presidents, government experts, legislators, diplomats, NGOs, and companies in over 20 countries.
At the University of California, Davis, Martin thinks about the fundamental theories of how digitization affects society.’ More http://www.martinhilbert.net

[Source: youtube https://www.youtube.com/watch?v=UXef6yfJZAI]

Big Analytics

025 Internet of Things (IoT), 028 Big Data, 09 Stat.Office / Organization

Technology Predictions for 2015

For 2015, Bing predicts the same three technologies for all continents to be in the first places: Wearables, Digital Personal Assistants and Home Automation.

All these technologies are part of the Internet of Things (IoT). They have many sensors delivering data, they are connected to several clouds and they provide information about processes or behaviours of people and environments.

Devices communicate with devices, devices with humans and humans with devices. So IoT is not another Internet or an Internet besides the already known. It’s an expansion of the Internet and it produces more massive data accessible for known and unknown parties: telecom and cloud providers, enterprises, governments ……

Finally also humans become part of IoT …

IoT Infographic

These aspects are explained in a good infographic made by Postscapes in collaboration with Harbor Research.

The complete infographic can be found here:

https://s3.amazonaws.com/postscapes/IoT-Harbor-Postscapes-Infographic.pdf

IoT, Big Data and then?

Data from classical statistical surveys as well as from connected devices must be analyzed in order to be of use.

Analytics of things (of data produced by sensors of connected things including humans) are the new kind of statistical information every organisation, private person or government can use for their purposes, for well informed decisions and steering activities. The usages are numberless and under no control.

Examples of such statistics and applications are already numberless:

Singapur itraffic
Some more example applications of IoT in Postscapes IoT toolkit
And some things are intelligent even without being connected, like thermostats.

Big Analytics, Data Science and Official Statistics

The quasi-monopoly of national statistical agencies having the resources to undertake huge surveys vanishes. A new kind of statistics emerges – collected, controlled and used by new agents.

To get the full potential of these data, special qualifications are needed. Classical statistical analysis expands to Data Science. (Big) Data need (Big) Analytics and this explains why many predict that statistician will be the sexiest profession.

To get an idea of this business in expansion or even explosion Diego Kuonen’s frequent tweets and presentations are an ideal source.

Diego Kuonen: A Statistician’s ‘Big Tent’ View on Big Data and Data Science

As new sources of data appear and broader analyzing techniques become necessary, Statistical Agencies are challenged. Attacking the paradigm change is on the agenda.

Who owns the data

Who owns the data emerging from my devices (smartphone, wearables, home automation …)? What data come from sensors out of my personal reach?

And what are they doing with these data?

Huge data protection issues wait for answers. Reality seems to be faster than rules …

'Estimates vary, but by 2020 there could be over 30 billion devices connected to the Internet. Once dumb, they will have smartened up thanks to sensors and other technologies embedded in them and, thanks to your machines, your life will quite literally have gone online.'

https://www.aclu.org/blog/technology-and-liberty-free-speech-national-security/invasion-data-snatchers-big-data-and

Statistics with a Feeling of Joy

028 Big Data, 033 Statistical literacy, 071 Hint

‘Attacking statistical problems with a feeling of joy.. and not from a position of fear and self-doubt’. That’s the message of John Rauser using computational models in statistical argumentation.

Look at this, perfect presentation!

From the 2014 Strata Conference + Hadoop World in New York City, 17.10.2014

Big Data – Big Projects – Big Discussions

025 Internet of Things (IoT), 028 Big Data

‘Old’ Data vs. Reality Mining

For a long time Official Statistics are synonym for data. With the emergence (or better: the stronger awareness) of new information sources – aka Big Data – this is about to be changed. And the opportunities these data are offering are changing, too. With all the risks (privacy!) included.

In the light of the research activities and projects around Big Data and reality mining traditional statistical data management seems to date from another time. Evidence based decision making is migrating to a new level.

Some prominent examples of Big Data research:

Project FuturICT

One very interesting and very ambitious project facing such a BIG Data opportunity is FuturICT, lead by Dirk Helbling from the Swiss Federal Institute of Technology in Zurich (ETHZ).

FuturICT’s ‘ultimate goal … is to understand and manage complex, global, socially interactive systems’ (Homepage FuturICT).

Introducing FuturICT by Dirk Helbling:

Some points (taken from Edge and the FuturICT brochure):

‘There are two big global trends. One is big data. That means in the next ten years we’ll produce as many data, or even more data than in the past 1,000 years.

The other trend is hyperconnectivity. That means we have networking our world going on at a rapid pace; we’re creating an Internet of things. So everyone is talking to everyone else, and everything becomes interdependent. ….

But on the other hand, it turns out that we are, at the same time, creating highways for disaster spreading. We see many extreme events, we see problems such as the flash crash, or also the financial crisis. That is related to the fact that we have interconnected everything. In some sense, we have created unstable systems. We can show that many of the global trends that we are seeing at the moment, like increasing connectivity, increase in the speed, increase in complexity, are very good in the beginning, but (and this is kind of surprising) there is a turning point and that turning point can turn into a tipping point that makes the systems shift in an unknown way. ……

We really need to understand those systems, not just their components. It’s not good enough to have wonderful gadgets like smartphones and computers; each of them working fine in separation. Their interaction is creating a completely new world, and it is very important to recognize that it’s not just a gradual change of our world; there is a sudden transition in the behavior of those systems, as the coupling strength exceeds a certain threshold.’

Three components

‘The [first] component to ‘measure the state of the world’ is called the Planetary Nervous System. It can be imagined as a global sensor network, where ‘sensors’ include anything able to provide data in real-time about socio-economic, environmental or technological systems (including the Internet). Such an infrastructure will enable real-time data mining – reality mining – and the calibration and validation of coupled models of socio-economic, technological and environmental systems with their complex interactions. It will even be possible to extract suitable models in a data-driven way, guided by theoretical knowledge.’ (Future ICT. Global computing for our complex world, p.18)

The second component, the Living Earth Simulator will be very important here, because that will look at what-if scenarios. It will take those big data generated by the Planetary Nervous System and allow us to look at different scenarios, to explore the various options that we have, and the potential side effects or cascading effects, and unexpected behaviors, because those interdependencies make our global systems really hard to understand.’

The third component will be the Global Participatory Platform. That basically makes those other tools available for everybody: for business leaders, for political decision-makers, and for citizens. We want to create an open data and modeling platform that creates a new information ecosystem that allows you to create new businesses, to come up with large-scale cooperation much more easily, and to lower the barriers for social, political and economic participation.’

Scoop.IT: FuturICT

Social Physics: Another Approach

Alexander Pentland from MIT Media Labs is also dealing with the opportunities of Big Data. In his book “Social Physics” he reflects about what can be done with this treasure of information. And it’s a rather technocratic approach he follows. .

Social physics?

‘Social physics is a quantitative social science that describes reliable, mathematical connections between information and idea flow on the one hand and people’s behavior on the other. Social physics helps us understand how ideas flow from person to person through the mechanism of social learning and how this flow of ideas ends up shaping the norms, productivity, and creative output of our companies, cities, and societies. It enables us to predict the productivity of small groups, of departments within companies, and even of entire cities. It also helps us tune communication networks so that we can reliably make better decisions and become more productive.’ …

See also Pentland at a Google show: http://youtu.be/HMBl0ttu-Ow

‘The engine that drives social physics is big data: the newly ubiquitous digital data now available about all aspects of human life. Social physics functions by analyzing patterns of human experience and idea exchange within the digital bread crumbs we all leave behind us as we move through the world—call records, credit card transactions, and GPS location fixes, among others. These data tell the story of everyday life by recording what each of us has chosen to do. And this is very different from what is put on Facebook; postings on Facebook are what people choose to tell each other, edited according to the standards of the day. Who we actually are is more accurately determined by where we spend our time and which things we buy, not just by what we say we do.

The process of analyzing the patterns within these digital bread crumbs is called reality mining, and through it we can tell an enormous amount about who individuals are.’ (From ‘Social Physics: How Good Ideas Spread-The Lessons from a New Science’, The Penguin Press, 2014) .

‘How to re-engineer the world’: The Economist’s critical voice

‘ Institutions should be redesigned around social physics, [Pentland] says. For instance, to improve health-care, anonymous medical records could be used to show what treatments work best. Mr Pentland’s research also offers lessons for policymakers and business people. He advances a new way to protect privacy by creating something of a property right for personal information. People would in most cases control what personal data were collected, how they are used, and with whom they are shared, treating their personal data as assets, as they do money in a bank. Yet he is less convincing when he strays from his research to make broader points about politics and economics. He reduces too much of the world’s complexity to something to be solved by data, when they are just part of the solution. His enthusiasm for a world run by datacrats rings of a zealotry that could easily go awry. Still, “Social Physics” is a fascinating look at a new field by one of its principal geeks.’ From http://www.economist.com/news/books-and-arts/21595883-how-re-engineer-world-measure-man-0

‘A society enabled by Big Data’

‘Reality mining’ is the buzzword and it’s tied to the other buzzword ‘Human-Data Interaction’ HDI.

Human Data Interaction HDI.
‘Personal data about and by each of us, whether we are aware of it or not, feeds into black-box analytics algorithms to infer facts, both correct and incorrect. These drive actions, whose effects may or may not be visible to us’.
–> http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-837.pdf

Anonymized data, visualizations, code, documentation, and papers from MIT Human Dynamics Laboratory can be found at http://realitycommons.media.mit.edu and http://hd.media.mit.edu.

Pentland: Reality mining and the new deal on privacy

Internet of things

Sources of big data are not only humans with or without their devices but also objects equipped with sensors and machines communicating with machines (M2M). In the Internet of things (IoT) things exchange data, semantic description helps for the interoperability of things and interconnected smart objects become reality.

‘The internet of things is a way to deliver cheap information that could be used for good or ill. So let’s start talking about what we want as a society’ This is the motto for one of several conferences dealing with this topic:

http://gigaom.com/2014/06/09/the-internet-of-things-isnt-about-things-its-about-cheap-data/

	Kiminal Bushro on The Challenge of Smart Da…
	From Project Rosling… on A Language Beyond …
	estate lawyers las v… on A Bridge between Statistics an…