Statistics is Dead – Long Live Statistics

To be an expert in a thematic field!

Lee Baker wrote an article that will please the whole community of official statistics where specialists of many thematic fields (and not alone statisticians or mathematicians or … data scientists) are collecting, analysing, interpreting, explaining and publishing data.
It’s this core message that counts:
“… if you want to be an expert Data Scientist in Business, Medicine or Engineering”  (or vice versa: An expert statistician in a field of official statistics like demography, economy, etc.)  “then the biggest skill you’ll need will be in Business, Medicine or Engineering…. In other words, …. you really do need to be an expert in your field as well as having some of the other listed skills”

Here is his chain of arguments:

“Statistics is Dead – Long Live Data Science…

by Lee Barker

I keep hearing Data Scientists say that ‘Statistics is Dead’, and they even have big debates about it attended by the good and great of Data Science. Interestingly, there seem to be very few actual statisticians at these debates.

So why do Data Scientists think that stats is dead? Where does the notion that there is no longer any need for statistical analysis come from? And are they right?

Is statistics dead or is it just pining for the fjords?

I guess that really we should start at the beginning by asking the question ‘What Is Statistics?’.
Briefly, what makes statistics unique and a distinct branch of mathematics is that statistics is the study of the uncertainty of data.
So let’s look at this logically. If Data Scientists are correct (well, at least some of them) and statistics is dead, then either (1) we don’t need to quantify the uncertainty or (2) we have better tools than statistics to measure it.

Quantifying the Uncertainty in Data

Why would we no longer have any need to measure and control the uncertainty in our data?
Have we discovered some amazing new way of observing, collecting, collating and analysing our data that we no longer have uncertainty?
I don’t believe so and, as far as I can tell, with the explosion of data that we’re experiencing – the amount of data that currently exists doubles every 18 months – the level of uncertainty in data is on the increase.

So we must have better tools than statistics to quantify the uncertainty, then?
Well, no. It may be true that most statistical measures were developed decades ago when ‘Big Data’ just didn’t exist, and that the ‘old’ statistical tests often creak at the hinges when faced with enormous volumes of data, but there simply isn’t a better way of measuring uncertainty than with statistics – at least not yet, anyway.

So why is it that many Data Scientists are insistent that there is no place for statistics in the 21st Century?

Well, I guess if it’s not statistics that’s the problem, there must be something wrong with Data Science.

So let’s have a heated debate…

What is Data Science?

Nobody seems to be able to come up with a firm definition of what Data Science is.
Some believe that Data Science is just a sexed-up term for statistics, whilst others suggest that it is an alternative name for ‘Business Intelligence’. Some claim that Data Science is all about the creation of data products to be able to analyse the incredible amounts of data that we’re faced with.
I don’t disagree with any of these, but suggest that maybe all these definitions are a small part of a much bigger beast.

To get a better understanding of Data Science it might be easier to look at what Data Scientists do rather than what they are.

Data Science is all about extracting knowledge from data (I think just about everyone agrees with this very vague description), and it incorporates many diverse skills, such as mathematics, statistics, artificial intelligence, computer programming, visualisation, image analysis, and much more.

It is in the last bit, the ‘much more’ that I think defines a Data Scientist more than the previous bits. In my view, if you want to be an expert Data Scientist in Business, Medicine or Engineering then the biggest skill you’ll need will be in Business, Medicine or Engineering. Ally that with a combination of some/all of the other skills and you’ll be well on your way to being in great demand by the top dogs in your field.

In other words, if you want to call yourself a Data Scientist you really do need to be an expert in your field as well as having some of the other listed skills.

Are Computer Programmers Data Scientists?

On the other hand – as seems to be happening in Universities here in the UK and over the pond in the good old US of A – there are Data Science courses full of computer programmers that are learning how to handle data, use Hadoop and R, program in Python and plug their data into Artificial Neural Networks.

It seems that we’re creating a generation of Computer Programmers that, with the addition of a few extra tools on their CV, claim to be expert Data Scientists.

I think we’re in dangerous territory here.

It’s easy to learn how to use a few tools, but much much harder to use those tools intelligently to extract valuable, actionable information in a specialised field.

If you have little/no medical knowledge, how do you know which data outcomes are valuable?
If you’re not an expert in business, then how do you know which insights should be acted upon to make sound business decisions, and which should be ignored?

Plug-And-Play Data Analysis

This, to me, is the crux of the problem. Many of the current crop of Data Scientists – talented computer programmers though they may be – see Data Science as an exercise in plug-and-play.

Plug your dataset into tool A and you get some descriptions of your data. Plug it into tool B and you get a visualisation. Want predictions? Great – just use tool C.

Statistics, though, seems to be lagging behind in the Data Science revolution. There aren’t nearly as many automated statistical tools as there are visualisation tools or predictive tools, so the Data Scientists have to actually do the statistics themselves.

And statistics is hard.
So they ask if it’s really, really necessary.
I mean, we’ve already got the answer, so why do we need to waste our time with stats?

Booooring….

So statistics gets relegated to such an extent that Data Scientists declare it dead.”

The original article and discussion –>here


About the Author

Lee Baker is an award-winning software creator with a passion for turning data into a story.
A proud Yorkshireman, he now lives by the sparkling shores of the East Coast of Scotland. Physicist, statistician and programmer, child of the flower-power psychedelic ‘60s, it’s amazing he turned out so normal!
Turning his back on a promising academic career to do something more satisfying, as the CEO and co-founder of Chi-Squared Innovations he now works double the hours for half the pay and 10 times the stress – but 100 times the fun!”


This post is taken from datascience.central and has been published previously in Innovation Enterprise and LinkedIn Pulse

Attention please!

They all use statistics …  in the media, in politics, in sports. But they mostly forget that statistics, especially official statistics, are made by professionals in a quite demanding, time- and resource-consuming process. The WO/MAN-IN-THE-MIDDLE, the professionals, providing information and knowledge from facts remain hidden (despite Googles’ statement that statistician will be ‘the sexy job in the next ten years)


snip_20151124213156

Source: 'Statistics – A universal language', Swiss Statistics, Neuchâtel

How to promote the statisticians’ work?

This question is a perennial topic in the statistics community.  And the answers are manifold. Some examples:

Show the results!

Dissemination of statistics is widely developed and of high quality. Websites of statistical institutions present rich information – from simple facts to interactive presentations and visualisations.

New media play its role, too.  And they are important. Feeds and tweets are omnipresent (-> Some examples of official statistical tweets).

See Statistics Netherlands’ experience: ‘In addition to the normal distribution of news reports, Twitter has become a standard way for Statistics Netherlands to distribute day-to-day information. The number of followers of @statistiekcbs grew from 14,000 in early 2014 to almost 56,000 by the end of the year. In December, Statistics Netherlands’ tweets were viewed a total of 3.6 million times which represents an average of almost 120,000 per day. In the final months of 2014, news reports were being retweeted on an average of 100 times a day. `(Statistics Netherlands Annual Report for 2014, p.9)

Yes, ok … but the professionals behind these presentations are not visible. Even as brilliant presenters as Hans Rosling let us forget how the facts for his beautiful visualisations were prepared.

About us

Selfies of statistical institutions are standard on their websites. These presentations are short and normally without a marketing touch.

There are some examples, where a self-portrait gets an own website and presents more than static information about the institution. The European Statistical System (ESS)  publishes a website serving as a ‘single entry point to relevant information on the organization and activities of the ESS’ and its partner organizations. An RSS feed provides updates and readers can follow the work of more than 30 statistical institutions … so long as they provide their news to the website.

snip_essheader

snip_essvision

Launching a campaign is another way to attract attention. This is mostly chosen for periodical Census. So for instance in the US, Germany or the UK. There are also mini-campaigns. ‘Statistics counts for you’ is such an example.

snip_20151126131832Clicking on the animated teaser in the homepage opens a new website with the message and a summary of available statistics. There’s no offer to communicate with the reader via news-tweets or newsletters.

And finally there are also examples of more interactive and user-oriented approaches.

CBS Corporate News

is a specific website ( http://corporate.cbs.nl ) and a beautiful presentation of Statistics Netherlands,  showing activities and achievements in six fields by choosing the relevant filters (like projects, events&congresses, new services, innovative developments, user relations and international affairs). It’s attractive, personal, interactive and provides updates via a newsletter.

snip_cbscorpfilters

snip_cbscorpabout

Chronostat

is an interactive, multimedia presentation of Statistics Switzerland’s activities and products.

snip_chronostat

Five filters for topics (Personalities, Publications, Swiss History and Statistics, Achievements and Methods) and filters for years back to the beginnings of Statistics Switzerland in 1860 let readers follow multiple aspects of Statistics. With this timeline it provides an archive.

snip-chronostat2

Update it!

The be-all and end-all of statistical self-portraying are updating. Updated information presents an active institution and maintains the contact with users and interested groups. It fosters understanding for the work behind the statistical information and prevents from cutting necessary resources.

 

 

Today – Statsday

October 20th is the day of Official Statistics. It’s the day to highlight the importance of reliable, independent and high-quality numbers. Numbers that help to make good, evidence-based decisions.

This year “Better data, better lives” is the theme of the World Statistics Day selected by the United Nations General Assembly.

https://worldstatisticsday.org/

Many countries also celebrate this day and are planning special events. The new  UN Sustainable Development Goals (SDGs) will be the focus of most of these events.

For a fact-based Worldview

2015-10-07_RoslingMedia

Hans Rosling, co-founder and promoter of the Gapminder Foundation and of gapminder.org fights with statistics against myths (‘Our goal is to replace devastating myths with a fact-based worldview.’) and tries to counterbalance media focussing on war, conflicts and chaos.

Here one more example (and this in a media interview…): ‘You can’t use media if you want to understand the world’ (sic!)

And this statement on gapmider.org; ‘Statistical facts don’t come to people naturally. Quite the opposite. Most people understand the world by generalizing personal experiences which are very biased. In the media the “news-worthy” events exaggerate the unusual and put the focus on swift changes. Slow and steady changes in major trends don’t get much attention. Unintentionally, people end-up carrying around a sack of outdated facts that you got in school (including knowledge that often was outdated when acquired in school).’ http://www.gapminder.org/ignorance/

 

 

Elections, visual

2015-10-02_GEMEINDEN

On October 18, 2015 Swiss voters will elect a new Parliament for the next four years. There are some very useful and also beautiful visual tools that help voters to get informed about developments in the political landscape and about candidates.

.

Background: The Swiss Political System

2015-10-02_parliamentThe full picture of Switzerland’s political institutions and executive authorities can be found in a yearly updated official brochure (the page above is part of it)

2015-10-02_parliamentcoverSee also the Official Website (Federal elections of 18 October 2015) and:

The website of the Federal Statistical Office (FSO), FSO topic: elections (German and French only)

 

.


 

Let’s have a look at some of these visual and interactive tools.

 


Find Your Candidates

With interactive tools, one answers questions to define one’s position on the political spectrum and to generate suggestions for candidates to vote for.

Tools from smartvote or vimentis exist for the National Council (200 members and 3,802 candidates) and the Council of States (46 members and 161 candidates).

For smartvote about 80 to 90 percent of the candidates have filled in a questionnaire.

2015-10-02_smartvoteThis questionnaire helps defining their political profile, a smartspider.

‘The smartspider presents a political profile based on the agreement about eight topics/aims. A value of 100 represents a strong agreement, a value of 0 a strong disagreement.’

2015-10-02_candidateIn answering the same questionnaire a voter defines his one profile that is matched with the candidates’. In the end, he gets his own smartspider and suggestions for candidates in his constituency. The more questions a voter answers, the more precise his voting advice will be.

.


Political Shift in Communes 1981 to 2014 – year by year

Lean Swiss communes more towards the left or the right, are they more conservative or progressive?

‘Based on the result of every single popular vote since 1983. The Somoto Research Institute together with the Swiss Broadcasting Corporation (SBC, swissinfo.ch’s parent company) has used the data to find this out.’ A quite complex interactive visualisation depicts this for types of communes and every single commune.

 

2015-10-02_shift.

 


Party Preferences in the Communes

11 elections (1971 to 2011) for the Swiss National Council show how 2345 communes changed their political preferences during these 40 years. The  SRF Data Team (@srfdata) created a visualisation out of tons of data (in German only)

2015-10-02_GEMEINDENAfter selecting a political party and a commune the map of Switzerland shows how this commune changed its attitude towards the chosen party. Hovering over the map gives the facts of all other communes for the chosen party.

2015-10-02_GEMEINDEN-POSCHIAVO timeline Great!

.


Interactive Political Atlas

And not to forget the very rich interactive Political Atlas presented by the Swiss Federal Statistical Office (FSO).

Elections to the National Council can be found from 1919 (!) to today. And also votations about innumerable topics are shown starting 1866 (!!). Have a look (with flash enabled).

2015-10-02_POlitAtlas.

 


And not enough yet

How did national counselors vote in parliament? (in German, by SRF Data)

2015-10-02_Abstimmungen.

Do you like a Quiz … and learn about Swiss political parties?

How well do you know Swiss politics? (by SRF Data)

 

Forgotten something? Sure! There is so much activity in the visualisation scene in Switzerland …