Visualising (not so) Big Data

Facebook is a frequently used source for information. We do not know all kinds of such data queries and reutilisations……. But we know some amazing visualsations made out of these big data.

Faces of Facebook

Here’s http://app.thefacesoffacebook.com by Natalia Rojas: 2013-12-01_facesfacebookMore than

2013-12-01_facesfacebook0on one page.

And here I am lost in pixels

2013-12-01_facesfacebook3and with an image.

2013-12-01_facesfacebook4Guess who is number 1 in facebook?

With a little bit of narcissism

If one loves to be part of an exhibition facebook is the source, too. And Intel provides the exhibition on http://museumofme.intel.com/

2013-12-01_museumofme.

2013-12-01_museumofme2The words I used in facebook:

2013-12-01_museumofme4And my graph.

2013-12-01_museumofme5

Meanwhile

At the end of some minutes editing this blog there were 166 new faces on facebook

2013-12-01_facebook#And this happens in an Internet minute according to http://www.intel.com/content/www/us/en/communications/internet-minute-infographic.html :

internet-minute

Official Statistic’s SWOT

In the official statistics industry (an industry!) reflection and  collaboration are highly prioritized.

As an example: HLG-BAS.

What’s this? ‘The High-Level Group for Strategic Developments in Business Architecture in Statistics (HLG-BAS) was set up by the Bureau of the Conference of European Statisticians in 2010 to oversee and coordinate international work relating to the development of enterprise architectures within statistical organisations.’ More about HLG-BAS on UNECE statistics wikis.

And more about the Conference of European Statisticians CES:

Implement the HLG-BAS vision

HLG-BAS presents a very interesting paper for the 60th plenary session of the Conference of European Statisticians. It’s the ‘Strategy to implement the vision of the High-level Group for Strategic Developments in Business Architecture in Statistics‘.

This paper positions official statistics as part of the information industry:
‘The official statistics industry is part of a more extensive information industry. Within this wider information industry other players are claiming their place and statistical organisations cannot automatically assume that they will retain their current position and relevance.’ (point 5)

SWOT

And the paper summarizes in a short and impressive manner the Strengths, Weaknesses, Opportunities and Threats of Official Statistics. (point 9)

‘A SWOT (Strengths, Weaknesses, Opportunities, Threats) analysis was undertaken by Capgemini Consulting working for Statistics Netherlands to define the current situation of the official statistics industry assessing it from an international perspective. This analysis was based on existing information on the industry (including the vision of the HLG-BAS) complemented by interviews with members of the HLG-BAS (internal stakeholders), commercial organisations and government bodies (external stakeholders).

The results of this exercise are:

1. Strengths

(a) High quality with relevant and very strong statistical products over long term;
(b) Strong “brand value” of official statistics locally and internationally;
(c) Ability and ‘stamina’ to produce statistics for long-term records and consistency;
(d) International collaboration has started mainly because it is becoming too expensive for each NSO to individually change their tailor-made production processes and products.

2. Weaknesses

(a) A limited outside and “client-centric” view;
(b) Communication of products and results is often not good enough;
(c) Workforce and processes should be more agile to follow rapidly the changing needs of society;
(d) NSOs are not efficient enough in their processes and rely too much on human effort;
(e) The statistical industry as a whole has no clear silhouette or definition; international coherence is low;
(f) NSOs should provide more information about statistics, regarding both quality and other metadata;
(g) Top-level commitment to bring about the changes needed to align the statistical industry with the changing environment is not broadly understood as the key factor in this change process.

3. Opportunities

(a) In some specific statistical domains, cross-border data become more important (globalisation, enterprise groups, climate change). The work and products of NSOs should be expanded to explain what is happening on a multinational level;
(b) The “open data” movement may increase the sources available for official statistics;
(c) NSOs could collaborate (more) with (commercial) external parties;
(d) The official statistics industry could play a more active role regarding new and alternative data sources and collection methods;
(e) NSOs could be quality institutes that certify statistical inputs/outputs of other (commercial) parties;
(f) In the statistical domain the NSOs can lead when it comes to defining and maintaining international standards;
(g) Standardisation of production process (plug and play technology) and products of NSOs to increase international comparison and quality control of products;
(h) Consolidation of NSOs roles as public supplier of trust and quality;
(i) International coherence and the willingness to form a more closely knit statistical community or industry are beginning to materialize;
(j) Specialisation of NSOs in certain products to increase efficiency in the production process of these products. This specialisation in products could vary across countries and sectors to optimize the possibilities of specialisation.

4. Threats

(a) Other organisations are starting to create output NSOs used to have a monopoly on;
(b) Reduced staff and budget cuts;
(c) Weak/fragile coordination of international collaboration activities;
(d) Society wants more timeliness in statistics, both in disseminating existing products and in developing new products;
(e) Some government clients do not distinguish between official and non-official data sources for ad hoc questions, as long as it meets their purpose;
(f) New technologies like open data can seduce NSOs into losing focus of their core business.’

Dürer’s Rhinoceros and Statistics

Mixing Dürer’s Rhino with Statistics might sound a little bit strange.

Dürer’s Rhinoceros – Wikipedia, the free encyclopedia.

But in an epistemological perspective there’s a point.  Dürer never saw a Rhinoceros; he created it – in accordance with some information he got- in the process of drawing. Statistics – in a sense –  do the same and this even with objects which do not exist in ‘reality’.

This topic is itself an object in several studies. So the Norvegians Rudinow Saetnan, Heidi Mork Lomell and Svein Hammer treat it in their reader ‘The mutual construction of statistics and society‘.

‘How does the act of counting affect the world? How does it change the objects counted, change the lifes of those who count (double entendre intended)? …  Our argument, briefly stated, is that society and the statistics that measure and describe it are mutually constructed.  This argumcnt addresses two counterarguments from seemingly opposite directions. On the one hand, we oppose the notion that statistics are simple, straighthforward, objective descriptions of society, gathered from nonparticipant points of observation…. Like all othcr specific forms of viewing, it is a social act. Counting acts in and upon the social world. Of course, this also means that not counting has an effect on the aspects of the world we (do and/or don’t) count. ….
On the other hand, we also oppose the notion that statistics and/or society are mere fictions, to bc invented at will.’  (Introduction, p.1)

And in its alltime classic ‘The politics of Large NumbersAlain Desrosières treats the same question: ‘ … it is difficult to think simultaneaously that the objects being measured really do exist, and thatt this is only a convention’ (p.1)

And here’s the real Rhinoceros (Indian rhinoceros (Rhinoceros unicornis), Panzernashorn )

Statistics are not so bad .. -;) .

2010 in review – WordPress told me

The stats helper monkeys at WordPress.com mulled over how this blog did in 2010, and here’s a high level summary of its overall blog health:

Healthy blog!

The Blog-Health-o-Meter™ reads Wow.

Crunchy numbers

Featured image

About 3 million people visit the Taj Mahal every year. This blog was viewed about 33,000 times in 2010. If it were the Taj Mahal, it would take about 4 days for that many people to see it.

 

In 2010, there were 73 new posts, growing the total archive of this blog to 302 posts. There were 195 pictures uploaded, taking up a total of 30mb. That’s about 4 pictures per week.

The busiest day of the year was October 20th with 277 views. The most popular post that day was World Statistics Day 2010.

Where did they come from?

The top referring sites in 2010 were googleblog.blogspot.com, crisismaven.wordpress.com, en.wordpress.com, Google Reader, and netvibes.com.

Some visitors came searching, mostly for blog stats, trendalyzer, statistics blog, google trendalyzer, and trendalyzer google.

Attractions in 2010

These are the posts and pages that got the most views in 2010.

1

World Statistics Day 2010 September 2010
3 comments

2

Trendalyzer becomes Motion Chart April 2008
16 comments

3

Google launches Trendalyzer Gadget February 2008
4 comments

4

Google buys Trendalyzer from Gapminder March 2007
7 comments

5

Downloads July 2007
1 comment

Journalism in the Age of Data

Journalism in the Age of Data from Geoff McGhee on Vimeo.

Journalists are coping with the rising information flood by borrowing data visualization techniques from computer scientists, researchers and artists. Some newsrooms are already beginning to retool their staffs and systems to prepare for a future in which data becomes a medium. But how do we communicate with data, how can traditional narratives be fused with sophisticated, interactive information displays?

Watch the full version with annotations and links at datajournalism.stanford.edu.

Produced during a 2009-2010 John S. Knight Journalism Fellowship at Stanford University.

Mandatory Independence

Among all those nice posts about the latest data visualizations and web 2.0 activities we must not forget how all the data is gathered that we later on distribute, publish or visualize. Keeping the balance between the burden of filling out forms and privacy concerns on the one hand and demands for high quality data on the other have occupied us for ages. This becomes especially visible with a census where typically the largest amount of people are affected.

It probably comes at no surprise that statistical offices aren’t the only ones juggling that balance. More often than not they have superiors and those are in politics. Recent events in Canada are worth being spread in this community in case they haven’t already.

From my understanding of two articles in The Globe and Mail Canada’s Industry Minister wants to make the census long-form voluntary against the advice from Statistics Canada. The debate ended in the Canadian chief statistician stepping down:

Dr. Sheikh’s Wednesday night resignation as Statistics Canada’s chief statistician over the census is all the more remarkable because of its rarity. In a world where loyalty is king, bureaucrats of his standing do not tend to quit over differences of opinion.
He did. In doing so, he displayed qualities that have emerged through his 38-year career: stubbornness and independence of mind.
The Globe and Mail, July 23, 2010

This is indeed remarkable. The lack of people speaking up when it comes to political interference with official statistics is no proof that such interference does not exist. In theory there are provisions in some countries such as the following:

The professional independence of statistical authorities from other policy, regulatory or administrative departments and bodies, as well as from private sector operators, ensures the credibility of European Statistics.
Article 1 of the European Statistics Code of Practice

But at least off the record many people involved might regard such codes to be mere lip service and wouldn’t be surpised to read something like this in the news:

In an interview published Wednesday, Clement said that some people at Statistics Canada “like to think” they are an independent agency, but in fact they report to him as minister.
POSTMEDIA NEWS JULY 21, 2010

At least one should spread the word about such instances.

Timeline

This year, the Swiss Federal Statistical Office (FSO) celebrates the 150th anniversary of its founding. On 1 June 1860 the predecessor of the FSO began its activities.

A special website in four languages offers various activities  for the “150 years FSO” anniversary year. Launched 23 march 2010 (see press release).

Some highlights

The interactive timeline “ChronoStat” (in German and French only)

A quiz to test your knowledge in statistics (in German and French only)

and much more …..

Tufte’s Granddad

Are you in need for holiday presents in the office and on a tight budget? Why not go back in time and shop for books out of copyright. The Internet Archive is here to help. Check out Willard Cope Brinton: Graphic presentation (1939), and delve into an ancestor to the Tufte books.

You can read this book online through the beautiful web-based book reader or download in a number of formats that allow for high quality printing. For free.