Big Data is THE topic of the freshly published Statistical Journal of the IAOS – Volume 31, issue 2.
Five articles deal with Big-Data topics:
- International collaboration to understand the relevance of Big Data for official statistics
- Web scraping techniques to collect data on consumer electronics and airfares for Italian HICP compilation
- The production of salary profiles of ICT professionals: Moving from structured database to big data analytics
- “Re-make/Re-model”: Should big data change the modelling paradigm in official statistics?
- Measuring output quality for multisource statistics in official statistics: Some directions
In the editorial Fride Eeg-Henriksen and Peter Hackl give an overview of the Big-Data discussions hold in Official Statistics. Here some remarks taken from this editorial:
‘In spite of the wide interest in and the great popularity of Big Data, no clear and commonly accepted definition of the notion Big Data could be established so far . Modern technological, social and economic developments including the growth of smart devices and infrastructure, the growing availability and efficiency of the internet, the appeal of social networking sites and the prevalence and ubiquity of IT systems are resulting in the generation of huge streams of digital data. The complexities of the structure and dynamic of corresponding datasets, the challenges in developing the suitable software tools for data analytics, generally the diversity of potentials in making use of the masses of available data make it difficult to find a suitable and generally applicable definition. The often mentioned characterization of Big Data by 3 – or more – Vs (volume, velocity, variety – as well as veracity and value), does not capture the enormous scope of the corresponding data sets and the extensive potentials of making use of these data. A highly relevant aspect is that Big Data are so large and complex that traditional database management tools and data processing applications are not feasible and efficient means. This is illustrated by a look at the categories of data sources which typically are seen in the context of Big Data: Such data sources may be
– Administrative, e.g., medical records, insurance records, bank records.
– Commercial transactions, e.g., credit card transactions, scanners in supermarkets.
– Sensors, e.g., satellite imaging, environmental sensors, road sensors.
– Tracking devices, e.g., tracking data from mobile telephones, GPS
– Tracks of human behaviour, e.g., online searches, online page viewing.
– Documentation of opinion, e.g., comments posted in social media.
‘A general conclusion from the set of articles in this Special Section can be drawn as follows: The feasibility and the potentials of using Big Data in official statistics have to be assessed from case to case. In some areas the use of Big Data sources has already proved to be feasible. The choice of the appropriate IT technology and statistical methods must be specific for each situation. Also issues like the representativity and the quality of the resulting statistics, or the confidentiality and the risk of disclosure of personal data need to be assessed individually for each case. There is no doubt that Big Data will have a place in the future of official statistics, helping to reduce costs and burden on respondents. However, major efforts will be necessary to establish the routine wise use of Big Data, and new approaches will be needed for assessing all aspects of quality.’
 C. Reimsbach-Kounatze, (2015), The Proliferation of “Big Data” and Implications for Official Statistics and Statistical Agencies: A Preliminary Analysis”, OECD Digital Economy Papers, No. 245, OECD Publishing. http://dx.doi.org/10.1787/5js7t9wqzvg8-en
See also: Big Data in Action May 2015