And now: Semantic Statistics (SemStats)

Official Statistics has a long tradition in creating and providing high-quality metadata. And the Semantic Web needs just this: metadata!

So it’s not surprising that these two find together, more and more.
A special workshop will be organized during the The 12th International Semantic Web Conference ISWC, 21-25 October 2013, Sydney, Australia.

It is the 1st International Workshop on Semantic Statistics (SemStats 2013) organized by Raphaël Troncy (EURECOM), Franck Cotton (INSEE), Richard Cyganiak (DERI), Armin Haller(CSIRO) and Alistair Hamilton (ABS).

ISWC 2013 is the premier international forum for the Semantic Web / Linked Data Community. Here, scientists, industry specialists, and practitioners meet to discuss the future of practical, scalable, user-friendly, and game changing solutions.’

The workshop summary

How to publish linked statistics? And: How to use linked data for statistics? These are the key questions of this workshop.

‘The goal of this workshop is to explore and strengthen the relationship between the Semantic Web and statistical communities, to provide better access to the data held by statistical offices. It will focus on ways in which statisticians can use Semantic Web technologies and standards in order to formalize, publish, document and link their data and metadata.

The statistics community faces sometimes challenges when trying to adopt Semantic Web technologies, in particular:

  • difficulty to create and publish linked data: this can be alleviated by providing methods, tools, lessons learned and best practices, by publicizing successful examples and by providing support.
  • difficulty to see the purpose of publishing linked data: we must develop end-user tools leveraging statistical linked data, provide convincing examples of real use in applications or mashups, so that the end-user value of statistical linked data and metadata appears more clearly.
  • difficulty to use external linked data in their daily activity: it is important do develop statistical methods and tools especially tailored for linked data, so that statisticians can get accustomed to using them and get convinced of their specific utility.’

A tradition

RDF, Triples, Linked Data … these are topics statisticians already treated and adapted. But rather on an individual track and not as an organization.

This blog has a lot of information about Semantic Web and Official Statistics, about 40 posts since 2007.

See this post (2012) with a recent paper from Statistics Switzerland (where a study on publishing linked data has just been finished in collaboration with the Bern University of Applied Sciences):

Or this (2009) about SDMX and RDF or about LOD activities in 2009:

Linking UK Government Data, John Sheridan in UK has a linked data lead. It’s John Sheridan and he prepared

a short presentation propagating (of course) linked data with a lot of interesting examples!

The summary:


Linked Data: It’s not a top-down system. Berners-Lee and OpenGov

There’s not much noise about Semantic Web these days. But in the fascinating and creative semantic-web niche activities go on.

Once more Tim Berners-Lee explains what Linked data are.


The 5-star system helps measuring i.e. how far or near open-gov data  are from being part of the Semantic Web.

Available on the web (whatever format), but with an open licence


Available as machine-readable structured data (e.g. excel instead of image scan of a table)


as (2) plus non-proprietary format (e.g. CSV instead of excel)


All the above plus, Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff


All the above, plus: Link your data to other people’s data to provide context



And there is a interesting example how Linked Data are published:

‘Linked data is data in which real-world things are given addresses on the web (URIs), and data is published about them in machine-readable formats at those locations. Other datasets can then point to those things using their URIs, which means that people using the data can find out more about something without that information being copied into the original dataset.

This page lists the sectors for which we currently publish linked data and some additional resources that will help you to use it. Most sectors have one or more SPARQL endpoints, which enable you to perform searches across the data; you can access these interactively on this site‘.


What’s the effect of open data? Some journals (like the Guardian) make ample use of open data, but there is no wide-spread activity or commitment or lots of evaluation studies to be seen. Infoweek just published an article about US open gov and found that there is a lot  to be done as only small groups seem to take notice of this government activity. ‘The most difficult part of open government may be getting the public to participate.   … the “if you build it, they will come” approach simply doesn’t work.’ (InformationWeek, Feb 21, 2011: Open Government Reality Check: Federal agencies are making progress on the Obama administration’s Open Government Directive, but there’s still a long way to go. Here’s our list of top priorities.)


“Journalism Needs Data in 21st Century”

This is an interesting summary from ReadWriteWeb about the role of data in media with several fresh innovative examples coming up 2009.  Some of them have already been presented before as posts in this blog. The conclusion is that journalism is data-driven and demands open access to raw data. Data should be transparent and not be hidden in pdf-files or orther formats without links to raw data (micro data). However, integrity aspects of such transparency are not discussed and they are certainly an important issue among official and public statistics data producers in this context.

Journalism Needs Data in 21st Century

Written by Guest Author / August 5, 2009 2:00 AM / 5 Comments

Journalism has always been about reporting facts and assertions and making sense of world affairs. No news there. But as we move further into the 21st century, we will have to increasingly rely on “data” to feed our stories, to the point that “data-driven reporting” becomes second nature to journalists.

The shift from facts to data is subtle and makes perfect sense. You could that say data are facts, with the difference that they can be computed, analyzed, and made use of in a more abstract way, especially by a computer.