Semantic Web and the practice of Statistical Agencies

Some statistical agencies are already using standardized meadata to describe their webpages. There are several forms of syntax used (see also earlier posts 1 2 3).

RDF (Resource Description Framework) is not yet very common, DC (Dublin Core) on the other hand can be found on several websites. Some examples:

Australian Bureau of Statistics

<head>
<META NAME=”DC.Date.modified” SCHEME=”ISO8601″ CONTENT=”2007-05-04″>
<META NAME=”DC.Coverage.jurisdiction” CONTENT=”Commonwealth of Australia”>
<META NAME=”DC.Coverage.spatial” CONTENT=”Australia”>
<META NAME=”AGLS.Function” SCHEME=”AGIFT” CONTENT=”information dissemination; census collection; population distribution analysis; collection management; economic statistical collection”>
<META NAME=”DC.Title” CONTENT=”Australian Bureau of Statistics web site”>
<META NAME=”DC.Language” SCHEME=”RFC3066″ CONTENT=”en”>
<META NAME=”DC.Rights” CONTENT=”© Commonwealth of Australia, 2007″>
<META NAME=”DC.Creator” SCHEME=”GOLD” CONTENT=”c=AU; o=Commonwealth of Australia; ou=Australian Bureau of Statistics”>
<META NAME=”DC.Publisher” SCHEME=”GOLD” CONTENT=”c=AU; o=Commonwealth of Australia; ou=Australian Bureau of Statistics”>
<META NAME=”DC.Description” CONTENT=”Welcome to the Australian Bureau of Statistics web site. The Australian Bureau of Statistics is Australia’s official statistical organisation. We assist and encourage informed decision-making, research and discussion within governments and the community, by providing a high quality, objective and responsive national statistical service.”>
<META NAME=”DC.Subject” SCHEME=”ABS Classification” CONTENT=”Building and construction”>
<META NAME=”DC.Type.category” CONTENT=”document”>
<META NAME=”DC.Type.aggregationLevel” CONTENT=”collection”>
<META NAME=”DC.Type.documentType” SCHEME=”agls-document” CONTENT=”homepage”>
<META NAME=”DC.Availability” CONTENT=”corporate name:Australian Bureau of Statistics; address:PO Box 10 Belconnen ACT 2616; contact:National Information Referral Service; email:client.services@abs.gov.au; telephone:1300 135 070″>
<META NAME=”DC.Identifier” SCHEME=”URI” CONTENT=”www.abs.gov.au/websitedbs/d3310114.nsf/Home/Home!OpenDocument”><LINK href=”/ausstats/wmdata.nsf/stylesheetscurrent/zabs_website.css/$FILE/zabs_website.css” rel=”stylesheet” type=”text/css” media=”all”>
…….

Czech Statistical Office

<head>
<title>Èeský statistický úøad | ÈSÚ</title>
<meta http-equiv=”Content-Type” content=”text/html; charset=windows-1250″>
<meta http-equiv=”Content-Language” content=”cs”>
<meta name=”description” content=”Èeský statistický úøad”>
<meta name=”keywords” content=”státní správa, statistika, ÈSÚ”>
<!–NK metadata–>
<link rel=”schema.DC” href=”http://purl.org/dc/elements/1.1/”&gt;
<meta name=”DC.Title” content=”Èeský statistický úøad”>
<meta name=”DC.Creator” content=”Èeský statistický úøad”>
<meta name=”DC.Subject” scheme=”PHNK” content=”Èeský statistický úøad”>
<meta name=”DC.Subject” scheme=”PHNK” content=”sbìr dat”>
<meta name=”DC.Subject” scheme=”PHNK” content=”statistika”>
<meta name=”DC.Subject” scheme=”PHNK” content=”Èesko”>
<meta name=”DC.Subject” scheme=”PHNK” content=”statistické služby”>
<meta name=”DC.Subject” scheme=”MDT_MRF” content=”311″>
<meta name=”DC.Subject” scheme=”MDT_MRF” content=”311.2″>
<meta name=”DC.Subject” scheme=”MDT_MRF” content=”(437.3)”>
<meta name=”DC.Subject” scheme=”MDT_KON” content=”311 – Statistika [4]”>
<meta name=”DC.Subject” scheme=”DDC_CON” content=”310 – General statistics”>
<meta name=”DC.Description.abstract” content=”Portál Èeského statistického úøadu je dùvìryhodným informaèním nástrojem vytváøející konzistentní obraz o stavu a vývoji spoleènosti v návaznosti na vyvíjející se potøeby uživatelù služeb státní statistiky. “>
<meta name=”DC.Publisher” content=”Èeský statistický úøad”>
<meta name=”DC.Date” scheme=”W3C-DTF” content=”2006-07-18″>
<meta name=”DC.Type” scheme=”DCMIType” content=”Text”>
<meta name=”DC.Format” scheme=”IMT” content=”text/html”>
<meta name=”DC.Format.medium” content=”computerFile”>
<meta name=”DC.Identifier” content=”http://www.czso.cz”&gt;
<meta name=”DC.Identifier” scheme=”URN” content=”URN:NBN:cz-nk2006708″>
<meta name=”DC.Language” scheme=”RFC3066″ content=”cze”>
<meta name=”DC.Language” scheme=”RFC3066″ content=”eng”>
<meta name=”DC.Rights” content=”© Èeský statistický úøad, 2007″>
……..

Statistics Denmark

<html>
<head>
<title>Danmarks Statistik</title>
<link rel=”schema.dc” href=”http://purl.org/metadata/dublin_core_elements”&gt;
<meta name=”DC.Creator” content=”Danmarks Statistik / Statistics Denmark”>
<meta name=”DC.Date” content=”2005-06-13″>
<meta name=”DC.Language” content=”Eng”>
<meta name=”DC.Title” content=”Engelsk Forside”>
<meta name=”DC.Description” content=”Engelsk Forside – Engelsk Forside”>
<meta name=”DC.Subject” content=””>
<meta name=”DC.Rights” content=”Copyright: Danmarks Statistik / Statistics Denmark”>
<meta name=”DC.Author” content=”ANH”>
…..
</head>

So why did these agencies use resource description? What are the advantages? What did they learn?
Let’s ask them!

4 thoughts on “Semantic Web and the practice of Statistical Agencies”

  1. Well –what did we learn and why did we use resource descriptions at Statistics Denmark?

    To tell the hole truth and nothing but the truth we at Statistics Denmark did not expect anything to happen. However we did follow a recommendation about best practices for internet communication prepared by the Danish Ministry of Science, Technology and Innovation (www.itst.dk).

    Each year they stage a competition / review between governmental internet sites. In the review the sites are ranked according to a set of criteria and resource descriptions are one of these criteria. So the simple answer is that we use Dublin Core because it is recommended to us, and because it is important to our ranking in the yearly review of official Danish web-sites. Not because of it diret application for end users.

    The original thinking behind the recommendation was that search engines should and would do a large part of their indexing from the Dublin Core meta-tags. But of cause now we know that Google works in a completely different way. And we know that Google is the only search engine used by visitors to http://www.dst.dk.

    So in reality we could remove the Dublin Core tags without affecting our users.

    From time to time we use the Dublin Core fields to insert synonyms aimed at our own robot based search engine. But at the moment we do not see our “tagging” as stepping stone to any kind of semantic web. It is my understanding that there are information specialists who does indeed use the information found in the Dublin Core tags.

    Best regards
    Jesper Ellemose

  2. The answer of the colleague in the Czech Statistical Office

    Using of Dublin Core in the CZSO

    There is demand to use standardized metadata description of all electronic documents in the Czech Republic in accordance with the act. No. 365/2000 Coll., with effect from the 1st January 2007. The Czech Statistical Office discharged this duty in March 2006. Ministry of Informatics prepared direction based on the Act, which contains obligatory description of metadata system – and this is based on Dublin Core.

    Besides this statutory duty the CZSO efforts to observe recommendation, which ensures better data accessibility. Structured data is one of the possibilities how to offer to our users additional information, which could have relevance for them in many cases (for example guarantee of content or information recency).
    We have expected, beside acquaintance determined by the act, that by using metadata (for example page description, key words) we could influence the possibility of finding web pages and support its better rating in searching. Metadata displayed on particular pages serves first of all for automatic processing by the “catalogue” – i.e. special file, which keeps metadata to each information source and enables transparent and structured data searching. “Catalogue” allows collecting metadata from different databases in different technologies etc.
    Using of standardized metadata should be profitable for both sides. Users receive guarantee of data validity and recency, on the other hand the attendance and rating of institution’s web presentation should be higher. Structured data could be also used in communication with other state institutions. But to tell the truth, we cannot verify presently whether using of standardized metadata positively influenced rating of web pages of the CZSO and filled every requirements.

    Standardized metadata are also used in internal searching on the CZSO web pages.
    For optimising the search results, the CZSO uses the controlled vocabulary. It contains 74 keywords selected from searched words analysis. Each keyword has its own group of alternative words – synonyms and a limited number (5) of “selected links”. When the user enters a keyword or a synonym in the search engine, system displays a site with the searched word and “selected links”, which are most related with the searched word or theme.
    Moreover the CZSO uses a List of thematic groups. Each page created is assigned to one or more thematic group. By entering a searched word, the user has a possibility to select a thematic group, where to search for the word.
    Catalogue used on CZSO web pages for data searching accurately follows the Dublin Core structure, these data are subsequently provided to full text search engine. In the future there is a vision to implement these records to other public catalogues without any fundamental changes.
    We also hope, that standardized metadata assist in interconnection of the Public database and the current web presentation.
    A practical illustration of using the structured data is the National Statistical Portal. Its main objective is to assemble statistical information from all public institutions at one place and obligatory metadata description (in our case it is Dublin Core) plays here an essential informative role.
    In this case, the catalogue should accumulate metadata focused on statistical information from all available information sources and then provide them for structured searching. In the future there is a possibility to create the Public Administration‘s Information Sources Catalogue. This Catalogue would contain a list of data gained from all subjects of Public Administration Sector.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s