Semantic Web: Basics, RDF, DC and the description of a statistical site

INTRO

Recently I found a very short and very good intro to the Semantic Web, here it is (from Radar Networks)

” An Intro To The Semantic Web

The concept of the sematic web is a few years old now, but is only now really beginning to gain real-world traction. The idea is based upon the simple observation that the current web mainly consists of a network of human-readable documents, not computer-parsable data. Because of this, the web is extremely useful for humans to gather data and information, but not at all useful for computers. The sematic web seeks to overcome this limitation by promoting standards for information representation and exchange to create a web of data.

The key technical standards for the Semantic Web are RDF and OWL, both of which were concieved by Tim Berners-Lee and later developed into working standards by the collective efforts of many contributors to W3C working groups. These standards provide a consistent, unifed way of representing knowledge and information as well as mechnisms for exchanging this information.

So what can you do with the Semantic Web? Theoretically, lots. Consider the following simple question: what are the homepages for all of the Web 2.0 companies located in San Francisco? With today’s tools, this is a nearly impossible question to answer. Typing “web 2.0 company san francisco” into Google returns a confusing mishmash of 12 million hits, most of which are neither companies nor located in San Francisco. It’s up to you, the human on the other side of the screen, to sift through the dross of ads, conference announcements, articles, and blog chatter to find the few gems you are looking for. It’s also up to you, the human, to cut/paste all of these into a spreadsheet for tracking.

The Semantic Web solves this problem by providing a standard mechanisms for web sites to publish data, instead of documents. One could imagine that every company that wanted to make its presence known on the Semantic Web would publish a set of RDF tags (<MyCompany, location, San Francisco> and <MyCompany, field, Web 2.0>) describing itself. With the information in a standard format, query tools could then allow construction of targeted queries that answer the specific question at hand.”

And here is the link leading to the full article with a lot of additional links and informations

A good overview with details can also be found in the article Nigel Shadbolt, Wendy Hall (University of Southampton) and Tim Berners-Lee ( Massachusetts Institute of Technology) wrote in 2006: Semantic Web Revisited.

RDF and DC

RDF (Resource Description Framework) is the technical standard that can be used to describe web content. But there is one more standard needed to structure the description of (the content of) a resource (i.e. a web page), like the standards used to describe a book in a library catalogue. Dublin Core is such a standard, it is easy to understand and it is widely used. Dublin Core (DC) defines for instance the title, the subject, the date, the creator, the publisher … of a resource.

As an example the description of Statistics Switzerland’s Portal (http://www.bfs.admin.ch/bfs/portal/en/index.html). Statistics Switzerland does not yet use RDF and DC but there are tools that can extract informations out of a webpage and put these informations in RDF and DC. This looks like this:

<?xml version=”1.0″?>
<!DOCTYPE rdf:RDF SYSTEM “http://dublincore.org/documents/2002/07/31/dcmes-xml/dcmes-xml-dtd.dtd”&gt;

<rdf:RDF
xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#&#8221;
xmlns:dc=”http://purl.org/dc/elements/1.1/
“>
<rdf:Description rdf:about=”http://www.bfs.admin.ch/bfs/portal/en/index.html”&gt;
<dc:title>
Swiss Federal Statistical Office
</dc:title>
<dc:subject>
pressure; statistics; French; Board; News; eSurvey; wide;
Specialized; Terms; English; Security; Nutshell; Yearbook;
supplement; Subscribe; cantons; informations; Financial;
releases; German; download; Rome; found; Administration;
english; info@bfs.admin.ch; Homepage; Brief; offers;
ClassWEB; press; Forestry; publications; DATA; costs;
Definitive; access; footprint; health; ecological; WORLD;
REGIONAL; admin.ch; communes; offer; copyright;
statistical; Service; national; Bilateral; World; links;
refer; Federal; Swiss; microdata; cooperation; data;
overview; NEWS; Results; Liability; Statistical; Aviation;
Guide; managing; maps; Figures; E-Mail; Health; languages;
public; available; Compare; adopted; SFSO; Encyclopedia;
Schweiz; Espace; Site; European; Data; complete; Regional;
Portraits; Dissemination; content; Publications; TOPICS;
Pocket; Activities; listings; DSBB; based; Rising;
Italiano; Agreements; Information; brochure; 13th; Union;
comparisons; Please; Details; principles; top; Bulletin;
SERVICES; pages; minimum; core; Forum; visit; Address;
entitled; nomenclatures; brief; Maps; Definitions; LIBRARY;
Statweb; Map; set; OECD; Statistics; Statisticians;
Environmental; Edition; Standards; Extended; Civil;
Tourism; Office; Tel; Server; Kids; Monetary; Contact;
search; International; Neuchâtel; Switzerland; indicators;
FAQ; Conference; Treaty; Economic; kindly; compact;
Catalog; INSTITUTIONS; en; Deutsch; de; finances; portrait;
official; major; Social; research; Education; guidelines;
Classifications; Search; June; basis; EU-CH; Thematic;
rumantsch; Total; Newsletters; Eurostat; Web; publishes;
Accounts; Infurmaziuns; Français; Impressum; Italian; Help
</dc:subject>
<dc:description>
The SFSO publishes information on the situation and
developments in Switzerland in a multitude of fields and
plays a part in enabling comparisons in an international
context. It provides the quantitative information needed
for understanding the present and planning for the future.
</dc:description>
<dc:publisher>
</dc:publisher>
<dc:type>
Text
</dc:type>
<dc:format>
text/html; charset=utf-8
</dc:format>
<dc:format>
50275 bytes
</dc:format>
</rdf:Description>
</rdf:RDF>

This description has been generated by a tool (http://www.ukoln.ac.uk/metadata/dcdot/) and it is not very good; especially the subject section is a compilation of sense and nonsense and shows the necessity that such descriptions are controlled by humans. But it shows the key elements of this metadata technology: How the resource “Statistic Switzerland’s Portal” (rdf:about=”http://www.bfs.admin.ch&#8221;) is described by title (dc:titel), subject (dc:subject) and so on.

PROBLEMS

Describing resources like websites using standards like RDF (Resource Desciption Framework) and DC (Dublin Core) is an important step towards the Semantic Web. But there are (at least) two main problems.

1) There is no quality control. Everybody can fill whatever information in the DC categories in order to attract traffic on it’s site. This will not be the case with satistical agencies but the system does not prevent from doing such things.

2) We do not know what search engines do with these metadata. Google seems not to trust these metadata and doesn’t use it, Yahoo is said to make use of it, but how exactly?

One thought on “Semantic Web: Basics, RDF, DC and the description of a statistical site”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s