Freebase – an ambitious database project

Semantic Web projects are emerging. Some days ago Freebase (Metaweb) has open the doors for non registered users. With Freebase is starting a new and very ambitious approach combining data base technology with social web features.Statistical publications of official sources, tables from Swivel etc. could be integrated in Freebase and benefit from this metadata based access.

First: Freebase is creating a central database that can feed everybody’s website.

People are allowed to put their data in one big database. The (flexible) general structure of this database is managed by administrators in order to guarantee structured order and in order to prevent redundancy. This database can be integrated in other applications, websites, blogs etc..The vision: A database that can be used by everybody with content for everyone – bigger than Google. A database organizing the world’s information and make it accessible from everywhere.

In the words of Freebase: ” We will be compared to Google, Yahoo!, Wikipedia and other information driven sites, but such a comparison misses our real purpose: we are trying to enable thousands of organizations and millions of people to build their own Web. Paradoxically, we are decentralizing the experience of information by centralizing the storage of it. … The idea is to create a ‘new web’ with cleaner, more structured information.”And John Markoff explains it in the following manner:“The idea of a centralized database storing all of the world’s digital information is a fundamental shift away from today’s World Wide Web, which is akin to a library of linked digital documents stored separately on millions of computers where search engines serve as the equivalent of a card catalog.In contrast, Mr. Hillis (see company descrition at the end of this post, AG) envisions a centralized repository that is more like a digital almanac. The new system can be extended freely by those wishing to share their information widely.On the Web, there are few rules governing how information should be organized. But in the Metaweb database, to be named Freebase, information will be structured to make it possible for software programs to discern relationships and even meaning. …. Contributions already added into the Freebase system include descriptive information about four million songs from Musicbrainz, a user-maintained database; details on 100,000 restaurants supplied by Chemoz; extensive information from Wikipedia; and census data and location information” (NY Times, March 9, 2007).

And Michael Arrington adds:“Like Google Base, Freebase is a massive database. The purpose of the database is to centralize as much data as possible, and allow participants to freely add and access data – developers can extract information from Freebase via a set of APIs and add it to their web applications. It also builds relationships between highly structured pieces of data, something that can’t easily be done with distributed data controlled by different entities.” (9.3.2007)

Second: The structure of Freebase

The topmost category in Freebase is called a “category”. There are 9 categories for the moment:
Arts&Entertainment,
Society,
Sports,
Products&Services, Money,
Science&Technology,
Time&Space,
Special Inttersests,
System.

Categories have domains. For example the category Society comprises the following domains

Freebase Domains

Domains have types. For example Society->Education->Field of Study or ->School Newspaper. And types have some defined properties.

Freebase Type definition The basic concept is “topics”. So for instance “Reporter” is a topic of the type School Newspaper. Freebase Topic

.

Third: Freebase uses social web technologies to get the information. And this in a way avoiding redundancies and structural chaos.

In the words of O’Reilly

Freebase makes usage of the so called “architecture of parrticipation” i.e. “… one of the secrets of success in Web 2.0 is to harness self-interest, not volunteerism, in a natural “architecture of participation.”

” What’s so clever is that by articulating the types as a separate structure from the data, and having instances inherit that structure when they are created, users don’t think they are providing metadata — they think they are just providing data.

Because anyone creating a new instance is prompted to fill out the data in a structured way, that it doesn’t seem like an extra task, but rather that the software is being helpful. Any data field can be left blank, but it can also easily be updated by anyone else who cares to do so.

This is the true Web 2.0 way: don’t ask users to provide structure, unless it’s useful to them. But do design your applications in such a way that structure is generated without extra effort on the user’s part. And mine structure that already exists, even if it’s messy and inefficient.”

Registered users can input topics and link one or more types to this topic, so relations are created. Everyone can create new types, first they belong to one’s personal domain. If these types turn out to be of general interest, they can be promoted as being public or common.

Fourth: The company’s profile in crunchbase

“Freebase is a massive, collaboratively edited database of cross-linked data. The idea behind the product is to create a Wikipedia like system for building the semantic web. Freebase allows anyone to contribute, structure, search, copy and use data. It sounds like Wikipedia, but instead of arranging by articles, it is more of an almanac, organized like a database, and readable by people and software.

Freebase was founded in San Francisco in 2005 by Danny Hillis who previously co-founded both Applied Minds and Thinking Machines Corporation. In March 2006, Freebase received $15 million in funding from investors including Benchmark Capital, Millennium Technology Ventures and Omidyar Network.

Freebase is a large public database that collects three kinds of information: data, which includes fine-grained information like the release date of a movie; texts, which include documents such as the topic descriptions; and media, which includes files like images, movies and audio.

The key feature that distinguishes Freebase from database competitors like Google Base and Oracle is that their data is highly interconnected. For instance, if you add a Freebase page for yourself and enter that you went to a certain college as a student than you will show up under the student section of your college’s page. And, if your college doesn’t already have a page than one will be automatically created.

Google Base consists of many separate data sets that are stored in an organized way. This makes it easy for duplicate records of the same data to be uploaded that might conflict with each other. Freebase takes a different approach of reconciling conflicting data and ensuring that each object only exists once in the database. This means that you won’t find multiple pages for your favorite band.

Freebase organized database might be different than Wikipedia’s arranged articles, but Freebase still uses Wikipedia as one of its

biggest contributors. Freebase also uses contributors Musicbrainz, with over 4 million songs submissions, and Chemoz, with over 100,000 restaurant review submissions.”

Fifth: Have a look.

There are 15 invitations I can distribute. With these invitations interested persons get access to Freebase as “authors” and can play around. Send a mail to blogstats at gmx dot net.

2 thoughts on “Freebase – an ambitious database project”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s