World Bank public data, now in Google search

11/11/2009 11:00:00 AM
When we first launched public data on Google.com, we wanted to make statistics easier to find and to encourage debate based on facts rather than intuition. The day after we launched, a friend who worked at the World Bank called me, her voice filled with enthusiasm, “Did you know that the World Bank also just released an API for their data?” Excited, I checked it out, and found an amazing treasure trove of statistics for most economies in the world. After some hard work and analysis, today we’re happy to announce that 17 World Development Indicators (list below*) are now conveniently available to you in Google search.With today’s update, you can quickly access more data with a broad range of queries. Search should be intuitive, so we’ve done the work to think through queries where public data will be most relevant to you. To see the new data, try queries like [gdp of indonesia], [life expectancy brazil], [rwanda’s population growth], [energy use of iceland], [co2 emissions of iceland] and [gdp growth rate argentina]. For example, if you search for [internet users in the united states], you will see the following chart at the top of the results page:

 

Small steps

There are specialized search engines like sig.ma using semantic technology (but working in niches), and there are the big search machines slowly starting to use such an approach. Yahoo does it for some time already and Google goes RDFa and microformats.

Google looks for markup formats (microformats and RDFa) and displays reviews and ratings and also information about persons in Rich Snippets. New types of data beyond reviews and people will follow.

Googles Webmaster Blog describes this:

‘Imagine that you have a review of a restaurant on your page. In your HTML, you show the name of the restaurant, the address and phone number, the number of users who have provided reviews, and the average rating. People can read and understand this information, but to a computer it is nothing but strings of unstructured text. With microformats or RDFa, you can label each piece of text to make it clear that it represents a certain type of data: for example, a restaurant name, an address, or a rating. This is done by providing additional HTML tags that computers understand. These don’t affect the appearance of your pages, but Google and any other services that look at the HTML can use the tags to better understand your information, and display it in useful ways—for example, in search results.’

“Data is the New Hot, Drop-dead Gorgeous Field”

Data is the New Hot, Drop-dead Gorgeous Field (From FlowingData and New York Times)

Posted by Nathan / Aug 7, 2009 to Statistics / 1 comment

We all know this already, but it’s nice to get some backing from The New York Times every now and then. In this NYT article, that I’m sure has spread to every statistician’s email inbox by now, Steve Lohr describes the dead sexy that is statistics:

The rising stature of statisticians, who can earn $125,000 at top companies in their first year after getting a doctorate, is a byproduct of the recent explosion of digital data. In field after field, computing and the Web are creating new realms of data to explore — sensor signals, surveillance tapes, social network chatter, public records and more. And the digital data surge only promises to accelerate, rising fivefold by 2012, according to a projection by IDC, a research firm.

Read more….

Timetric Makes Web Data Useful with Time Series Analysis

Timetric Makes Web Data Useful with Time Series Analysis (from ReadWriteSTART)

Written by Jolie O’Dell / August 5, 2009 8:40 PM / 0 Comments

This post is part of our ReadWriteStart channel, which is dedicated to profiling startups and entrepreneurs. The channel is sponsored by Microsoft BizSpark. To sign up for BizSpark, click here.

A winner at this year’s mini-Seedcamp in London, Timetric is an app from Inkling Software, a three-principle shop composed of chemistry and physics PhDs.

The premise is fairly simple: Timetric was created to store, share, and analyze data over time. For predicting trends, proving assertions, or recommending actions, time series analysis is a highly valuable tool. It’s Facebook’s Lexicon all grown up and actually useful, pulling data from all over the web and querying this huge database to serve significant results.

Aviary timetric-com Picture 2

Spezify: Search Engine for Visualized Statistics?

Stockholm-based startup Spezify is a visual search engine that impresses with relevant results displayed in a visual but still functional way.

Founded by Felix af Ekenstam and Per Persson, digital creatives who have over 10 years of experience in the space, Spezify arrived in beta in April and launched officially about six weeks ago. Results are culled from a number of search APIs and include social and multimedia content presented as a mosaic of the “big picture” for any search terms. More….

Aviary spezify-com Picture 2

http://spezify.com/#/_22_official%20statistics_22_ or try with the name of your statistical agency.

Yebol: Semantic Search Engine for Statistics?

Have a try with for example “official statistics“!

Aviary yebol-com Picture 1

More structure in results in Yebol than Google.  Only “statistics”  or other combinations like global and international give interesting variations in  patterns of results.

Yebol offers categorised search results for about 10 million search terms. Within half a year “every possible search term” is said to be included. See presentation video.

Yebol’s mission is to build human-like world’s knowledge base and provide knowledge based search (semantics) and services.

Yebol utilizes a combination of patented algorithms paired with human knowledge to build a Web directory for each query and each user.  Instead of the common “listing” of Web search queries, Yebol automatically clusters and categorizes search terms, Web sites, pages and contents.

Perhaps they will need some assistance to classify all the relevant search terms related to statistics from the international statistical community? Contact for partnership:partners@yebol.com

To compute whatever can be computed about the world: Wolfram Alpha

Wolfram Alpha WA is a new search engine starting in May 2009 and which could be important for statistics.

Not really a search engine

But, in fact: ‘ Wolfram Alpha isn’t really a search engine, because we compute the answers, and we discover new truths. If anything, you might call it a platonic search engine, unearthing eternal truths that may never have been written down before.
Despite his disclaimer, Wolfram Alpha looks like a search engine, in that there’s a one-line box where you type in a question. The output appears a second or two later, as a page of text and graphics below the box. What’s happening behind the scenes? Rather than looking up the answer to your question, Wolfram Alpha figures out what your question means, looks up the necessary data to answer your question, computes an answer, designs a page to present the answer in a pleasing way, and sends the page back to your computer.’ And Wolfram Alpha is about how we might build the edifice of human knowledge from simple primitive computational rules. In a general way, the NKS (A New Kind of Science NKS is a book from Stephen Wolfram, full text here: http://www.wolframscience.com/nksonline/toc.html) notion that everything is computable gave me the confidence to go ahead with Wolfram Alpha at all. It’s because of NKS that I’m willing to believe that we can find a computational model for every branch of science.’

These are the words of Stephen Wolfram. He has been talking with Rudy Rucker, the podcast is published in h+ Magazine .

Rudy Ruckers who wrote himself a book in the line of NKS adds some more remarks in h+ Magazine:

….  ‘As Wolfram Alpha comes into widespread use, Stephen believes “It will raise the level of scientific things that the average person can do. People will find that the world is more predictable than they might have expected. Just as running Google is like having a reference librarian to help you, running Wolfram Alpha will be like having a house scientist to consult for you.” ‘
‘I wondered how Wolfram Alpha compares to the so-called Semantic Web – an intelligent web project that’s been kicking around for several years now. “The problem with the Semantic Web is that the people who post the content are expected to apply the tags,” remarks Wolfram. “And the tagging system involves a complicated categorization of all the things that might exist – what philosophers call an ontology. Like any comprehensive world-system, the Semantic Web ontology is subject to endless revision, with many gray areas. For instance if there’s a cell phone antenna on a bridge girder, is the structure a bridge or a cell phone tower? It’s proved easier for us to hand-curate the existing data that we find in books and on the web. This is feasible because a lot of the data we’re interested in is purely scientific – things like the chemical formula of some compound. As this kind of data isn’t being constantly revised, it’s possible to stay ahead of the curve.”

Some pictures of WA

The namics blog shows some pictures of WA searches (comment in German)

Query „life expectancy age 45 ireland“

WA search result
WA search result

Stephen Wolfram talks at Harvard Law School, 28th of April 2009

‘WA is a ‘tremendously ambitious project. … Its goal is to find a way to make the systematic knowledge that we’ve accumulated in our civilization computable, to find a way to take a sort of all the data out there …. combine them … and make them computable … ‘

Stephen Wolfram at Harvard Law School April 2009
Stephen Wolfram at Harvard Law School April 2009

Some facts Stephen Wolfram was serching for in the talk at Harvard Law School:
GDP of France compared to Italy, internet users in europe, data about a certain location like Lexington Massachusets, materials like gold, medicine, stock performance, GDP versus railway length, president of Brazil in 1920, tide in New York at a particular time.

The ambition is to ‘reach a expert level knowledge with WA, making accessible expert level knowledge for everybody.

The 4 elements of WA are:

  1. Data curation: get data from everywhere, clean it up, make it computable in a automatic and partially human process. WA is keen to work with people from different areas to get data. The metadata, the ontology will be published, perhaps in RDF.
  2. Implementation of actual algorithms, methods and models, and put these Mathematica, the software under the hood. WA is based on Mathematica an earlier and very successful software from Stephen Wolfram.
  3. Lnguistic analysis to understand the questions. WA doesn’t use natural language processing, but the opposite: map the questions to the symbolic representations of the data in WA.
  4. Automate presentation of things like i.e. graphical data

See the video or listen to the podcast only.

Some links

Wolfram Alpha is coming by Stephen Wolfram on http://www.wolframalpha.com/

And?

It would be very interesting to see which statistical data this machine will compute and if the source will – as a matter of quality – be clearly stated.
The possibility to upload data and get them processed (API) is planned and can be used by everyone.

Google Begins to Make Public Data Searchable (ReadWriteWeb)

Google Begins to Make Public Data Searchable

Written by Marshall Kirkpatrick / April 28, 2009 1:28 PM /

Google just announced its first foray into making public data searchable and viewable in graph form. The company is starting with population and unemployment data from around the US but promises to make far more data sets searchable in the future. The potential significance of making aggregate data about our world easy to visualize, cross reference and compare can’t be overstated.

Most of us understand the world based on stories we’ve put together from our own lived experience. Another way to understand things is by finding patterns drawn from everyone’s experience in aggregate. Journalists often find big patterns and then zoom in to particular life stories that exemplify those general trends but make them easier for us to relate to as individuals. Those stories then help move public opinion in favor of policies that aim to change the general trends.

That’s just one way that easily searchable public data can be very, very important. These first data sets come from the U.S. Bureau of Labor Statistics and the U.S. Census Bureau’s Population Division, but as Google explains in its announcement there are far more sources of information that could be included. Those two government agencies alone have a lot more to offer as well.

Continued….