More information about the conference and all presentations: “ICT is the lifeblood of the knowledge society” – Swedish Presidency of the European Union
Category: 06 Search
World Bank public data, now in Google search
Small steps
There are specialized search engines like sig.ma using semantic technology (but working in niches), and there are the big search machines slowly starting to use such an approach. Yahoo does it for some time already and Google goes RDFa and microformats.
Google looks for markup formats (microformats and RDFa) and displays reviews and ratings and also information about persons in Rich Snippets. New types of data beyond reviews and people will follow.
Googles Webmaster Blog describes this:
‘Imagine that you have a review of a restaurant on your page. In your HTML, you show the name of the restaurant, the address and phone number, the number of users who have provided reviews, and the average rating. People can read and understand this information, but to a computer it is nothing but strings of unstructured text. With microformats or RDFa, you can label each piece of text to make it clear that it represents a certain type of data: for example, a restaurant name, an address, or a rating. This is done by providing additional HTML tags that computers understand. These don’t affect the appearance of your pages, but Google and any other services that look at the HTML can use the tags to better understand your information, and display it in useful ways—for example, in search results.’
“Data is the New Hot, Drop-dead Gorgeous Field”
Data is the New Hot, Drop-dead Gorgeous Field (From FlowingData and New York Times)
We all know this already, but it’s nice to get some backing from The New York Times every now and then. In this NYT article, that I’m sure has spread to every statistician’s email inbox by now, Steve Lohr describes the dead sexy that is statistics:
The rising stature of statisticians, who can earn $125,000 at top companies in their first year after getting a doctorate, is a byproduct of the recent explosion of digital data. In field after field, computing and the Web are creating new realms of data to explore — sensor signals, surveillance tapes, social network chatter, public records and more. And the digital data surge only promises to accelerate, rising fivefold by 2012, according to a projection by IDC, a research firm.
Timetric Makes Web Data Useful with Time Series Analysis
Timetric Makes Web Data Useful with Time Series Analysis (from ReadWriteSTART)
This post is part of our ReadWriteStart channel, which is dedicated to profiling startups and entrepreneurs. The channel is sponsored by Microsoft BizSpark. To sign up for BizSpark, click here.
A winner at this year’s mini-Seedcamp in London, Timetric is an app from Inkling Software, a three-principle shop composed of chemistry and physics PhDs.
The premise is fairly simple: Timetric was created to store, share, and analyze data over time. For predicting trends, proving assertions, or recommending actions, time series analysis is a highly valuable tool. It’s Facebook’s Lexicon all grown up and actually useful, pulling data from all over the web and querying this huge database to serve significant results.
Spezify: Search Engine for Visualized Statistics?
Stockholm-based startup Spezify is a visual search engine that impresses with relevant results displayed in a visual but still functional way.
Founded by Felix af Ekenstam and Per Persson, digital creatives who have over 10 years of experience in the space, Spezify arrived in beta in April and launched officially about six weeks ago. Results are culled from a number of search APIs and include social and multimedia content presented as a mosaic of the “big picture” for any search terms. More….
http://spezify.com/#/_22_official%20statistics_22_ or try with the name of your statistical agency.
Yebol: Semantic Search Engine for Statistics?
Have a try with for example “official statistics“!
More structure in results in Yebol than Google. Only “statistics” or other combinations like global and international give interesting variations in patterns of results.
Yebol offers categorised search results for about 10 million search terms. Within half a year “every possible search term” is said to be included. See presentation video.
Yebol’s mission is to build human-like world’s knowledge base and provide knowledge based search (semantics) and services.
Yebol utilizes a combination of patented algorithms paired with human knowledge to build a Web directory for each query and each user. Instead of the common “listing” of Web search queries, Yebol automatically clusters and categorizes search terms, Web sites, pages and contents.
Perhaps they will need some assistance to classify all the relevant search terms related to statistics from the international statistical community? Contact for partnership:partners@yebol.com
Real-time Search
In the age of microblogging search engines have to be very, very quick in indexing the continuous stream of information. The search engine Collecta indexes blogs, blog comments, microblogs like Twitter, Jaiku and also fotos in Flickr.
In order to find out more about search engines beyond Google see the article of Ryan Singel in WIRED .
Wolfram Alpha is online
Some hours ago the much discussed new search engine Wofram Alpha (see blogstats post ) went online.
It gives results about statistics (first page only viewed below)
And it gives ample information about 2 slices of swiss cheese (plain, low fat etc.) 😉
It gives ample source information:
And it gives this, too:
To compute whatever can be computed about the world: Wolfram Alpha
Wolfram Alpha WA is a new search engine starting in May 2009 and which could be important for statistics.
Not really a search engine
But, in fact: ‘ Wolfram Alpha isn’t really a search engine, because we compute the answers, and we discover new truths. If anything, you might call it a platonic search engine, unearthing eternal truths that may never have been written down before.
Despite his disclaimer, Wolfram Alpha looks like a search engine, in that there’s a one-line box where you type in a question. The output appears a second or two later, as a page of text and graphics below the box. What’s happening behind the scenes? Rather than looking up the answer to your question, Wolfram Alpha figures out what your question means, looks up the necessary data to answer your question, computes an answer, designs a page to present the answer in a pleasing way, and sends the page back to your computer.’ And Wolfram Alpha is about how we might build the edifice of human knowledge from simple primitive computational rules. In a general way, the NKS (A New Kind of Science NKS is a book from Stephen Wolfram, full text here: http://www.wolframscience.com/nksonline/toc.html) notion that everything is computable gave me the confidence to go ahead with Wolfram Alpha at all. It’s because of NKS that I’m willing to believe that we can find a computational model for every branch of science.’
These are the words of Stephen Wolfram. He has been talking with Rudy Rucker, the podcast is published in h+ Magazine .
Rudy Ruckers who wrote himself a book in the line of NKS adds some more remarks in h+ Magazine:
…. ‘As Wolfram Alpha comes into widespread use, Stephen believes “It will raise the level of scientific things that the average person can do. People will find that the world is more predictable than they might have expected. Just as running Google is like having a reference librarian to help you, running Wolfram Alpha will be like having a house scientist to consult for you.” ‘
‘I wondered how Wolfram Alpha compares to the so-called Semantic Web – an intelligent web project that’s been kicking around for several years now. “The problem with the Semantic Web is that the people who post the content are expected to apply the tags,” remarks Wolfram. “And the tagging system involves a complicated categorization of all the things that might exist – what philosophers call an ontology. Like any comprehensive world-system, the Semantic Web ontology is subject to endless revision, with many gray areas. For instance if there’s a cell phone antenna on a bridge girder, is the structure a bridge or a cell phone tower? It’s proved easier for us to hand-curate the existing data that we find in books and on the web. This is feasible because a lot of the data we’re interested in is purely scientific – things like the chemical formula of some compound. As this kind of data isn’t being constantly revised, it’s possible to stay ahead of the curve.”
Some pictures of WA
The namics blog shows some pictures of WA searches (comment in German)
Query „life expectancy age 45 ireland“

Stephen Wolfram talks at Harvard Law School, 28th of April 2009
‘WA is a ‘tremendously ambitious project. … Its goal is to find a way to make the systematic knowledge that we’ve accumulated in our civilization computable, to find a way to take a sort of all the data out there …. combine them … and make them computable … ‘

Some facts Stephen Wolfram was serching for in the talk at Harvard Law School:
GDP of France compared to Italy, internet users in europe, data about a certain location like Lexington Massachusets, materials like gold, medicine, stock performance, GDP versus railway length, president of Brazil in 1920, tide in New York at a particular time.
The ambition is to ‘reach a expert level knowledge with WA, making accessible expert level knowledge for everybody.
The 4 elements of WA are:
- Data curation: get data from everywhere, clean it up, make it computable in a automatic and partially human process. WA is keen to work with people from different areas to get data. The metadata, the ontology will be published, perhaps in RDF.
- Implementation of actual algorithms, methods and models, and put these Mathematica, the software under the hood. WA is based on Mathematica an earlier and very successful software from Stephen Wolfram.
- Lnguistic analysis to understand the questions. WA doesn’t use natural language processing, but the opposite: map the questions to the symbolic representations of the data in WA.
- Automate presentation of things like i.e. graphical data
See the video or listen to the podcast only.
Some links
Wolfram Alpha is coming by Stephen Wolfram on http://www.wolframalpha.com/
Good Morning Silicon Valley, Search party: Public data from Google, public peek from WolframAlpha
NY Times, The Veil is Lifted From Wolfram Alpha
ReadWriteWeb,Wolfram|Alpha: Our First Impressions
techcrunch, Nova Spivack, wolfram-alpha-computes-answers-to-factual-questions-this-is-going-to-be-big
And?
It would be very interesting to see which statistical data this machine will compute and if the source will – as a matter of quality – be clearly stated.
The possibility to upload data and get them processed (API) is planned and can be used by everyone.
Google Begins to Make Public Data Searchable (ReadWriteWeb)
Google Begins to Make Public Data Searchable
Written by Marshall Kirkpatrick / April 28, 2009 1:28 PM /
Google just announced its first foray into making public data searchable and viewable in graph form. The company is starting with population and unemployment data from around the US but promises to make far more data sets searchable in the future. The potential significance of making aggregate data about our world easy to visualize, cross reference and compare can’t be overstated.
Most of us understand the world based on stories we’ve put together from our own lived experience. Another way to understand things is by finding patterns drawn from everyone’s experience in aggregate. Journalists often find big patterns and then zoom in to particular life stories that exemplify those general trends but make them easier for us to relate to as individuals. Those stories then help move public opinion in favor of policies that aim to change the general trends.
That’s just one way that easily searchable public data can be very, very important. These first data sets come from the U.S. Bureau of Labor Statistics and the U.S. Census Bureau’s Population Division, but as Google explains in its announcement there are far more sources of information that could be included. Those two government agencies alone have a lot more to offer as well.