Digging for Data

graphsWhen searching for data and statistics, usually the best approach is to first consider who would be interested in the information. If you can identify the organization or group that has a need to know the information in order to operate, or is mandated to collect and disseminate the data, you are halfway to finding the data, or at least to assessing whether the data exists.

But what if the information is obscure or the source nebulous? Until recently, conducting this kind of research on the web was difficult, if not impossible. Advanced Google syntaxes are useful, as is adding the word “database” to your search terms, but these methods go only so far since Google doesn’t index the deep web, where such information usually exists. A few search engines, however, have recently made this kind of research much easier.

Zanran is one such search engine. The clever idea behind it is that images often contain numerical data. The search engine finds these images and indexes the surrounding text. It currently extracts tables and images from HTML, PDF, and Excel files and promises to add PowerPoint and Word documents in the near future. It’s a good resource for finding obscure statistics, or at least identifying possible sources by finding related information.

Quandl, another search engine for data, is impressive in its scope, transparency, and ability to download datasets in a number of formats. It has so far indexed 8 million time-series datasets from 400 quality sources. Scroll down to the bottom of the page of results to see information about the frequency of the data, the date the search engine retrieved the data, a link to the original source, and other relevant information.

DataMarket is a portal to free and proprietary datasets. It is aimed at the enterprise market, but it is free to search and create charts and visualizations of the public data. Find the list of data providers here.

Finally, the University of Auckland Library’s OFFSTATS is worth bookmarking. It is not a search engine, but a directory of official statistical sources on the web, organized by country, region, subject, or a combination of categories. It is a handy resource to consult for locating official sources.

These resources certainly make researching data and statistics easier and more fun. Know of other good statistics search engines or meta-sources? Please share them in the comments!

Photo source: Iman Mosaad, Fickr

Leave a Reply

Your email address will not be published.