Digging for Data

graphsWhen searching for data and statistics, usually the best approach is to first consider who would be interested in the information. If you can identify the organization or group that has a need to know the information in order to operate, or is mandated to collect and disseminate the data, you are halfway to finding the data, or at least to assessing whether the data exists.

But what if the information is obscure or the source nebulous? Until recently, conducting this kind of research on the web was difficult, if not impossible. Advanced Google syntaxes are useful, as is adding the word “database” to your search terms, but these methods go only so far since Google doesn’t index the deep web, where such information usually exists. A few search engines, however, have recently made this kind of research much easier.

Zanran is one such search engine. The clever idea behind it is that images often contain numerical data. The search engine finds these images and indexes the surrounding text. It currently extracts tables and images from HTML, PDF, and Excel files and promises to add PowerPoint and Word documents in the near future. It’s a good resource for finding obscure statistics, or at least identifying possible sources by finding related information.

Quandl, another search engine for data, is impressive in its scope, transparency, and ability to download datasets in a number of formats. It has so far indexed 8 million time-series datasets from 400 quality sources. Scroll down to the bottom of the page of results to see information about the frequency of the data, the date the search engine retrieved the data, a link to the original source, and other relevant information.

DataMarket is a portal to free and proprietary datasets. It is aimed at the enterprise market, but it is free to search and create charts and visualizations of the public data. Find the list of data providers here.

Finally, the University of Auckland Library’s OFFSTATS is worth bookmarking. It is not a search engine, but a directory of official statistical sources on the web, organized by country, region, subject, or a combination of categories. It is a handy resource to consult for locating official sources.

These resources certainly make researching data and statistics easier and more fun. Know of other good statistics search engines or meta-sources? Please share them in the comments!

Are You a Skilled Googler?

Most of us think we’re great Googlers. And it’s a testament to Google’s strength as a mostly reliable search engine that we do usually find what we’re looking for with a few simple keywords. But beyond the quick factual search, things can get tricky, and as a number of studies have shown, most of us miss good information on the open web due to our limited search skills (and here it’s worth noting that less than 10% of online information is actually available on the open web via search engines; the other 90% resides on the deep or invisible web).

There are a number of ways to improve your search skills. While Google appears simple and intuitive on the surface, its power can best be harnessed with some training, and Google provides a number of online training guides to help improve the search skills of its users. Two self-paced courses have been developed for power searching and advanced power searching, and this course, geared to students and their teachers, provides lesson plans and trivia challenges. Also available are webinars that guide the user through a variety of tools and techniques to find higher quality sources more easily.

But no matter how advanced a Googler you become, you’ll be missing a lot of good information if you rely solely on Google. Other search engines such as Bing and DuckDuckGo index the web differently and have different ways of prioritizing results. (See this slide deck from Karen Blakeman of RBA Information Services for some alternatives to Google.) And as mentioned before, only a small fraction of online information is indexed through search engines; countless specialized databases and indexes provide high-quality material that won’t appear in search engine results.

By the way, Google has come up with a fun way to put your Google search skills to the test. A Google a Day is a daily puzzle that can be solved by using clever search skills on Google.

New Resource for Finding Theses and Dissertations

Open Access Theses and Dissertations is a new search engine for locating open access graduate theses and dissertations published around the world. It’s important to note that the search engine does not search the full-text of the theses, but rather the metadata drawn from the records of university repositories, consortia, or OCLC WorldCat. Still, it is a good supplement to other resources such as the Networked Digital Library of Theses and DissertationsProQuest Theses and Dissertations, and the Theses Canada portal.

Theses are great sources of deep analysis and reliable statistics. They are a secret weapon in my research toolbox.