Digging for Data

graphsWhen searching for data and statistics, usually the best approach is to first consider who would be interested in the information. If you can identify the organization or group that has a need to know the information in order to operate, or is mandated to collect and disseminate the data, you are halfway to finding the data, or at least to assessing whether the data exists.

But what if the information is obscure or the source nebulous? Until recently, conducting this kind of research on the web was difficult, if not impossible. Advanced Google syntaxes are useful, as is adding the word “database” to your search terms, but these methods go only so far since Google doesn’t index the deep web, where such information usually exists. A few search engines, however, have recently made this kind of research much easier.

Zanran is one such search engine. The clever idea behind it is that images often contain numerical data. The search engine finds these images and indexes the surrounding text. It currently extracts tables and images from HTML, PDF, and Excel files and promises to add PowerPoint and Word documents in the near future. It’s a good resource for finding obscure statistics, or at least identifying possible sources by finding related information.

Quandl, another search engine for data, is impressive in its scope, transparency, and ability to download datasets in a number of formats. It has so far indexed 8 million time-series datasets from 400 quality sources. Scroll down to the bottom of the page of results to see information about the frequency of the data, the date the search engine retrieved the data, a link to the original source, and other relevant information.

DataMarket is a portal to free and proprietary datasets. It is aimed at the enterprise market, but it is free to search and create charts and visualizations of the public data. Find the list of data providers here.

Finally, the University of Auckland Library’s OFFSTATS is worth bookmarking. It is not a search engine, but a directory of official statistical sources on the web, organized by country, region, subject, or a combination of categories. It is a handy resource to consult for locating official sources.

These resources certainly make researching data and statistics easier and more fun. Know of other good statistics search engines or meta-sources? Please share them in the comments!

Photo source: Iman Mosaad, Fickr

The Ethics of Social Media Cyber-Sleuthing

social media

Without a doubt, social media and social networking sites like Facebook, Twitter, LinkedIn, and countless others have become indispensable tools in conducting background investigations, due diligence, employment pre-screening, and other types of investigations. Pursuit Magazine recently had a good two-part series that covered not just pointers to some lesser-known social media sites, but also discussed the importance of adequately capturing and presenting the information found on these sites.

The articles also highlighted some ethical and legal issues around gathering such information, advising, for example, against using shady techniques like pretexting and password cracking to gain access to protected material. Additionally, in Canada, a number of laws – notably human rights and privacy laws – govern the types of information that may be gathered on social media and elsewhere, the methods used for gathering the information, and the decisions made based on the information.

To stay on the side of the law, it is crucial for organizations and investigators to exercise caution when researching, collecting, and disclosing personal information about individuals. The Information and Privacy Commissioner of British Columbia has released some guidelines for social media background checks (PDF), identifying some pitfalls and issues to keep in mind:

  • Accuracy of information (Is it the right profile? Was the profile created by the individual himself or herself? Is the information current?)
  • Collecting irrelevant or too much information
  • Over-reliance on consent

Exercising good judgment when trawling social media sites isn’t just a matter of law and ethics; it can also save the organization from embarrassment, a lesson that the Toronto Star learned the hard way when it published false allegations against an Ontario MPP based on an old Facebook photo. The newspaper issued a rare front-page apology, citing an “egregious lapse” of standards.

Photo source: Jason Howie, Flickr

Roundup of Subject Guides and Directories

Gwen Harris’s post about the WWW Virtual Library — a directory of recommended web resources in various subject areas started back in the day by Tim Berners-Lee — inspired me to do a quick roundup of a few of the most useful and well-kept subject guides and directories I’m aware of.

In the early days of the world wide web, when the number of websites was small, directories were common and extremely useful in locating websites in an organized way. Perhaps the best-known one was Yahoo!, which was a hierarchical directory before it was a search engine (it still maintains a directory). Today, with billions of websites online, directories and subject guides are arguably even more important to help direct us to vetted, high-quality sources of information and save us from flailing around on search engines. As Gwen notes, however, subject guides/directories are a dying breed because of the amount of work involved in their upkeep.

Some of the guides that are updated regularly include:

  • The Virtual Private Library: A massive (almost overwhelming!) list of resources, branded as Subject Tracers, on a number of research topics. If you’re looking for comprehensiveness rather than curation, these lists are chock-full of links in various subject areas.
  • Toddington’s Free Online Open Source Intelligence (OSINT) Resources: A compendium to their paid knowledge base, this page lists links to useful resources in a number of categories, to help online research and investigative professionals.
  • University library sites: University libraries sites provide wonderful guides and pathfinders to reliable research resources. While the material tends to be academic and scholarly (obviously) and is often limited to the library system’s holdings, they can provide research direction for an unfamiliar subject area, and with a little resourcefulness, one can often access the material in other collections. The University of Toronto Libraries Research Guides and the Harvard Library Research Guides are two good ones, or search for “LibGuides” and your subject area of interest to find others.

What are some of your favourite subject guides and directories?

The Periodic Table of Business Research Databases

I just came across a terrifically handy tool from Alacra called the Periodic Table of Business Research Databases for identifying the right database for business-related research. There are a vast number of databases available on the market, each with its own focus and content depth. This tool provides a nice, quick and dirty overview of most of these information sources, identifying them by the various categories of research (company profiles, credit and investment research, market research, news, etc.).

Unfortunately it is missing some key Canadian databases, such as Infomart and Newscan, but I’ll be bookmarking it.