Quick Tip Tuesday: International Investigative Research

For international due diligence and investigative research, I’ve found Investigative Dashboard to be a wonderful resource:

Investigative Dashboard (ID) has been developed by the Organized Crime and Corruption Reporting Project (OCCRP), the world’s leading cross border investigative reporting organization. OCCRP designed Investigative Dashboard as a transnational collaborative effort to help journalists and civil society researchers expose organized crime and corruption around the world. It hosts three core tools: a crowd-sourced database of information and documents on persons of interest and their business connections, a worldwide list of online databases and business registries, and a research desk where journalists can go for help in sourcing hard to find information.

The collection of business registries and related databases is terrifically useful.

Newspaper Databases: All That’s Not Fit for Research

newspapersThey say newspapers are the first draft of history. They capture and disseminate noteworthy events as they unfold, and they are used by succeeding generations to make sense of a nation’s history and identity. Although this chronicling is increasingly occurring with dizzying speed on the web, newspapers, especially the paper editions captured in databases, will remain a fundamental resource for scholarly and other types of research for the foreseeable future.

Limitations of Full-Image Databases

There are, however, a number of problems with using newspaper databases for research. In his article “Illusionary Order: Online Databases, Optical Character Recognition, and Canadian History, 1997–2010,” Prof. Ian Milligan identifies one such issue: the shortcomings of optical character recognition (OCR) in databases of scanned microfilm. In a related post, he notes that keyword searches in databases that contain digitized, full-image versions of newspapers often result in incomplete retrieval of articles, due to the nature of the scanned material, the speed with which these databases were created, and the technological limitations. Consequently, research results can be problematic:

[H]yphenations are not covered (problematic in smaller columns, where Woodwork might be hyphenated as Wood-work across two lines), if microfilm streaks obscure a letter, if it was slightly tilted, or if the OCR just plain misses a character.* 

Prof. Milligan likens using these databases uncritically for historical research to “using a volume of the Canadian Historical Review with 10% or so of the pages ripped out.” While he recognizes that these databases are indispensable tools, he urges researchers to be aware of their limitations and to identify how they dealt with them.

Not Just a Database Issue

As a former researcher at Canada’s “newspaper of record,” I have additional concerns about relying on newspapers and newspaper databases for research. Despite the best efforts of reporters, editors, researchers, and archivists, news articles have long been replete with inaccuracies and omissions. The reasons are numerous and have to do with both structural and human shortcomings: the fast pace of news production, the lack of access to sources and resources, the lack of space, human error, editorial bias, editorial decision-making regarding which corrections are worth appending, etc. Once news articles make it into databases, other problems arise: graphics are not rendered in text-based electronic databases, databases have search and display technical shortcomings, etc.

Add to these the continuing economic pressures facing news organizations, which have necessitated deep cuts at many newspapers, and using newspapers as research sources has become increasingly problematic. In the seemingly endless rounds of layoffs since the start of the Great Recession, copyeditors, researchers, and enhancers/archivists — the guardians of accuracy, clarity, and order — have been the worst hit, while reporters and editors are being expected to do more with less. Errors, omissions, bias, and inconsequential content are now baked into the newspaper product, and this will have deep consequences for future scholarship and research.

All this to say that cautionary notes like Prof. Milligan’s are welcome and necessary, and researchers should always, always cross-reference research results with multiple and varied sources.

*It is my understanding that an upgraded database for The Globe and Mail’s Canada’s Heritage from 1844 is in the works that will address some of these shortcomings. For example, it will use higher quality OCR and will search and identify articles as a whole, even across pages.

Photo source: Jon S, Flickr

The Periodic Table of Business Research Databases

I just came across a terrifically handy tool from Alacra called the Periodic Table of Business Research Databases for identifying the right database for business-related research. There are a vast number of databases available on the market, each with its own focus and content depth. This tool provides a nice, quick and dirty overview of most of these information sources, identifying them by the various categories of research (company profiles, credit and investment research, market research, news, etc.).

Unfortunately it is missing some key Canadian databases, such as Infomart and Newscan, but I’ll be bookmarking it.

New Online Case Law Additions in Canada and the U.S.

The trend toward providing more free online access to court opinions got a massive boost recently in both Canada and the U.S. In Canada, the Law Society of Upper Canada, the copyright holder of the Ontario Reports, made available to CanLII the full historical collection of OR case reports (15,000 decisions published from 1931 to 2013), increasing CanLII’s database for Ontario courts by about 25%.

In the United States, the federal Judiciary and the Government Printing Office partnered through the GPO’s Federal Digital System, FDsys, to provide public access to more than 750,000 opinions, many dating back to 2004. In addition to PACER, this is another source to access court-related information. [via InfoDocket]

New Resource for Finding Theses and Dissertations

Open Access Theses and Dissertations is a new search engine for locating open access graduate theses and dissertations published around the world. It’s important to note that the search engine does not search the full-text of the theses, but rather the metadata drawn from the records of university repositories, consortia, or OCLC WorldCat. Still, it is a good supplement to other resources such as the Networked Digital Library of Theses and DissertationsProQuest Theses and Dissertations, and the Theses Canada portal.

Theses are great sources of deep analysis and reliable statistics. They are a secret weapon in my research toolbox.