I recently sat in on a conference call offered occasionally by the Antitrust Section of the American Bar Association to discuss developments in e-discovery. The general reaction I heard from lawyers about e-discovery—meaning, the use of technology to sift through mountains of documents for the few that may be relevant to a particular case—can be summed up in one word: suspicion. Sound familiar?
Lawyers use predictive coding to extract needles from their haystack of possibly useful texts just as translators use machine learning to extract terminology from their source texts or a corpus. Both professions can benefit from this technology, but many companies and individuals are incredibly mistrustful of the tools.
During this particular call, representatives from law firms, courts, and federal agencies discussed the guidelines they would like to see in place to make clients and judges more comfortable with e-discovery output. Most of the suggestions could very easily apply to the use of machine translation and CAT tools:
- Show your work. Be transparent about your process. For both lawyers and translators, this means defining different quality control stages and documenting the output at each stage. It also means understanding very clearly the limitations of your tools and ways you can work around these limitations. For instance, everyone knows that spell checker tools can’t distinguish between errors like “they’re” versus “their,” so professional writers use human proofreaders at a final stage to correct what the software cannot.
- Make multiple passes through your texts to better control how your work gets refined. For lawyers, this means using several, gradually narrower search queries to pinpoint key documents, rather than using one “high-powered” string of specific search terms on the very first go. For translators, this means doing your background and terminology research, then using your CAT tool, then using a concordance tool, then perhaps a special spell checker, and so on. Bottom line: when working with huge volumes of information, even when using technology-based solutions, it’s more effective to take small bites multiple times than trying to swallow the whole project at once.
- Spend a significant amount of time training your tools with a significant number of texts in order to teach it to produce results backed by statistically significant confidence levels.
- Have a human expert review and approve the results of the machine work before using them in a professional context. This was the most-repeated suggestion throughout the call. People are far more likely to trust a machine that is taking over a formerly human task if a seasoned professional can confirm that the machine is, in fact, performing well. Even if you only confirm a [statistically significant] sampling of the machine output as high quality, you will exponentially increase your client’s comfort with the non-human processing of language.
Where else have you heard concerns from the translation industry cross over into other fields? How have you addressed concerns about technology use in your field? What do you do to make sure your time savers work properly?