I have read more eDiscovery case law and commentary on these two matters (Kleen Products and Da Silva Moore) in the last month than I ever wanted. Although the issues around Technology Assisted Review (TAR) are important, it appears that the fervor and hype are being driven primarily by a wide variety of parties who are attempting to capitalize on the matters. One has to question the motives of parties involved given the fact that hearing transcripts have been publicized and press releases have been issued – these kinds of actions take the focus off the real issues at hand and try to spin these important cases into advertisements. In Da Silva Moore the parties demonstrated laudable cooperation and agreement prior to the first hearing. They agreed to a relatively transparent protocol to tackle a massive collection. All of that has broken down and now there appears to be what could be a concerted effort to discredit magistrate judge Peck and force a recusal. Wow!! Would this promising case have turned so acrimonious without the heavy publicity and marketing budgets of TAR providers? Possibly.
eDJ has interviewed many of the key players in Da Silva Moore and published several perspective pieces on the case. Last week, Barry Murphy and I spoke with Paul Neale, plaintiff consultant/expert in Da Silva Moore and Kleen Products. Unlike some other sideline bloggers, I am going to resist the urge to debate the merits of matters still before the bench or allow my pulpit to be used to put forth either side’s agenda. Instead, I will continue to derive insight from the interesting technical and procedural elements exposed in both cases. In case you have been actually working instead of reading reams of filings, here is my overview:
- Da Silva Moore – the parties agreed to use predictive coding to tackle ~3.3 million items, but disagree on how to measue and define relevance success.
- Kleen Products – Defendants have invested heavily in a more ‘traditional’ Boolean search strategy that Plaintiffs want to replace with a TAR methodology.
So beyond the use of TAR for relevance determination, what do these matters have in common? Mr. Neale asserts that, “the plaintiffs in both cases are trying to incorporate methods to measure the accuracy of the defendant’s productions.” I agree in general, though I do believe that there is merit in transparent discussion and analysis of the chosen methodology and technology to assess any potential gaps or necessary exception categories. Both cases are stuck on methodology specifics when they should be defining measurement standards. Essentially, focus on the outcome over the process.
I find it interesting that the Kleen Products plaintiffs attack Boolean search as being inaccurate and outmoded, yet the Da Silva Moore seed sets (used for training Recommind in relevance categories) are themselves based on Boolean searches. The pleadings in both cases cite various studies and statistics on the effectiveness of manual review, search criteria and TAR methods that bring to mind Mark Twain’s, “lies, damn lies and statistics” quote. I distinctly recall spirited editorial discussions on the early versions of “The Sedona Conference® Best Practices Commentary on Search & Retrieval Methods” about how easy it is to take the wrong conclusion from the 1985 Blair Maron study.
Measuring the overall effectiveness of an information retrieval method is a combination of recall, precision and Accuracy (as calculated by the F1 score below). Apart from the Wikipedia definition, page 3 of Mr. Neale’s Declaration in Support of Plaintiff’s has some good examples to put these measurements into context.
The recall rate in Da Silva Moore appears to be very small, ~1.5% based on the initial random sample. The problem with random samples of very large ESI collections is that few collections even approach a uniform distribution of relevant items outside of email from MS Exchange Journals. Instead, we frequently find little pockets of critical items kept inconsistently by custodians. This poses a challenge when trying to train a TAR system. After all, the fancy calculations of content clustering, latent semantic indexing or other TAR mechanisms all ultimately result in highly complex search criteria/rules. I have every confidence that used correctly, these systems are far more effective at identifying common item characteristics and constructing retrieval criteria than almost any human being. But they only find what we train them to find. Any quality control sampling should be designed in to take account of the size, composition, distribution and other aspects of the ESI collection and the matter itself.
eDiscovery providers and consumers seem to be fixated on a magic TAR solution that will solve the relevance problem in under 5,000 licks of the review Tootsie Pop. The question should be how to measure the results of review passes against a predefined threshold of accuracy that is agreed upon by parties and the bench to represent a reasonable and proportional confidence standard. Mr. Neale, “Each use of TAR should be evaluated on its own but that an F1 score needs to take into account an acceptable level of both recall and precision – one should not be sacrificed at the expense of the other.” Discovery counsel and practitioners are just starting to become comfortable with statistical terms such as F1 measurements the TREC and other studies on manual vs. TAR retrieval efficiencies.
Remember from the formulas above that F1 is a balance between recall and precision. It is very possible to train a system to be 95% or even 99% precise but still have a very low recall rate. In other words, a retrieval definition can be tuned to find everything similar to what you told it was relevant, but that might actually exclude relevant items that you did not have examples of. A good process and appropriate quality check of the excluded (not relevant) set can address this concern, but it should be in common sense proportion to the overall collection and case. If you are going to take the time to read my and other blogs, then I hope that you can find the time to read some of the actual filings in these two cases to inform your own opinion. Be aware of the goals and biases of the parties and eDiscovery pundits when you sift through the material. Did you find any gems hidden under the fertilizer? Write a comment and let everyone hear your take always.
Greg Buckles – eDiscoveryJournal