Migrated from eDJGroupInc.com. Author: Greg Buckles. Published: 2014-06-10 20:00:00Format, images and links may no longer function correctly.
The worlds of academic research and eDiscovery do not collide often enough. All too many practitioners assumed that first generation eDiscovery processing, search and collection technology were accurate and effective. I will stay off of my soap box on validation testing, but my long term readers know my passion for defensible, transparent process and tools. All too many ‘academic white papers’ in eDiscovery are funded from provider marketing budgets and even academic organizations such as the governmental Text Retrieval Conference (TREC) can have their results ‘reinterpreted’ by spin doctors to sell products to consumers desperate for reassurance. The hype cycle around Predictive Coding/Technology Assisted Review (PC/TAR) has focused around court acceptance and actual review cost savings. The last couple weeks have seen a bit of blogging kerfuffle over the conclusions, methods and implications of the new study by Gordon Cormack and Maura Grossman, “Evaluation of Machine-Learning Protocols for Technology-Assisted-Review in Electronic Discovery”. Pioneering analytics guru Herbert L. Roitblat of OrcaTec has published two blogs (first and second links) critical of the study and its conclusions. As much as I love a spirited debate and have my own history of ‘speaking truth’ in the public forum, I can’t help wondering if this tussle over Continuous Active Learning (CAL) vs. Simple Active Learning (SAL) has lost view of the forest while looking for the tallest tree in it.
As a provider of CAL, Catalyst has immediately seized on the study as validation of their methodology/technology and promoted it hard. It is not difficult to see why OrcaTec would respond to the statements potentially depreciating the effectiveness of random sampling as part of the TAR process. All of this controversy around collection richness, gold standard decisions and comparative methodology misses some critical big picture items for the average practitioner. As a lowly former CSI, I am not going to step in the ring with all the PHDs arguing over the interpretations of this study.
The first point is that we have a real research study on TAR. These are rarer than four leaf clovers. So even if you disagree with the conclusions or find fault with the assumptions or methodology, the average practitioner can gain valuable insight into the general approaches the challenge of testing TAR systems. I happen to believe that some conclusions were overly broad and not justified for many TAR usage scenarios. But some of the detailed information about sample sizes vs. relevance richness are really good information for a practitioner who is struggling with the leap from linear review to TAR.
I have learned the hard way that every client has different requirements, language usage, environments and ESI characteristics that justify real adoption testing before you rely on any technology, process or provider for a live case. This study lays out how Cormack/Grossman designed their tests to eliminate the usual apple to orange comparisons that I see all too often, usually from over-eager sales reps in a client procurement cycle.
Marketing Language Examples:
“My system only required 5,000 review decisions to train compared to CompetitorX’s on a completely different case.”
“Look! 95% TAR precision after we used selective collection searches.” (meaning that ALL ESI already had same set of search terms)
Cormack/Grossman based their testing on the SAME item level decisions for all three systems tested and automated the test process. This takes more programming than most users could accomplish for 100 review batches. But you don’t need 100 review batches in your real world usage testing. I can see ways that a clever LitSupport tech could simulate this process with incremental load files or SQL update queries. So there is valuable material in this study.
The second big picture issue is the dynamic nature of relevance in real world matters. How many cases have you started into review and had to redefine the key case issues after stumbling across a hot item? My team adapted the old military adage to say that “no review protocol survives contact with the docs.” How do you think that we found the esoteric Enron terms like ‘Death Star’, ‘Fat Boy’ and ‘Get Shorty’ in all those trader memos? The custodians did not just tell us their ‘terms of art’, we had to find examples and then run more searches. The study scenarios seem to rely on what I would call a ‘stable’ relevance definition. Cormack/Gordon compensate for some of this by including fallible training judgments, but I would differentiate a wrong review call from the value of stumbling across an entirely new aspect of relevance. To roughly paraphrase a well known eDiscovery personality from a Carmel Valley retreat panel, “I am not worried about finding every variation of the documents that I know about. Instead I am worried about missing the one critical document that I had no idea existed.”
Without random or targeted sampling beyond those conceptually related to the training set, counsel may miss critical documents supporting the adversarial perspective. My main point here is that I do not believe that ANY TAR method is a stand-alone solution for relevance definition. They each have a potential role to play in your process. As someone with no product to pitch or position to defend, I am thankful for the serious work invested in both the study AND Herb’s critique. We need more research and public debate to advance our art in meaningful ways. So don’t be afraid to test your toys and don’t let fear of criticism keep you from publishing the results. You can always get me to review it and publish it for you.
Greg Buckles can be reached at Greg@eDJGroupInc.com for offline comment, questions or consulting. His active research topics include mobile device discovery, the discovery impact of the cloud, Microsoft’s 2013 eDiscovery Center and multi-matter discovery. Recent consulting engagements include managing preservation during enterprise migrations, legacy tape eliminations, retention enablement and many more.
Take eDJ’s monthly survey on Analytics Adoption for Consumers AND Providers to get premium access to profiles.
eDJ Group is proud to promote the Information Governance Initiative’s 2014 IG Annual Survey. We encourage you to participate and will share our insights on year to year trends when the survey is closed.
Join Greg in Houston at the upcoming ARMA International regional program on June 30. InfoGov: Getting Your Data House in Order to Avoid Litigation Costs and Risk.