New Study Mired in the TAR Pit?
The worlds of academic research and eDiscovery do not collide often enough. All too many practitioners assumed that first generation eDiscovery processing, search and collection technology were accurate and effective. I will stay off of my soap box on validation testing, but my long term readers know my passion for defensible, transparent process and tools. All too many ‘academic white papers’ in eDiscovery are funded from provider marketing budgets and even academic organizations such as the governmental Text Retrieval Conference (TREC) can have their results ‘reinterpreted’ by spin doctors to sell products to consumers desperate for reassurance. The hype cycle around Predictive Coding/Technology Assisted Review (PC/TAR) has focused around court acceptance and actual review cost savings. The last couple weeks have seen a bit of blogging kerfuffle over the conclusions, methods and implications of the new study by Gordon Cormack and Maura Grossman, “Evaluation of Machine-Learning Protocols for Technology-Assisted-Review in Electronic Discovery”. Pioneering analytics guru Herbert L. Roitblat of OrcaTec has published two blogs (first and second links) critical of the study and its conclusions. As much as I love a spirited debate and have my own history of ‘speaking truth’ in the public forum, I can’t help wondering if this tussle over Continuous Active Learning (CAL) vs. Simple Active Learning (SAL) has lost view of the forest while looking for the tallest tree in it.