Migrated from eDJGroupInc.com. Author: Greg Buckles. Published: 2012-03-07 08:00:11
The value promise of ‘black box’ predictive coding or ‘Easy Button’ review gets marketing departments all excited. So excited that eDJ was inundated with copies of the hearing transcript and opinion of Da Silva Moore v. Publicis Groupe & MSL Group, No. 11 Civ. 1279 (ALC) (AJP) (S.D.N.Y. Feb. 24, 2012) along with their interpretations on how this changed the ground rules of eDiscovery. Marketing departments can spin a mountain out of a mole hill. In his final order, Judge Peck pushed back, “To correct the many blogs about this case, initiated by a press release from plaintiffs’ vendor – the Court did not order the parties to use predictive coding. The parties had agreed to defendants’ use of it, but had disputes over the scope and implementation, which the Court ruled on, thus accepting the use of computer-assisted review in this lawsuit.” Check out Mikki Tomlinson’s interviews with Judge Peck or Conor Crowley’s excellent legal summary for more practical interpretations. The arguments over how you know when your predictive training are ‘good enough’ are worth dissecting. eDJ has been researching the methods of technology assisted review, so I thought it worth extracting some of these key points. Remember that even the vendor experts in this case are representing their technology. Preliminary raw hearing transcripts rarely released to the public and I wonder if the parties are as happy as their vendors about all the publicity. This is a nice glimpse into the fray before the dust has settled on technology assisted review.
First, let’s set the stage for you. Remember that the parties have already jointly agreed to the overall process and that ALL items reviewed will be produced excepting privilege items. The dispute is over details like sample sizes and who will decide when the training process is done. The defendant has roughly 3.2 million documents in the collection. The last 300,000 documents were a late collection, so there are a lot of arguments about how to handle iterative updates. There are seven key relevance categories, but two of the categories were late input from the plaintiffs (if I am interpreting this right). That means we get a lot of discussion about whether you need to re-review the initial random seed set (training set) in order to accommodate the new categories.
Predictive Coding Initial Proposal:
- Random Seed Set – 2,399 items for review by Defense
- Category Search Sets – Boolean terms/names – review top 50 hits from each (4,000 items? – without the Exhibits it is difficult to understand the exact number of searches or total number to be reviewed)
- 7 Iterative Predicted Relevant Training Sets – 500 items per set selected by Recommind engine
- Final Random Quality Set – 2,399 items
Let’s start with the sample seed set size, confidence level (95%) and the associated confidence interval (+2%) (Pg. 58). It appears as if the parties are using a standard calculation that assumes a relatively even distribution of relevant items. You can run the calculation yourself here:
I tackled some issues around sample sizes a couple years back, but there are no easy ‘one-size-fits-all’ sampling calculations. The Plaintiff’s vendor expert, Paul J. Neale, makes some excellent points in his declaration. The main point that jumps out is a fundamental challenge to many training and search systems, false negative exclusions or “you only get what you ask for”. Anyone who has participated in a large scale document review is keenly aware of how the initial relevance and issue profile can and will change dramatically once coding is underway. I ran a small coding shop back in the early 1990’s and I can tell you that the effort required to go back through boxes of hardcopy to ‘re-review’ for new issues or modified relevance taught us some hard lessons. We learned to review diverse samples and get early counsel decisions on small batches until the coding manual and example documents were ‘stable’ before we cranked up the teams. With propagated decision systems there is the danger that the system will ‘latch on’ to a subset of relevant documents and never present others. The category searches in this protocol are intended to supply relevant exemplars to counter this phenomena, but that only works for what you know or guess is out there. The plaintiff’s seem to feel that the final random QC set of the non-relevant discards is too small or insufficient. Judge Peck wisely pushed back on making an arbitrary decision until the parties have made the first round of review. It is interesting that the parties discuss stopping iterative reviews when they return less than 5% new hits(25 hits). “There is no science to it.” Pg 74.
What bothered me about the statistical arguments is that there was no discussion of or profile metrics about the collection itself. What percentage(if any) of non-text files did it contain? Had the interviewed custodians organized their email and documents to make pockets of relevant items? Will the different sources of ESI affect relevance characteristics and context? Maybe the parties already knew the answers, but it did not feel like the statistical variance and relevance distribution were a recognized issue. Our ongoing research on technology assisted review has included some great interviews with users and providers. I plan to explore practical approaches to eDiscovery statistics to translate the impact and use of accuracy, precision, recall, f-measure and other terms to the non-statistician in upcoming blogs. I know that they are long documents, but the transcript and supporting plaintiff declaration are worth the time to read. I would love to hear your impressions of the arguments.