Migrated from eDJGroupInc.com. Author: Barry Murphy. Published: 2010-10-28 19:56:47Format, images and links may no longer function correctly. Recently, Recommind briefed eDiscoveryJournal on the software vendor’s predictive coding. In the Recommind context, predictive coding starts with a sub-set of data (derived by various techniques such as concept searching, phrase identification, keyword searching, metadata filters, etc) and users review and code the seed data set for factors such as responsiveness, issue, and privilege. Once that review is complete, the user can hit a “train” button that tells the Axcelerate application to identify conceptually similar documents based on the attributes of the first set of coding. Recommind refers to this as machine learning – the engine learns from the document coding conducted by humans; and vice-versa, with predictive coding the human reviewers also learn from the suggested relevant documents that are returned by the machine. Basically, it is a process where reviewers are presented with more relevant documents, more often and see much less non-relevant document that slow down the process of completing review. There are checks built in so that case managers can continue to review sets of potentially low relevance documents. If any of those documents are in fact responsive, they are re-coded and the system can apply this learning back to the rest of the corpus.
The predictive coding technology and process then allows the review team to conduct random sampling of the positive results or the documents judged by the application as not being responsive or privileged, etc (organizations can specify the accuracy level the want to go with; most go for 95% – 99% accuracy). If any of the random sample documents are relevant, the system is retrained based on the newly marked documents.
Having been in the eDiscovery industry long enough – and being a professional skeptic – my first question was how predictive coding has stood up to challenges. After all, no one wants to use a process that might get struck down. Recommind points out that they have many clients using predictive coding as part of their review process across court cases and regulatory hearings and it has never been challenged. From Recommind’s perspective, the lack of challenges against the predictive coding process points to an acceptance of the process. Since the vendor can’t publicly name clients using this process or the cases it’s been used in, it’s hard to say if predictive coding is truly accepted. What do you think? Is predictive coding defensible?
[poll id=”5″]
While Recommind believes that predictive coding will ultimately be the default rather than linear document review (and I bet there are lots of young associates that hope it does), one thing is clear in my mind – predictive coding can certainly help in the near term with corporations’ early case assessment initiatives. In our post on Earlier ECA, we looked at how organizations are looking to find out as much about their potentially responsive data as quickly as possible and as early in the eDiscovery process as possible. Recommind claims that customers can get intimately familiar with what the make-up of a data set is within 24-48 hours of collection. That is hugely beneficial to such things as meet-and-confer preparation – one of the use-cases associated with ECA. In addition, corporations can start to optimize keyword selection – checking the keywords that are there and then be able to better negotiate the keyword. Why give them ‘oil spill’ when ‘oil spill gulf of Mexico’ will give both parties the relevant results and cut out non-relevant noise?
Law firms can reap the benefits of predictive coding, too. As the market evolves and corporations demand more service at less cost, predictive coding promises to make firms more efficient so as to be able to play in fixed price bids. Will it allow them to eliminate linear document review? Only time will tell; in this industry, things tend to happen slowly.