Migrated from eDJGroupInc.com. Author: Greg Buckles. Published: 2010-03-04 04:00:31Format, images and links may no longer function correctly. In the weeks following LTNY 2010, I have tried to catch up on the demos and briefings that did not make it into my busy show schedule. I finally managed a look at the new i-Decision automated first pass review from the team at DiscoverReady. It got me thinking about the entire concept of automated relevance designation. Several years back, H5 introduced automated review to the market using their Hi-Q Platform™. Recommind’s Axcelerate, Equivio’s Relevance and now Xerox Litigation Services CategoriX also bring some flavor of automated categorization to the field. Having at least five serious products on the market tells me that customers are paying the relatively high per item or per GB rates to bypass a full manual review.
Before we get too far into the topic, it would be good to look at how a couple of these folks define automated document review. For the purposes of this discussion, we can exclude any processing, culling, concept search, near duplicate clustering and other technologies that serve to either remove items prior to review or cluster items with the expectation that they will still be viewed. These methods can reduce the review volume or increase the relative review speed, but human beings are still expected to put eyes on everything that is produced. DiscoverReady, H5, Xerox and Recommind all intend for their systems to effectively replace a full first pass review. Equivio is a bit more conservative with their claims and has done an excellent job of extracting both positive and negative Boolean criteria to support Meet-and-Confer negotiations.
But are counsel ready to produce documents based on black-box technology? Auto-categorization studies like those by the eDiscovery Institute and other vendor sponsored comparisons have presented excellent evidence that expert application of search, clustering and other technologies consistently retrieve more items matching the relevance criteria than manual review. Notice that I did not say that they retrieved more relevant documents, as that is a subjective judgment. When you can define the characteristics of what you want, technology will always be more accurate and consistent. But the best criteria in the world will only find what you ask for. Whereas most first year associates have a chance of spotting secondary work-product privilege, just because they learn to know it when they see it.
Most of these systems use a single expert reviewer to ‘train’ the system by reviewing small batches of items and extracting relevance characteristics from the resulting buckets of items. In this context, counsel is making the initial calls and the system is just extrapolating the common relevant names, terms and other criteria from these sets. The first stumbling block comes when you want to know the exact reason that an item was automatically put into the Relevant/Non-Relevant bucket. Despite all the assurances of third party validations, studies and the indisputable fact that these systems do a better, cheaper overall job of first pass review, few lawyers can be comfortable taking responsibility for black box decisions.
So what does one of these systems look like from the customer perspective? Equivio’s Relevance is the easiest to understand and makes the most modest claims. It is available for corporate enterprise purchase or through a provider partner on a per item cost basis. You could use their system in an ECA model with a small investigatory collection or against all or part of your preservation collection. You start with a 40-50 item training set and mark Relevant/Non-relevant only. The system uses 200+ fields and characteristics to extract criteria and then runs a prediction against the next training set. You know the predicted relevance ratio going into each training set and the system tunes the criteria profile until it meets your predictive confidence level. Equivio does not claim that they can find every relevant document, but they expose the iterative predictive accuracy with a calculated f-measure graph (essentially a weighted average of precision and recall). The idea is to run training sets until the system demonstrates that the relevance profile is stable and consistent to your level of confidence and error. This only works for Relevant vs. Non-Relevant, but the ability to access both the positive and exclusion keywords from the profile provides counsel with a scientific means of determining search criteria in compliance with opinions like Judge Paul Grimm on Stanley v. Creative Pipe.
So Equivio carefully positions itself as a categorization/investigation tool in counsel’s arsenal, though the company would not deny that customers could use Equivio for full first pass review if they were comfortable with it. I liked the transparency of the system. Equivio is not going to spell out all of the 200+ factors that they use in their profile system, but a customer can easily convert the profile to a set of Boolean criteria to take to the other side. I also liked the dynamic review cost calculator built into the application. In the next part of this series, we will look at a full blown automated first pass system to see how that would look from the customer’s perspective.