Migrated from eDJGroupInc.com. Author: Greg Harris. Published: 2012-02-22 04:00:55Format, images and links may no longer function correctly. Matters like Pippins v. KPMG LLP, —F.R.D.—, 2012 WL 370321 (S.D.N.Y. Feb. 3, 2012) provide a unique opportunity to discuss eDiscovery, and many of the considerations involved. How do the seven Zubulake factors apply? How much data constitutes an undue burden? Can predictive coding be trusted, and at what point is human interaction required for review? Who is going to do all of the work? These are only a few of the questions brought to light by this matter. Though the Pippins case alone does not resolve any of these questions, it does offer a platform for discussion.
Looking at the first two Zubulake factors, the request in Pippin for over 2,500 hard drives is “specifically tailored to discover relevant information” in that the hard drives could contain evidence of the plaintiffs’ work history. The next factor questions the “availability of such information from other sources”. In Pippins v. KPMG LLP, Dist. Court, SD New York 2012, Judge Colleen McMahon writes that, “KPMG failed to keep accurate records of the hours that [the plaintiffs] worked”. Therefore, the hard drives may be necessary to provide a clear picture of the actual hours.
Zubulake Factors three, four and five relate to the cost of production. Based on research documented in a white paper written by KMPG employees in 2010 an average laptop hard drive contains approximately 6 GB of “reviewable file types”. Multiplying this by 2,500 gives us 15 TB of data to collect and process. Average costs to collect, process and produce are estimated to be anywhere between $250 and $2,000 per GB. Using $500 per GB as our estimate, the cost for 2,500 hard drives will be $7.5 million. However, according to the decision, KPMG estimates the cost to be approximately $600 per hard drive, which would bring the figure down to $1.5 million. KPMG argues that even this estimate will “swallow the amount at stake”.
Factor six is not necessarily relevant to eDiscovery in this matter, and factor seven examines the “relative benefits to the parties of obtaining the information.” Depending on what analysts discover on the hard drives, either party could benefit from their production. If the hard drives show consistent labor on work product in excess of forty hours per week, the plaintiffs would benefit. Conversely, if the hard drives failed to show this excess, the plaintiffs would lose that supporting data, thus benefiting KPMG. As it stands, Judge McMahon denied KPMG’s appeal, and they are “directed to continue preservation of the existing hard drives […]” as Judge James L. Cott ordered in Pippins v. KPMG LLP, Dist. Court, SD New York 2011.
How much data does it take to become unduly burdensome? Using the above estimate of 6 GB of “reviewable file types” per laptop, we have 15 TB of data to review. The same research also suggests that targeted collections only contain 1.5 GB of data per laptop. Targeted collections are based on keywords, date ranges, file types, and other criteria. Predictive coding would be useful to reduce the amount of data necessary for review. However, the problem with using this lower estimate is that more detailed analysis would be required to establish the amount of time the plaintiffs were working.
Assuming the plaintiffs used Windows computers, the analysts could review the event log to determine how many hours a particular employee was logged in to a computer. However, this would require a sufficient history to be present in the event log. Also, this only shows that the computer was powered on, and that the user’s account was authenticated. Suppose the plaintiffs’ work product consisted of documents and spreadsheets. In order to establish the amount of time worked during a given week using only these types of files, an analyst would have to make assumptions based on the “created” and “modified” timestamps for the files. Of course, this does not show that the user was actively working during the time between timestamp updates.
Perhaps the plaintiffs used some sort of web-based application to track progress, or submit information to a database. The analyst would have to use tools to build a timeline based on the browser’s history and cache, which again assumes that this information has not been purged. Functionality like this is not included in basic eDiscovery software packages, though it is provided in a lot of forensic analysis tools.
A matter such as this seems to blur the lines between eDiscovery and forensic investigations. Automating eDiscovery with predictive coding – technology assisted review (PC-TAR) is a great step forward in creating efficiencies. However, the need for skilled analysis and human intervention will remain. The questions to the reader are, should KPMG be required to produce 2,500 hard drives, and how much analysis is reasonable?
eDiscoveryJournal Contributor – Greg Harris