Migrated from eDJGroupInc.com. Author: Greg Buckles. Published: 2010-06-07 11:08:35 I have always contended that the estimates of the true volume and cost of eDiscovery compliance resembled the proverbial iceberg. The Socha-Gelbmann Survey, Gartner Magic Quadrant and Forrester Wave only deal with the small minority of matters that rise above the waters because their particular size or risk forced the parties to treat them with due diligence. The recent run of judicial sanctions and caselaw have focused entirely on preservation and search criteria issues, but they have raised corporate awareness about the difficulties associated with desktop preservation and collection. I have seen this awareness translated into corporate clients exploring their options, if not actually conducting RFP exercises in search of a solution.Software providers are actually ahead of this wave for once and we have seen a crop of new offering targeting desktop preservation and collection. The sheer scope of the potential BP civil and regulatory discovery has raised the question, “What would we do?” Having handled enterprise wide civil and criminal catastrophes during the Enron fallout, I only wish that I had had access to some of these new technologies back in 2000-2005. It would have made my life and my vendor bills a lot easier to deal with.Let’s start by defining the basic methods before we dive into specific solutions. At eDJ, we try to provide information and perspective without advocating any specific solution or technology. This is in recognition that your discovery demands and environment are unique. eDiscovery is not a one-size-fits-all world.
- Self Collection – The custodians themselves are responsible for identifying, preserving and even collecting/copying potentially responsive ESI. This method of preservation dominates corporations despite ample evidence that the majority of custodians do not have the time, tools or knowledge to carry out their responsibilities.
- Legal Manual Collection – Counsel interviews custodians and jointly identifies potential ESI that is collected by IT or a specialist. This is much more defensible, but mainly because the courts do not want to challenge the onsite judgment call by an attorney. It increases the level of diligence, but still cannot identify unknown or inaccessible ESI collections. Savvy counsel will have a checklist and push custodians to look in places that they would have ignored, but they will still miss large portions of potential ESI. However, having an attorney on hand is pretty much the only way to defensibly exclude entire folders during onsite inspection.
- Forensic Imaging – A full forensic image will preserve everything on the custodian desktop, but at a high price in time, cost and volume. Grabbing everything dramatically increases the volume of ESI that will have to be preserved and processed. Even if you have technology to filter out system files, you are really just delaying the relevance process.
- Local Crawl – A crawl search is a live search of every file or email on the desktop to see which match up with your relevance criteria. Even a check of names, dates, owners or file types can take hours on a large desktop unless you are willing to interrupt user activities. A background crawl with full text search criteria on a 100 GB laptop can take 10+ hours and most crawl search systems do not handle a wide variety of file types. If your relevance criteria changes, you will have to do this all over again and you had better hope that your system understands how to deal with differential crawls.
- Enterprise Crawl – Guidance pioneered this concept back in 2002-3 and ran into the challenges of using the local processor and dealing with the limitations of network connectivity/bandwidth. The method centralizes administration and reporting with a corresponding reduction in manpower required. Corporations that minimize local ESI and broadly collect in a high connectivity environment can benefit the most from this method. It is not for everyone, but some corporations have definitely been able to realize their Return On Investment.
- Local Index – This search relies on the local desktop to proactively index ESI and execute preservation/collection searches. This is an example of distributed processing, but requires an enterprise wide roll out to minimize user impact and maximize user business benefits. The main challenge is that most desktop search engines like Windows Desktop Search were not designed to comply with discovery accuracy requirements. In simple terms, they tend to focus on common Office file types and do not track unindexed or inaccessible ESI. X1, dtSearch and Isys have tried to bring more accurate engines to the market. Local indexes tend to minimize index lag, but rarely offer anything more than simple Boolean search functionality.
- Enterprise Index – Centralized indexes driven by servers promise enterprise wide search. The reality of poor connectivity, slow storage, highly mobile decision makers and the radical growth of corporate ESI have kept this promise from becoming reality for most corporations. They have the illusion of live enterprise search, but only as long as they do not look too closely at search results. Discovery appliances such as StoredIQ and Kazeon have bridged this gap with high powered servers that index desktops only when they are identified as being potentially relevant. Although there is a lag in search availability, subsequent refinements of search criteria do not require a full crawl. Index lag is still an issue, but the limited set of targets enable administrators to force index updates and at least have a higher level of confidence.
- DR/Archive Index – This method relies on a back up or archiving solution that keeps copies of all desktop ESI on the network. The only gap is ESI created since the last time that the laptop was connected to the network. It removes the indexing burden from the local desktops, but does represent a substantial enterprise investment.
The below table is just a rough comparison based on typical corporate desktops and general methods. Specific technologies will vary from the generic aspects and should be evaluated based on your requirements.
Method | Cost | Time | Risk | User Impact | IT Impact | Size1 | Accuracy |
Self Collection | Zero | <1 hr | High | High | Zero | 0.1 GB | Low |
Legal Manual Collection | $3-500 | 1-2 hrs | Med | Low | Zero | 2.5 GB | Med |
Forensic Imaging | $500-$750 | 3-5 hrs | Low | Low | Zero | 100 GB | Over |
Local Crawl | $100-500 | 4-10 hrs | Med | Med | Low | 5 GB | Med |
Enterprise Crawl | $25-1002 | 5-15 hrs | Med | Low | High | 5 GB | Med |
Local Index | $50-250 | Zero | Med | Low | Low | 5 GB | High |
Enterprise Index | $50-1002 | Zero | Med | low | Med | 5 GB | High |
DR or Archive Index | 25-1002 | Zero | Low | Low | Med | 5 GB | High |
- Based on a theoretical 100 GB Desktop/Laptop drive. Estimates may vary for your data profile
- Cost per custodian based on overall investment in enterprise wide license.
This is a good place to start diving into specific products in the following journal entries. In each specific product we can explore the advantages and potential issues associated with the general methology and how that specific product addresses them.