Migrated from eDJGroupInc.com. Author: Greg Buckles. Published: 2013-05-16 07:28:57Format, images and links may no longer function correctly. Businesses of all sizes are migrating files from unstructured file shares to onsite and cloud based content collaboration systems at a remarkable rate. Microsoft’s SharePoint 2010 and 2013 are finally seeing rapid adoption and eDJ’s working analysts have seen increasing inquiries on managing eDiscovery and compliance risks in these new environments. Almost all of these new ESI repositories come with search indexes to support the end user experience and to satisfy new information governance requirements like the 2010 Dodd-Frank Act. We will be publishing a research report on the IT impact of the new ‘corporate transparency’ mandates shortly, but I wanted to explore the risks and benefits of leveraging the ‘in-place’ search indexes.
The Problem
For too long, the immature eDiscovery lifecycle has confined required collection and processing before search could be leveraged for most of the market. Enterprise archives from companies like Symantec, EMC, HP-Autonomy and CommVault brought the first functional ‘in-place’ discovery search to selective ESI sources, but they were still primarily repositories of inactive content or compliance copies. Next came eDiscovery ECA appliances such as Clearwell, StoredIQ, Recommind and Octane that connected directly to enterprise sources. Although these systems can index content in place, their primary usage case has been to broadly inventory content and then collect/index based on custodians, dates and other file properties. We have seen this same hesitancy to use search terms for collection in the wild from customers with crawl based search systems (think one time search with no index) such as Encase eDiscovery, AD eDiscovery and the Nuix Collection Center.
Unstructured to Searchable – the SharePoint Sea Change
With content migrating from unstructured, inaccessible file shares and PST files to searchable active systems, CIO’s and IT directors are asking corporate legal, “Why do we need these appliances and software? It’s all just search, so why can’t you use what we have?” That’s where we have been brought in to test the systems (e.g. Exchange 2010) and to explain to IT that most were not built to meet discovery requirements. Microsoft, Google, Symantec and the other giants struggling to host corporate assets have made a lot of progress and I am looking forward to putting their latest offerings through the same wringer. The reality is that their indexes may be ‘good enough’ for discovery search of ESI in place. So does that threaten the eDiscovery software market?
Connectors and Federated Search
For now, we expect customers with significant litigation profiles will continue to demand mature, full-featured products specifically built for eDiscovery workflows. That has providers scrambling to create connectors to these new, smarter sources that leverage their indexes to give corporate Legal a single portal. There are many advantages to leaving all the hard work of extracting text and updating indexes to another system while you own the customer interaction. There is an excellent plain language definition of the federated search options here. From the user perspective, you type in one search and it brings back a unified list of results (generally). The potential problem occurs when this ‘one search fits all’ approach is accepted without understanding how your search criteria is interpreted by each different engine and applied to potentially different fields. Example questions:
- What fields does each system use to determine ownership or date created?
- Do all of the systems support the same Boolean search functions?
- How are duplicate results handled from different systems? (yes, copies happen)
- How are unsearchable file types handled or reported? (audio, images)
- Are there different limits to the number of terms, names or sources?
I don’t see anything intrinsically wrong with the federated approach. The idea does seem a lot more attainable than the ‘universal’ index concept that has sold a lot of HP-Autonomy products. With so much content flowing into SharePoint and cloud-based content management systems, I expect a steady stream of press releases announcing a wide variety of integration methods. X1’s monthly eDJ briefing covered the new SharePoint integration in their X1 Search released last week. The big question is whether the SharePoint connector and search is so simple that an unsophisticated user could assume that the results were based on the X1 index instead of the SharePoint indexes. As an industry, the eDiscovery market has a bad tendency to offer up ‘Easy Button’ solutions that have the potential to gloss over issues that can impact discovery productions. We are seeing it in the glut of new predictive coding offerings and I expect that federated search strategies will bring their own share of hidden gotchas. Don’t let these issues stop you from embracing innovation. Just take my hard earned advice and spend the time to both understand how they work and validate them.
Greg Buckles can be reached at Greg@eDJGroupInc for offline comments or questions. His active research topics include mobile device discovery, the discovery impact of the cloud and Microsoft’s 2013 eDiscovery Center. Recent consulting engagements include managing preservation during enterprise migrations, legacy tape eliminations, retention enablement and many more.