Now that my full research paper is complete and available, I wanted to share some highlights. Just yesterday I had another GC client tell me that their IT department wanted to rely on their upcoming upgrade to Exchange 2010 to respond to discovery requests. Microsoft has added some good new features, but I would not want to try to defend their use against any kind of adverse scrutiny. So let’s talk about the new ‘Discovery Search’ interface. First and foremost, this is basically the old administrative multi-mailbox search within the OutlookWebApp. The search name and criteria are written to a database table along with the user, date, size estimate and some keyword statistics. The last is a good feature that was undoubtably driven by a customer request to support keyword negotiations. Here is a look at the landing page:
Getting to this Discovery Search page is not intuitive. You will need to navigate through several Options pages to ‘Manage My Organization’ to find it. Once there, you can create searches with keywords, email addresses and date ranges on all or specific user mailboxes. There is no way to organize or secure your searches by matter, so a consistent naming convention will be critical. A list of raw searches will quickly become an unwieldy mess for any company with a normal litigation load. The good news is that you can run searches to just get a hit count and it will even break it down by search terms (assuming that your search terms have not corrupted). The bad news is that your only other real option is to restore the results to the ‘Discovery Mailbox’. Since Exchange 2010 broke their single instance storage architecture, that means quickly having to manually export on a continual basis or risking quickly overrunning the 2 TB recommended maximum Mailbox size (assuming that you have the optimal Exchange environment).
Exchange and Outlook are not designed to handle potential evidence. I have seen the damage that can occur when an eager legal-eagle attaches the critical PST and starts sorting through email. Just moving the items into a new mailbox can change dates, owner and IDs. Restored search results had changed Creator, Last Modified By, PR_Creation_Time and other properties. This is no way to handle evidence. Just touching an email can change the Read/Unread status, but you don’t need to worry about that, as we experienced odd Read/Unread behavior on restored results anyway.
Although Microsoft broke SIS in Exchange 2010, they added in an email deduplication feature when restoring your search results. Unfortunately, deduplicated results are dumped into a single folder with no ability to reconstruct their source, folder location or to account for the theoretical duplicates that were discarded. I have worked with the EDRM Enron Data Set Ver. 1 while testing a wide variety of archiving and discovery software. The early tests seemed to show a much larger deduplication of search results than expected. The EDRM Enron email has a large number of text artifacts/defects that have changed emails that were exact duplicates into ‘near duplicates’. In order to test for what Exchange is categorizing as an exact duplicate, a PST was created using a set of 8 original emails. Copies of these emails were placed in folders and altered or acted upon in ways that would result in changes to MAPI and content. This PST was ingested, searched and then restored with deduplication enabled. The deduplication function suppressed 30% of the altered emails that were arguably unique and it seemed to randomly reset the Read/Unread status. It ignored any forward or reply information, although the detail log report did retain category and action flags that were stripped from the results.
I could and did go on at length (85 pages to be exact) with everything we found. Technology that is perfectly appropriate for ordinary business usage is just not built to meet the requirements of civil discovery. We submitted our draft findings to Microsoft and got some good feedback on changes in the final SP1 release. We incorporated their feedback into the report, even when we could not find any documentation or online confirmation to back up their assertions. Exchange 2010 is a definite step in the right direction, but we did not find it to be a suitable replacement for existing, mature archiving and discovery technology.