A question to the eDiscovery Facebook group asked how others were handling the unindexed items reported in M365 core and AED searches. If you have not run M365 eDiscovery searches yet or not noticed the Status section of the search detail page, it provides the item count and volume of ‘unsearchable items’ in the sources that your search. In my recent legal hold validation testing you can see 1,000 items (1.76 GB) were flagged as ‘unindexable’ in the eDJ Group M365 test environment I was searching. Microsoft assures us, “most organizations customers have less than 1% of content by volume and less than 12% of content by size that is partially indexed”. Those numbers are a bit low for the large enterprise instances I have done validation testing on, but probably a good overall average. So what do you do about the target content that you cannot search?
First, remember that this only matters if you are using search terms/keywords. Many M365 customers and eDiscovery service providers have elected to just ‘pump and dump’ custodial mailboxes, OneDrives and SharePoint sites to Relativity, LogiKcull and other platforms with collection integrations instead of trusting the evolving M365 search infrastructure. Have they actually tested the search infrastructure of their chosen eDiscovery platform? If they had, they would quickly find indexing exceptions are common in every large, diverse ESI collection.
‘But my provider never told me that!’
Check the details in your processing reports. You should find the exception metrics listing corrupt, encrypted, password protected and other problem items. A good PM will have sent that to your point person for approval or instructions on remediation. Assuming that you are using keywords, clustering, TAR or other text content-based relevance determination approaches, you will have exceptions.
Second, you can export a report of the unindexable items for analysis. This will help you understand why it flagged these items and what kinds of file types are affected. I dump that report straight into SQL or Access so that I can group by error code and file extension. You should discover that the vast majority of ‘errors’ are embedded images or objects within files flagged with ‘partially unindexable’. As my friend Rob Robinson pointed out, if you are using Advanced eDiscovery (AED) for custodians the system will attempt to reindex file types that are not indexed by default. If you export a small set of these items (I use a narrow date range and the export only unindexable items options) you should find that very few of these items contain any actual text. This exercise is just my way of understanding the true scope and nature of these flagged items. It takes some work to produce a report analyzing the unique composition of your unindexable content, but that report and exemplars supports your counsel’s informed decision as to when and what remediation is required to make a reasonable effort in a given matter.
Third, some issues/errors may in fact impact search accuracy/completeness. Here is a handy list:
- Too many files attached to an email message.
- A file attached to an email message is too large.
- The file type is supported for indexing but an indexing error occurred for a specific file.
- Encrypted or password protected items.
- Too many attachments.
- Text exceeded the word breaker limit.
The key here is to understand whether the relevant email or documents are likely to fall into any of these categories. M365 Advanced eDiscovery (AED) has an exception remediation process to download all the passworded files for you to unlock/decrypt and reprocess. Executives may well have passworded sensitive corporate documents that might be relevant to a board dispute. Now you know how to scope the size and nature of potential search exceptions.
Fourth, I recommend drafting a plain language standardized disclosure covering these and other known technical exceptions for the requesting party. This shows your good faith and gives them the opportunity to request exemplars or other reasonable accommodations if they can make the relevance argument.
Closing out, thank you for the inquiries last week when we were off diving with the sea turtles. It is nice to know that readers noticed the break in blogs. I am slowly processing my pictures and feeding them to my Instagram account. Back to work!
Greg Buckles wants your feedback, questions or project inquiries at Greg@eDJGroupInc.com. Contact him directly for a free 15 minute ‘Good Karma’ call. He solves problems and creates eDiscovery solutions for enterprise and law firm clients.
Greg’s blog perspectives are personal opinions and should not be interpreted as a professional judgment or advice. Greg is no longer a journalist and all perspectives are based on best public information. Blog content is neither approved nor reviewed by any providers prior to being posted. Do you want to share your own perspective? Greg is looking for practical, professional informative perspectives free of marketing fluff, hidden agendas or personal/product bias. Outside blogs will clearly indicate the author, company and any relevant affiliations.
See Greg’s latest pic on Instagram.