Increasing numbers of large enterprise customers are evaluating or using Microsoft Purview’s Record Management functionality directly or through integrated partner #InfoGov platforms like Gimmal or Veritas. Recent engagements have had me elbow deep in all the new Purview information governance features. I thought I should write a high-level summary of how some of these features might impact eDiscovery downstream to add value and avoid some potential gotchas.
First review existing policies, protocols, employee training materials and other documentation that demonstrate how the company thinks records should be declared, preserved and disposed of. Next discover what (if any) records management settings, features, policies, scripts, etc. have been implemented within the Microsoft 365 tenant. Last step is to examine the metadata/properties of a designated record file/email in place and after collecting a copy. The results of these steps rarely match up with each other.
Retention Modules/Component/Element Primer:
- File plan – list of retention policies, retention periods and rules
- Retention Policy – Location based data management
- Retention Label – Item level expiry management, label resides in metadata. Only ONE retention label per item.
- Record Label – A version that adds change/version controls and creates preservation copies in the ‘dumpster’.
- Regulatory Label – Manually applied label version that locks item for retention period against ALL changes (even Admin). Per MSFT docs, ‘use with caution’.
- Policy/label Scopes – These determine where the policies/labels are applied. Worth understanding in the context of your matter scope. A policy can only have one scope.
- Static Scopes – Location based. Think of these as default retention settings by department or site.
- Adaptive Scopes – Query based on Groups, keywords, properties.
- Sensitivity Labels – A limited number (5) of labels that are manually/rule applied to trigger data protection policies. Only one per item stored in plain text metadata. Example: Privileged, Confidential, Trade Secret, PII. You should be taking advantage of these if the corporation has invested the heavy implementation effort to create and tune custom classifiers for these.
- Classifiers – Before you get too excited, I rarely find these being implemented or maintained outside of heavily regulated verticals (and usually through partner platforms). Classifiers generally only apply to new items, though I have seen clever IT teams ‘migrate’ sites to reclassify content.
- Sensitive Info Types –
- 265 built-in – Pattern recognizers that must be associated with an active security/compliance policy to start finding SSNs, bank account numbers and similar GREP style patterns. MSFT owns these and they cannot be copied or tuned. However, the named entity type recognizers can be useful.
- Custom Info Types – You can create your own pattern classifiers for standard contract numbers, matter numbers, etc.
- Exact Data Matches/Keywords – As you would expect.
- Trainable Classifiers – Some kind of AI/cluster/dictionary based data models that can be used to apply policies/labels. My bet is that this is Equivio behind the veil, but I have never been able to get that confirmed.
- Built-in – 17 English classifiers detecting profanity, HR, discrimination and other common content types. MSFT owns these, so test before use.
- Custom – Essentially creating a data model from a known data set then training the model with mixed data until you feel it is ready to start finding new items. Only the model creator can update/tune the model. Beyond my own testing I have never seen these leveraged by clients for many reasons. If you have, message me!
- Purview Compliance Modules that may impact your eDiscovery. Ask if users have E3 or E5 licenses because most of these features are not available for E3 users. The functionality of the Compliance modules overlap and interact, so you have to review them all to understand how retention and records are managed.
- Records management – File plan, label policies, policy lookup, scopes, events and disposition. Good starting place to understand the retention context of your potential collections, especially event triggers that may have created gaps.
- Data lifecycle management – Similar to above, but focused on data management rather than records. Huge overlap.
- Information protection – action policies based on sensitivity labels that can encrypt files (yep, processing impact!), add content marking (which can help/hurt search criteria) and control site access (useful to understand in bad actor/breach investigations).
- Data classification – Trainable classifiers, sensitive info types, exact data matches, content explorer and activity explorer. If you can get access to content explorer it can help you evaluate the value of sensitive info or labels without having to collect or run reports. The activity explorer is basically an activity log tool.
- Communication compliance – Email policy violation monitoring, alerts and reports. I have used these for HR, harassment, EEOC and similar matters/investigations. I encourage clients to use the profanity, offensive language, discrimination, harassment and threat classifiers connected to communication policies to monitor culture and show reasonable diligence. Microsoft is adding a ‘Leavers’ classifier to public preview…
- Data loss prevention – classifying and protecting content on local Windows/Mac devices. This may help you understand what could/could not exist on custodial machines prior to collection or arguing scope.
- eDiscovery – I am going to save a deep dive for another day. This is where you can determine users/locations/scope under hold and starting dates of in-place preservation. Remember that M365 legal holds do NOT prevent custodians from changing or deleting active email or files. Instead, silent copies are placed in the ‘dumpsters’ that can only be retrieved via eDiscovery or Content search. You need to know about holds so that you can manage the preservation copies, explain them to counsel and minimize their impact. I have seen perpetual custodians with ‘Recoverable Items Folders’ that exceed 50 GB. So this can have major impact.
- Priva privacy risk management/DSAR – A relatively new module that identifies personal data. Frankly I have not explored/tested it yet. But if your matters have privacy issues it may help. Shoot me any testing write-ups you find.
- Sensitive Info Types –
Preserving Classification Information:
You are mainly looking to extract labels that have been applied in the item metadata (retention/record/regulatory/sensitivity) during processing. Because sensitivity information classifiers and other classifiers are M365 properties rather than embedded metadata like labels, you may have to run special searches via Purview Content Search or eDiscovery to identify items. Once upon a time we had to run searches by custodian in email archives to retain that information. Every tenant and corporate data profile is unique. I am calling your attention to these properties so that you can determine if they exist and whether they will be valuable in your downstream culling, review or even production labeling. Articles like this one from John Hodges can give you a good starting point for syntax since the MSFT docs are always a challenge.
Avoiding Retention Related eDiscovery Headaches:
Start by understanding when/if legal holds were applied to your custodians or overlapping scope. Hopefully a historical hold log exists outside of M365 eDiscovery (Basic or Premium). There is no easy M365 reports or query feature to check your matter scope against active and released holds. Holds and retention help you understand what may and may not exist in locations or custodian mailboxes. They are also the key to understanding and explaining why you may have multiple copies/versions of items that appear to have been deliberately deleted.
You need to determine what and why encryption may exist. Not only do you need the keys to the kingdom to decrypt these items, they may not have been properly classified or found in content based searches. Email encrypted by rule may have had the sender changed to the security service account or owner. Once an item has been encrypted, most M365 indexing/classification services can no longer see their content. I have seen some ugly search exceptions that traced back to protection rules or bad actors using data sensitivity labels to hide their communications from the filters.
M365 regulatory record label controls are hard core. If they have been enabled and labels applied, be very careful about what collection tools or methods you use. The existence of regulatory records in a SharePoint site or mailbox can prevent migration or restoration actions used by many tools.
Wrap Up:
This was a long piece that barely scratched the surface of M365 retention capabilities and pitfalls. I hope that it gives peers a starting place to avoid the latter and leverage the former. All of this started with an infogov engagement question of “Why not just use Microsoft?” As always, please comment or message me with your questions and insights.
Greg Buckles wants your feedback, questions or project inquiries at Greg@eDJGroupInc.com. Contact him directly for a free 15 minute ‘Good Karma’ call if he has availability. He solves problems and creates eDiscovery solutions for enterprise and law firm clients.
Greg’s blog perspectives are personal opinions and should not be interpreted as a professional judgment or advice. Greg is no longer a journalist and all perspectives are based on best public information. Blog content is neither approved nor reviewed by any providers prior to being published. Do you want to share your own perspective? Greg is looking for practical, professional informative perspectives free of marketing fluff, hidden agendas or personal/product bias. Outside blogs will clearly indicate the author, company and any relevant affiliations.
Want to see my pandemic project? Visit www.Know-Now.io to explore how eDiscovery tools can be applied on your own data to create a personal knowledge management system. Apply for early access while you can!
See Greg’s latest pic on Instagram.