New Purview eDiscovery Portal
Overview
eDJ: This quick overview of the new Portal UX is meant to support existing Purview eDiscovery users needing to quickly identify changes that may impact their eDiscovery workflows before the classic UX is decommissioned. It includes some open questions that will be updated after discussions with Microsoft or more extensive testing. Please send comments and feedback to share with the community so that we can all minimize risk and maximize the new capabilities.
The new streamlined case page workflow has Searches, Hold policies, Review Sets and Exports tabs. This does eliminate some overlap and duplication, though there is an argument that the holds could have been merged with Searches. Cases are now the main workflow organizing basis instead of custodians.
Accept terms of data flow disclosure and privacy statement to access new portal UX. Based on a simple read, the eDiscovery cases do not share data to the Purview Data Map, but this has not been confirmed with MSFT.
The Purview portal has relocated and consolidated some features. The big change for legal is merging in the old Content Search into a core Searches component in eDiscovery. If your IT, Security or other stakeholders used Content Search for retrievals, investigations or other non-legal business functions you may have to create dedicated cases for them to work from with appropriate security roles.
Portal Settings (new):
These tenant wide settings are available from the Setting cog on the navigation panel under the solutions settings. They control defaults and can be overridden by Case Settings.
- User experience setting – will you leave the classic UX available to users?
- Analytics – Do you want to user the new privilege detection model? If so, you will need to upload a .csv file with attorney email addresses. There does not appear to be a case level list option, so the list will be global and need to be updated for matters with new counsel. The model gives a privilege score field and can be associated with a smart tag for review sets.
- Guest access – Enable external users with a Microsoft account to be added to cases. Requires tenant external user policy.
- Collections – Lots of new options here that can control default location and collection parameters for the tenant. Default allows case settings to override tenant. The most important difference is whether guest mailboxes, inactive mailboxes, group mailboxes (i.e. SP/Teams messages) and shared Teams channels are included on tenant wide searches. Retrieval setting control whether chat message hits are expanded to 12 hour conversation chunks, cloud attachments (hyperlinked), document versions and partially indexed items.
- Tag templates – Set default tag groups and choices for cases. Good option for consistent universal categories such as privilege, confidential, PII, etc.
- The Communication library and Issuing officer apply if you are issuing hold notices via Purview.
- Historical versions (preview) – The feature language seems to imply that after the preview version search and retrieval will require additional licensing/costs.
Case Settings (where different from Portal):
- Case – Primary actions to close or delete a case
- Premium features – you may have to enable this for legacy classic cases.
- Access and Permissions – Can now add individual users or entire role groups. I believe that users need to belong to role groups to control their access permissions.
- Data sources – This controls default sources in tenant wide searches
- Search & analytics –Classic settings for dups/threading, themes. Can set a list of Ignore words for themes and automatically run OCR where needed. Purview OCR services for compliance and overall enterprise requires an Azure account and license, but eDiscovery still seems to be running OCR without charge.
- Review sets – Enable grouping in Review Sets where analytics has been run.
Case level changes/decisions:
The first big change is consolidation of tabs to simple search, hold, review and export. It is important to understand that any search now (or has for a while) triggers advanced indexing in order to generate the estimates. Once upon a time only searches committed to review were reprocessed for advanced indexing. Adding items to a review set creates a copy in a secure repository that becomes ‘immutable’, so changes to the source after commitment may not be reflected in the review set. A Hold is basically a search scope with a hold policy associated.
Search
Search query now consists of a simplified set of sources and conditions. There is a new ‘Search by file’ option for premium licenses that will be investigated later. The Search subtabs include Query, Statistics and Sample. It is good to see more detailed search result estimates in the queries, but I wish they restored estimates to hold policies.
Sources
Sources are now driven by recognized entities, People and Groups. Each entity generally owns a mailbox and a SharePoint site (OneDrive) to select from. Per Microsoft, you can search for sources be URL, SMTP, alias, display name and even ExchangeGUID. Although they removed the ability to upload a CSV list of custodians/sources, you can search for up to 100 with a comma delimiter (thanks to the PM team, as that was the one delimiter I did not test…). Users with large lists of custodians or groups can leverage PowerShell or use a partner product to avoid this inefficient exercise. The inability to search for familiar fragments of names/terms buried within the full value still poses a challenge. I believe that Purview practitioners need access and tools to navigate and visualize the tenant’s entities. The new functionality IS a dramatic improvement and minor tweaks to the source selector will make it better.
You can add ‘tenant-wide’ sources with the corresponding performance impact. I am happy to see that inactive user assets are now added by default and can be excluded. The release blogs mention a new data source “sync” that ‘alerts eDiscovery users to the new site’s existence’. I will update this after checking with Purview team. I cannot find any settings for ‘alerts’ in the new hub yet. My guess is that they improved the data source for real-time updates so that it is always accurate. If so, that make me wonder how often the synchronization ran in the prior system.
Source search seems to be EXACT with wildcard ending. i.e. “Microsoft Roadmap” Teams site cannot be found by “roadmap” source search. The additional target identifiers can help, but know that channel display names are added behind the Team name, so searching for them can be tricky. Sufficient to highlight that you need to gather information about your query/hold targets and make sure they are added to query.
Still need to test private and shared channel files as they are stored in a different location. Confirmed that by default selecting the parent Teams site includes ALL subSP sites.
Conditions
Property conditions (selective property:value conditions) now include KeyQL (Keyword Query language) statements. Previously you were limited to using the condition builder or manually creating KeyQL statements.
The new field selector does a better job showing common vs. mailbox vs. SPOD specific conditions. There are 22 fields in the query builder, but you will have to use KeyQL statements for the all-important DocumentLink: property for targeted SPOD folder collection. Unfortunately, mailboxes folder targets still require retrieval of the FolderID via PShell query.
Content conditions (keywords) no longer have a popup window for large lists, but pasting a list of keywords into the small value window converted them to spaces that acted as OR connectors in my quick test search. I want to run additional tests for lists of phrases and Boolean (AND, NEAR) clauses. Will update with results.
They have added the ability to search using example files as search input. I am still wrestling with a support ticket blocking some features, so will report when I have tested.
Format query results is a new query panel that allows you to select statistics options for categories, keyword report and partially indexed items. Alternatively, you can go straight to a sample set from this panel.
Partially indexed items – you can return ALL partially indexed items or just those in locations where you have search hits. This is a substantial improvement in understanding the default indexing status on searches. You can elect to kick off advanced indexing on partially indexed items in your query sources. The Statistics page will have information about the advanced indexing success. More information on this after further testing.
Statistics
Query (not hold) Results have expanded statistics with charts that give you volume, top 100 locations with hits and top 100 errors and overall sources with hits. Advanced Purview licenses will report on top sensitive information types (SITs), keywords and item types. Based on retrieval performance, statistics are stored and then must be regenerated if the search results have changed over time.
Sample
A sample set can be created after the query completes based on the number of items from a set number of locations. The sample preview only presents 4 fields but does allow you to preview items prior to adding them to a review set or export.
Add to Review Set/Export Options
There are a LOT of new options that need consideration when committing a search to a review set or export. The downloadable process reports contain a lot more metadata fields and statistical information. You access them through the Process Manager button in top right. I have called out the changes below:
- SPOD versions – latest version only, last 10, last 100 or all versions. This can make a huge impact on matter volume. I recommend establishing a default and updating request parameters so that counsel can call out versions where important. Most of the time I recommend reviewing the latest version and recollecting prior versions during review for important items unless versions are specified in the eDiscovery agreement.
- You can now expand hits to also collect everything in the folder. This is a game changer for developing and testing scope criteria.
- Expand hits in lists to collect all items. Interesting that I could not uncheck this option. I could select whether to collect list attachments.
- Teams/Engage chats can now be collected as 12-hour chunks instead of individual messages with hits.
- Chat conversations can be collected in HTML transcript format. This definitely makes review much easier AND my quick test even found emoji reactions (prior gap). Many processing platforms may not handle the new HTML format well. It also may pose redaction/disclosure issues by collecting non-relevant conversation segments. I can also see the 12-hour cut off generating questions from opposing counsel.
- Chat cloud attachments and their versions from mailbox conversations (that is where chat is stored) can now be retrieved with same options as SPOD.
- Direct query export does not allow you to expand hits by families(email) or conversations(chat) like you can in a Review Set. I will update on any differences in metadata fields between the direct query export and the review set export after analytics.
Review
This needs a lot more exploration, so I am just including a quick list of obvious changes that may impact your workflow. The new features add a lot of functionality, but need testing to understand and validate with your particular content before relying on them.
- You can open the Review Set to item view via the three ellipses behind the set name or with the Open Set button below the set properties page.
- The downloaded process reports are zip files with relatively detailed .CSV files covering the search/review set summary, location and item metrics. I can only imagine that this information will be accessible within the Purview UX eventually. The summary does help understand the differences between the query estimates and the review/export item count.
- There are a LOT more metadata fields available for filtering items in the review set. Some are from the source and some are generated during processing. This dramatically expands the ability to test search/hold criteria and to refine saved exclusionary filters prior to export. Most or all new metadata fields can be added as columns in the item view.
- All of the newly available metadata content is viewable in the item preview pane and can be pinned to the view. I recommend trying the Compound Path field as a useful example.
- The ability to change the relative time zone in the results list is useful for cross continental matters.
- Although you can set the analytics parameters at the case level you still need to manually trigger analytics once the review set is complete. Although I observed improved performance in search and adding to review, initial tests on analytics showed 8-10,000 items per hour. I expect this performance to vary based on result set composition.
- After analytics has run, you can group results by families(email) or conversations/related items (chat). This same logic is available to expand tags on selected items.
- The analytic reports generate the statistic pie/bar charts for the whole review set showing sources, documents, email and attachment breakdowns. Somewhat similar to those in the Search Statistics.
- For large result sets you must turn off pagination to select all for bulk tagging, download or export.
- You can copy selected items to other review sets.
- Export options include expanding the set by families (email) or conversation (chat), but you cannot expand by BOTH. This means that you may need to segregate exports or make sure that your result set has already been expanded properly.
- I had to add purview.microsoft.com as a pop-up blocker exception to Edge to download a sample set of items. Downloads failed to appear inconsistently in my testing.
- The export reports contain a LOT more metadata fields. The classic export reports just gave the basic owner/date/source type fields. Users should evaluate the new metadata in your environment for the different data types to determine which should be removed from direct requests or merged into your external review platform.
Process Management (Reports)
One of the most important changes for tracking and defensibility is hidden in the Process Management tab. Every major item action (except holds) now has a common set of reports that can be downloaded (zip file) and opened/processed (CSV files) for tracking and QC. The reports give a lot more detail about the differences between search estimates in Statistics and final number of items when promoted to a review set. The item level report includes most or all of the new expanded metadata fields, which can be leveraged for custom scoping, QC and more. As mentioned above, I believe that much of this detail will be available in the Purview UX. I have not yet determined how long these reports exist, but as this is Microsoft, I would guess no longer than 90 days. While better reports themselves do not provide defensibility, they are an important part in your reasonable, mature eDiscovery process around Purview eDiscovery.
Purview eDiscovery (Premium) Portal Decisions:
- Integration of Content Search. Who are existing MIS or other stakeholders previously using Content Search and how to support them with dedicated cases?
- Should you leave classic UX available?
- Whether to use the privilege detection model?
- If so, who pulls together the attorney email list from what source?
- Can case owners override the tenant settings?
- Default tenant wide query scope includes guest mailboxes, inactive mailboxes, group mailboxes and shared Teams channels?
- Chat collections/exports to use 12-hour conversation chunks and export in HTML?
- Default SharePoint, OneDrive and chat attachment version collection?
- SharePoint, OneDrive folder hit expansion?
- List hit expansion and attachments?
- Do you use new SIT, Sensitivity levels and other metadata in collection workflow to segregate or tag them?
- Do you download and store process management reports for tracking and defensibility?
- If so, which reports and where to store them?