Migrated from eDJGroupInc.com. Author: Greg Buckles. Published: 2010-07-16 05:00:59Format, images and links may no longer function correctly. Almost every new processing or review application that I have seen over the last year has featured a left hand navigation window that enables users to dynamically filter the collection by Author, Date, Type and more. You can call this faceted navigation, guided search or browsing navigation, but it boils down to the user’s ability to actively browse/filter the collection by metadata characteristics and categories that have been extracted from the index. Although this seems like just another way to construct a search, this feature offers a lot more to the discerning user. In older platforms, users had to run reports on their collections to extract the summary population metrics across different fields. The first one that I recall was the Tally function in Summation. This could only be done one field at a time, but unlike most static reports, you could generate the tally numbers on a set of search results instead of the entire collection. Current review, processing and even archiving products like Clearwell, Relativity, Introspect and Symantec’s Discovery Accelerator can generate these hierarchical ‘facets’ across multiple fields and display the total and item level counts dynamically in real time.
So how do they do this? During the processing or ingestion of your collection, metadata and content text from items are extracted and inserted into the application database fields and full text index. The platforms vary wildly on what data goes into a database versus the text index, with all kinds of trade offs in performance, functionality, support expense and more. Whether it is entirely database like Concordance, mostly index like Zylab’s XML back end or a hybrid, modern systems use entity extraction techniques to analyze the frequency and relationships of fielded data. These systems automatically create sorted lists of categories such as year/month/day, people, company names, domains, file types, size and many more. All of these things could be found by running simple searches on these fields, but who has the time to run hundreds or thousands of searches and log them all in a spreadsheet? Yes, that is how we used to generate reports of Custodian counts, number of Excel and other document properties for counsel.
Even more important than the time savings, these facets effectively create a property profile of your collection. Search is only as good as the criteria. You have to know what you are looking for in order to create a search with real precision and recall. The facet display allows the collection to tell you about itself. Think about a typical IP scenario where you have relatively unique product phrase. You can run that search across your collection and one of the first questions will be, “Who touched the toxic phrase?” Dynamic facets would immediately give you a list of Authors, Recipients and found Names in descending frequency to build your list of potential relevant custodians. I see these kinds of features as essential tools to investigate, create and validate relevance criteria. Recent opinions have made it clear that the bench expects parties to have a reasonable foundation for search criteria that are used to determine potentially relevant, privileged or confidential documents within collections. You cannot just make up a list of search terms based on your best guess and the discovery request. Facet navigation can be a strong tool to enable and support your reasonable efforts.
After the search, browsing through your facet categories can provide an immediate first level quality assurance check. It is not a substitute for real sampling or other QA analytics, but it provides fast feedback to spot gross gaps, spikes and unexpected trends. For example, in a large population of results you would normally expect to see a rough bell curve in the chronological frequency. If your hits drop off suddenly in a month and then pick back up, it might indicate a gap in the collection (just remember the holiday cycle as well). Spotting unexpected domain names of competitors or interested third parties can be useful. Discovery search is still half art and half technology. Luckily for us, the technology is getting better every year. Traditional Boolean search depended entirely on the user to test, preview and revise search criteria. Faceted navigation, concept foldering, pivot tables and other new features now give the user insight into the collection before the search.