Proximity Search Challenges in eDiscovery
Searching for a single term within a document is pretty black or white. It is either present or not. When you step up to searching based on phrases, proximity terms, concepts and compound term clusters things start to get a bit less absolute. Yet, simple lists of terms are generally either overly broad or are missing relevant ESI. The simplest search index does not store information about the position(s) of terms within a document. Modern search indexes such as Lucene, FAST, IDOL and others rely on term position and other information to derive clusters of two or more related terms (concepts) and relevance weighting factors. During a recent briefing call with Mike Wade, CTO of Planet Data, we delved into some of the challenges that Planet Data faced expanding their Exego Early Cost Assessment platform to support concept search and ECA workflow. What really caught my attention was the ability to extract two separate versions of the text from documents, both the raw unformatted text AND the rendered view. Alternatively, they have developed a merged rendering that embeds the extracted object text in-line with the viewed text.