Migrated from eDJGroupInc.com. Author: Chuck Rothman. Published: 2012-04-17 07:00:05Format, images and links may no longer function correctly. Parts 1 and 2 of this series illustrated some of the issues that should be considered when processing electronic records. This final part continues the discussion and ends with a checklist of questions to ask, in addition to the per-gig cost.

Searchable Content

Most electronic records contain words. Some are typed into a computer, some are scanned from printed material, and some are even spoken. In most cases, it is this content in which a reviewer is most interested.

For files created by typing (such as Word, Excel, Powerpoint, and emails), it is a simple procedure for the processing software to extract the text. However, what text should be extracted? Word documents may contain redlining; Excel spreadsheets might contain hidden cells, formulas and notes; PowerPoints can contain hidden slides and speaker notes.

While the decision of whether to include some or all of the non-visible text in the searchable content is usually a legal one based on the issues of the case, it is a decision that needs to be made. All too often, this is overlooked, and the default settings of the software are used.

This can have significant implications. Consider the following example. Keywords are run on a collection to triage for privilege. An Excel spreadsheet contains notes attached to cells made by in-house counsel. However, when the text was extracted from the spreadsheet and made searchable, it did not include cell notes. As such, the spreadsheet was not included in the potentially privileged set. When reviewed, the reviewer did not examine the cell notes, and so the record was not flagged as privileged. A final keyword search of responsive records failed to identify this spreadsheet because the notes were not included in the searchable text. Ultimately, because it was a spreadsheet, it was provided to the opposing counsel natively, in-house counsel’s notes and all.

Some files contain content, but not in a form that is easily accessed. A PDF file, for instance, may contain a scanned image of record, or may incorporate security options that prevent extraction of its text. The e-discovery processing program needs to identify these types of records and deal with them in one way or another, such as OCRing scanned images, or flagging records where text was not extracted so that they can be addressed, if required.

Finally, more and more files appear in e-discovery review sets containing audio text. A typical example is a voicemail attached to an email. While a reviewer can usually listen to the audio recording during the review, special processing is necessary in order to be able to search the audio content. At the very least, records containing audio content should be flagged for further consideration.


It is clear that there is much more involved in e-discovery processing than just the cost. When planning an e-discovery project, consider the following:

  1. How are files (i.e. non-email) duplicates identified? Is the entire byte-contents of the file used, or just searchable text?
  2. What criteria is used when identifying email duplicates. Should it include the BCC field?
  3. Are email attachments and stand-alone files compared for duplicate analysis?
  4. What is the native file format for email bodies?
  5. Are objects embedded into Office documents extracted as attachments? If so,

    1. Is this limited to specific Office files types (such as just Excel files)?
    2. Which types of embedded objects are extracted?

  6. When text is extracted for indexing and searching, does it include hidden content?
  7. Are records containing inaccessible text contents identified?

e-Discovery processing software has been around for at least ten years, and the current offerings are fairly sophisticated. This means that most of the options listed above will have no impact on the actual processing cost. However, as shown, some thought given to how electronic records are processed can results in significant cost savings at the review stage and beyond.

eDiscoveryJournal Contributor – Chuck Rothman


0 0 votes
Article Rating