Cracking Office Open XML Files
We all know that Office 2007 and later files are a different file format from your traditional DOC/XLS/PPT files, but I thought that it was worth exploring them with an eye on their potential impact in eDiscovery activities. First we need a simple explanation of what changed from Office 2003 to Office 2007 formats. Prior to 2007, Word, Excel and Powerpoint files were each proprietary binary file formats that required the application or a viewer to open. Office 2007 adopted an XML-based file format called Office Open XML that uses a common set of XML files within a compressed Zip container. These Extensible Markup Language (XML) files are simple text files that resemble HTML. The files now have an X or M added to their traditional file extensions to indicate whether they are flat XML or if they have embedded macro content. So DOC, XLS and PPT have become DOCX/DOCM, XLSX/XLSM and PPTX/PPTM. There are many advantages to the open formats, but we will focus on the potential discovery impact.