Migrated from eDJGroupInc.com. Author: Greg Buckles. Published: 2011-07-26 08:40:45 Although none of the principals involved want to speak on the record, I managed to get some detailed information on the technical issues of what happened in the Casey Anthony browser analysis. Consider this a Part 2 to my blog on the discrepancies raised by the defense between the initial police report using Digital Detective’s NetAnalysis of 1 visit and the subsequent SiQuest CacheBack report of 84 visits related to chloroform from the Anthony family computer. The NY Times article does a good job of the event timeline, so I am just going to focus on the deep geek details. All of this centers around the parsing and extraction of Google searches and site visitations from the Firefox 2 browser history. Firefox Versions 1 & 2 used a rather unique and problematic database file coined “Mork” after the quirky TV alien ‘Morky & Mindy’ TV show.
This semi-anonymous blog from Digital Detective does a good job of breaking down the Mork record structure and the pair of records at issue within the case. It is important to note that the database file was found in unallocated space (i.e. unprotected against being partially overwritten). He carefully and admirably refrains from commenting on the CacheBack analysis and where it might have run into problems. It helps to understand that most of our forensic tools have been developed by solo forensic technicians with a very limited market. These are not big-ticket software purchases by corporations. Instead, they are mostly sold for very reasonable license fees to city, state and federal agencies. The handful of forensic developers generally know each other and none would wish a serious bug or glitch upon a competitor. That is because most of them are former law enforcement officers with a vested interest in the public’s trust in criminal forensics as a science.
Drawing some hypothetical conclusions from all of the available sources, it appears that the Mork database’s inconsistent use of the VisitCount attribute caused the CacheBack software to retrieve the VisitCount = 84 value from the subsequent record, which was associated with a MySpace.com page. It is difficult to tell from the available screenshots, but I understand that the original file may have had a missing closing bracket “]” that further complicated parsing the record. Software is usually tested on control data from public or private sources where the analyst knows the exact record counts, hits and character of the test data.Obviously the CacheBack software parsed the original control databases during development testing and many others since. The problem is that real world data frequently contains aberrations in format and content that no developer can anticipate. That is why it is so important to perform validation testing with your own representative data sets. Then you need to have effective post-process quality checks in place to catch these kinds of issues. The mixed binary/text format of the Mork records make them hard to read on the fly, but I understand that the training and written protocol of the Orange County Sheriff department requires a visual inspection and confirmation (validation) of relevant information obtained through the use of any forensic software tools.It is clear from the NY Times article that the pro-bono analysis by John Bradley was done with limited time and without any knowledge of prior analysis performed by Sgt. Kevin Stenger. Double blind testing with different software tools is an excellent method of validating results. Unfortunately, in this case it appears that the second analysis (using CacheBack) was requested to obtain an output in Daylight Savings Time format rather than internet UTC times. Event times are critical, but cross checking results are more important. Either the police and prosecutors knew of the discrepancy or they did not compare the results. Jose Baez definitely caught the difference in the results.Anyone who has worked extensively with load files knows how a missing delimiter or field value can throw off records in your database. Most of us have cursed text embedded within CSV files produced to us by the opposing side. “Do I try and find all the embedded commas or cause a fuss with the attorneys?” I remember how the early versions of the free IPRO iConvert utility struggled with oddities in the Summation DII load file formats and frequently spit out garbage when encountering large records that spanned folders.Good software supports, but does not replace good process. Although civil litigation does not have to meet the higher standards of criminal trials, someone will be certifying their reasonable effort in producing a complete and accurate relevant collection. If you are going to be that someone or report to that someone, don’t you want to have confidence in your work? Testing is the only way that I have ever found confidence in my process, people and technologies. I expect to find issues, exceptions and problems. As long as they are documented and disclosed appropriately, I have done my job. So how confident do you feel in your discovery?