Migrated from eDJGroupInc.com. Author: Barry Murphy. Published: 2010-03-10 07:19:01Format, images and links may no longer function correctly. In his earlier journal entries – Inside of Automated Review Part 1 and Part 2 – Greg Buckles explored the practice of using content analysis software to enable a level of automate for document review. The growing trend to let software create clusters of content by concept and other analytics in an effort to decrease massive review costs in a good indication that automation is here to stay.
Thankfully, I’m seeing more and more indications that content analytics are becoming accepted in the information governance community. At LegalTech, I participated in a panel and one of the questions I received was how organizations can better proactively manage information in order to make eDiscovery as efficient as possible. My answer was to use auto-classification to go through legacy content and identify potential records, knowledge assets, and other retention-worthy content. This answer was the topic of debate, with some folks thinking that auto-classification will never stand up in court or is simply not advanced enough to work. Others feel that there is no way to effectively classify information manually and therefore auto-classification is inevitable.
I fall into the latter camp – there is simply too much information for us to realistically expect that people will classify it. Take email for example – on a good day, I get about 100 emails; there is not enough time in the day for me to classify each one of them. Sticking with the email example, even if I did have enough time to classify each message, there is very little chance that I would correctly classify each one on a consistent basis. One day I might file an email about the Jones contract in the folder labeled “Jones” and another day I might file that email in a folder labeled “contracts.” It’s simply human nature.
To be fair, however, the term automation does imply that the human element of classification will be eliminated. When automated classification became a hot topic in the early 2000’s, many touted the death of records management. The records management crowd was looked upon as “old school” and conservative; a hindrance to fully leveraging organizational information. But, nothing could have been further from the truth.
What automated classification does is increase the importance of records management departments. In fact, there is a death of records management – no longer should we refer to this group as “records management;” this group should be called “information management.” While there is a classic definition of a “record,” it’s time to acknowledge the semantics at play here. The Amendments to the Federal Rules of Civil Procedure in 2006 made all electronically stored information (ESI) discoverable and therefore worthy of some level of retention management. Even if ESI is not a traditional record, we still need to classify it and save it for some finite period of time. To me, the term “record” just denotes an even higher value of information asset.
Getting back to the issue of automation, there is simply too much information – both newly created and legacy – to think that humans could classify all of it. Classification, though, is a must if organizations want to clean up information stores in order to be ready for eDiscovery if and when it strikes. So, there will be a need to rely on at least some level of automation for the classification process. This does not mean that humans are completely removed from the equation. There are two major ways in which humans must be involved in classification efforts.
First, humans are critical to go-forward classification. This means that organizations must make it as easy as possible for users to classify their information assets. Examples include capability to drag and drop emails into folders to drive classification and integration of records declaration/classification into content authoring tools.
The humans that work in information and records management groups are also critical to classification initiatives. These folks know how to organize information and can serve as quality assurance for automated classification as well second-level classifiers of the most important information assets. Let’s say that we use auto-classification to go through TBs of information on a network file share. Perhaps 20% of that information is classified as a potential “corporate record.” Most records management file plans have thousands of categories, so it will require an experienced records manager to ensure the right classification of those records. It’s this combination of automation with human intervention that will close the last mile of true information management.
For a long time, the records managers I interacted with were resistant to auto-classification, but now they are warming to it. I was recently at an event with a records manager who was looking at how to manage all of the records created in high-volume systems like email. He believed that without some level of auto-classification, any records management project would fail. It’s great to see records managers embracing technology and I hope that organizations begin to understand the real value that the records managers bring to the table.