Migrated from eDJGroupInc.com. Author: Steve Markey. Published: 2012-02-23 04:00:56Format, images and links may no longer function correctly. Yesterday, when I sat down with one of my students to go over her class project, I found myself explaining Big Data and its relevance.  To her it is relevant as she is a data analyst and her employer is a cable-based retailer selling thousands of items over the course of a year.  So Big Data is of particular relevance to her, but why is Big Data relevant for eDiscovery professionals?

Big Data is a new paradigm for many professionals, and it is relevant for eDiscovery professionals as it is a new artifact to consider when pondering the collection and discoverability of content.  Oracle defines Big Data as an aggregation of data from three sources, which include: traditional (structured data), sensory (log data, metadata), and social (social media)1 data.  Big Data is often stored in non-relational, distributed databases using new technology paradigms, such as NoSQL (Not only Structured Query Language).

There are four types of non-relational database management systems (NRDBMS), and they are: columnar-based, key/value, graph, and document-based.  These NRDBMS systems aggregate the source data while analytical programs, such as MapReduce, analyze the information.  Once Big Data is aggregated and analyzed, organizations can use this information for market research, supply chain research, process optimization, security incident analysis, or trending analysis.

Scenarios where Big Data is a value-add include having market research data available to support a decision to outsource/in-source, engage in an acquisition/merger, move into a new market, or leave a market.  Further examples blur the lines of the eDiscovery/digital forensics domains, including a discovery request involving incident response information where a court case requires an organization to provide proof that it did or did not know of a security incident.  Again, this statement blurs the line of digital forensics evidence versus eDiscovery content, but rest assured if an organization has the source and aggregate data, a case will come up where the information is subpoenaed.

With this new paradigm comes a need to collect and preserve not only the source data, but the aggregate data as well.  Note that many organizations will find that they have a series of disjointed solutions for collecting the source data, and if this is the case then having the aggregate data collected becomes much more important for responding to a discovery request.  While the market does not yet offer a solution for collecting, preserving, and analyzing big data, one is sure to come.  In the meantime, organizations can use tape, cloud, or alternative backup measures to collect and preserve this information.

Stay tuned for additional articles from me on Big Data, Cloud Computing, information governance, and eDiscovery.  Predicated on interest, I plan on posting the following:

  • Establishing an eDiscovery strategy.
  • Winning your stakeholders over on incorporating your electronic content/document management (ECM/EDM) systems into your eDiscovery collection solution.
  • Integrating GARP into your eDiscovery processes.
  • The future of eDiscovery in the Cloud.

eDiscoveryJournal Contributor – Steven Markey

ENDNOTES

  1. Dijcks, Jean-Pierre; Oracle:  Big Data for the Enterprise, Oracle, October 2011, www.oracle.com/us/products/database/big-data-for-enterprise-519135.pdf

 

0 0 votes
Article Rating