… and cost you a fortune as well!
Having spent the better part of the last decade designing Information Management & eDiscovery Solutions, I decided to write a blog to share some of my findings in this space.
This is my first post and if you are even remotely interested in Information Management for the Enterprise, then read on!
Unstructured Information
Corporations large and small create significant amount of data that include documents, presentations and email messages. Most of this data is “unstructured”, i.e it resides on file shares and employee laptops/desktops. Industry analysts like Gartner estimate that this unstructured information accounts for 80% of all corporate information and expect it grow at 60% or more each year.
In the simplest terms, unstructured information is nothing but “unmanaged” information. The file system on which this information resides typically is not monitored and the content is practically invisible to employees, auditors or corporate compliance officers. In an effort to provide a greater degree of visibility, control and management of this information to meet compliance reporting requirements and enhance abilities and obligations to respond to corporate litigation, many organizations have implemented one or more technologies, each of which has its advantages and disadvantages:
1. Enterprise File Backup
Many companies attempt to solve the problem by creating regular backups of all the data on the network. These backups are saved to tapes, which are typically shipped and stored at a remote location for disaster recovery purposes. Backing up all data regardless of its business value is an inefficient use of time and resources, increases the cost of tape storage and decreases the likelihood of rapid single file recovery, which is the most-used aspect of file backup, thereby still leaving the problem of quickly identifying “responsive” data, unsolved.
2. Enterprise Content Management
Enterprise Content Management (ECM) systems can effectively manage many types of content and can provide access and version control, both of which are effective aspects of information management. ECM systems also tend to have significant costs to setup and maintain. These systems typically require an organization to purchase server and user licenses, implement policies and processes for using the system, and train its users. Because of these costs, companies often limit their ECM implementations to specific areas of their business or types of data, such as documents that pertain to finance or HR. According to many analyst organizations, ECM systems are being used to manage approximately ten percent of today’s corporate information.
3. Enterprise Search
Enterprise Search is an effective way to index and find documents or emails that contain certain keywords. Most are easy to implement and mid-range enterprise implementations require only a modicum of regular maintenance. Some organizations also deploy Enterprise Search for proactive litigation readiness i.e ability to quickly produce responsive information pertaining to a litigation or an investigation.
Unfortunately, most enterprise search engines are tuned to find all the documents that may contain a particular term, rather than a specific document that may be required by an auditor. It is left to the user to sift through the thousands of returned documents to find what he needs, which can be a time-consuming and costly exercise. For example, many enterprise search engines do not provide the ability to look for sensitive and high risk Personal Identification Information (PII) like SSN, Drivers License etc or Payment Credit Industry (PCI) information like Credit Card within unstructured data. Additionally, search engines are mostly lacking in providing the ability to manage or take actions on the indexed documents. Furthermore, scalability of many Enterprise Search Engines is called into question when they are required to “proactively” index and search data in the order of Terabytes or Petabytes.
4. Enterprise Email & File Archival
Email and File Archival solutions have been gaining immense popularity over the past decade or so due to their relatively cheaper cost of implementation and maintenance as well as the storage cost and risk reduction benefits they provide for an enterprise.
Email Archive systems provide the ability to “stub” aged or high risk files and emails into an Email Archive and retrieve them “on demand”. Most Email Archive Solutions are priced nominally based on the number of “seats” or mailboxes that are archived.
File archival systems, on the other hand, are typically part of Hierarchical Storage Management (HSM) solutions that differentiate high cost storage from low cost storage using the concept of tiers.
These solutions allow for policy-driven stub-based file migration across multiple tiers using which organizations can move less accessed or less valuable information to cheaper storage, making them available to the end user only on demand. This helps in significantly reducing risk from a compliance standpoint and storage costs from an IT standpoint.
5. Do Nothing !
Unfortunately, as difficult as it may seem to comprehend, this is what many companies still choose for handling unstructured information – They do nothing about it!
Many organizations are still in “status quo” with respect to their strategy for handling unstructured information because they believe it is insignificant and therefore does not require management. And herein lies the basic problem.
Is the Unmanaged Content Worth Managing?
That begs the question – how much of this unmanaged content in the enterprise is really worth managing ? As a matter of fact, empirical data suggests most of the unstructured information IS *insignificant*. Take for example – the following analyst report for storage usage in an enterprise – it indicates that only 20% of data in the enterprise is infact active and relevant!

While most files aren’t worth managing, the risk comes from the small number of files that do matter. For instance, your Sarbanes-Oxley policy and procedure manual, which took valuable internal resources, a consulting firm, and many months to create, has likely been copied from the content management system specially created for finance-related documents. The next time you update that manual with critical information, you have fulfilled one aspect of the act by tracking and recording those changes in your records management system. However, what about the dozens of copies that may have spread across the network on shared file servers? How can you be certain those copies are deleted or updated to keep people from following old procedures or controls? If you aren’t doing anything to manage that data, you are leaving your company exposed and vulnerable.
Recognizing & Remediating Valuable Information
Addressing these issues is key to an effective solution for Sarbanes-Oxley or any information governance initiative. Obviously doing nothing is not the answer. At the same time, it would be cost-prohibitive to manage all files as though they were critical business records. Therefore, the ability to specify which data is critical and worthy of this level of management is a crucial first step. If you are aware of the data’s value, you can make educated decisions as to the disposition of important data and create an appropriate retention policy. Determining data’s value is a result of effective information visibility and control.
- Information Visibility The first step toward recognizing valuable data requires that it be visible. While your compliance office may have access to all corporate information across the network, the sheer amount of data necessitates the use of technology to find and manage the appropriate documents.
- Information Control The second step is to effectively manage and control unstructured information. For this, you need a solution that not only lets you identify valuable information but also lets you “remediate” – copy, move, delete or lock down documents containing this valuable information. Even better, the solution should provide an integrated policy engine that can be customized with your company’s information governance regulations.
Reducing Risk and Lowering Costs
Setting information governance policies fulfills a basic requirement. Active management of your unstructured data will find, tag and move content according to your corporate policies, lowering the risk that information is likely to fall through the cracks and potentially protect you from breaking the law. Creating a tiered storage system will allow you to set retention policies according to the value of the content, saving money and reducing risks. And proving compliance or at least showing that you are attempting to comply is sometimes the best way to meet and exceed current and future government regulations not only around financial systems but around employee and customer privacy as well.
Final Analysis – Act Now !
In the end, the benefits of visibility and control of your unstructured information reduces risks of compliance violations, litigation exposure, untimely responses and privacy and security breaches and lowers costs through streamlined storage operations, improved service levels and automated policy-driven data management. An organization with better visibility into corporate data is also poised to save thousands if not millions of dollars by quickly responding responding to an investigation or corporate litigation with a streamlined Electronic Discovery (popularly referred to as eDiscovery) process. This more than emphasizes the need for Corporations (who haven’t done so already) to quickly invest and plan for implementation of one or a combination of the aforementioned technologies that best serves their needs for managing unstructured information in their environment.
Just like no executive likes surprises in a boardroom, you don’t want any nasty surprises resulting from something you didn’t plan for. Don’t let something you didn’t know about and consequently didn’t do anything about, hurt you! Act now – Unstructured information is only going to grow further and not doing anything about it is just going to make matters worse.
What you don’t know CAN hurt you.
Coming Soon – eDiscovery Concepts & Industry Trends
With that, the stage is now set for introduction of eDiscovery concepts and industry trends for litigation readiness, something I plan to address in my next post. I sincerely hope you enjoyed reading this topic and learnt something new along the way. Please feel free to post your comments and feedback.
Stay tuned and till next time, Carpe Diem baby!
- Rakesh Nair

Good start… waiting for the next part
[...] Discover the Full article [...]
Thank you Prabhjeet ! My next post is due in 2 weeks.
-Rakesh