Many years ago, MANY years ago, I bought a PC with a 10 Megabyte hard drive.  All my friends were envious of me having that tremendous amount of storage. I would joke that I would never use that much storage and I have to admit that during the entire time I owned that PC, the hard drive never filled up.

The onset of Moore's law, which states that the computing power, ram speed, and hard disk size will increase by about 2 times every 2 years has changed how we look at storage. While disk drive size may not have kept pace with the speed of the processors or ram, disks have reached a level that most of my friends and I would have scoffed at. Today, it is fairly inexpensive to buy a terabyte hard drive for under $100.  If one wants to have a 1TB hard drive for a new PC, the increase in cost is minimal when ordering.

At this point we should probably discuss the definition of a byte of data so we can better understand what it means when we continue discussing the ever increasing size of data storage.  We all know that a byte of data is 8 bits; these eight bits can represent one character of a word or expression. That means that a million byte document contains a million characters, spaces, or special characters. I realize that is a simplistic explanation but stay with me and it should make sense later.

The ten megabyte hard drive in my first paragraph could store ten million characters used to make up documents. It sounds like a lot even today, and it sure did at the time I purchased my system. We all know that the system hard drive doesn't just store documents but all sorts of binary representations that make up the software. The documents and spreadsheets that are created haven't grown much in bytes when created. Then we should ask the question, why do we have these huge hard drives? The follow up question is how come we keep running out of space?

The world has become wrapped up in saving anything and everything in the electronic space.  We have become accustomed to putting all our information in electronic form. An example of this is the pictures we have of our family for numerous occasions. We create them using a digital camera then off load them to our computer for storage.

We can now view our favorite pictures with just a few clicks. I will speak for myself that even though it is convenient to view, print, or edit the pictures, it is still rather cluttered. When I am looking for a particular picture, I find myself viewing page after page of pictures to find the one I want. Even though I have the pictures in folders that somewhat describe the event when the pictures were taken. The point here is we take more pictures then we used to because it is recorded on a memory device instead of on film. We can view the picture after it was taken without the expense of having it developed.  As we take more pictures and store even the ones that we may never use, we end up with thousands of pictures to sift through when we look for one in particular.

Our cameras can use multi-gigabyte memory cards that even if we use a very high resolution format we can save thousands of photos. I have stopped adding up the number of gigabytes of photos I have on my home computer. I also store them on a USB 1 terabyte hard drive as a backup just in case something happens to my computer. So we can see that if we had 10 gigabytes of photos, that number is doubled in storage requirements due to backups.

The second reason that we have been getting larger and larger hard drives is the operating software needed to make the pile of metal, silicon, and other materials become a computer is growing at an amazing pace. The operating systems that we use couldn't even be loaded onto my old system. The truth is my old system's hard drive wouldn't even come close to matching the amount of RAM needed to run today's "enhanced", "feature rich" operating systems.

We haven't even touched on the applications, virtual memory, or security software needed to perform our daily tasks. All of these necessities add to the amount of disk space needed to make the computer operate, much less store the things we create each day. All of these applications contribute to the size of the backup space needed in the "rare" event of a system failure and we have only discussed the desktops or workstations sitting at our desks.

When we begin to discuss the systems needed to create and operate a local area network the storage requirements become much greater. Every system needed to run a network has storage space requirements for its operating system, configurations, and security. Some of these systems need to be redundant to ensure the network will continue to function in case of a system failure. We could go on and on about systems and redundancy but that is for another day, another blog. All these systems need backup disk spaces; which are becoming larger as the "features" keep being added.

We started earlier to discuss definitions of bytes at a very low level. In order to understand the “data explosion” we have some more terms we need to cover. We know that megabyte means a million bytes of data, a gigabyte is a billion bytes of data, and terabyte is a trillion bytes of data.  We utilize these numbers on a daily basis while dealing with computers and storage.

Do we know what comes after terabyte? The next one is a petabyte, which is quadrillion bytes of data or a million gigabyte. The one after that is an Exabyte, then Zettabyte, then Yottabyte. The last one being slightly humorous if one thinks about it. A Yottabyte is essentially a trillion terabytes of data. There are higher numbers but for the purpose of this discussion, I am only talking about these four.

According to a report by the International Data Corporation, IDC, in 2009 that the total amount of data in the World was 800 exabytes. It is expected to rise to 35 zettabytes by the end of 2020; this is 35 X 1021 or 35 trillion gigabytes. The second statement they made was that 11 zettabytes of this data will be stored in the cloud.

This massive explosion of data within the next 9+ years causes many people to pause and ask some intriguing questions. Why do we save everything? Is it because of regulatory compliance?  Is it because of the migration of everything being digitized then saved? I believe all of the above as the reasons the explosion is occurring.

Then the questions that need to be answered are: How do we find things that are useful in all that clutter; remember my picture story? How do we ensure that the data that is truly important is backed up? How do we secure the rapidly growing information within our own companies? Are the storage vendors watching this trend? I would bet money they are and they are working diligently to provide products that will answer these questions.

A very knowledgeable storage expert I was talking to once said; "If the data is stored or backed up but is extremely difficult to find the information you need, the data is just a blob of binary digits."  I believe that we as IT professionals; administrative, database, or security need to stay on top of this so we can enable our end users to deal with information not just as a blob.

As security professionals we need to ensure that confidentiality, integrity, and availability are maintained. This can be a daunting task with the rapid growth of data. We will need to ensure that the data owners of the different silos of data understand the importance of maintaining the Access Control List (ACL) for their data. Another aspect of confidentiality is the monitoring of data in regards to users that aren't given full access to the data, work within the access they were granted.

Many of the regulations require data at rest to be encrypted to protect the data in the event of a breach. Encryption of the burgeoning data can be an issue due to sheer size. This is an issue when the encryption keys need to be changed periodically while the data is needed for daily business. The tools that will be used to ensure the data is accurate and unmodified by an unauthorized user will need to be resilient.

Sometimes the most difficult tenet of the security triad is availability; security needs to ensure that secure methods are used when large blocks of data are transmitted in order to satisfy a business need. Secure availability will be especially important to health care facilities because the push is having Electronic Medical Records (EMR) to allow easier access to patients’ data.  HIPAA's security rule will come into play for all doctor's office that are moving to EMR.

Data management will be one of the significant issues to face us in the coming decade. The regulations for HIPAA, privacy and PCI will be driving forces for how data is stored, accessed, or transmitted. The amount of data will continue to grow as the study has shown and our job as security professionals is to protect it.