The Emergence of a New Architecture for Long-term Data Retention
rate this
Last Update: Jun 03, 2014 | 03:33
Viewed 1215 times | Community Rating: 5
Originating Author: David Floyer
Fuente: http://wikibon.org/
Introduction - The Emergence Of A New Architecture For Long-Term Data Retention
The economics of IT has been discussed ad nauseam over the last decade. The issue is that a large percentage of the overall IT budget is consumed in order to maintain the status quo with little to no dollars being spent on innovation or new capabilities that drive business growth and shareholder value. New technologies and capabilities are unveiled every year that tout their economic value in order to get CIOs to pick their heads up from their smart phones to make a purchasing decision. A few examples are server virtualization, data deduplication, and now converged infrastructure solutions.
Each technology came with the promise that by implementing the new capability, IT would save money. In many instances, these technologies did help a specific segment of the infrastructure but may have hurt others. At the end of the day, very few of these new capabilities actually put IT departments in a position to change their investment strategy to focus on innovation. Rather, they just moved money around.
Storage is one area within IT where costs are rising, driven by 50% year-over-year data growth, and where deploying the right innovation can save more than 50% of the IT budget. Several new storage technologies have come on the market with the promise of saving IT money. However, a concept coined by Wikibon in 2012 called "Flape", a combination of flash and tape, has real potential. As figure 1 shows, this combination of technologies when used for long-term archiving can save IT departments as much as 300% of their overall IT budget over the course of 10 years.
Storage Technology Fundamentals
This section is a technology deep dive into the dilemma created by the 20% annual growth of traditional disk drives in data centers combined with only 10% growth in the rate that data can take data off a disk is only growing by about 10%. In a word, data is going to disk to die. Feel free to skip ahead to the next section if you already understand the fundamentals of this problem.
The concept behind the Flape architecture is to place the most active data as well as the metadata (the data about the data) on flash and the rest of the (cold) data on tape. The combination of flash and tape provides IT with the right balance of performance and cost for a number of use cases. Before we get into details, let’s discuss why Wikibon believes this combination of technologies can have a profound impact on storage budgets, and on the overall storage market.
The first issue is with the HDD disk manufacturers themselves and the second is the disruption to the HDD business caused by flash. Two characteristics that are important in the HDD space are areal density and performance. The areal density of an HDD is defined by the following:
- Disk Areal Density = Linear Density x Tracks/mm
Previously, both linear density and tracks/mm were improving by about 35%/year (doubling every two years) because Moore's Law was applying to the read/write head. However, because of flash technologies driven by the consumer market, the market for disk drives on PCs has gone to hell in a hand-basket. As a result, vendors are making no investment in disk technology beyond capacity, which itself has slowed from 35%/year to 20%/year. Both Linear density and tracks/mm, therefore, are growing at about the same, the √ Areal Density = 9.6%.
Disk performance is defined by bandwidth by the following equation:
- Disk Bandwidth = Linear Density x Rotational Speed
Increasing spinning disk speed is difficult and expensive. Today's maximum rotation is 15,000 RPM, but the most cost-effective speed for capacity is 7,200 RPM, with little prospect of increasing. Smaller disks could spin faster, but no one is investing in new disk technologies to accomplish this. In theory you could increase the number of heads by having more arms - but there is no investment there either.
Therefore rotational speed will remain at a constant. So:
- Disk bandwidth is improving at the CGR of disk linear density = √ Areal Density CGR of 20% = 9.6%.
The capacity available on an HDD is approaching finite. The reality is that HDDs cannot spin any faster, no more heads can be put on an HDD, nor can more platters be added to an HDD (at least under current design constraints).
This formula tells us that as HDDs grow in capacity, getting data off them becomes more difficult. Imagine trying to get data off of a 10 terabyte HDD. Now take that a step further and imagine the rebuild time in a RAID set for a 10 TB HDD. Further, given that HDD MTBF (mean time between failure) isn’t getting any better, the likelihood of a double drive failure grows as the capacity grows. Additionally there is very little investment being put into HDDs to change this. At the end of the day, the linear density of disk, to meet reasonable performance, is only growing at about 9.6%.
On the other side of the equation is SSD or flash disk. Flash/SSD performance increases linearly with capacity and has other benefits such as better storage utilization, better environmentals (power, cooling, floor space) and can help lower overall software licensing costs. Figure 2 represents the overall savings IT can achieve when using all-flash storage with atomic writes, NVM compression, and in-line deduplication.
Comparing column 1 to column 3 in Figure 2:
- 38% lower license costs:
- Fewer core, lower database maintenance, lower infrastructure software costs.
- Higher storage costs:
- More than 50%, lower maintenance, simplified management.
- 17% Fewer Servers:
- Fewer cores, fewer network connections, lower maintenance.
- 74% lower environmental costs:
- Power, space, cooling, etc.
- 35% lower operational costs.
Of course, not all data should be stored on SSD since it is too expensive especially given the value of data over time. The question becomes where to place the “cold” data? Low cost HDD isn't the answer - we just stated that capacity utilization, environmentals and operational costs, especially for large data sets, are extremely high considering how this data will be used. The question is, “Is there a better alternative for storing this data while meeting the business requirements?”
Tape Never Died
Tape has always been the best long-term storage medium when it comes to cost (1/5th the cost of disk), as shown in Figure 3, but it also has many other characteristics that outperform disk.
Enterprise Tape is 100,000 times more reliable than consumer disk (see Table 1 in Footnotes) with greater longevity. It is portable, its areal density is growing faster, and vendors are making significant investment it tape today, more than into HDDs. The next question is whether tape can achieve the performance required to meet the business objectives. Applying the same areal density formula used above to tape shows:
- Tape Areal Density = Linear Density x Track/mm
- Tape Bandwidth is Linear Density x Tape Speed x Tracks Read, therefore:
- Tape bandwidth CGR ≅ Areal Density CGR = 30% (Assuming tape speed remains constant)
This tells us that the ability to get data off tape is getting faster, relatively, given the tape growth rates. And the areal density is growing at approximately 30% versus disk, which is growing at only 9.6%. Data can be extracted 4x faster from tape than it can from HDD. Now this is a relative number so let’s project it in Figure 4.
Time To First Byte Vs. Time To Last Byte
How can tape be 4x faster than disk? This is an important concept and makes a difference when architecting a Flape infrastructure to meet performance requirements. It is what we call “time to first byte” and “time to last byte”. The formulas above prove that getting data off tape is faster than getting it off disk, with one caveat. Once you “get to” the first byte of data, tape is faster, but getting to that first byte is the trick. If getting to that first byte means you need to go get the tape, mount the tape into the library and index the tape before you can begin looking for the data, then yes, disk is faster because the time to the first byte is faster. However, if the tape is in the library, mounted and indexed, then the time to first byte is greatly reduced.
Figure 5 shows application spaces as a function of "time-to-first-byte" and "time-to-last-byte". What Figure 5 shows is that when the time to both first byte and to last type is low (imagine a small medical record required at ER), flash/disk is the better solution. If time to first byte is not critical, flash/tape is the lower cost solution. When time to last byte is longer (imagine bringing in all the medical records with all the X-rays, angiograms, CAT Scans, etc.), flash/tape is the better solution.
The question is, what can the industry do to enable a better time to first byte? This is where we see the investment going into tape, and more needs to happen.
Liner Tape File System - LTFS
In 2008/2009 IBM invented a technology called LTFS or Liner Tape File System. Today, LTFS is an open, self-describing tape format that allows tape to look like a file system. The advent of LTFS has added the benefit of simplicity to tape, making it easier to use and leverage and making it more ubiquitous and accessible for a broader audience. Combine the fact that tape now looks like a file system and most implementations of LTFS have a policy engine (different vendors have different capabilities with their version of LTFS), cold or “stale” data can be:
- Quickly and easily migrated to tape,
- Quickly found and recalled from tape,
- Easily managed on tape.
By leveraging LTFS and integrating it as a part of the data lifecycle, businesses have the ability to make sure they are achieving one core metric: storing data of lower business value on the lowest cost storage medium possible.
LTFS is typically implemented where data tapes are written in the LTFS format and can be used independently of any external database or storage system, allowing direct access to file content data and file metadata. LTFS sits beside a disk-based file system and, based on the policies of the data, typically when usage becomes low, the data is migrated to LTFS.
A good example is a media and entertainment company. While a company works on files that are a part of a movie, this data lives on active storage (HDD or SSD). As the movie is produced and released, the data that makes up the movie is then migrated to lower cost storage, tape, via LTFS. This is good because with LTFS the metadata is readily available to the consumers/knowledge workers of the data. So when “part 2” of the movie is being worked on, video editors can easily and quickly grab data from the first movie that they may want to use in the second, perhaps for a flashback. This keeps the work process flowing very smoothly. LTFS changes the dynamic from application or device performance to true work performance which, in this case, has real business value.
Another important aspect of LTFS is that it has a very simple and smart pricing model (implemented by most vendors). The software is licensed by the relative size of the configuration – small or large. The software is not licensed by the terabyte, which tends to be a big challenge for customers. Having software that is priced on a per/TB license gets more expensive to maintain as the environment grows. A unit price for the software ensures that software and maintenance costs don’t grow, which helps to keep long-term data retention costs low over time. As a tape environment grows, the only real growth challenges are the number of tapes for capacity purposes and perhaps the number of libraries, which allow control over the “time to first” byte characteristic by keeping more tapes in the library.
Flash/SSD
Today, flash/SSD is still an expensive storage technology. However, we have discussed the overall economic benefits of flash/SSD. It not only meets the performance needs of today, but will grow and scale into the future, saving the business money in the long term. Figure 4 shows flash/SSD bandwidth projections growing at 43%, faster than both disk and tape. Combining this with Figure 3, which shows that the cost of flash/SSD is going to be the same as disk over time, points to how the disruption we are seeing today has been created.
In summary, what does all this mean? The projected growth in disk density is 20% (from a number of projections). The disk bandwidth will only come from the linear density component, as the tracks/square inch does not impact disk bandwidth.
Meanwhile the higher investment in tape heads includes increasing density at 30%. The bandwidth comes from increased linear density and increased number of heads across the tape - hence 30% improvement in tape bandwidth.
For flash/SSD technology, bandwidth increases directly as a function of density. The strong investment by consumer technologies in flash is driving density, and therefore also bandwidth, improvement at 43%. At the end of the day, flash is getting faster and less expensive, tape is getting faster and is inexpensive and HDDs are not getting that much faster and based on how they need to be used in order to store data effectively, with RAID etc., are very costly.
Making Flape Even More Usable
One way IT is solving the “time to first byte” challenge is by leveraging flash to manage the metadata of the data that is on tape, making tape even more effective. By keeping the metadata on Flash/SSD and knowing exactly where the data is on tape can help speed up the process of getting the data from tape. The combination of flash and tape allows IT to meet performance objectives for the most active data while helping to achieve the best overall $/IOP. Leveraging tape as the storage medium also allows IT to get the best overall $/GB for long-term data retention. In each case, both solutions are simple and easy to manage, provide much higher utilization capabilities, and much lower environmental costs, all which save money and free up time for IT to work on project that drive innovation.
Figure 5 above shows examples of how to leverage flash, disk and tape technologies to meet specific business requirements. The chart helps to explain where data can live based on access requirements.
The concept of flape is very real, and a number of customers have deployed the combination of flash and tape in their environments today. IT should consider two components when looking at flape technology. First, while a number of ISVs have worked with various vendors to integrate their software to work more seamlessly with LTFS, more work needs to be done. For straight file system data moves based on simple policies, such as age of a file or last touched data, the standard LTFS implementation fits most needs. However, if there are other parameters then a flape environment may be trickier to implement. One example is a media company whose value-added is in managing the metadata to perform life cycle management of its customers' media data. It has a cloud-based solution to support a majority of its customers who want to monetize their video assets or otherwise preserve those assets for foreseeable or unforeseeable purposes. The media company is required to always have two copies of the content and provide this data over the cloud. Its tool provides the content owner with the ability to identify and pull out subsections of a 20TB movie file in order to create a 30 sec promo. The company needed a way to manage this data in an elegant, cost efficient way. It ended up building its own metadata manager that lived on top of LTFS that allowed it to find data inside millions of movie files.
Each industry will have its own specific requirements for metadata management. Over time, we believe a variety of different ISVs, vendors or even partners will develop software and tools that live on top of the flape (with LTFS) solution, enabling users to get at the data faster. With advancements to LTFS need to grow by supplying RESTful APIs such that these new middleware layers can have better access to the data.
The other thing to keep in mind about a flape infrastructure is that a lot of the common “storage services” that are available today for storage arrays just aren’t available for tape. This is where the whole software-led storage conversation begins to make a lot of sense. The ability to abstract storage services such as:
- Replication,
- Encryption,
- Copy Management,
- Compression / Deduplication and
- REST API’s
from the array and be able to leverage these services across a storage infrastructure, where these services are common from platform-to-platform will be key in ensuring that a Flape architecture can provide even greater value to the enterprise. Wikibon believes that as the software-led data center and software-led storage become more commonplace, these storage service capabilities will be available to all devices in the infrastructure and offered in a common way that provides seamless data management across the enterprise.
The figure above makes reference to six common use cases for flape:
- Archiving,
- Backup,
- Long-term Retention,
- Scale-out NAS,
- Cloud, and
- Tiered Storage.
The overall architecture lends itself to a few characteristics that support these use cases very well. First, high performance. By leveraging Flash/SSD, each use case doesn’t suffer the typical tape performance barriers. By putting the most frequently accessed data and the meta-data on the flash/SSD tier, users and applications have fast access to their data, enabling faster backups and restores as well as quick access to their data that may live in the cloud.
Second, low total cost of ownership or TCO. By taking advantage of the tape layer, use cases like archiving, long-term retention and tiered storage (where 70+% of the data is stale) can live on a low-cost storage medium like tape. There really is no comparison of tape to disk or even deduplicated disk in this case. Deduplication arrays do a good job for solutions with a great deal of redundant data being added day-after-day such as operational backup, but for the longer term storage and scale, tape is unmatched. Deduplication arrays are more expensive since the software to drive the deduplication costs a premium and data migration and the multiple systems required due to scale issues end up costing even more.
Lastly, tape file systems based on industry standards allow ISVs to write to these platforms, making it much easier to utilize new capabilities where there is no standard for deduplication. Deduplication arrays make adoption of those capabilities more difficult and introduces inflexibility into the environment.
Blind Alleys
Some of the objections to tape and flape come from a misunderstanding and misapplication of the technology. For example, disk-based de-duplication systems such as Data Domain from EMC and StoreOnce from HP have been very successful in solving the problems of rapid recovery from backup. For many organizations, backup has been used as a type of archive and method of last resort for recovering data. The argument goes that backup appliances should be the model for long term retention, as the data is already stored in the backup and therefore will not take any more space if the backup system is expanded to include long-term retention.
There is a huge difference between the optimal structure for backups and the optimal structure for long-term retention. Both may include compression and de-duplication, but the fundamental access methods, access requirements and metadata to support (for example) compliance and provenance are completely different. Using a backup structure for long-term retention is not a strategic option for allowing access to and mining of long-term data that is infrequently accessed.
Conclusion And Recommendations
The "missing links" in designing flape systems are the file and meta-data systems that will support this environment. Oracle is investing in this area, with a long history of innovation in tape and tape libraries. IBM, HP and Spectra are also in the tape business.
Some companies that Wikibon interviewed have developed their own "flape" solutions to this problem, such as in the media library area. The business benefit has been the ability to reuse and repurpose film and similar assets. Other areas such as storing very large problem datasets in simulation and oil exploration already have tape solutions well established. Future designers of Server SAN software as part of software-led infrastructure should include tape and tape libraries as media alternatives. Future developers of archive, backup and Big Data systems should allow for higher performance and lower cost systems by utilizing flape.
Wikibon recommends that CIOs and CTOs seriously consider flape architectures to reduce the cost of long-term retention of data and as a fundamental way to enable the business to turn unmanageable data into a true business asset.
Tape is not dead and never was dead. By the same token, it's important to remember that disk is NOT dead either. Magnetic media will continue to be the low cost method for holding data for the foreseeable future. Disk will continue to be an important part of a storage hierarchy, especially for legacy systems and new systems where small objects are needed infrequently but quickly. Vendors who claim that either tape or disk is dead are doing a disservice to their customers.
Action Item: CIOs & CTOs should seriously consider a flape architecture that helps meet the performance requirements of live, active data (best $/IOP) and the budget requirements for long-term data retention (best $/GB). Given the investments being made in flash/SSD, tape and HDD and the costs of each solution over time, a flape architecture makes sense, especially when looking at the manageability and environmentals of the solution. As shown in Figure 1 above, flape solutions can enable IT to turn unmanageable, infrequently accessed "big data" into a true business asset.
Footnotes: