Most users have no idea that reading electronic information from a data storage medium like a hard disk drive (HDD) or solid state drive (SSD) is plagued with read errors. For this reason error correction codes (ECC) are used to fix the random bit errors that arise during the reading process before the incorrect data is returned to the user. But the error correction codes can only handle so many errors at one time. If data errors exceed the ECC limits, the data goes uncorrected and is lost forever. More recent ECC algorithms like the LSI SHIELD error correction technology go a lot farther to protect user data than prior solutions.
What happens to the data when the ECC fails?
If the ECC fails, only a backup protection mechanism will recover the data. There are three alternatives. First, users should always back up their critical data since ECC failure and other threats can destroy data or render it inaccessible such as natural disasters (earthquakes, tornadoes, flooding etc.) that cause heavy damage to buildings and their contents, lightning overloading that can burn up a computer without adequate electrical protection, and of course computer theft. Any backup system should be either automated or at least run consistently if it is manual. Industry reports cite that less than 10% of computer users back up their data. That is not very comforting.
The second solution is to employ a RAID (Redundant Array of Independent Disks) array that uses multiple storage devices with one or more of the drives acting as a parity device to provide redundancy. That way if one drive fails, the redundant drive provides enough parity information to restore the original data. This type of system is very common in enterprise environments—a work computer—but hardly used in home systems or laptop PC.
Is the third solution simple, automatic, and operable in a single-drive environment?
Yes. Yes. And Yes. LSI® SandForce® flash and SSD controllers have a feature called RAISE™ data protection that meets all of these needs. Introduced in 2009 with the first SandForce controller, RAISE technology stands for Redundant Array of Independent Silicon Elements. It sounds like RAID, and acts something like RAID, but protects data using a single drive. With RAISE technology, the individual flash die act like the drives in a RAID array. The original RAISE level 1 technology protects against single page and block failures in the flash. These types of failures are beyond the protection of the ECC, but RAISE technology can recover the data.
With the introduction of the SF3700 this month, RAISE technology now offers more flexibility to deliver greater data protection. With the original RAISE level 1, the space of a full flash die had to be allocated solely to protect user data. In small-capacity configurations, like 64GB, RAISE level 1 required too much over provisioning and therefore had to be disabled or, with RAISE left on, its available capacity reduced to 60GB or 55GB. With a new enhancement to the SF3700, no such tradeoff is necessary. The new Fractional RAISE option for this first level of protection uses only a small portion of a die to protect user data in even the smallest configurations and preserve over provisioning (OP). This is important because, as I explained in my blog titled Gassing up your SSD, the more space you allocate for OP, the lower the write amplification, which translates to higher performance during writes and longer endurance of the flash memory.
Stronger data protection with RAISE level 2
A new RAISE level 2 capability offers even stronger data protection, safeguarding against multiple, simultaneous page and block failures, as well as a full die failure. If a die fails, the SandForce controller recovers the user data. RAISE level 2 includes Auto-Reallocation that can be set up to automatically redistribute and protect user data in the event of a subsequent die failure. Because the option to protect against a second die failure would reduce the available OP area, the RAISE level 2 feature can be set up to simply drop back to RAISE level 1 protection without sacrificing any OP space. .
Another new capability is an additional (9th) flash channel that enables the manufacturer to populate an extra flash package with one die that enables full RAISE level 1 protection while maintaining maximum user data capacity such as 64GB, 128GB, 256GB, etc. Without the 9th channel option, the SSD capacity would be forced to sacrifice a few GBs of capacity (reducing available user capacity to 60GB, 120GB, 240GB, respectively) because RAISE requires extra storage space.
Although all these new features cannot protect against the would-be thief or catastrophic drive failures from electrical surges or natural disasters, the probability of those events is much lower than a simple ECC failure. That’s why you would be best suited to have an SSD with RAISE technology to automatically protect against the more common ECC failures and then make a backup copy of your system at least periodically to protect your data against those far more serious events.
Each new generation of NAND flash memory reduces the fabrication geometry – the dimension of the smallest part of an integrated circuit used to build up the components inside the chip. That means there are fewer electrons storing the data, leading to increased errors and a shorter life for the flash memory. No need to worry. Today’s flash memory depends upon the intelligence and capabilities of the solid state drive (SSD) controller to help keep errors in check and get the longest life possible from flash memory, making it usable in compute environments like laptop computers and enterprise datacenters.
Today’s volume NAND flash memory uses a 20nm and 19nm manufacturing process, but the next generation will be in the 16nm range. Some experts speculate that today’s controllers will struggle to work with this next generation of flash memory to support the high number of write cycles required in datacenters. Also, the current multi-level cell (MLC) flash memory is transitioning to triple-level cell (TLC), which has an even shorter life expectancy and higher error rates.
Can sub-20nm flash survive in the datacenter?
Yes, but it will take a flash memory controller with smarts the industry has never seen before. How intelligent? Sub-20nm flash will need to stretch the life of the flash memory beyond the flash manufacturer’s specifications and correct far more errors than ever before, while still maintaining high throughput and very low latency. And to protect against periodic error correction algorithm failures, the flash will need some kind of redundancy (backup) of the data inside the SSD itself.
When will such a controller materialize?
LSI this week introduced the third generation of its flagship SSD and flash memory controller, called the SandForce SF3700. The controller is newly engineered and architected to solve the lifespan, performance, and reliability challenges of deploying sub-20nm flash memory in today’s performance-hungry enterprise datacenters. The SandForce SF3700 also enables longer periods between battery recharges for power-sipping client laptop and ultrabook systems. It all happens in a single ASIC package. The SandForce SF3700 is the first SSD controller to include both PCIe and SATA host interfaces natively in one chip to give customers of SSD manufacturers an easy migration path as more of them move to the faster PCIe host interface.
How does the SandForce SF3700 controller make sub-20nm flash excel in the datacenter?
Our new controller builds on the award-winning capabilities of the current SandForce SSD and flash controllers. We’ve refined our DuraWrite™ data reduction technology to streamline the way it picks blocks, collects garbage and reduces the write count. You’ll like the result: longer flash endurance and higher read and write speeds.
The SandForce SF3700 includes SHIELD™ error correction, which applies LDPC and DSP technology in unique ways to correct the higher error rates from the new generations of flash memory. SHIELD technology uses a multi-level error correction schema to optimize the time to get the correct data. Also, with its exclusive Adaptive Code Rate feature, SHIELD leverages DuraWrite technology’s ability to span internal NAND flash boundaries between the user data space and the flash manufacturer’s dedicated ECC field. Other controllers only use one size of ECC code rate for flash memory – the one largest size designed to support the end of the flash’s life. Early in the flash life, a much smaller size ECC is required, and SHIELD technology scales down the ECC accordingly, diverting the remaining free space as additional over provisioning. SHIELD partially increases the ECC size over time as the flash ages to correct the increasing failures, but does not use the largest ECC size until the flash is nearly at the end of its life.
Why is this good? Greater over provisioning over the life of the SSD improves performance and increases the endurance. SHIELD also allows the ECC field to grow even larger after it reaches its specified end of life. The big takeaway: All of these SHIELD capabilities increase flash write endurance many times beyond the manufacturer’s specification. In fact at the 2013 Flash Memory Summit exposition in Santa Clara, CA, SHIELD was shown to extend the endurance of a particular Micron NAND flash by nearly six times.
That’s not all. The SandForce SF3700 controller’s RAISE™ data reliability feature now offers stronger protection, including full die failure and more options for protecting data on SSDs with low (e.g., 32GB & 64GB) and binary (256GB vs. 240GB) capacities.
So what about end user systems?
The beauty of all SandForce flash and SSD controllers is its onboard firmware, which takes the one common hardware component – the ASIC – and adapts it to the user’s storage environment. For example, in client applications the firmware helps the controller preserve SSD power to enable users of laptop and ultrabook systems to remain unplugged longer between battery recharges. In contrast, enterprise environments require the highest possible performance and lowest latency. This higher performance draws more power, a tradeoff the enterprise is willing to make for the fastest time-to-data. The firmware makes other similar tradeoffs based on which storage environment it is serving.
Although most people consider the enterprise and client storage needs are very diverse, we think the new SandForce SF3700 Flash and SSD controller showcases the perfect balance of power and performance that any user hanging ten can appreciate.