For the uninitiated, low-density parity-check (LDPC) code is an error correction code (ECC) that is used to both detect and correct errors on data that is transmitted from one point to another. All ECC types include correction data, so when information is transmitted with errors, the receiver has enough information to fix the errors without having to ask the source for the data again.
This enables transmitted data to maintain a constant speed as is required with digital television signals. What you don’t want is for the image to freeze repeatedly while waiting for correction data to be sent multiple times.
LDPC code was first presented to the world by Robert G. Gallager at MIT in 1960. It was very advanced for its time and, as it turned out, required a fantastic amount of computation to use real-time. The problem was that back in 1960, vacuum tube computers of that period performed about 100 times less work than the microprocessor-powered computers of today. In 1960, you would need a computer the size of a 2,000-square-foot house to process the LDPC correction information in real-time. This was hardly economical, so LDPC was mostly lost for nearly 40 years, as other simpler codes took its place over that period.
What was old is new again
In the mid-1990s, engineers working on satellite transmissions for digital television dusted off the LDPC codes and started using them for real-time operations. By then, computer processing had seen dramatic reductions in size and costs. Fast-forward to the past 5 years: we have seen a major increase in LDPC development and use because it appears to be the best solution for high-speed data transmissions, especially those subject to heavy levels of electrical noise that induce higher error rates. Also, the processing power of target devices like WiFi receivers and HDDs has grown even stronger and faster than some of the main CPUs of a few years ago. This enables LDPC to be deployed for little additional cost with the advantage of real-time data correction superior to correction offered by simpler codes.
If you have seen one LDPC solution, have you seen them all?
Nothing could be further from the truth. For example, an LDPC solution designed for satellite communication cannot be used for HDDs since there is no direct porting of the code, though there are distinct advantages to the two engineering teams sharing their knowledge and experience through their development efforts. Take, for example, the LDPC code that LSI has been shipping in its TrueStore® HDD read channel solutions for 3 years now. When LSI acquired SandForce and started work on SHIELD™ error correction code (based on LDPC) for flash controllers, there was no direct porting of that HDD code to support SSDs. However, the HDD development team’s knowledge and experience from creating the HDD code greatly improved the SSD team’s ability to more quickly bring SHIELD technology to the next-generation SandForce flash controller.
How do LDPC solutions for SSDs differ?
Many LDPC providers claim that their offerings rival the capabilities of competitive solutions, though often they aren’t telling the whole story. All LDPC solutions start with what is called hard-decision LDPC – a digital correction algorithm that operates at line rate on all data passing through the correction engine. The algorithm uses the meta-data generated from the user and system data stored on the flash memory, and helps recreate the user data when the flash memory returns it with errors. Hard-decision LDPC catches most errors from the flash memory, though sometimes it can be overwhelmed by an inordinate number of errors. That is where soft-decision LDPC – a more analog-based correction algorithm – comes into play.
Can a soft-decision be strong enough for my data?
Soft-decision LDPC is an error correction method that looks at other information beyond the actual ECC data. Soft-decision, in a sense, looks at the meta-data of the meta-data. The simplest form of soft-decision LDPC may just re-read the data at a different reference voltage, as if asking a person “can you say that again?” More complex soft-decision might be compared to listening to a man with a heavy French accent speaking English. You know he just said something in English, but you could not clearly grasp what he said. You ask some questions and, from his answers, soon realize what he originally said and are now back on track. While this might seem more like guessing at the answer, soft-decision LDPC uses statistics to help ensure the answers are not false positive results. As a result, soft-decision LDPC uncovers a new set of engineering problems that need to be solved, opening new opportunities for flash controller manufacturers to create powerful intellectual property (IP). For that reason, you’re likely to learn very little about how a given company’s soft-decision LDPC works.
At the 2013 Flash Memory Summit in Santa Clara, California, LSI demonstrated its SHIELD Advanced Error Correction Technology. SHIELD technology includes hard and soft-decision LDPC with digital signal processing (DSP) and a number of other unique features designed to optimize future NAND flash memory operation in compute environments. One feature, called Adaptive Code Rate, works with other LSI features to enable the spare area in flash memory reserved for ECC data to occupy less space than the manufacturer allocation and then dynamically grow to accommodate inevitable increases in flash error rates. The soft-decision LDPC capability offers multiple strengths of correction, with each activating only as necessary to ensure the lowest possible real-time latency.
So it’s clear that all LDPC solutions are far from the same. When evaluating LDPC solutions, be sure to understand how they manage data correction when it exceeds the ability of the hard-decision LDPC. Also, make sure the algorithms are actually in use in a product. Otherwise, the product might turn out to be a science experiment that never works.
Tags: Adaptive Code Rate, ECC, error correction, flash controllers, flash memory, Flash Memory Summit, LDPC, NAND flash, Robert Gallager, SandForce, SHIELD, solid state drive, SSD, TrueStore
Imagine a bathtub full of water and asking someone to empty the tub while you turn your back for a moment. When you look again and see the water is gone, do you just assume someone pulled the drain plug?
I think most people would, but what about the other methods of removing the water like with a siphon, or using buckets to bail out the water? In a typical bathroom you are not likely to see these other methods used, but that does not mean they do not exist. The point is that just because you see a certain result does not necessarily mean the obvious solution was used.
I see a lot of confusion in forum posts from SandForce Driven™ SSD users and reviewers over how the LSI® DuraWrite™ data reduction and advanced garbage collection technology relates to the SATA TRIM command. In my earlier blog on TRIM I covered this topic in great detail, but in simple terms the operating system uses the TRIM command to inform an SSD what information is outdated and invalid. Without the TRIM command the SSD assumes all of the user capacity is valid data. I explained in my blog Gassing up your SSD that creating more free space through over provisioning or using less of the total capacity enables the SSD to operate more efficiently by reducing the write amplification, which leads to increased performance and flash memory endurance. So without TRIM the SSD operates at its lowest level of efficiency for a particular level of over provisioning.
Will you drown in invalid data without TRIM?
TRIM is a way to increase the free space on an SSD – what we call “dynamic over provisioning” – and DuraWrite technology is another method to increase the free space. Since DuraWrite technology is dependent upon the entropy (randomness) of the data, some users will get more free space than others depending on what data they store. Since the technology works on the basis of the aggregate of all data stored, boot SSDs with operating systems can still achieve some level of dynamic over provisioning even when all other files are at the highest entropy, e.g., encrypted or compressed files.
With an older operating system or in an environment that does not support TRIM (most RAID configurations), DuraWrite technology can provide enough free space to offer the same benefits as having TRIM fully operational. In cases where both TRIM and DuraWrite technology are operating, the combined result may not be as noticeable as when they’re working independently since there are diminishing returns when the free space grows to greater than half of the SSD storage capacity.
So the next time you fill your bathtub, think about all the ways you can get the water out of the tub without using the drain. That will help you remember that both TRIM and DuraWrite technology can improve SSD performance using different approaches to the same problem. If that analogy does not work for you, consider the different ways to produce a furless feline, and think about what opening graphic image I might have used for a more jolting effect. Although in that case you might not have seen this blog since that image would likely have gotten us banned from Google® “safe for work” searches.
I presented on this topic in detail at the Flash Memory Summit in 2011. You can read it in full here: http://www.lsi.com/downloads/Public/Flash%20Storage%20Processors/LSI_PRS_FMS2011_T1A_Smith.pdf
Tags: bathtub drain, controller, data reduction technology, DuraWrite, flash, flash controller, flash memory, Flash Memory Summit, NAND, over-provisioning, SandForce, SandForce Driven SSD, SATA, Serial ATA, solid state drive, TRIM
Have you ever run out of gas in your car? Do you often risk running your gas tank dry? Hopefully you are more cautious than that and you start searching for a gas station when you get down to a ¼ tank. You do this because you want plenty of cushion in case something comes up that prevents you from getting to a station before it is too late.
The reason most people stretch their tank is to maximize travel between station visits. The downside to pushing the envelope to “E” is you can end up stranded with a dead vehicle waiting for AAA to bring you some gas.
Now most people know you don’t put gas in a solid state drive (SSD), but the pros and cons associated with how much you leave in the “tank” is very relevant to SSDs.
To understand how these two seemingly unrelated technologies are similar, we first need to drill into some technical SSD details. To start, SSDs act, and often look, like traditional hard disk drives (HDDs), but they do not record data in the same way. SSDs today typically use NAND flash memory to store data and a flash controller to connect the memory with the host computer. The flash controller can write a page of data (often 4,096 bytes) directly to the flash memory, but cannot overwrite the same page of data without first erasing it. The erase cycle cannot expunge only a single page. Instead, it erases a whole block of data (usually 128 pages). Because the stored data is sometimes updated randomly across the flash, the erase cycle for NAND flash requires a process called garbage collection.
Garbage collection is just dumping the trash
Garbage collection starts when a flash block is full of data, usually a mix of valid (good) and invalid (older, replaced) data. The invalid data must be tossed out to make room for new data, so the flash controller copies the valid data of a flash block to a previously erased block, and skips copying the invalid data of that block. The final step is to erase the original whole block, preparing it for new data to be written.
Before and during garbage collection, some data – valid data copied during garbage collection and the (typically) multiple copies of the invalid data – is in two or more locations at once, a phenomenon known as write amplification. To store this extra data not counted by the operating system, the flash controller needs some spare capacity beyond what the operating system knows. This is called over-provisioning (OP), and it is a critical part of every NAND flash-based SSD.
Over-provisioning is like the gas that remains in your tank
While every SSD has some amount of OP, some will have more or less than others. The amount of OP varies depending on trade-offs made between total storage capacity and benefits in performance and endurance. The less OP allocated in an SSD, the more information a user can store. This is like the driver who will take their tank of gas clear down to near-empty just to maximize the total number of miles between station visits.
What many SSD users don’t realize is there are major benefits to NOT stretching this OP area too thin. When you allocate more space for OP, you achieve a lower write amplification, which translates to a higher performance during writes and longer endurance of the flash memory. This is like the driver who is more cautious and visits the gas station more often to enable greater flexibility in selecting a more cost-effective station, and allows for last-minute deviations in travel plans that end up burning more fuel than originally anticipated.
The choice is yours
Most SSD users do not realize they have full control of how much OP is configured in their SSD. So even if you buy an SSD with “0%” OP, you can dedicate some of the user space back to OP for the SSD.
A more detailed presentation of how OP works and what 0% OP really means was presented at the Flash Memory Summit 2012 and can be viewed with this link for your convenience: http://www.lsi.com/downloads/Public/Flash%20Storage%20Processors/LSI_PRS_FMS2012_TE21_Smith.pdf
It pays to be the cautious driver who fills the gas tank long before you get to empty. When it comes to both performance and endurance, your SSD will cover a lot more ground if you treat the over-provisioning space the same way – keeping more in reserve.