Back in the 1990s, a new paradigm was forced into space exploration. NASA faced big cost cuts. But grand ambitions for missions to Mars were still on its mind. The problem was it couldn’t dream and spend big. So the NASA mantra became “faster, better, cheaper.” The idea was that the agency could slash costs while still carrying out a wide variety of programs and space missions. This led to some radical rethinks, and some fantastically successful programs that had very outside-the-box solutions. (Bouncing Mars landers anyone?)
That probably sounds familiar to any IT admin. And that spirit is alive at LSI’s AIS – The Accelerating Innovation Summit, which is our annual congress of customers and industry pros, coming up Nov. 20-21 in San Jose. Like the people at Mission Control, they all want to make big things happen… without spending too much.
Take technology and line of business professionals. They need to speed up critical business applications. A lot. Or IT staff for enterprise and mobile networks, who must deliver more work to support the ever-growing number of users, devices and virtualized machines that depend on them. Or consider mega datacenter and cloud service providers, whose customers demand the highest levels of service, yet get that service for free. Or datacenter architects and managers, who need servers, storage and networks to run at ever-greater efficiency even as they grow capability exponentially.
(LSI has been working on many solutions to these problems, some of which I spoke about in this blog.)
It’s all about moving data faster, better, and cheaper. If NASA could do it, we can too. In that vein, here’s a look at some of the topics you can expect AIS to address around doing more work for fewer dollars:
And, I think you’ll find some astounding products, demos, proof of concepts and future solutions in the showcase too – not just from LSI but from partners and fellow travelers in this industry. Hey – that’s my favorite part. I can’t wait to see people’s reactions.
Since they rethought how to do business in 2002, NASA has embarked on nearly 60 Mars missions. Faster, better, cheaper. It can work here in IT too.
Tags: 12Gb/s SAS, AIS, big data analytics, cloud infrastructure, cloud services, datacenter, flash, flash memory, hyperscale datacenters, NAS, NASA, SAN, SDN, shareable DAS, software-defined networks, sub-20nm flash, triple-level cell flash, VDI, web 2.0
All NAND flash-based SSDs use a process called garbage collection (GC) so the flash memory can be rewritten with fresh data, enabling the SSD to function like any other rewritable storage device. The number of rewrites (program/erase cycles) possible with NAND flash memory is finite. That’s why it’s essential to ensure that each P/E cycle really counts – that is, each is performed with top efficiency – to help preserve optimum SSD performance.
Collecting the garbage takes time
In my 2011 Flash Memory Summit presentation , I went into great detail about how GC – the automatic memory management process of clearing invalid data from memory to give new data a clean slate – works in an SSD. Here’s a recap: flash memory is organized in groups of pages where data can be written. Once a page is written, it cannot be rewritten until it is erased. Simple enough. But a page can only be erased within a group of typically 128 pages called a block. But wait. The complexity of writing data really starts to escalate in the case of random writes replacing previously written data. Random writes put the new data in previously erased pages elsewhere, peppering a block of valid data with “patches of invalid data.” In order to write new data to these patches, the whole block – all 128 pages – must be erased. But first all surrounding pages with valid data must be read and then rewritten to blank pages. The newly erased block of blank pages is then ready to save new data.
So why’s this a problem? This rewriting process shares the same path to the flash memory as new data arriving from the host system. You see the issue – a bottleneck. What you may not know is that this traffic jam can severely degrade overall write performance, sometimes as much as 90%.
Why not collect the garbage when the SSD is idle?
To improve write performance, many SSDs perform idle-time GC or background GC. When the SSD is idle – not performing reads or writes from the host system – the data paths to the flash memory are open. In a perfect world, the SSD controller would move all valid data into a contiguous group of new blocks so that all the free space would be consolidated into a few very large areas. Then, when new data arrived, the controller would send it directly to the fresh blocks and be spared from having to move data around just to free up space on pages of invalid data. But the world is far from perfect.
No free lunch, even from the garbage can
As might be expected, background or idle-time GC has drawbacks. The two main downsides are:
1. For users of Ultrabooks and other mobile systems, battery power is precious. The longer users can work unplugged between battery charges, the better. To make the most of a single charge, these systems use features like DevSleep to drastically reduce power to internal components not in use. At times when no data is being stored to or retrieved from the SSD, the host system gears down the SSD into a low power state (like DevSleep) to reduce power draw. In this state, an SSD with background or idle-time GC has no power to perform GC.
The upshot is that the SSD will be very slow when the system turns it back on and starts sending new data that must be saved in the spaces not yet cleared out by garbage-collected while the SSD was asleep. Alternately, the SSD may temporarily override the low power command from the host in order to perform the background GC, pulling more power from the battery and shortening the time remaining before it needs to be recharged.
2. When the SSD is performing GC, invalid data is ignored and only valid data is moved before the block is erased. Now imagine a large 2GB file on the SSD that the user plans to delete tomorrow. The SSD has no clue this will happen, so it automatically performs background GC around the 2GB file – and all other data – today, consuming one more of the very limited, precious program/erase cycles for all the flash holding that data. Ideally, the SSD would have waited one more day to garbage collect, the user would have already deleted the file, and the SSD wouldn’t have had to move all that data to new locations. No unnecessary data movement. No unnecessary use of a precious program/erase cycle.
Many people don’t realize that the number of background reads and writes initiated by the operating system, virus checkers, browsers, etc., far outstrip the number initiated by the computer user. Some users rarely delete files, believing they’ll extend the life of their SSD. The truth is, user file deletions are a drop in the bucket. It’s not their use of the computer storage that causes the most wear and tear. Background action from applications and the operating system is the real culprit.
Is there a better option?
What’s an SSD user to do? A super-fast foreground GC engine is the best solution. The key is special hardware and firmware integrated into the flash controller that streamlines garbage collection so it can run in the foreground with incoming data. The engine also enables high-speed writes to the flash memory. By maintaining high write speeds for GC operations, the SSD can afford to leave all valid data mixed with invalid data. That way the blocks are not recycled until absolutely necessary, dramatically reducing wear. Plus, the longer the SSD waits to GC pages, the higher the likelihood other pages of data will have already been made invalid by the operating system or user. The result is lower write amplification, longer flash endurance, and even higher performance.
All LSI® SandForce® Flash Controllers employ foreground GC to provide these invaluable benefits to the user.
Where did my email go?
This week I was dragged into to the virtualized cloud kicking and screaming … well, sort of. LSI has moved me, and all my co-workers, from my nice, safe local Exchange server to one in the amorphous, mysterious cloud. Scary. My IT department says the new cloud email is a great thing. They are now promising me unlimited email storage. Long gone will be the days of harrowing emails telling me I am approaching my storage limit and soon will be unable to send new email.
With cloud email, I can keep everything forever! I am not quite sure that saving mountains of email will be a good thing :-). Other than having to redirect my tablet and smartphone to this new service, update my webmail bookmark and empty my email inbox, there was not much I had to do. So far, so good. I have not experienced any challenges or performance issues. A key reason is flash storage.
To be sure, virtualization is a great tool for improving physical server utilization and flexibility as well as reducing power, cooling and datacenter footprint costs. That’s why the use of virtualization for email, databases and desktops is growing dramatically. But virtualized servers are only as effective as the storage performance that supports them. If, as a datacenter manager, your clients cannot access their application data quickly or boot their virtual desktop in a reasonable time, your company’s productivity and client satisfaction can drop dramatically.
Today, applications most likely run on virtualized servers. The upside of server virtualization is that a company can improve server utilization and run more applications on fewer physical servers. This can reduce power consumption, make more efficient use of datacenter floor space and make it easier to configure servers and deploy applications. The cloud also helps streamline application development, allowing companies to more efficiently and cost effectively test software applications across a broad set of configurations and operating systems.
A heated dispute – storage contention
Once application testing is complete, a virtual server’s configuration and image can be put on a virtual shelf until they are needed again, freeing up memory, processing, storage and other resources on the physical server for new virtual servers with just a few keystrokes. But with virtualization and the cloud there can be downsides, like slow performance – especially storage performance.
When a number of virtual servers are all using the same physical storage, there can be infighting for storage capacity, generally known as storage contention. These internecine battles can slow application response to a frustrating glacial pace and lead to issues like VDI Boot and Login Storm that can extend the time it takes for users to login to tens of minutes.
Flash helps alleviate slowdowns in storage performance
Here is where flash comes to the rescue. New flash storage solutions are being deployed to help improve virtualized storage performance and alleviate productivity-sapping slowdowns caused by VDI Boot and Login Storm — the crush of end users booting up or logging in within a small window that overwhelms the server with data requests and degrades response times. Flash can be used as primary storage inside servers running virtual machines to dramatically speed storage response time. Flash can also be deployed as an intelligent cache for DAS- or SAN-connected storage and even as an external shared storage pool.
It’s clear that virtualization will require higher storage performance and better, more cost-effective ways to deploy flash storage. But how much flash you need depends on your particular virtualization challenge, configuration and of course budget: while flash storage is extremely fast, it is costlier than disk-based storage. So choosing the right storage acceleration solution – one is LSI® Nytro™ Application Acceleration – can be as important as choosing the right cloud provider for your company’s email.
While my email is now stored in the cloud in Timbuktu, I know the flash storage solutions in that datacenter help keep my mail quickly accessible 24/7 whether I access it from my computer, tablet or smartphone, giving my productivity a boost. I can be assured that every action item I am sent will quickly make it to my inbox and be added to my ever-growing to-do list. Now my next big challenge is to improve my own response performance to those email requests!
Most consumers are skeptical when they see a manufacturer whipping out grandiose performance claims. And for good reason. The manufacturer could be stretching the truth, twisting the results, or just being downright misleading. From this distrust grew demand for 3rd-party writers to review products, test claims and provide an unbiased analysis of the device’s performance and other capabilities – as consumers would experience themselves.
Who can really claim to be an SSD benchmarking expert?
Solid state drive (SSD) technology is still relatively new in the computer industry, and in many ways SSDs are profoundly different from hard disk drives – perhaps most notably, in the way they record data, to a NAND cell rather than on spinning media. Because of differences in their operation, SSDs have to be tested in ways that are not necessarily obvious.
Can anyone who simply runs a benchmark application claim to be an expert? I would say not. Just as anyone sitting behind the wheel of a car is not necessarily an expert driver. The problem is that it is hard to determine the thoroughness and expertise of an SSD reviewer. Does the author really understand the details behind the technology to run adequate tests and analyze the results?
Can “experts” present bad data?
Maybe it is obvious, but of course experts can be wrong, especially when they are self-proclaimed mavens without deep experience in the technology they cover. At a minimum, you can generally count on them to act in good faith – that is, to not be intentionally misleading – but they can easily be misinformed (for instance, by manufacturers) and perpetuate the misinformation. What’s more, some reviewers are pressured to do a cursory analysis of an SSD as they crank through countless product evaluations under unremitting deadlines – a crush that can cause oversights in telling aspects of a drive’s performance. In any case, it is not good to rely on bad data no matter the intention.
What makes for a thorough SSD review?
Some reviewers have gone to great lengths to ensure their SSD analysis is extremely detailed and represents a real-world environment and performance. These reviewers will generally talk about how their analysis simulates a true user or server environment. The trouble can begin if a reviewer doesn’t recognize normal operation of an SSD in its own environment. With SSDs, “normal” is when garbage collection is operating, which greatly impacts overall performance. It’s important for reviewers to recognize that, with a new SSD, garbage collection is inactive until at least one full physical capacity of data has been sequentially written to the device. For example, with a 256GB SSD, 256GB of data must be written to trigger garbage collecting. At that point, garbage collection is ongoing, the drive has reached its steady-state performance, and the device is ready for evaluation. Random writes are another story, requiring up to three passes (full-capacity writes) randomly written to the SSD before the steady-state performance level shown below is reached.
You can see that running only a few minutes of random write tests on this SSD logs performance of over 275 MB/s. However, once garbage collection starts, performance plunges and then takes up to 3 hours before the true performance of 25 MB/s (a 90% drop) is finally evident – a phenomenon that often is not communicated clearly in reviews nor widely understood.
Good benchmarkers will discuss how their review factors in both garbage collection preparation and steady-state performance testing. Test results that purportedly achieve steady state in less time than in the example above are unlikely to reflect real-world performance. This is all part of what is called SSD preconditioning, but keep in mind that different tests require different steps for preconditioning.
For additional information on this topic, you can review my presentation from Flash Memory Summit 2013 on “Don’t let your favorite benchmarks lie to you.”
For the uninitiated, low-density parity-check (LDPC) code is an error correction code (ECC) that is used to both detect and correct errors on data that is transmitted from one point to another. All ECC types include correction data, so when information is transmitted with errors, the receiver has enough information to fix the errors without having to ask the source for the data again.
This enables transmitted data to maintain a constant speed as is required with digital television signals. What you don’t want is for the image to freeze repeatedly while waiting for correction data to be sent multiple times.
LDPC code was first presented to the world by Robert G. Gallager at MIT in 1960. It was very advanced for its time and, as it turned out, required a fantastic amount of computation to use real-time. The problem was that back in 1960, vacuum tube computers of that period performed about 100 times less work than the microprocessor-powered computers of today. In 1960, you would need a computer the size of a 2,000-square-foot house to process the LDPC correction information in real-time. This was hardly economical, so LDPC was mostly lost for nearly 40 years, as other simpler codes took its place over that period.
What was old is new again
In the mid-1990s, engineers working on satellite transmissions for digital television dusted off the LDPC codes and started using them for real-time operations. By then, computer processing had seen dramatic reductions in size and costs. Fast-forward to the past 5 years: we have seen a major increase in LDPC development and use because it appears to be the best solution for high-speed data transmissions, especially those subject to heavy levels of electrical noise that induce higher error rates. Also, the processing power of target devices like WiFi receivers and HDDs has grown even stronger and faster than some of the main CPUs of a few years ago. This enables LDPC to be deployed for little additional cost with the advantage of real-time data correction superior to correction offered by simpler codes.
If you have seen one LDPC solution, have you seen them all?
Nothing could be further from the truth. For example, an LDPC solution designed for satellite communication cannot be used for HDDs since there is no direct porting of the code, though there are distinct advantages to the two engineering teams sharing their knowledge and experience through their development efforts. Take, for example, the LDPC code that LSI has been shipping in its TrueStore® HDD read channel solutions for 3 years now. When LSI acquired SandForce and started work on SHIELD™ error correction code (based on LDPC) for flash controllers, there was no direct porting of that HDD code to support SSDs. However, the HDD development team’s knowledge and experience from creating the HDD code greatly improved the SSD team’s ability to more quickly bring SHIELD technology to the next-generation SandForce flash controller.
How do LDPC solutions for SSDs differ?
Many LDPC providers claim that their offerings rival the capabilities of competitive solutions, though often they aren’t telling the whole story. All LDPC solutions start with what is called hard-decision LDPC – a digital correction algorithm that operates at line rate on all data passing through the correction engine. The algorithm uses the meta-data generated from the user and system data stored on the flash memory, and helps recreate the user data when the flash memory returns it with errors. Hard-decision LDPC catches most errors from the flash memory, though sometimes it can be overwhelmed by an inordinate number of errors. That is where soft-decision LDPC – a more analog-based correction algorithm – comes into play.
Can a soft-decision be strong enough for my data?
Soft-decision LDPC is an error correction method that looks at other information beyond the actual ECC data. Soft-decision, in a sense, looks at the meta-data of the meta-data. The simplest form of soft-decision LDPC may just re-read the data at a different reference voltage, as if asking a person “can you say that again?” More complex soft-decision might be compared to listening to a man with a heavy French accent speaking English. You know he just said something in English, but you could not clearly grasp what he said. You ask some questions and, from his answers, soon realize what he originally said and are now back on track. While this might seem more like guessing at the answer, soft-decision LDPC uses statistics to help ensure the answers are not false positive results. As a result, soft-decision LDPC uncovers a new set of engineering problems that need to be solved, opening new opportunities for flash controller manufacturers to create powerful intellectual property (IP). For that reason, you’re likely to learn very little about how a given company’s soft-decision LDPC works.
At the 2013 Flash Memory Summit in Santa Clara, California, LSI demonstrated its SHIELD Advanced Error Correction Technology. SHIELD technology includes hard and soft-decision LDPC with digital signal processing (DSP) and a number of other unique features designed to optimize future NAND flash memory operation in compute environments. One feature, called Adaptive Code Rate, works with other LSI features to enable the spare area in flash memory reserved for ECC data to occupy less space than the manufacturer allocation and then dynamically grow to accommodate inevitable increases in flash error rates. The soft-decision LDPC capability offers multiple strengths of correction, with each activating only as necessary to ensure the lowest possible real-time latency.
So it’s clear that all LDPC solutions are far from the same. When evaluating LDPC solutions, be sure to understand how they manage data correction when it exceeds the ability of the hard-decision LDPC. Also, make sure the algorithms are actually in use in a product. Otherwise, the product might turn out to be a science experiment that never works.
Tags: Adaptive Code Rate, ECC, error correction, flash controllers, flash memory, Flash Memory Summit, LDPC, NAND flash, Robert Gallager, SandForce, SHIELD, solid state drive, SSD, TrueStore
Part two of this Write Amplification (WA) series covered how WA works in solid-state drives (SSDs) that use data reduction technology. I mentioned that, with one of these SSDs, the WA can be less than one, which can greatly improve flash memory performance and endurance.
Why is it important to know your SSD write amplification?
Well, it’s not really necessary to know the write amplification of your SSD at any particular point in time, but you do want an SSD with the lowest WA available. The reason is that the limited number of program/erase cycles that NAND flash can support keeps dropping with each generation of flash developed. A low WA will ensure the flash memory lasts longer than flash on an SSD with a higher WA.
A direct benefit of a WA below one is that the amount of dynamic over provisioning is higher, which generally provides higher performance. In the case of over provisioning, more is better, since a key attribute of SSD is performance. Keep in mind that, beyond selecting the best controller, you cannot control the WA of an SSD.
Just how smart are the SSD SMART attributes?
The monitoring system SMART (Self-Monitoring, Analysis and Reporting Technology) tracks various indicators of hard disk solid state drive reliability – including the number of errors corrected, bytes written, and power-on hours – to help anticipate failures, enabling users to replace storage before a failure causes data loss or system outages.
Some of these indicators, or attributes, point to the status of the drive health and others provide statistical information. While all manufacturers use many of these attributes in the same or a similar way, there is no standard definition for each attribute, so the meaning of any attribute can vary from one manufacturer to another. What’s more, there’s no requirement for drive manufacturers to list their SMART attributes.
How to measure missing attributes by extrapolation
Most SSDs provide some list of SMART attributes, but WA typically is excluded. However, with the right tests, you can sometimes extrapolate, with some accuracy, the WA value. We know that under normal conditions, an SSD will have a WA very close to 1:1 when writing data sequentially.
For an SSD with data reduction technology, you must write data with 100% entropy to ensure you identify the correct attributes, then rerun the tests with an entropy that matches your typical data workload to get a true WA calculation. SSDs without data reduction technology do not benefit from entropy, so the level of entropy used on them does not matter.
To measure missing attributes by extrapolation, start by performing a secure erase of the SSD, and then use a program to read all the current SMART attribute values. Some programs do not accurately display the true meaning of an attribute simply because the attribute itself contains no description. For you to know what each attribute represents, the program reading the attribute has to be pre-programmed by the manufacturer. The problem is that some programs mislabel some attributes. Therefore, you need to perform tests to confirm the attributes’ true meaning.
Start writing sequential data to the SSD, noting how much data is being written. Some programs will indicate exactly how much data the SSD has written, while others will reveal only the average data per second over a given period. Either way, the number of bytes written to the SSD will be clear. You want to write about 10 or more times the physical capacity of the SSD. This step is often completed with IOMeter, VDbench, or other programs that can send large measurable quantities of data.
At the end of the test period, print out the SMART attributes again and look for all attributes that have a different value than at the start of the test. Record the attribute number and the difference between the two test runs. You are trying to find one that represents a change of about 10, or the number of times you wrote to the entire capacity of the SSD. The attribute you are trying to find may represent the number of complete program/erase cycles, which would match your count almost exactly. You might also find an attribute that is counting the number of gigabytes (GBs) of data written from the host. To match that attribute, take the number of times you wrote to the entire SSD and multiply by the physical capacity of the flash. Technically, you already know how much you wrote from the host, but it is good to have the drive confirm that value.
Doing the math
When you find candidates that might be a match (you might have multiple attributes), secure erase the drive again, this time writing randomly with 4K transfers. Again, write about 10 times the physical capacity of the drive, then record the SMART attributes and calculate the difference from the last recording of the same attributes that changed between the first two recordings. This time, the change you see in the data written from the host should be nearly the same as with the sequential run. However, the attribute that represents the program/erase cycles (if present) will be many times higher than during the sequential run.
To calculate write amplification, use this equation:
( Number of erase cycles x Physical capacity in GB ) / Amount of data written from the host in GB
With sequential transfers, this number should be very close to 1. With random transfers, the number will be much higher depending on the SSD controller. Different SSDs will have different random WA values.
If you have an SSD with the type of data reduction technology used in the LSI® SandForce® controller, you will see lower and lower write amplification as you approach your lowest data entropy when you test with any entropy lower than 100%. With this method, you should be able to measure the write amplification of any SSD as long as it has erase cycles and host data-written attributes or something that closely represents them.
Protect your SSD against degraded performance
The key point to remember is that write amplification is the enemy of flash memory performance and endurance, and therefore the users of SSDs. This three-part series examined all the elements that affect WA, including the implications and advantages of a data reduction technology like the LSI SandForce DuraWrite™ technology. Once you understand how WA works and how to measure it, you will be better armed to defend yourself against this beastly cause of degraded SSD performance.
This three-part series examines some of the details behind write amplification, a key property of all NAND flash-based SSDs that is often misunderstood in the industry.
In part one of this Write Amplification (WA) series, I examined how WA works in basic solid-state drives (SSDs). Part two now takes a deeper look at how SSDs that use some form of data reduction technology can see a very big and positive impact on WA.
Data reduction technology can master data entropy
The performance of all SSDs is influenced by the same factors – such as the amount of over provisioning and levels of random vs. sequential writing – with one major exception: entropy. Only SSDs with data reduction technology can take advantage of entropy – the degree of randomness of data – to provide significant performance, endurance and power-reduction advantages.
Data reduction technology parlays data entropy (not to be confused with how data is written to the storage device – sequential vs. random) into higher performance. How? When data reduction technology sends data to the flash memory, it uses some form of data de-duplication, compression, or data differencing to rearrange the information and use fewer bytes overall. After the data is read from flash memory, data reduction technology, by design, restores 100% of the original content to the host computer This is known as “loss-less” data reduction and can be contrasted with “lossy” techniques like MPEG, MP3, JPEG, and other common formats used for video, audio, and visual data files. These formats lose information that cannot be restored, though the resolution remains adequate for entertainment purposes.
The multi-faceted power of data reduction technology
My previous blog on data reduction discusses how data reduction technology relates to the SATA TRIM command and increases free space on the SSD, which in turn reduces WA and improves subsequent write performance. With a data-reduction SSD, the lower the entropy of the data coming from the host computer, the less the SSD has to write to the flash memory, leaving more space for over provisioning. This additional space enables write operations to complete faster, which translates not only into a higher write speed at the host computer but also into lower power use because flash memory draws power only while reading or writing. Higher write speeds also mean lower power draw for the flash memory.
Because data reduction technology can send less data to the flash than the host originally sent to the SSD, the typical write amplification factor falls below 1.0. It is not uncommon to see a WA of 0.5 on an SSD with this technology. Writing less data to the flash leads directly to:
Each of these in turn produces other benefits, some of which circle back upon themselves in a recursive manner. This logic diagram highlights those benefits.
So this is a rare instance when an amplifier – namely, Write Amplification – makes something smaller. At LSI, this unique amplifier comes in the form of the LSI® DuraWrite™ data reduction technology in all SandForce® Driven™ SSDs.
This three-part series examines some of the details behind write amplification, a key property of all NAND flash-based SSDs that is often misunderstood in the industry.
In today’s solid state drives (SSDs), the NAND flash memory must be erased before it can store new data. In other words, data cannot be overwritten directly as it is in a hard disk drive (HDD). Instead, SSDs use a process called garbage collection (GC) to reclaim the space taken by previously stored data. This means that write demands are heavier on SSDs than HDDs when storing the same information.
This is bad because the flash memory in the SSD supports only a limited number of writes before it can no longer be read. We call this undesirable effect write amplification (WA). In my blog, Gassing up your SSD, I describe why WA exists in a little more detail, but here I will explain what controls it.
It’s all about the free space
I often tell people that SSDs work better with more free space, so anything that increases free space will keep WA lower. The two key ways to expand free space (thereby decreasing WA) are to 1) increase over provisioning and 2) keep more storage space free (if you have TRIM support).
As I said earlier, there is no WA before GC is active. However, this pristine pre-GC condition has a tiny life span – just one full-capacity write cycle during a “fresh-out-of-box” (FOB) state, which accounts for less than 0.04% of the SSD’s life. Although you can manually recreate this condition with a secure erase, the cost is an additional write cycle, which defeats the purpose. Also keep in mind that the GC efficiency and associated wear leveling algorithms can affect WA (more efficient = lower WA).
The other major contributor to WA is the organization of the free space (how data is written to the flash). When data is written randomly, the eventual replacement data will also likely come in randomly, so some pages of a block will be replaced (made invalid) and others will still be good (valid). During GC, valid data in blocks like this needs to be rewritten to new blocks. This produces another write to the flash for each valid page, causing write amplification.
With sequential writes, generally all the data in the pages of the block becomes invalid at the same time. As a result, no data needs relocating during GC since there is no valid data remaining in the block before it is erased. In this case, there is no amplification, but other things like wear leveling on blocks that don’t change will still eventually produce some write amplification no matter how data is written.
Calculating write amplification
Write Amplification is fundamentally the result of data written to the flash memory divided by data written by the host. In 2008, both Intel and SiliconSystems (acquired by Western Digital) were the first to start talking publically about WA. At that time, the WA of all SSDs was something greater than 1.0. It was not until SandForce introduced the first SSD controller with DuraWrite™ technology in 2009 that WA could fall below 1.0. DuraWrite technology increases the free space mentioned above, but in a way that is unique from other SSD controllers. In part two of my write amplification series, I will explain how DuraWrite technology works.
This three-part series examines some of the details behind write amplification, a key property of all NAND flash-based SSDs that is often misunderstood in the industry.
It seems like our smartphones are getting bigger and bigger with each generation. Sometimes I see people holding what appears to be a tablet computer up next to their head. I doubt they know how ridiculous they look to the rest of us, and I wonder what pants today have pockets that big.
I certainly do like the convenience of the instant-on capabilities my smart phone gives me, but I still need my portable computer with its big screen and keyboard separate from my phone.
A few years ago, SATA-IO, the standards body, added a new feature to the Serial ATA (SATA) specification designed to further reduce battery consumption in portable computer products. This new feature, DevSleep, enables solid state drives (SSDs) to act more like smartphones, allowing you to go days without plugging in to recharge and then instantly turn them on and see all the latest email, social media updates, news and events.
Why not just switch the system off?
When most PC users think about switching off their system, they dread waiting for the operating system to boot back up. That is one of the key advantages of replacing the hard disk drive (HDD) in the system with a faster SSD. However, in our instant gratification society, we hate to wait even seconds for web pages to come up, so waiting minutes for your PC to turn on and boot up can feel like an eternity. Therefore, many people choose to leave the system on to save those precious moments… but at the expense of battery life.
Can I get this today?
To further extend battery life, the new DevSleep feature requires a signal change on the SATA connector. This change is currently supported only in new Intel® Haswell chipset-based platforms announced this June. What’s more, the SSD in these systems must support the DevSleep feature and monitor the signal on the SATA connector. Most systems that support DevSleep will likely be very low-power notebook systems and will likely already ship with an SSD installed using a small mSATA, M.2, or similar edge connector. Therefore, the signal change on the SATA interface will not immediately affect the rest of the SSDs designed for desktop systems shipping through retail and online sources. Note that not all SSDs are created equal and, while many claim support for DevSleep, be sure to look at the fine print to compare the actual power draw when in DevSleep.
At Computex last month, LSI announced support for the DevSleep feature and staged demonstrations showing a 400x reduction in idle power. It should be noted that a 400x reduction in power does not directly translate to a 400x increase in battery life, but any reduction in power will give you more time on the battery, and that will certainly benefit any user who often works without a power cord.
Not likely. But you might think that solving your computer data security problems is very well possible when someone tells you that TCG Opal is the key. According to its website, “The Trusted Computing Group (TCG) is a not-for-profit organization formed to develop, define and promote open, vendor-neutral, global industry standards, supportive of a hardware-based root of trust, for interoperable trusted computing platforms.”
That might take a bit to digest, but think about TCG as a group of companies creating standards to simplify deployment and increase adoption of data security. The consortium has two better known specifications called TCG Enterprise and TCG Opal.
Sorting through the alphabet soup of data security
“Our SED with TCG Opal provides FDE.” While this might look like a spoonful of alphabet soup, it is music to the ears of a corporate IT manager. Let me break it down for those who just hear fingernails on the chalkboard. A self-encrypting drive (SED) is one that embeds a hardware-based encryption engine in the storage device. One chief benefit is that the hardware engine performs the encryption, preserving full performance of the host CPU. An SED can be a hard disk drive (HDD) or a solid state drive (SSD). True, traditional software encryption can secure data going to the storage device, but it consumes precious host CPU bandwidth. The related term, full drive encryption (FDE), is used to describe any drive (HDD or SSD) that stores data in an encrypted form. This can be through either software-based (host CPU) or hardware-based (an SED) encryption.
Most people would assume that if their work laptop were lost or stolen, they would suffer only some lost productivity for a short time and about $1,500 in hardware costs. However, a study by Intel and the Ponemon Institute found that the cost of a lost laptop totaled nearly $50,000 when you account for lost IP, legal costs, customer notifications, lost business, harm to reputation, and damages associated with compromising confidential customer information. When the data stored on the laptop is encrypted, this cost is reduced by nearly $20,000. This difference certainly supports the need for better security for these mobile platforms.
When considering a security solution for this valuable data, you have to decide between a hardware-based SED and a host-based software solution. The primary problem with software solutions is they require the host CPU to do all of the encryption. This detracts from the CPU’s core computing work, leaving users with a slower computer or forcing them to pay for greater CPU performance. Another drawback of many software encryption solutions is that they can be turned off by the computer user, leaving data in the clear and vulnerable. Since hardware-based encryption is native to the HDD or SSD, it cannot be disabled by the end user.
In April 2013, LSI and a few other storage companies worked with the Ponemon Institute to better understand the value of hardware-based encryption. You can read about the details in the study here, but the quick summary is that hardware-based encryption solutions can offer a 75% total cost savings over software-based solutions, on average.
When is this available?
At the Computex Taipei 2013 show earlier this month, LSI announced availability of a firmware update for SandForce® controllers that adds support for TCG Opal. The LSI suite at the show featured TCG Opal demonstrations using self-encrypting SSDs provided by SandForce Driven™ member companies, including Kingston, A-DATA, Avant and Edge. (Contact SSD manufacturers directly for product availability.)