Ever since SandForce introduced data reduction technology with the DuraWrite™ feature in 2009, some users have been confused about how it works and questioned whether it delivers the benefits we claim. Some even believe there are downsides to using DuraWrite with an SSD. In this blog, I will dispel those misconceptions.
Data reduction technology refresher
Four of my previous blogs cover the many advantages of using data reduction technology like DuraWrite:
In a nutshell, data reduction technology reduces the size of data written to the flash memory, but returns 100% of the original data when reading it back from the flash. This reduction in the required storage space helps accelerate reads and writes, extend the life of the flash and increase the dynamic over provisioning (OP).
What is incompressible data?
Data is incompressible when data reduction technology is unable to reduce the size of a dataset – in which case the technology offers no benefit for the user. File types that are altogether or mostly incompressible include MPEG, JPEG, ZIP and encrypted files. However, data reduction technology is applied to an entire SSD, so the free space resulting from the smaller, compressed files increases OP for all file types, even incompressible files.
The images below help illustrate this process. The image on the left represents a standard SSD 256GB SSD filled to about 80% capacity with a typical operating system, applications and user data. The remaining 20% of free space is automatically used by the SSD as dynamic OP. The image on the right shows how the same data stored on a data reduction-capable SSD can nearly double the available OP for the SSD because the operating system, applications and half of the user data can be reduced in this example.
Why is dynamic OP so important?
OP is the lifeblood of a flash memory-based SSD (nearly all of them available today). Without OP the SSD could not operate. Allocating more space for OP increases an SSD’s performance and endurance, as well as reduces it power consumption. In the illustrations above, both SSDs are storing about 30% of user data as incompressible files like MPEG movies and JPG images. As I mentioned, data reduction technology can’t compress those files, but the rest of the data can be reduced. The result is the SSD with data reduction delivers higher overall performance than the standard SSD even with incompressible data.
Misconception 1: Data reduction technology is a trick
There’s no trickery with data reduction technology. The process is simple: It reduces the size of data differently depending on the content, increasing SSD speed and endurance.
Misconception 2: Users with movie, picture, and audio files will not benefit from data reduction
As illustrated above, as long as an operating system and other applications are stored on the SSD, there will be at least some increase in dynamic OP and performance despite the incompressible files.
Misconception 3: Testing with all incompressible data delivers worst-case performance
Given that a typical SSD stores an operating system, programs and other data files, an SSD test that writes only incompressible data to the device would underestimate the performance of the SSD in user deployments.
Data reduction technology delivers
Data reduction technology, like LSI® SandForce® DuraWrite, is often misunderstood to the point that users believe they would be better off without it. The truth is, with data reduction technology, nearly every user will see performance and endurance gains with their SSD regardless of how much incompressible data is stored.
It may sound crazy, but hard disk drives (HDDs) do not have a delete command. Now we all know HDDs have a fixed capacity, so over time the older data must somehow get removed, right? Actually it is not removed, but overwritten. The operating system (OS) uses a reference table to track the locations (addresses) of all data on the HDD. This table tells the OS which spots on the HDD are used and which are free. When the OS or a user deletes a file from the system, the OS simply marks the corresponding spot in the table as free, making it available to store new data.
The HDD is told nothing about this change, and it does not need to know since it would not do anything with that information. When the OS is ready to store new data in that location, it just sends the data to the HDD and tells it to write to that spot, directly overwriting the prior data. It is simple and efficient, and no delete command is required.
However, with the advent of NAND flash-based solid state drives (SSDs) a new problem emerged. In my blog, Gassing up your SSD, I explain how NAND flash memory pages cannot be directly overwritten with new data, but must first be erased at the block level through a process called garbage collection (GC). I further describe how the SSD uses non-user space in the flash memory (over provisioning or OP) to improve performance and longevity of the SSD. In addition, any user space not consumed by the user becomes what we call dynamic over provisioning – dynamic because it changes as the amount of stored data changes.
When less data is stored by the user, the amount of dynamic OP increases, further improving performance and endurance. The problem I alluded to earlier is caused by the lack of a delete command. Without a delete command, every SSD will eventually fill up with data, both valid and invalid, eliminating any dynamic OP. The result would be the lowest possible performance at that factory OP level. So unlike HDDs, SSDs need to know what data is invalid in order to provide optimum performance and endurance.
Keeping your SSD TRIM
A number of years ago, the storage industry got together and developed a solution between the OS and the SSD by creating a new SATA command called TRIM. It is not a command that forces the SSD to immediately erase data like some people believe. Actually the TRIM command can be thought of as a message from the OS about what previously used addresses on the SSD are no longer holding valid data. The SSD takes those addresses and updates its own internal map of its flash memory to mark those locations as invalid. With this information, the SSD no longer moves that invalid data during the GC process, eliminating wasted time rewriting invalid data to new flash pages. It also reduces the number of write cycles on the flash, increasing the SSD’s endurance. Another benefit of the TRIM command is that more space is available for dynamic OP.
Today, most current operating systems and SSDs support TRIM, and all SandForce Driven™ member SSDs have always supported TRIM. Note that most RAID environments do not support TRIM, although some RAID 0 configurations have claimed to support it. I have presented on this topic in detail previously. You can view the presentation in full here. In my next blog I will explain how there may be an alternate solution using SandForce Driven member SSDs.
The term global warming can be very polarizing in a conversation and both sides of the argument have mountains of material that support or discredit the overall situation. The most devout believers in global warming point to the average temperature increases in the Earth’s atmosphere over the last 100+ years. They maintain the rise is primarily caused by increased greenhouse gases from humans burning fossil fuels and deforestation.
The opposition generally agrees with the measured increase in temperature over that time, but claims that increase is part of a natural cycle of the planet and not something humans can significantly impact one way or another. The US Energy Information Administration estimates that 90% of world’s marketed energy consumption is from non-renewable energy sources like fossil fuels. Our internet-driven lives run through datacenters that are well-known to consume large quantities of power. No matter which side of the global warming argument you support, most people agree that wasting power is not a good long-term position. Therefore, if the power consumed by datacenters can be reduced, especially as we live in an increasingly digitized world, this would benefit all mankind.
When we look at the most power-hungry components of a datacenter, we find mainly server and storage systems. However, people sometimes forget that those systems require cooling to counteract the heat generated. But the cooling itself consumes even more energy. So anything that can store data more efficiently and quickly will reduce both the initial energy consumption and the energy to cool those systems. As datacenters demand faster data storage, they are shifting to solid state drives (SSDs). SSDs generally provide higher performance per watt of power consumed over hard disk drives, but there is still more that can be done.
Reducing data to help turn down the heat
The good news is that there’s a way to reduce the amount of data that reaches the flash memory of the SSD. The unique DuraWrite™ technology found in all LSI® SandForce® flash controllers reduces the amount of data written to the flash memory to cut the time it takes to complete the writes and therefore reduce power consumption, below levels of other SSD technologies. That, in turn, reduces the cooling needed to further reduce overall power consumption. Now this data reduction is “loss-less,” meaning 100% of what is saved is returned to the host, unlike MPEG, JPEG, and MP3 files, which tolerate some amount of data loss to reduce file sizes.
Today you can find many datacenters already using SandForce Driven SSDs and LSI Nytro™ application acceleration products (which use DuraWrite technology as well). When we start to see datacenters deploying these flash storage products by the millions, you will certainly be able to measure the reduction in power consumed by datacenters. Unfortunately, LSI will not be able to claim it stopped global warming, but at least we, and our customers, can say we did something to help defer the end result.