Ever since SandForce introduced data reduction technology with the DuraWrite™ feature in 2009, some users have been confused about how it works and questioned whether it delivers the benefits we claim. Some even believe there are downsides to using DuraWrite with an SSD. In this blog, I will dispel those misconceptions.
Data reduction technology refresher
Four of my previous blogs cover the many advantages of using data reduction technology like DuraWrite:
In a nutshell, data reduction technology reduces the size of data written to the flash memory, but returns 100% of the original data when reading it back from the flash. This reduction in the required storage space helps accelerate reads and writes, extend the life of the flash and increase the dynamic over provisioning (OP).
What is incompressible data?
Data is incompressible when data reduction technology is unable to reduce the size of a dataset – in which case the technology offers no benefit for the user. File types that are altogether or mostly incompressible include MPEG, JPEG, ZIP and encrypted files. However, data reduction technology is applied to an entire SSD, so the free space resulting from the smaller, compressed files increases OP for all file types, even incompressible files.
The images below help illustrate this process. The image on the left represents a standard SSD 256GB SSD filled to about 80% capacity with a typical operating system, applications and user data. The remaining 20% of free space is automatically used by the SSD as dynamic OP. The image on the right shows how the same data stored on a data reduction-capable SSD can nearly double the available OP for the SSD because the operating system, applications and half of the user data can be reduced in this example.
Why is dynamic OP so important?
OP is the lifeblood of a flash memory-based SSD (nearly all of them available today). Without OP the SSD could not operate. Allocating more space for OP increases an SSD’s performance and endurance, as well as reduces it power consumption. In the illustrations above, both SSDs are storing about 30% of user data as incompressible files like MPEG movies and JPG images. As I mentioned, data reduction technology can’t compress those files, but the rest of the data can be reduced. The result is the SSD with data reduction delivers higher overall performance than the standard SSD even with incompressible data.
Misconception 1: Data reduction technology is a trick
There’s no trickery with data reduction technology. The process is simple: It reduces the size of data differently depending on the content, increasing SSD speed and endurance.
Misconception 2: Users with movie, picture, and audio files will not benefit from data reduction
As illustrated above, as long as an operating system and other applications are stored on the SSD, there will be at least some increase in dynamic OP and performance despite the incompressible files.
Misconception 3: Testing with all incompressible data delivers worst-case performance
Given that a typical SSD stores an operating system, programs and other data files, an SSD test that writes only incompressible data to the device would underestimate the performance of the SSD in user deployments.
Data reduction technology delivers
Data reduction technology, like LSI® SandForce® DuraWrite, is often misunderstood to the point that users believe they would be better off without it. The truth is, with data reduction technology, nearly every user will see performance and endurance gains with their SSD regardless of how much incompressible data is stored.
Last week at LSI’s annual Accelerating Innovation Summit (AIS) the company took the wraps off a vision that should lead its technical direction for the next few years.
The LSI keynote featured a video of three situations as they might evolve in the future:
I’ll focus on just one of these to show how LSI expects the future to develop. In the bicycle accident scenario, a businessman falls to the ground while riding a bicycle in a foreign country. Security cameras that have been upgraded to understand what they see notify an emergency services agency which sends an ambulance to the scene. The paramedic performs a retinal scan on the victim, using it to retrieve his medical records, including his DNA sequence, from the web.
The businessman’s wearable body monitoring system also communicates with the paramedic’s instruments to share his vital signs. All of this information is used by cloud-based computers to determine a course of action which, in the video, requires an injection that has been custom-tuned to the victim’s current situation, his medical history, and his genetic makeup.
That’s a pretty tall order, and it will require several advances in the state of the art, but LSI is using this and other scenarios to work with its clients and translate this vision into the products of the future.
What are the key requirements to make this happen? Talwalkar told the audience that we need to create a society that is supported by preventive, predictive and assisted analytics to move in a direction where the general welfare is assisted by all that the Internet and advanced computing have to offer. Since data is growing at an exponential rate, he argued that this will require the instant retrieval of interlinked data objects at scale. Everything that is key to solving the task must be immediately available, and must be quickly analyzed to provide a solution to the problem at hand. The key will be the ability to process interlinked pieces of data that have not been previously structured to handle any particular situation.
To achieve this we will need larger-scale computing resources than are currently available, all closely interconnected, that all operate at very high speeds. LSI hopes to tap into these needs through its strengths in networking and communications chips for the communications, its HDD and server and storage connectivity array chips and boards for large-scale data, and its flash controller memory and PCIe SSD expertise for high performance.
LSI brought to AIS several of the customers and partners it is working with using to develop these technologies. Speakers from Intel, Microsoft, IBM, Toshiba, Ericsson and others showed how they are working with LSI’s various technologies to improve the performance of their own systems. On the exhibition floor booths from LSI and many of its clients demonstrated new technologies that performed everything from high-speed stock market analysis to fast flash management.
It’s pretty exciting to see a company that has a clear vision of its future and is committed to moving its entire ecosystem ahead to make that happen and help companies manage their business more effectively during what LSI calls the “Datacentric Era.” LSI has certainly put a lot of effort into creating a vision and determining where its talents can be brought to bear to improve our lives in the future.
Tags: AIS, chips, communications, connectivity, data, Datacentric Era, Ericsson, flash, flash memory, hard disk drive, HDD, IBM, Intel, large-scale data, Microsoft, Networking, server, Storage, Toshiba
The lifeblood of any online retailer is the speed of its IT infrastructure. Shoppers aren’t infinitely patient. Sluggish infrastructure performance can make shoppers wait precious seconds longer than they can stand, sending them fleeing to other sites for a faster purchase. Our federal government’s halting rollout of the Health Insurance Marketplace website is a glaring example of what can happen when IT infrastructure isn’t solid. A few bad user experiences that go viral can be damaging enough. Tens of thousands can be crippling.
In hyperscale datacenters, any number of problems including network issues, insufficient scaling and inconsistent management can undermine end users’ experience. But one that hits home for me is the impact of slow storage on the performance of databases, where the data sits. With the database at the heart of all those online transactions, retailers can ill afford to have their tier of database servers operating at anything less than peak performance.
Slow storage undermines database performance
Typically, Web 2.0 and e-commerce companies run relational databases (RDBs) on these massive server-centric infrastructures. (Take a look at my blog last week to get a feel for the size of these hyperscale datacenter infrastructures). If you are running that many servers to support millions of users, you are likely using some kind of open-sourced RDB such as MySQL or other variations. Keep in mind that Oracle 11gR2 likely retails around $30K per core but MSQL is free. But the performance of both, and most other relational databases, suffer immensely when transactions are retrieving data from storage (or disk). You can only throw so much RAM and CPU power at the performance problem … sooner rather than later you have to deal with slow storage.
Almost everyone in industry – Web 2.0, cloud, hyperscale and other providers of massive database infrastructures – is lining up to solve this problem the best way they can. How? By deploying flash as the sole storage for database servers and applications. But is low-latency flash enough? For sheer performance it beats rotational disk hands down. But … even flash storage has its limitations, most notably when you are trying to drive ultra-low latencies for write IOs. Most IO accesses by RDBs, which do the transactional processing, are a mix or read/writes to the storage. Specifically, the mix is 70%/30% reads/writes. These are also typically low q-depth accesses (less than 4). It is those writes that can really slow things down.
PCIe flash reduces write latencies
The good news is that the right PCIe flash technology in the mix can solve the slowdowns. Some interesting PCIe flash technologies designed to tackle this latency problem are on display at AIS this week. DRAM and in particular NVDRAM are being deployed as a tier in front of flash to really tackle those nasty write latencies.
Among other demos, we’re showing how a Nytro™ 6000 series PCIe flash card helps solve the MySQL database performance issues. The typical response time for a small data read (this is what the database will see for a Database IO) from an HDD is 5ms. Flash-based devices such as the Nytro WarpDrive® card can complete the same read in less than 50μs on average during testing, an improvement of several orders-of-magnitude in response time. This response time translates to getting much higher transactions out of the same infrastructure – but with less space (flash is denser) and a lot less power (flash consumes a lot lower power than HDDs).
We’re also showing the Nytro 7000 series PCIe flash cards. They reach even lower write latencies than the 6000 series and very low q-depths. The 7000 series cards also provide DRAM buffering while maintaining data-integrity even in the event of a power loss.
For online retailers and other businesses, higher database speeds mean more than just faster transactions. They can help keep those cash registers ringing.
Tags: AIS, database, DRAM, e-commerce, flash, flash memory, hard disk drive, HDD, hyperscale datacenter, latency, MySQL, NVDRAM, Nytro 6000, Nytro 7000, Nytro WarpDrive, Oracle, PCIe flash, relational database, storage latency, web 2.0, write latency
Back in the 1990s, a new paradigm was forced into space exploration. NASA faced big cost cuts. But grand ambitions for missions to Mars were still on its mind. The problem was it couldn’t dream and spend big. So the NASA mantra became “faster, better, cheaper.” The idea was that the agency could slash costs while still carrying out a wide variety of programs and space missions. This led to some radical rethinks, and some fantastically successful programs that had very outside-the-box solutions. (Bouncing Mars landers anyone?)
That probably sounds familiar to any IT admin. And that spirit is alive at LSI’s AIS – The Accelerating Innovation Summit, which is our annual congress of customers and industry pros, coming up Nov. 20-21 in San Jose. Like the people at Mission Control, they all want to make big things happen… without spending too much.
Take technology and line of business professionals. They need to speed up critical business applications. A lot. Or IT staff for enterprise and mobile networks, who must deliver more work to support the ever-growing number of users, devices and virtualized machines that depend on them. Or consider mega datacenter and cloud service providers, whose customers demand the highest levels of service, yet get that service for free. Or datacenter architects and managers, who need servers, storage and networks to run at ever-greater efficiency even as they grow capability exponentially.
(LSI has been working on many solutions to these problems, some of which I spoke about in this blog.)
It’s all about moving data faster, better, and cheaper. If NASA could do it, we can too. In that vein, here’s a look at some of the topics you can expect AIS to address around doing more work for fewer dollars:
And, I think you’ll find some astounding products, demos, proof of concepts and future solutions in the showcase too – not just from LSI but from partners and fellow travelers in this industry. Hey – that’s my favorite part. I can’t wait to see people’s reactions.
Since they rethought how to do business in 2002, NASA has embarked on nearly 60 Mars missions. Faster, better, cheaper. It can work here in IT too.
Tags: 12Gb/s SAS, AIS, big data analytics, cloud infrastructure, cloud services, datacenter, flash, flash memory, hyperscale datacenters, NAS, NASA, SAN, SDN, shareable DAS, software-defined networks, sub-20nm flash, triple-level cell flash, VDI, web 2.0
It may sound crazy, but hard disk drives (HDDs) do not have a delete command. Now we all know HDDs have a fixed capacity, so over time the older data must somehow get removed, right? Actually it is not removed, but overwritten. The operating system (OS) uses a reference table to track the locations (addresses) of all data on the HDD. This table tells the OS which spots on the HDD are used and which are free. When the OS or a user deletes a file from the system, the OS simply marks the corresponding spot in the table as free, making it available to store new data.
The HDD is told nothing about this change, and it does not need to know since it would not do anything with that information. When the OS is ready to store new data in that location, it just sends the data to the HDD and tells it to write to that spot, directly overwriting the prior data. It is simple and efficient, and no delete command is required.
However, with the advent of NAND flash-based solid state drives (SSDs) a new problem emerged. In my blog, Gassing up your SSD, I explain how NAND flash memory pages cannot be directly overwritten with new data, but must first be erased at the block level through a process called garbage collection (GC). I further describe how the SSD uses non-user space in the flash memory (over provisioning or OP) to improve performance and longevity of the SSD. In addition, any user space not consumed by the user becomes what we call dynamic over provisioning – dynamic because it changes as the amount of stored data changes.
When less data is stored by the user, the amount of dynamic OP increases, further improving performance and endurance. The problem I alluded to earlier is caused by the lack of a delete command. Without a delete command, every SSD will eventually fill up with data, both valid and invalid, eliminating any dynamic OP. The result would be the lowest possible performance at that factory OP level. So unlike HDDs, SSDs need to know what data is invalid in order to provide optimum performance and endurance.
Keeping your SSD TRIM
A number of years ago, the storage industry got together and developed a solution between the OS and the SSD by creating a new SATA command called TRIM. It is not a command that forces the SSD to immediately erase data like some people believe. Actually the TRIM command can be thought of as a message from the OS about what previously used addresses on the SSD are no longer holding valid data. The SSD takes those addresses and updates its own internal map of its flash memory to mark those locations as invalid. With this information, the SSD no longer moves that invalid data during the GC process, eliminating wasted time rewriting invalid data to new flash pages. It also reduces the number of write cycles on the flash, increasing the SSD’s endurance. Another benefit of the TRIM command is that more space is available for dynamic OP.
Today, most current operating systems and SSDs support TRIM, and all SandForce Driven™ member SSDs have always supported TRIM. Note that most RAID environments do not support TRIM, although some RAID 0 configurations have claimed to support it. I have presented on this topic in detail previously. You can view the presentation in full here. In my next blog I will explain how there may be an alternate solution using SandForce Driven member SSDs.