In part one of this Write Amplification (WA) series, I examined how WA works in basic solid-state drives (SSDs). Part two now takes a deeper look at how SSDs that use some form of data reduction technology can see a very big and positive impact on WA.
Data reduction technology can master data entropy
The performance of all SSDs is influenced by the same factors – such as the amount of over provisioning and levels of random vs. sequential writing – with one major exception: entropy. Only SSDs with data reduction technology can take advantage of entropy – the degree of randomness of data – to provide significant performance, endurance and power-reduction advantages.
Data reduction technology parlays data entropy (not to be confused with how data is written to the storage device – sequential vs. random) into higher performance. How? When data reduction technology sends data to the flash memory, it uses some form of data de-duplication, compression, or data differencing to rearrange the information and use fewer bytes overall. After the data is read from flash memory, data reduction technology, by design, restores 100% of the original content to the host computer This is known as “loss-less” data reduction and can be contrasted with “lossy” techniques like MPEG, MP3, JPEG, and other common formats used for video, audio, and visual data files. These formats lose information that cannot be restored, though the resolution remains adequate for entertainment purposes.
The multi-faceted power of data reduction technology
My previous blog on data reduction discusses how data reduction technology relates to the SATA TRIM command and increases free space on the SSD, which in turn reduces WA and improves subsequent write performance. With a data-reduction SSD, the lower the entropy of the data coming from the host computer, the less the SSD has to write to the flash memory, leaving more space for over provisioning. This additional space enables write operations to complete faster, which translates not only into a higher write speed at the host computer but also into lower power use because flash memory draws power only while reading or writing. Higher write speeds also mean lower power draw for the flash memory.
Because data reduction technology can send less data to the flash than the host originally sent to the SSD, the typical write amplification factor falls below 1.0. It is not uncommon to see a WA of 0.5 on an SSD with this technology. Writing less data to the flash leads directly to:
Each of these in turn produces other benefits, some of which circle back upon themselves in a recursive manner. This logic diagram highlights those benefits.
So this is a rare instance when an amplifier – namely, Write Amplification – makes something smaller. At LSI, this unique amplifier comes in the form of the LSI® DuraWrite™ data reduction technology in all SandForce® Driven™ SSDs.
This three-part series examines some of the details behind write amplification, a key property of all NAND flash-based SSDs that is often misunderstood in the industry.
Every year I diligently get in line for my annual flu (or more technically accurate “seasonal influenza”) shot. I’m not particularly fond of needles, but I have seen what the flu can do and the how many die each year from this seasonal virus.
When you get the flu shot – or, now, the nasal mist – you and I are trusting a lot of people that what you are taking will actually help protect you. According to the CDC (Centers for Disease Control and Prevention), there are 3 three strains, (A, B &C Antigenic) of influenza virus and of those three types, two cause the seasonal epidemics we suffer through each year.
Not to get too technical, but I learned that the A strain is further segregated by 2 proteins and are given code names like H1N1, H3N2 and H5N1. They can even be updated by year if there is a change in them. An example of this was in 2009, when the H1N1 became the 2009 H1N1. So where we may just call it H1N1, the World Health Organization has a whole taxonomy to describe a seasonal influenza strain.
This taxonomy includes:
As you can see, it can really get complicated quickly. If you would like to go deeper, you can read more about this here. While much of this information seems pretty arcane to the lay reader, you quickly can see that the sheer volume of information collected, stored and analyzed to combat seasonal influenza is a great example of big data.
In the US, once the CDC sifts through this data – using big data analytics tools – it uses its findings to determine what strains might affect the US and build a flu shot to combat those strains. During the 2012/2013 season, the predominant virus was Influenza A (H3N2), though some influenza B viruses contained a dash of influenza A (H1N1) pdm09 (pH1N1). (See the full report here.)
In addition to identifying dominant viruses, the CDC also uses big data to track the spread and potential effect on the population. Reviewing information from prior outbreaks, population data, and even weather patterns, the CDC uses big data analytics to quickly estimate and attempt to determine where viruses might hit first, hardest and longest so that a targeted vaccine can be produced in sufficient quantities, in the required timeframe and even for the right geography. The faster and more accurately this can be done, the more people can get this potentially life saving vaccine before the virus travels to their area.
As I stated in my previous blog post, the Hadoop® architecture is a great tool for efficiently storing and processing the growing amount of data worldwide, but Hadoop is only as good as the processing and storage performance that supports it. As with weather predictions, the more data you can quickly and efficiently analyze, the greater the likelihood of an accurate prediction. When it comes to weather and flu vaccines, these predictions can help save lives. In my final blog post in this series, I will explore how big data helps the fashion industry.
Whether in medical, weather or other fields that leverage big data technologies, the use of Hadoop for high levels of speed and accuracy in big data analysis requires computers with application acceleration. One such tool is LSI® Nytro™ Application Acceleration. You can go to TheSmarterWayToFaster™ for more information on the Nytro product family.
Part two of this three-part series continues to examine some of the diverse and potentially life-saving uses of big data in our everyday lives. It also explores how expanded data access and higher processing and storage speed can help optimize big data application performance.
We all watch the local weather and wonder how forecasters predict (or in some cases mis-predict) the future of weather. While they may not all agree on the forecast, they do agree that the more current and historical data you have, the better your ability to predict what might happen over the next hours, days and weeks.
A term used to describe this growing amount of information is Big Data, and more and more of it leverages Hadoop, a flexible architecture that provides the analysis tools and scalability required to comb through and utilize all available data. When recently talking to a US-based meteorologist (the technical name for a degreed weather forecaster), I learned that meteorologists rely on many different weather models from various sources to help create their forecasts.
Weather spawns downpour of Big Data
These models collect massive amounts of weather information from around the world. Using this information, computers then run billions of calculations to mimic the motion of weather patterns in the Earth’s dynamic atmosphere and produce forecasts for any given location over time. It was interesting to learn that not all weather models are equal.
While weather modeling websites worldwide collect this atmospheric data and provide it to meteorologists, the European community is seen as having the most accurate information. When I asked why, I learned that European weather modeling sites have some of the fastest computer hardware and technology, enabling them to analyze more data faster, which produces better overall forecasts. The US weather professional I spoke with tends to use these European sites as part of his analysis, and when European models conflict with those from US sites, he often leans toward the European data.
His use of the European weather modeling sites points to the value of fast, accurate analysis of Big Data. It also underscores the implications of vast amounts of data overwhelming the ability of the compute and storage resources available to process it. An accurate and timely weather forecast is critical and a bad or missed forecast can have terrible and even deadly consequences.
A case in point: Hurricane Sandy
In this article on Hurricane Sandy forecast speed and accuracy, you can see how removing just one source of data can dramatically reduce the accuracy of predicting a critical event such as where a hurricane will make landfall. To be sure, the more data you can store and the faster you can process it for analysis, the greater your potential competitive advantage, even in the vaunted halls of meteorological analysis and prediction.
The Hadoop® architecture is a great tool for efficiently storing and processing the growing amount of data worldwide, but Hadoop is only as good as the processing and storage performance that supports it. This gets interesting as you think about and explore the ripple effect of accurate or inaccurate forecasting in many areas. In my next blog post I will explore one of those – flu vaccines.
Whether in meteorology or other fields that leverage Big Data technologies, the use of Hadoop for high levels of speed and accuracy in Big Data analysis requires computers with application acceleration. One such tool is LSI® Nytro™ Application Acceleration. You can go to TheSmarterWayToFaster™ for more information on the Nytro product family.
This three-part series examines some of the diverse uses of Big Data in our everyday lives. It also explores how expanded data access and higher processing and storage speed can help optimize Big Data application performance.
Tags: application accleration, big data, European weather modeling, flash, flash storage, Hadoop, Hurricane Sandy, meterology, Nytro, processing performance, storage performance, weather modeling
In today’s solid state drives (SSDs), the NAND flash memory must be erased before it can store new data. In other words, data cannot be overwritten directly as it is in a hard disk drive (HDD). Instead, SSDs use a process called garbage collection (GC) to reclaim the space taken by previously stored data. This means that write demands are heavier on SSDs than HDDs when storing the same information.
This is bad because the flash memory in the SSD supports only a limited number of writes before it can no longer be read. We call this undesirable effect write amplification (WA). In my blog, Gassing up your SSD, I describe why WA exists in a little more detail, but here I will explain what controls it.
It’s all about the free space
I often tell people that SSDs work better with more free space, so anything that increases free space will keep WA lower. The two key ways to expand free space (thereby decreasing WA) are to 1) increase over provisioning and 2) keep more storage space free (if you have TRIM support).
As I said earlier, there is no WA before GC is active. However, this pristine pre-GC condition has a tiny life span – just one full-capacity write cycle during a “fresh-out-of-box” (FOB) state, which accounts for less than 0.04% of the SSD’s life. Although you can manually recreate this condition with a secure erase, the cost is an additional write cycle, which defeats the purpose. Also keep in mind that the GC efficiency and associated wear leveling algorithms can affect WA (more efficient = lower WA).
The other major contributor to WA is the organization of the free space (how data is written to the flash). When data is written randomly, the eventual replacement data will also likely come in randomly, so some pages of a block will be replaced (made invalid) and others will still be good (valid). During GC, valid data in blocks like this needs to be rewritten to new blocks. This produces another write to the flash for each valid page, causing write amplification.
With sequential writes, generally all the data in the pages of the block becomes invalid at the same time. As a result, no data needs relocating during GC since there is no valid data remaining in the block before it is erased. In this case, there is no amplification, but other things like wear leveling on blocks that don’t change will still eventually produce some write amplification no matter how data is written.
Calculating write amplification
Write Amplification is fundamentally the result of data written to the flash memory divided by data written by the host. In 2008, both Intel and SiliconSystems (acquired by Western Digital) were the first to start talking publically about WA. At that time, the WA of all SSDs was something greater than 1.0. It was not until SandForce introduced the first SSD controller with DuraWrite™ technology in 2009 that WA could fall below 1.0. DuraWrite technology increases the free space mentioned above, but in a way that is unique from other SSD controllers. In part two of my write amplification series, I will explain how DuraWrite technology works.
This three-part series examines some of the details behind write amplification, a key property of all NAND flash-based SSDs that is often misunderstood in the industry.
It is always good to hear the opinions of your customers and end users, and in that respect June was a banner month for LSI® SandForce® flash controllers.
In a survey soliciting responses from more than 1 million members of on-line groups and other sources by IT Brand Pulse, an independent product testing and validation lab, LSI SandForce controllers ranked at the top of all six SSD controller chip sub-categories: market, price, performance, reliability, service and support, and innovation. Last August, the LSI SandForce controllers won in three of the six sub-categories, so we’re thrilled to see momentum building.
Winning all six awards is no easy task. Some of the sub-categories could be considered mutually exclusive, requiring customers to make trade-offs among product attributes. For example, often a product with the best price is considered to have skimped on quality compared to pricier solutions. A product with screaming performance, ironically, is seen as something of a market laggard because it usually does not carry the best price. So it is exciting to strike the right balance among all six measures and sweep the product category. You can find more details on the awards here: http://itbrandpulse.com/research/brand-leader-program/225-ssd-controller-chips-2013
For those of us in product marketing, winning product awards voted on by your peers can bring on a feeling similar to that warm afterglow parents bask in when they hear their child has made the honor roll or was named the valedictorian for his or her graduating class.
So please pardon us, for a moment, as we beam with pride.
It seems like our smartphones are getting bigger and bigger with each generation. Sometimes I see people holding what appears to be a tablet computer up next to their head. I doubt they know how ridiculous they look to the rest of us, and I wonder what pants today have pockets that big.
I certainly do like the convenience of the instant-on capabilities my smart phone gives me, but I still need my portable computer with its big screen and keyboard separate from my phone.
A few years ago, SATA-IO, the standards body, added a new feature to the Serial ATA (SATA) specification designed to further reduce battery consumption in portable computer products. This new feature, DevSleep, enables solid state drives (SSDs) to act more like smartphones, allowing you to go days without plugging in to recharge and then instantly turn them on and see all the latest email, social media updates, news and events.
Why not just switch the system off?
When most PC users think about switching off their system, they dread waiting for the operating system to boot back up. That is one of the key advantages of replacing the hard disk drive (HDD) in the system with a faster SSD. However, in our instant gratification society, we hate to wait even seconds for web pages to come up, so waiting minutes for your PC to turn on and boot up can feel like an eternity. Therefore, many people choose to leave the system on to save those precious moments… but at the expense of battery life.
Can I get this today?
To further extend battery life, the new DevSleep feature requires a signal change on the SATA connector. This change is currently supported only in new Intel® Haswell chipset-based platforms announced this June. What’s more, the SSD in these systems must support the DevSleep feature and monitor the signal on the SATA connector. Most systems that support DevSleep will likely be very low-power notebook systems and will likely already ship with an SSD installed using a small mSATA, M.2, or similar edge connector. Therefore, the signal change on the SATA interface will not immediately affect the rest of the SSDs designed for desktop systems shipping through retail and online sources. Note that not all SSDs are created equal and, while many claim support for DevSleep, be sure to look at the fine print to compare the actual power draw when in DevSleep.
At Computex last month, LSI announced support for the DevSleep feature and staged demonstrations showing a 400x reduction in idle power. It should be noted that a 400x reduction in power does not directly translate to a 400x increase in battery life, but any reduction in power will give you more time on the battery, and that will certainly benefit any user who often works without a power cord.
Not likely. But you might think that solving your computer data security problems is very well possible when someone tells you that TCG Opal is the key. According to its website, “The Trusted Computing Group (TCG) is a not-for-profit organization formed to develop, define and promote open, vendor-neutral, global industry standards, supportive of a hardware-based root of trust, for interoperable trusted computing platforms.”
That might take a bit to digest, but think about TCG as a group of companies creating standards to simplify deployment and increase adoption of data security. The consortium has two better known specifications called TCG Enterprise and TCG Opal.
Sorting through the alphabet soup of data security
“Our SED with TCG Opal provides FDE.” While this might look like a spoonful of alphabet soup, it is music to the ears of a corporate IT manager. Let me break it down for those who just hear fingernails on the chalkboard. A self-encrypting drive (SED) is one that embeds a hardware-based encryption engine in the storage device. One chief benefit is that the hardware engine performs the encryption, preserving full performance of the host CPU. An SED can be a hard disk drive (HDD) or a solid state drive (SSD). True, traditional software encryption can secure data going to the storage device, but it consumes precious host CPU bandwidth. The related term, full drive encryption (FDE), is used to describe any drive (HDD or SSD) that stores data in an encrypted form. This can be through either software-based (host CPU) or hardware-based (an SED) encryption.
Most people would assume that if their work laptop were lost or stolen, they would suffer only some lost productivity for a short time and about $1,500 in hardware costs. However, a study by Intel and the Ponemon Institute found that the cost of a lost laptop totaled nearly $50,000 when you account for lost IP, legal costs, customer notifications, lost business, harm to reputation, and damages associated with compromising confidential customer information. When the data stored on the laptop is encrypted, this cost is reduced by nearly $20,000. This difference certainly supports the need for better security for these mobile platforms.
When considering a security solution for this valuable data, you have to decide between a hardware-based SED and a host-based software solution. The primary problem with software solutions is they require the host CPU to do all of the encryption. This detracts from the CPU’s core computing work, leaving users with a slower computer or forcing them to pay for greater CPU performance. Another drawback of many software encryption solutions is that they can be turned off by the computer user, leaving data in the clear and vulnerable. Since hardware-based encryption is native to the HDD or SSD, it cannot be disabled by the end user.
In April 2013, LSI and a few other storage companies worked with the Ponemon Institute to better understand the value of hardware-based encryption. You can read about the details in the study here, but the quick summary is that hardware-based encryption solutions can offer a 75% total cost savings over software-based solutions, on average.
When is this available?
At the Computex Taipei 2013 show earlier this month, LSI announced availability of a firmware update for SandForce® controllers that adds support for TCG Opal. The LSI suite at the show featured TCG Opal demonstrations using self-encrypting SSDs provided by SandForce Driven™ member companies, including Kingston, A-DATA, Avant and Edge. (Contact SSD manufacturers directly for product availability.)
Imagine a bathtub full of water and asking someone to empty the tub while you turn your back for a moment. When you look again and see the water is gone, do you just assume someone pulled the drain plug?
I think most people would, but what about the other methods of removing the water like with a siphon, or using buckets to bail out the water? In a typical bathroom you are not likely to see these other methods used, but that does not mean they do not exist. The point is that just because you see a certain result does not necessarily mean the obvious solution was used.
I see a lot of confusion in forum posts from SandForce Driven™ SSD users and reviewers over how the LSI® DuraWrite™ data reduction and advanced garbage collection technology relates to the SATA TRIM command. In my earlier blog on TRIM I covered this topic in great detail, but in simple terms the operating system uses the TRIM command to inform an SSD what information is outdated and invalid. Without the TRIM command the SSD assumes all of the user capacity is valid data. I explained in my blog Gassing up your SSD that creating more free space through over provisioning or using less of the total capacity enables the SSD to operate more efficiently by reducing the write amplification, which leads to increased performance and flash memory endurance. So without TRIM the SSD operates at its lowest level of efficiency for a particular level of over provisioning.
Will you drown in invalid data without TRIM?
TRIM is a way to increase the free space on an SSD – what we call “dynamic over provisioning” – and DuraWrite technology is another method to increase the free space. Since DuraWrite technology is dependent upon the entropy (randomness) of the data, some users will get more free space than others depending on what data they store. Since the technology works on the basis of the aggregate of all data stored, boot SSDs with operating systems can still achieve some level of dynamic over provisioning even when all other files are at the highest entropy, e.g., encrypted or compressed files.
With an older operating system or in an environment that does not support TRIM (most RAID configurations), DuraWrite technology can provide enough free space to offer the same benefits as having TRIM fully operational. In cases where both TRIM and DuraWrite technology are operating, the combined result may not be as noticeable as when they’re working independently since there are diminishing returns when the free space grows to greater than half of the SSD storage capacity.
So the next time you fill your bathtub, think about all the ways you can get the water out of the tub without using the drain. That will help you remember that both TRIM and DuraWrite technology can improve SSD performance using different approaches to the same problem. If that analogy does not work for you, consider the different ways to produce a furless feline, and think about what opening graphic image I might have used for a more jolting effect. Although in that case you might not have seen this blog since that image would likely have gotten us banned from Google® “safe for work” searches.
I presented on this topic in detail at the Flash Memory Summit in 2011. You can read it in full here: http://www.lsi.com/downloads/Public/Flash%20Storage%20Processors/LSI_PRS_FMS2011_T1A_Smith.pdf
Tags: bathtub drain, controller, data reduction technology, DuraWrite, flash, flash controller, flash memory, Flash Memory Summit, NAND, over-provisioning, SandForce, SandForce Driven SSD, SATA, Serial ATA, solid state drive, TRIM
I want to warn you, there is some thick background information here first. But don’t worry. I’ll get to the meat of the topic and that’s this: Ultimately, I think that PCIe® cards will evolve to more external, rack-level, pooled flash solutions, without sacrificing all their great attributes today. This is just my opinion, but other leaders in flash are going down this path too…
I’ve been working on enterprise flash storage since 2007 – mulling over how to make it work. Endurance, capacity, cost, performance have all been concerns that have been grappled with. Of course the flash is changing too as the nodes change: 60nm, 50nm, 35nm, 24nm, 20nm… and single level cell (SLC) to multi level cell (MLC) to triple level cell (TLC) and all the variants of these “trimmed” for specific use cases. The spec “endurance” has gone from 1 million program/erase cycles (PE) to 3,000, and in some cases 500.
It’s worth pointing out that almost all the “magic” that has been developed around flash was already scoped out in 2007. It just takes a while for a whole new industry to mature. Individual die capacity increased, meaning fewer die are needed for a solution – and that means less parallel bandwidth for data transfer… And the “requirement” for state-of-the-art single operation write latency has fallen well below the write latency of the flash itself. (What the ?? Yea – talk about that later in some other blog. But flash is ~1500uS write latency, where state of the art flash cards are ~50uS.) When I describe the state of technology it sounds pretty pessimistic. I’m not. We’ve overcome a lot.
We built our first PCIe card solution at LSI in 2009. It wasn’t perfect, but it was better than anything else out there in many ways. We’ve learned a lot in the years since – both from making them, and from dealing with customer and users – about our own solutions and our competitors. We’re lucky to be an important player in storage, so in general the big OEMs, large enterprises and the hyperscale datacenters all want to talk with us – not just about what we have or can sell, but what we could have and what we could do. They’re generous enough to share what works and what doesn’t. What the values of solutions are and what the pitfalls are too. Honestly? It’s the hyperscale datacenters in the lead both practically and in vision.
If you haven’t nodded off to sleep yet, that’s a long-winded way of saying – things have changed fast, and, boy, we’ve learned a lot in just a few years.
Most important thing we’ve learned…
Most importantly, we’ve learned it’s latency that matters. No one is pushing the IOPs limits of flash, and no one is pushing the bandwidth limits of flash. But they sure are pushing the latency limits.
PCIe cards are great, but…
We’ve gotten lots of feedback, and one of the biggest things we’ve learned is – PCIe flash cards are awesome. They radically change performance profiles of most applications, especially databases allowing servers to run efficiently and actual work done by that server to multiply 4x to 10x (and in a few extreme cases 100x). So the feedback we get from large users is “PCIe cards are fantastic. We’re so thankful they came along. But…” There’s always a “but,” right??
It tends to be a pretty long list of frustrations, and they differ depending on the type of datacenter using them. We’re not the only ones hearing it. To be clear, none of these are stopping people from deploying PCIe flash… the attraction is just too compelling. But the problems are real, and they have real implications, and the market is asking for real solutions.
Of course, everyone wants these fixed without affecting single operation latency, or increasing cost, etc. That’s what we’re here for though – right? Solve the impossible?
A quick summary is in order. It’s not looking good. For a given solution, flash is getting less reliable, there is less bandwidth available at capacity because there are fewer die, we’re driving latency way below the actual write latency of flash, and we’re not satisfied with the best solutions we have for all the reasons above.
If you think these through enough, you start to consider one basic path. It also turns out we’re not the only ones realizing this. Where will PCIe flash solutions evolve over the next 2, 3, 4 years? The basic goals are:
One easy answer would be – that’s a flash SAN or NAS. But that’s not the answer. Not many customers want a flash SAN or NAS – not for their new infrastructure, but more importantly, all the data is at the wrong end of the straw. The poor server is left sucking hard. Remember – this is flash, and people use flash for latency. Today these SAN type of flash devices have 4x-10x worse latency than PCIe cards. Ouch. You have to suck the data through a relatively low bandwidth interconnect, after passing through both the storage and network stacks. And there is interaction between the I/O threads of various servers and applications – you have to wait in line for that resource. It’s true there is a lot of startup energy in this space. It seems to make sense if you’re a startup, because SAN/NAS is what people use today, and there’s lots of money spent in that market today. However, it’s not what the market is asking for.
Another easy answer is NVMe SSDs. Right? Everyone wants them – right? Well, OEMs at least. Front bay PCIe SSDs (HDD form factor or NVMe – lots of names) that crowd out your disk drive bays. But they don’t fix the problems. The extra mechanicals and form factor are more expensive, and just make replacing the cards every 5 years a few minutes faster. Wow. With NVME SSDs, you can fit fewer HDDs – not good. They also provide uniformly bad cooling, and hard limit power to 9W or 25W per device. But to protect the storage in these devices, you need to have enough of them that you can RAID or otherwise protect. Once you have enough of those for protection, they give you awesome capacity, IOPs and bandwidth, too much in fact, but that’s not what applications need – they need low latency for the working set of data.
What do I think the PCIe replacement solutions in the near future will look like? You need to pool the flash across servers (to optimize bandwidth and resource usage, and allocate appropriate capacity). You need to protect against failures/errors and limit the span of failure, commit writes at very low latency (lower than native flash) and maintain low latency, bottleneck-free physical links to each server… To me that implies:
That means the performance looks exactly as if each server had multiple PCIe cards. But the capacity and bandwidth resources are shared, and systems can remain resilient. So ultimately, I think that PCIe cards will evolve to more external, rack level, pooled flash solutions, without sacrificing all their great attributes today. This is just my opinion, but as I say – other leaders in flash are going down this path too…
What’s your opinion?
Tags: DAS, datacenter, direct attached storage, enterprise IT, flash, hard disk drive, HDD, hyperscale, latency, NAS, network attached storage, NVMe, PCIe, SAN, solid state drive, SSD, storage area network
Big data and Hadoop are all about exploiting new value and opportunities with data. In financial trading, business and some areas of science, it’s all about being fastest or first to take advantage of the data. The bigger the data sets, the smarter the analytics. The next competitive edge with big data comes when you layer in flash acceleration. The challenge is scaling performance in Hadoop clusters.
The most cost-effective option emerging for breaking through disk-to-I/O bottlenecks to scale performance is to use high-performance read/write flash cache acceleration cards for caching. This is essentially a way to get more work for less cost, by bringing data closer to the processing. The LSI® Nytro™ product has been shown during testing to improve the time it takes to complete Hadoop software framework jobs up to a 33%.
Flash cache cards increase Hadoop application performance
Combining flash cache acceleration cards with Hadoop software is a big opportunity for end users and suppliers. LSI estimates that less than 10% of Hadoop software installations today incorporate flash acceleration1. This will grow rapidly as companies see the increased productivity and ROI of flash to accelerate their systems. And Hadoop software adoption is also growing fast. IDC predicts a CAGR of as much as 60% by 20162. Drivers include IT security, e-commerce, fraud detection and mobile data user management. Gartner predicts that Hadoop software will be in two-thirds of advanced analytics products by 20153. Many thousands of Hadoop software clusters are already deployed.
Where flash makes the most immediate sense is with those who have smaller clusters doing lots of in-place batch processing. Hadoop is purpose-built for analyzing a variety of data, whether structured, semi-structured or unstructured, without the need to define a schema or otherwise anticipate results in advance. Hadoop enables scaling that allows an unprecedented volume of data to be analyzed quickly and cost-effectively on clusters of commodity servers. Speed gains are about data proximity. This is why flash cache acceleration typically delivers the highest performance gains when the card is placed directly in the server on the PCI Express® (PCIe) bus.
Combining the best of flash and HDDs to drive higher performance and storage capacity
PCIe flash cache cards are now available with multiple terabytes of NAND flash storage, which substantially increases the hit rate. We offer a solution with both onboard flash modules and Serial-Attached SCSI (SAS) interfaces to enable high-performance direct-attached storage (DAS) configurations consisting of solid state and hard disk drive storage. This couples the low-latency performance benefits of flash with the capacity and cost-per-gigabyte advantages of HDDs.
To keep the processor close to the data, Hadoop uses servers with DAS. And to get the data even closer to the processor, the servers are usually equipped with significant amounts of random access memory (RAM). An additional benefit: Smart implementation of Hadoop and flash components can reduce the overall server footprint and simplify scaling, with some solutions enabling up to 128 devices to share a very high bandwidth interface. Most commodity servers provide 8 or less SATA ports for disks, reducing expandability.
Hadoop is great, but flash-accelerated Hadoop is best. It’s an effective way, as you work to extract full value from big data, to secure a competitive edge.