Big data and Hadoop are all about exploiting new value and opportunities with data. In financial trading, business and some areas of science, it’s all about being fastest or first to take advantage of the data. The bigger the data sets, the smarter the analytics. The next competitive edge with big data comes when you layer in flash acceleration. The challenge is scaling performance in Hadoop clusters.
The most cost-effective option emerging for breaking through disk-to-I/O bottlenecks to scale performance is to use high-performance read/write flash cache acceleration cards for caching. This is essentially a way to get more work for less cost, by bringing data closer to the processing. The LSI® Nytro™ product has been shown during testing to improve the time it takes to complete Hadoop software framework jobs up to a 33%.
Flash cache cards increase Hadoop application performance
Combining flash cache acceleration cards with Hadoop software is a big opportunity for end users and suppliers. LSI estimates that less than 10% of Hadoop software installations today incorporate flash acceleration1. This will grow rapidly as companies see the increased productivity and ROI of flash to accelerate their systems. And Hadoop software adoption is also growing fast. IDC predicts a CAGR of as much as 60% by 20162. Drivers include IT security, e-commerce, fraud detection and mobile data user management. Gartner predicts that Hadoop software will be in two-thirds of advanced analytics products by 20153. Many thousands of Hadoop software clusters are already deployed.
Where flash makes the most immediate sense is with those who have smaller clusters doing lots of in-place batch processing. Hadoop is purpose-built for analyzing a variety of data, whether structured, semi-structured or unstructured, without the need to define a schema or otherwise anticipate results in advance. Hadoop enables scaling that allows an unprecedented volume of data to be analyzed quickly and cost-effectively on clusters of commodity servers. Speed gains are about data proximity. This is why flash cache acceleration typically delivers the highest performance gains when the card is placed directly in the server on the PCI Express® (PCIe) bus.
Combining the best of flash and HDDs to drive higher performance and storage capacity
PCIe flash cache cards are now available with multiple terabytes of NAND flash storage, which substantially increases the hit rate. We offer a solution with both onboard flash modules and Serial-Attached SCSI (SAS) interfaces to enable high-performance direct-attached storage (DAS) configurations consisting of solid state and hard disk drive storage. This couples the low-latency performance benefits of flash with the capacity and cost-per-gigabyte advantages of HDDs.
To keep the processor close to the data, Hadoop uses servers with DAS. And to get the data even closer to the processor, the servers are usually equipped with significant amounts of random access memory (RAM). An additional benefit: Smart implementation of Hadoop and flash components can reduce the overall server footprint and simplify scaling, with some solutions enabling up to 128 devices to share a very high bandwidth interface. Most commodity servers provide 8 or less SATA ports for disks, reducing expandability.
Hadoop is great, but flash-accelerated Hadoop is best. It’s an effective way, as you work to extract full value from big data, to secure a competitive edge.