Pushing your enterprise cluster solution to deliver the highest performance at the lowest cost is key in architecting scale-out datacenters. Administrators must expand their storage to keep pace with their compute power as capacity and processing demands grow.
safijidsjfijdsifjiodsjfiosjdifdsoijfdsoijfsfkdsjifodsjiof dfisojfidosj iojfsdiojofodisjfoisdjfiodsj ofijds fds foids gfd gfd gfd gfd gfd gfd gfd gfd gfd gfdg dfg gfdgfdg fd gfd gdf gfd gdfgdf g gfd gdfg dfgfdg fdgfdgBeyond price and capacity, storage resources must also deliver enough bandwidth to support these growing demands. Without enough I/O bandwidth, connected servers and users can bottleneck, requiring sophisticated storage tuning to maintain reasonable performance. By using direct attached storage (DAS) server architectures, IT administrators can
Beyond price and capacity, storage resources must also deliver enough bandwidth to support these growing demands. Without enough I/O bandwidth, connected servers and users can bottleneck, requiring sophisticated storage tuning to maintain reasonable performance. By using direct attached storage (DAS) server architectures, IT administrators can reduce the complexities and performance latencies associated with storage area networks (SANs).Â Now, with LSI 12Gb/s SAS or MegaRAIDÂ® technology, or both, connected to 12Gb/s SAS expander-based storage enclosures, administrators can leverage the DataBoltâ„˘ technology to clear I/O bandwidth bottlenecks. The result: better overall resource utilization, while preserving legacy drive investments. Typically a slower end device would step down the entire 12Gb/s SAS storage subsystem to 6Gb/s SAS speeds. How does Databolt technology overcome this? Well, without diving too deep into the nuts and bolts, intelligence in the expander buffers data and then transfers it out to the drives at 6Gb/s speeds in order to match the bandwidth between faster hosts and slower SAS or SATA devices.
So for this demonstration at AIS, we are showcasing two Hadoop Distributed File System (HDFS) servers. Each server houses the newly shipping MegaRAID 9361-8i 12Gb/s SAS RAID controller connected to a drive enclosure featuring a 12Gb/s SAS expander and 32 6Gb/s SAS hard drives. One has a DataBolt-enabled configuration, while the other is disabled.
For the benchmarks, we ran DFSIO, which simulates MapReduce workloads and is typically used to detect performance network bottlenecks and tune hardware configurations as well as overall I/O performance.
The primary goal of the DFSIO benchmarks is to saturate storage arrays with random read workloads in order to ensure maximum performance of a cluster configuration. Our tests resulted in MapReduce Jobs completing faster in 12Gb/s mode, and overall throughput increased by 25%.
Lenovo is whopping big. The planetâ€™s second largest PC maker, the sixth largest server vendor and Chinaâ€™s top server supplier.
So when a big gun like Lenovo recognizes us with its Technology Innovation award for our 12G SAS technology, we love to talk about it. The lofty honor came at the recent Lenovo Supplier Conference in Hefei, China.
Hefei is big too. As recently at the mid-1930â€™s, Hefei was a quiet market town of only about 30,000. Today, itâ€™s home to more than 7 million people spread across 4,300 square miles. No matter how you cut it, thatâ€™s explosive growth â€“ and no less dizzying than the global seam-splitting growth that Lenovo is helping companies worldwide manage with its leading servers.
For more than a decade, LSI has been the SAS/RAID strategic partner for Lenovo and in 2009 it chose LSI as its exclusive SAS/RAID vendor. The reason: Our ability to provide enterprise class and industry-leading SAS/RAID solutions. Â Lenovo says it better.
â€śIn 2012, Lenovo began to sharpen itsÂ focus on the enterprise server businessÂ with the goal of becoming a tier-1 server in the global market,â€ť said Jack Xing, senior sales manager in China. â€ťTo support this strategy, the company realized the importance of selecting a trusted and innovative SAS/RAID partner, which is why it has turned to LSI exclusively for its 12G SAS technology.â€ť
Trust. Innovation. High compliments from Lenovo, a major engine of technology innovation in one of the worldâ€™s fastest-growing economies. Itâ€™s dizzying, even heady. You can see why we love to talk about it.
Big data and Hadoop are all about exploiting new value and opportunities with data. In financial trading, business and some areas of science, itâ€™s all about being fastest or first to take advantage of the data. The bigger the data sets, the smarter the analytics. The next competitive edge with big data comes when you layer in flash acceleration. The challenge is scaling performance in Hadoop clusters.
The most cost-effective option emerging for breaking through disk-to-I/O bottlenecks to scale performance is to use high-performance read/write flash cache acceleration cards for caching. This is essentially a way to get more work for less cost, by bringing data closer to the processing. The LSIÂ® Nytroâ„˘ product has been shown during testing to improve the time it takes to complete Hadoop software framework jobs up to a 33%.
Combining flash cache acceleration cards with Hadoop software is a big opportunity for end users and suppliers. LSI estimates that less than 10% of Hadoop software installations today incorporate flash acceleration1. Â This will grow rapidly as companies see the increased productivity and ROI of flash to accelerate their systems.Â And use of Hadoop software is also growing fast. IDC predicts a CAGR of as much as 60% by 20162. Drivers include IT security, e-commerce, fraud detection and mobile data user management. Gartner predicts that Hadoop software will be in two-thirds of advanced analytics products by 20153. There are many thousands of Hadoop software clusters already employed.
Where flash makes the most immediate sense is with those who have smaller clusters doing lots of in-place batch processing. Hadoop is purpose-built for analyzing a variety of data, whether structured, semi-structured or unstructured, without the need to define a schema or otherwise anticipate results in advance. Hadoop enables scaling that allows an unprecedented volume of data to be analyzed quickly and cost-effectively on clusters of commodity servers. Speed gains are about data proximity. This is why flash cache acceleration typically delivers the highest performance gains when the card is placed directly in the server on the PCI ExpressÂ® (PCIe) bus.
PCIe flash cache cards are now available with multiple terabytes of NAND flash storage, which substantially increases the hit rate. We offer a solution with both onboard flash modules and Serial-Attached SCSI (SAS) interfaces to create high-performance direct-attached storage (DAS) configurations consisting of solid state and hard disk drive storage. This couples the low latency performance benefits of flash with the capacity and cost per gigabyte advantages of HDDs.
To keep the processor close to the data, Hadoop uses servers with DAS. And to get the data even closer to the processor, the servers are usually equipped with significant amounts of random access memory (RAM). An additional benefit, smart implementation of Hadoop and flash components can reduce the overall server footprint required. Scaling is simplified, with some solutions providing the ability to allow up to 128 devices which share a very high bandwidth interface. Most commodity servers provide 8 or less SATA ports for disks, reducing expandability.
Hadoop is great, but flash-accelerated Hadoop is best. Itâ€™s an effective way, as you work to extract full value from big data, to secure a competitive edge.
Thereâ€™s no need to wait for higher speed. Server builders can take advantage of 12Gb/s SAS now. And this is even as HDD and SSD makers continue to tweak, tune and otherwise prepare their 12Gb/s SAS products for market. The next generation of 12Gb/s SAS without supporting drives? What gives?
Itâ€™s simple. LSI is already producing 12Gb/s ROC and IOC solutions, meaning that customers can take advantage of 12Gb/s SAS performance today with currently shipping systems and storage.Â As for the numbers, LSI 12Gb/s SAS enables performance increases of up to 45% in throughput and up to 58% in IOPS when compared to 6Gb/s SAS.
True, 12Gb/s SAS isnâ€™t a Big Bang Disruption in storage systems; rather itâ€™s an evolutionary change, but a big step forward.Â It may not be clear why it matters so much, so I want to briefly explain.Â In latest generation PCIe 3 systems, 6Gb/s SAS is the bottleneck that prevents systems from achieving full PCIe 3 throughput of 6,400 MB/s.
With 12Gb/s SAS, customers will be able to take full advantage of the performance of PCIe 3 systems.Â Earlier this month at CeBIT computer expo in Hanover, Germany, we announced that we are the first to ship production-level 12Gb/s SAS ROC (RAID on Chip) and IOC (I/O Controllers) to OEM customers.Â This convergence of new technologies and the expansion of existing capabilities create significant improvements for datacenters of all kinds.
At CeBIT, we demonstrated our 12Gb/s SAS solutions with the unique DataBoltTM feature and how, with DataBolt, Â systems with 6Gb/s SAS HDDs can achieve 12Gb/s SAS performance.
DataBolt uses bandwidth aggregation to create throughput performance acceleration. Â Most importantly, customers donâ€™t have to wait for the next inflection in drive design to get the highest possible performance and connectivity.
Iâ€™ve spent a lot of time with hyperscale datacenters around the world trying to understand their problems â€“ and I really donâ€™t care what area those problems are as long as theyâ€™re important to the datacenter. What is the #1 Real Problem for manyÂ hyperscale datacenters? Itâ€™s something youâ€™ve probably never heard about, and probably have not even thought about. Itâ€™s called false disk failure. Some hyperscaleÂ datacenters have crafted their own solutions â€“ but most have not.
Why is this important, you ask? Many large datacenters today have 1 million to 4 million hard disk drives (HDDs) in active operation. In anyoneâ€™s book thatâ€™s a lot. Itâ€™s also a very interesting statistical sample size of HDDs.Â Hyperscale datacentersÂ get great pricing on HDDs. Probably better than OEMs get, and certainly better than the $79 for buying 1 HDD at your local Fryâ€™s store. So you would imagine if a disk fails â€“ no one cares â€“ theyâ€™re cheap and easy to replace. But the burden of a failed disk is much more than the raw cost of the disk:
Letâ€™s put some scale to this problem, and youâ€™ll begin to understand the issue.Â One modest size hyperscale datacenter has been very generous in sharing its real numbers. (When I say modest, they are ~1/4 to 1/2 the size of many other hyperscale datacenters, but they are still huge â€“ more than 200k servers). Other hyperscale datacenters I have checked with say â€“ yep, thatâ€™s about right. And one engineer I know at an HDD manufacturer said â€“ â€śwow â€“ I expected worse than that. Thatâ€™s pretty good.â€ť To be clear â€“ these are very good HDDs they are using, itâ€™s just that the numbers add up.
The raw data:
RAIDed SAS HDDs
Non-RAIDed (direct map) SATA drives behind HBAs
Whatâ€™s interesting is the relative failure rate of SAS drives vs. SATA. Itâ€™s about an order of magnitude worse in SATA drives than SAS. Frankly some of this is due to protocol differences. SAS allows far more error recovery capabilities, and because they also tend to be more expensive, I believe manufacturers invest in slightly higher quality electronics and components. I know the electronics we ship into SAS drives is certainly more sophisticated than SATA drives.
False fail? What? Yea, thatâ€™s an interesting topic. It turns out that about 40% of the time with SAS and about 50% of the time with SATA, the drive didnâ€™t actually fail. It just lost its marbles for a while. When they pull the drive out and put it into a test jig, everything is just fine. And more interesting, when they put the drive back into service, it is no more statistically likely to fail again than any other drive in the datacenter. Why? No one knows. I suspect though.
I used to work on engine controllers. Thatâ€™s a very paranoid business. If something goes wrong and someone crashes, you have a lawsuit on your hands. If a controller needs a recall, thatâ€™s millions of units to replace, with a multi-hundred dollar module, and hundreds of dollars in labor for each one replaced. No one is willing to take that risk. So we designed very carefully to handle soft errors in memory and registers. We incorporated ECC like servers use, background code checksums and scrubbing, and all sorts of proprietary techniques, including watchdogs and super-fast self-resets that could get operational again in less than a full revolution of the engine.Â Why? â€“ the events were statistically rare. The average controller might see 1 or 2 events in its lifetime, and a turn of the ignition would reset that state.Â But the events do happen, and so do recalls and lawsuitsâ€¦ HDD controllers donâ€™t have these protections, which is reasonable. It would be an inappropriate cost burden for their price point.
You remember the Toyota Prius accelerator problems? I know that controller was not protected for soft errors. And the source of the problem remained a â€śmystery.â€ťÂ Maybe it just lost its marbles for a while? A false fail if you will. Just sayinâ€™.
Back to HDDs. False fail is especially frustrating, because half the HDDs actually didnâ€™t need to be replaced. All the operational costs were paid for no reason. The disk just needed a power cycle reset. (OK, that introduces all sorts of complex management by the RAID controller or application to manage that 10 second power reset cycle and application traffic created in that time â€“ be we can handle that.)
Daily, this datacenter has to:
And 1/2 of that is for no reason at all.
First â€“ why not rebuild the disk if itâ€™s RAIDed? Usually hyperscale datacenters use clustered applications. A traditional RAID rebuild drives the server performance to ~50%, and for a 2TByte drive, under heavy application load (definition of a hyperscale datacenter) can truly take up to a week.Â 50% performance for a week? In a cluster that means the overall cluster is running ~50% performance.Â Say 200 nodes in a cluster â€“ that means you just lost ~100 nodes of work â€“ or 50% of cluster performance. Itâ€™s much simpler to just take the node offline with the failed drive, and get 99.5% cluster performance, and operationally redistribute the workload across multiple nodes (because you have replicated data elsewhere). But after rebuild, the node will have to be re-synced or re-imaged. There are ways to fix all this. Weâ€™ll talk about them on another day. Or you can simply run direct mapped storage, and unmounts the failed drive.
Next â€“ Why replicate data over the network, and why is that a big deal? For geographic redundancy (say a natural disaster at one facility) and regional locality, hyperscale datacenters need multiple data copies. Often 3 copies so they can do double duty as high-availability copies, or in the case of some erasure coding, 2.2 to 2.5 copies (yea â€“ weird math â€“ how do you have 0.5 copyâ€¦). When you lose one copy, you are down to 2, possibly 1. You need to get back to a reliable number again. Fast. Customers are loyal because of your perfect data retention. So you need to replicate that data and re-distribute it across the datacenter on multiple servers. Thatâ€™s network traffic, and possibly congestion, which affects other aspects of the operations of the datacenter. In this datacenter itâ€™s about 50 hours of 10G Ethernet traffic every day.
To be fair, there is a new standard in SAS interfaces that will facilitate resetting a disk in-situ. And there is the start of discussion of the same around SATA â€“ but thatâ€™s more problematic. Whatever the case, it will be a years before the ecosystem is in place to handle the problems this way.
Whatâ€™s that mean to you?
Well. You can expect something like 1/100 of your drives to really fail this year. And you can expect another 1/100 of your drives to fail this year, but not actually be failed. Youâ€™ll still pay all the operational overhead of not actually having a failed drive â€“ rebuilds, disk replacements, management interventions, scheduled downtime/maintenance time, and the OEM replacement price for that drive â€“ what $600 or so ?â€¦ Depending on your size, thatâ€™s either a donâ€™t care, or a big deal. There are ways to handle this, and theyâ€™re not expensive â€“ much less than the disk carrier you already pay for to allow you to replace that drive â€“ and it can be handled transparently â€“ just a log entry without seeing any performance hiccups. Â You just need to convince your OEM to carry the solution.