I started working years ago to engage large datacenters, learn what their problems are and try to craft solutions for their problems. It’s taken years, but we engaged them, learned, changed how we thought about storage and began creating solutions that are being deployed at scale.
We’ve started to do the same with the Chinese Internet giants. They’re growing at an incredible rate. They have similar problems, but it’s surprising how different their solution approaches are. Each one is unique. And we’re constantly learning from these guys.
So to wrap up the blog series on my interview with CIO & CEO magazine, here are the last two questions to explain a bit more.
CEO & CIO: Please use examples to tell the stories about the forward-looking technologies and architectures that LSI has jointly developed with Internet giants.
While our host bus adapters (HBAs) and MegaRAID® solutions have been part of the hyperscale Internet companies’ infrastructure since the beginning, we have only recently worked very closely with them to drive joint innovation. In 2009 I led the first LSI engagement with what we then called “mega datacenters.” It took a while to understand what they were doing and why. By 2010 we realized there were specialized needs, and began to imagine new hardware products that worked with these datacenters. Out of this work came the realization that flash was important for efficiency and capability, and the “invention” of LSI® Nytro™ product portfolio. (More are in the pipeline). We have worked closely with hyperscale datacenters to evolve and tune these solutions, to where Nytro products have become the backbone of their main revenue platforms. Facebook has been a vitally important partner in evolving our Nytro platform – teaching us what was truly needed, and now much of their infrastructure runs on LSI products. These same products are a good fit for other hyperscale customers, and we are slowly winning many of the large ones.
Looking forward, we are partnered with several Internet giants in the U.S. and China to work on cold storage solutions, and more importantly shared DAS (Distributed DAS: D-DAS) solutions. We have been demonstrating prototypes. These solutions enable pooled architectures and rack scale architecture, and can be made to work tightly with software-defined datacenters (SDDCs). They simplify management and resource allocation – making task deployment more efficient and easier. Shared DAS solutions increase infrastructure efficiency and improves lifecycle management of components. And they have the potential to radically improve application performance and infrastructure costs.
Looking further into the future, we see even more radical changes in silicon supporting transport protocols and storage models, and in rack scale architectures supporting storage and pooled memory. And cold storage is a huge though, some would say, boring problem that we are also focused on – storing lots of data for free and using no power to do it… but I really can’t talk about any of that.
CEO & CIO: LSI maintains good contact with big Internet companies in China. What are the biggest differences between dealing with these Internet enterprises and dealing with traditional partners?
Yes, we have a very good relationship with large Chinese Internet companies. In fact, I will be visiting Tencent, Alibaba and Baidu in a few weeks. One of the CTOs I would like to say is a friend. That is, we have fun talking together about the future.
These meetings have evolved. The first meetings LSI had about two years ago were sales calls, or support for OEM storage solutions. These accomplished very little. Once we began visiting as architects speaking to architects, real dialogs began. Our CEO has been spending time in China meeting with these Internet companies both to learn, and to make it clear that they are important to us, and we want a chance to solve their problems. But the most interesting conversations have been the architectural ones. There have been very clear changes in the two years I have traveled within China – from standard enterprise to hyperscale architectures.
We’ve received fascinating feedback on architecture, use, application profiles, platforms, problems and goals. We have strong engagement with the U.S. Internet giants. At the highest level, the Chinese Internet companies have similar problems and goals. But the details quickly diverge because of revenue per user, resources, power availability, datacenter ownership and Internet company age. The use of flash is very different.
The Chinese Internet giants are at an amazing change point. Most are ready for explosive growth of infrastructure and deployment of cloud services. Most are changing from standard OEM systems and architectures to self-designed hyperscale systems after experimenting with Scorpio and microserver deployments. Several, like JD.com (an Amazon-like company) are moving from hosted to self-built infrastructure. And there seems to be a general realization that the datacenter has changed from a compute-centric model to a dataflow model, where storage and network dictate how much work gets done more than the CPU does. These giants are leveraging their experience and capability to move very quickly, and in a few cases are working to create true pooled rack level architectures much like Facebook and Google have started in the U.S. In fact, Baidu is similar to Facebook in this approach, but is different in its longer term goals for the architecture.
The Chinese companies are amazingly diverse, even within one datacenter, and arguments on architectural direction are raging within these Internet giants – it’s healthy and exciting. However, the innovations that are coming are similar to those developed by large U.S. Internet companies. Personally I have found these Internet companies much more exciting and satisfying to work with than traditional OEMs. The speed and cadence of advancement, the recognition of problems and their importance, the focus on efficiency and optimization have been much more exciting. And the youthful mentality and view to problems, without being burdened by “the way we’ve always done this” has been wonderful.
Also see these blogs of mine over the past year, where you can read more about some of these changes:
“Postcard from Shenzhen: China’s hyperscale datacenter growth, mixed with a more traditional approach”
“China in the clouds, again”
“China: A lot of talk about resource pooling, a better name for disaggregation”
Or see them (and others) all here.
Summary: So it’s taken years, but we engaged U.S. Internet giants, learned about their problems, changed how we thought about storage and began creating solutions that are now being deployed at scale. And we’re constantly learning from these guys. Constantly, because their problems are constantly changing.
We’ve now started to do the same with the Chinese Internet giants. They have similar problems, and will need similar solutions, but they are not the same. And just like the U.S. Internet giants, each one is unique.
Tags: Alibaba, Amazon, Baidu, CEO & CIO Magazine, China, cloud services, cold storage, D-DAS, DAS, datacenter, datacenter ecosystem, direct attached storage, distributed DAS, Facebook, flash, flash storage, Google, HBA, host bus adapter, hyperscale datacenter, Internet, JD.com, MegaRAID, OEM, original equipment manufacturer, Scorpio, Tencent
Pushing your enterprise cluster solution to deliver the highest performance at the lowest cost is key in architecting scale-out datacenters. Administrators must expand their storage to keep pace with their compute power as capacity and processing demands grow.
safijidsjfijdsifjiodsjfiosjdifdsoijfdsoijfsfkdsjifodsjiof dfisojfidosj iojfsdiojofodisjfoisdjfiodsj ofijds fds foids gfd gfd gfd gfd gfd gfd gfd gfd gfd gfdg dfg gfdgfdg fd gfd gdf gfd gdfgdf g gfd gdfg dfgfdg fdgfdgBeyond price and capacity, storage resources must also deliver enough bandwidth to support these growing demands. Without enough I/O bandwidth, connected servers and users can bottleneck, requiring sophisticated storage tuning to maintain reasonable performance. By using direct attached storage (DAS) server architectures, IT administrators can
Beyond price and capacity, storage resources must also deliver enough bandwidth to support these growing demands. Without enough I/O bandwidth, connected servers and users can bottleneck, requiring sophisticated storage tuning to maintain reasonable performance. By using direct attached storage (DAS) server architectures, IT administrators can reduce the complexities and performance latencies associated with storage area networks (SANs). Now, with LSI 12Gb/s SAS or MegaRAID® technology, or both, connected to 12Gb/s SAS expander-based storage enclosures, administrators can leverage the DataBolt™ technology to clear I/O bandwidth bottlenecks. The result: better overall resource utilization, while preserving legacy drive investments. Typically a slower end device would step down the entire 12Gb/s SAS storage subsystem to 6Gb/s SAS speeds. How does Databolt technology overcome this? Well, without diving too deep into the nuts and bolts, intelligence in the expander buffers data and then transfers it out to the drives at 6Gb/s speeds in order to match the bandwidth between faster hosts and slower SAS or SATA devices.
So for this demonstration at AIS, we are showcasing two Hadoop Distributed File System (HDFS) servers. Each server houses the newly shipping MegaRAID 9361-8i 12Gb/s SAS RAID controller connected to a drive enclosure featuring a 12Gb/s SAS expander and 32 6Gb/s SAS hard drives. One has a DataBolt-enabled configuration, while the other is disabled.
For the benchmarks, we ran DFSIO, which simulates MapReduce workloads and is typically used to detect performance network bottlenecks and tune hardware configurations as well as overall I/O performance.
The primary goal of the DFSIO benchmarks is to saturate storage arrays with random read workloads in order to ensure maximum performance of a cluster configuration. Our tests resulted in MapReduce Jobs completing faster in 12Gb/s mode, and overall throughput increased by 25%.
With the much anticipated launch of 12gb/s SAS MegaRAID and 12Gb/s SAS expanders featuring DataBolt™ SAS bandwidth-aggregation technologies, LSI is taking the bold step of moving beyond traditional IO performance benchmarks like IOMeter to benchmarks that simulate real-world workloads.
In order to fully illustrate the actual benefit many IT administrators can realize using 12Gb/s SAS MegaRAID products on their new server platforms, LSI is demonstrating application benchmarks on top of actual enterprise applications at AIS.
For our end-to-end 12Gb/s SAS MegaRAID demonstration, we chose Benchmark Factory® for Databases running on a MySQL Database. Benchmark Factor, a database performance testing tool that allows you to conduct database workload replay, industry-standard benchmark testing and scalability testing, uses real database application workloads such as TPC-C, TPC-E and TPC-H. We chose the TPC-H benchmark, a decision-support benchmark, because of its large streaming query profile. TPC-H shows the performance of decision support systems – which examine large volumes of data to simplify the analysis of information for business decisions – making it an excellent benchmark to showcase 12Gb/s SAS MegaRAID technology and its ability to maximize the bandwidth of PCIe 3.0 platforms, compared to 6Gb/s SAS.
The demo uses the latest Intel R2216GZ4GC servers based on the new Intel® Xeon® processor E5-2600 v2 product family to illustrate how 12Gb/s SAS MegaRAID solutions are needed to take advantage of the bandwidth capabilities of PCIe® 3.0 bus technology.
When the benchmarks are run side-by-side on the two platforms, you can quickly see how much faster data transfer rates are executed, and how much more efficiently the Intel servers handles data traffic. We used Quest Software’s Spotlight® on MySQL tool to monitor and measure data traffic from end storage devices to the clients running the database queries. More importantly, the test also illustrates how many more user queries the 12Gb/s SAS-based system can handle before complete resource saturation – 60 percent more than 6Gb/s SAS in this demonstration.
What does this mean to IT administrators? Clearly, resourse utilization is much higher, improving total cost of ownership (TCO) with their database server. Or, conversley, 12Gb/s SAS can reduce cost per IO since fewer drives can be used with the server to deliver the same performance as the previous 6Gb/s SAS generation of storage infrastructure.
I remember in the mid-1990s the question of how many minutes away from a diversion airport a two-engine passenger jet should be allowed to fly in the event of an engine failure. Staying in the air long enough is one of those high-availability functions that really matters. In the case of the Boeing 777, it was the first aircraft to enter service with a 180-minute extended operations certification (ETOPS)1. This meant that longer over-water and remote terrain routes were immediately possible.
The question was “can a two-engine passenger aircraft be as safe as a four engine aircraft for long haul flights?” The short answer is yes. Reducing the points of failure from four engines to two, while meeting strict maintenance requirements and maintaining redundant systems, reduces the probability of a failure. The 777 and many other aircraft have proven to be safe for these longer flights. Recently, the 777 has received FAA approval for a 330-minute ETOPS rating2, which allows airlines to offer routes that are longer, straighter and more economical.
What does this have to do with a datacenter? It turns out that some hyperscale datacenters house hundreds of thousands of servers, each with its own boot drive. Each of these boot drives is a potential point of failure, which can drive up acquisition and operating costs and the odds of a breakdown. Datacenter managers need to control CapEx, so for the sheer volume of server boot drives they commonly use the lowest cost 2.5-inch notebook SATA hard drives. The problem is that these commodity hard drives tend to fail more often. This is not a huge issue with only a few servers. But in a datacenter with 200,000 servers, LSI has found through internal research that, on average, 40 to 200 drives fail per week! (2.5″ hard drive, ~2.5 to 4-year lifespan, which equates to a conservative 5% failure rate/year).
Traditionally, a hyperscale datacenter has a sea of racks filled with servers. LSI approximates that, in the majority of large datacenters, at least 60% of the servers (Web servers, database servers, etc.) use a boot drive requiring no more than 40GB of storage capacity since it performs only boot-up and journaling or logging. For higher reliability, the key is to consolidate these low-capacity drives, virtually speaking. With our Syncro™ MX-B Rack Boot Appliance, we can consolidate the boot drives for 24 or 48 of these servers into a single mirrored array (using LSI MegaRAID technology), which makes 40GB of virtual disk space available to each server.
Combining all these boot drives with fewer larger drives that are mirrored helps reduce total cost of ownership (TCO) and improves reliability, availability and serviceability. If a rack boot appliance drive fails, an alert is sent to the IT operator. The operator then simply replaces the failed drive, and the appliance automatically copies the disk image from the working drive. The upshot is that operations are simplified, OpEx is reduced, and there is usually no downtime.
Syncro MX-B not only improves reliability by reducing failure points; it also significantly reduces power requirements (up to 40% less in the 24-port version, up to 60% less in the 48-port version) – a good thing for the corporate utility bill and climate change. This, in turn, reduces cooling requirements, and helps make hardware upgrades less costly. With the boot drives disaggregated from the servers, there’s no need to simultaneously upgrade the drives, which typically are still functional during server hardware upgrades.
In the case of both commercial aircraft and servers, less really can be more (or at least better) in some situations. Eliminating excess can make the whole system simpler and more efficient.
To learn more, please visit the LSI® Shared Storage Solutions web page: http://www.lsi.com/solutions/Pages/SharedStorage.aspx
One of the big challenges that I see so many IT managers struggling with is how are they supposed to deal with the almost exponential growth of data that has to be stored, accessed, and protected, with IT budgets that are flat or growing at rates lower than the nonstop increases in storage volumes.
I’ve found that it doesn’t seem to matter if it is a departmental or small business datacenter, or a hyperscale datacenter with many thousands of servers. The data growth continues to outpace the budgets.
At LSI we call this disparity between the IT budget and the needs growth the “data deluge gap.”
Of course, smaller datacenters have different issues than the hyperscale datacenters. However, no matter the datacenter size, concerns generally center on TCO. This, of course, includes both CapEx and OpEx for the storage systems.
It’s a good feeling to know that we are tackling these datacenter growth and operations issues head-on for many different environments – large and small.
LSI has developed and is starting to provide a new shared DAS (sDAS) architecture that supports the sharing of storage across multiple servers. We call it the LSI® SyncroTM architecture and it really is the next step in the evolution of DAS. Our Syncro solutions deliver increased uptime, help to reduce overall costs, increase agility, and are designed for ease of deployment. The fact that the Syncro architecture is built on our proven MegaRAID® technology means that our customers can trust that it will work in all types of environments.
Syncro architecture is a very exciting new capability that addresses storage and data protection needs for numerous datacenter environments. Our first product, Syncro MX-B, is targeted at hyperscale datacenter environments including Web 2.0 and cloud. I will be blogging about that offering in the near future. We will soon be announcing details on our Syncro CS product line, previously known as High Availability DAS, for small and medium businesses and I will blog about what it can mean for our customers and users.
Both of these initial versions of the Syncro architecture can be very exciting and I really like to watch how datacenter managers react when they find about these game-changing capabilities.
We say that “with the LSI Syncro architecture you take DAS out of the box and make it sharable and scalable. The LSI Syncro architecture helps make your storage Simple. Smart. On.” Our tag line for Syncro is “The Smarter Way to ON.™” It really is.
To learn more, please visit the LSI Shared Storage Solutions web page: http://www.lsi.com/solutions/Pages/SharedStorage.aspx