Software-defined datacenters (SDDC) and software-defined storage (SDS) are big movements in the industry right now. Just read the trade press or attend any conference and you’ll see that – it’s a big deal. We’re seeing for-pay vendors providing solutions, as well as strong ecosystems evolving around open source solutions. It’s not surprising why – there is a need for enterprises to deploy large scale compute clusters, and that takes either deep expertise that’s very rare, or orchestration tools that have not existed in the past. It’s the “necessity being the mother of invention” thing…
So datacenters are being forced to deploy large-scale clusters to handle the scale of compute needed, and the amount of data that is being captured, analyzed and stored. As an industry then, we’re being forced to simplify applications as well as the management and deployment of these large scale clusters. That’s great for datacenters. It’s even better that we’re figuring out how to provide those expanded resources and manage them for less money, and with fewer people to manage them. (well, it’s probably good for everyone but the sys admins…)
These new technologies are the key enabler. This blog, the second in my three-part series (based on interesting questions I was asked by CEO & CIO, a Chinese business magazine) examines how SDDC and SDS are helping enterprises get more out of their datacenter gear. You can read part 1 here.
CEO & CIO: What are your views on software-defined storage? What’s the development roadmap of LSI in achieving software-defined storage?
We see SDS as one of a number of vital changes underway in the datacenter. SDS promises to span some or all of file, object, key-value and block in order to pool resources and to simplify the infrastructure required in a datacenter, as well as to smooth the migration to object or key-value storage over time. Great examples of these SDS solutions are: Ceph, Swift, Cinder, Gluster, VSAN / VVOLs …. The model brings great benefits in datacenter management, resource pooling and allocation and usability. The main problem is performance – and by that I do not mean extreme performance. I mean poor performance that damages TCO, reduces efficiency of infrastructure and increases costs. Much worse than you would get otherwise. These solutions work, but compromise resource efficiency. Many require flash integrated in the system to simply maintain existing performance. However, this is a permanent change in how storage is used and deployed, and it’s a good change.
While block is what underlies most storage and will continue to for some time, the system and application level view is changing. We view SDS as having great synergy with LSI’s architectural direction – shared DAS infrastructure and ability to add “above the block” capability like quality of service (QoS), direct key/value hardware, etc, and bring improved performance and resource efficiency. Together, SDS + LSI innovation = resource pooling and allocation, including flash and cool/cold storage, management and virtual machine (VM) agility, performance and resource efficiency.
As a result, there has been tremendous interest from SDS vendors to work with us, to demonstrate prototype systems, and to make solutions better. We are working with many SDS partners to provide complete solutions. This is not a one-size-fits-all world, so there will be several solutions. Those solutions are not ready yet, but they’re coming, and will probably displace the older file and block storage systems we know and love.
CEO & CIO: Industry giants such as Intel have outlined their visions for software-defined datacenters. Chinese Internet giants have also put forward similar plans. What views does LSI have on software-defined datacenter?
If you view Abhi’s (our CEO) AIS keynote, you’ll see we believe this is a critical part of the future datacenter. But just one critical part. Interestingly, we had Intel present as well during AIS.
SDDC creates a critical control plane for the datacenter. It is the software abstraction model that enables resource pooling. Resource pooling of compute, storage and network, with memory in the near future. It enables the automation and allocation of tasks and resources in the datacenter. The leading models are VMware® SDDC and OpenStack® software, but there are others that are important too. They’re just a little less public right now. Anyway – it’s way too early to predict which will be dominant. Just like SDS, SDDC exchanges simplified control and abstraction for performance and efficiency. As a result, it’s not a very useful concept, at least not at hyperscale levels, without hardware that really, truly supports and enables it. As the datacenter has changed from a compute-centric model to a dataflow model, the storage and network and, soon, memory become very important. They dictate the useful work that can be gotten from the datacenter.
I believe we are, as an industry, at the start of the hardware transition to support these. We are building hardware solutions for storage and network that are being designed into products today. We are working very closely with three of the largest datacenters in the US, and two in China to build not just the SDDC, but the pooled hardware infrastructure that is needed to make it work.
It’s critical to understand that SDDC solutions really work, but often the performance and efficiency is – well – terrible. That’s been the evolution in computer science and computer architecture since the beginning. You raise the abstraction level, which simplifies development and support, but either causes poor performance or requires more hardware capability that is architected to support those abstractions.
As a result, it’s really difficult to talk about SDDCs without a rack-scale architecture to support them. So we are working closely with the key SDDC software solutions/vendors, even the ones I didn’t list, to integrate and optimize the solutions to make the SDDC actually work. We have been working very closely with VMware and the OpenStack community, and we are changing the way the software plane interacts with the pooled resources. Again, there has been so much interest in our shared DAS, incorporating flash in the same architecture and management, and our Axxia® SDN control plane processor for networks.
I talk about rack-scale architectures to support SDS in the second half of this keynote and in my blog “China: A lot of talk about resource pooling, a better name for disaggregation.”
Summary: So I believe SDS is a big movement, it’s a good thing, and it’s here to stay. But… the performance is poor today. Very poor. That’s where we come in, with hardware that enables SDS and not only makes performance acceptable, but helps make it excellent, and improves efficiency and cost too. And SDDC is also a massive movement that will define the future datacenter. But it is intertwined with the rack-level concepts of pooling or disaggregation to make it really compelling. Again – that’s where we come in.
These were good questions that were interesting to answer. I hope it’s interesting to you too. I’ll post some more soon about how the Chinese Internet giants differ from other customers, and about forward-looking technologies.
Tags: AIS, Axxia, CEO & CIO Magazine, Ceph, China, Cinder, cold storage, cool storage, datacenter, direct attached storage, disaggregation, ecosystem, Gluster, hyperscale datacenter, key-value storage, object storage, OpenStack, pooling, QoS, quality of service, rack scale architecture, SDDC, SDS, shared DAS, software-defined datacenter, software-defined storage, Swift, virtual machine, VM, VMware, VSAN, VVOL
I remember in the mid-1990s the question of how many minutes away from a diversion airport a two-engine passenger jet should be allowed to fly in the event of an engine failure. Staying in the air long enough is one of those high-availability functions that really matters. In the case of the Boeing 777, it was the first aircraft to enter service with a 180-minute extended operations certification (ETOPS)1. This meant that longer over-water and remote terrain routes were immediately possible.
The question was “can a two-engine passenger aircraft be as safe as a four engine aircraft for long haul flights?” The short answer is yes. Reducing the points of failure from four engines to two, while meeting strict maintenance requirements and maintaining redundant systems, reduces the probability of a failure. The 777 and many other aircraft have proven to be safe for these longer flights. Recently, the 777 has received FAA approval for a 330-minute ETOPS rating2, which allows airlines to offer routes that are longer, straighter and more economical.
What does this have to do with a datacenter? It turns out that some hyperscale datacenters house hundreds of thousands of servers, each with its own boot drive. Each of these boot drives is a potential point of failure, which can drive up acquisition and operating costs and the odds of a breakdown. Datacenter managers need to control CapEx, so for the sheer volume of server boot drives they commonly use the lowest cost 2.5-inch notebook SATA hard drives. The problem is that these commodity hard drives tend to fail more often. This is not a huge issue with only a few servers. But in a datacenter with 200,000 servers, LSI has found through internal research that, on average, 40 to 200 drives fail per week! (2.5″ hard drive, ~2.5 to 4-year lifespan, which equates to a conservative 5% failure rate/year).
Traditionally, a hyperscale datacenter has a sea of racks filled with servers. LSI approximates that, in the majority of large datacenters, at least 60% of the servers (Web servers, database servers, etc.) use a boot drive requiring no more than 40GB of storage capacity since it performs only boot-up and journaling or logging. For higher reliability, the key is to consolidate these low-capacity drives, virtually speaking. With our Syncro™ MX-B Rack Boot Appliance, we can consolidate the boot drives for 24 or 48 of these servers into a single mirrored array (using LSI MegaRAID technology), which makes 40GB of virtual disk space available to each server.
Combining all these boot drives with fewer larger drives that are mirrored helps reduce total cost of ownership (TCO) and improves reliability, availability and serviceability. If a rack boot appliance drive fails, an alert is sent to the IT operator. The operator then simply replaces the failed drive, and the appliance automatically copies the disk image from the working drive. The upshot is that operations are simplified, OpEx is reduced, and there is usually no downtime.
Syncro MX-B not only improves reliability by reducing failure points; it also significantly reduces power requirements (up to 40% less in the 24-port version, up to 60% less in the 48-port version) – a good thing for the corporate utility bill and climate change. This, in turn, reduces cooling requirements, and helps make hardware upgrades less costly. With the boot drives disaggregated from the servers, there’s no need to simultaneously upgrade the drives, which typically are still functional during server hardware upgrades.
In the case of both commercial aircraft and servers, less really can be more (or at least better) in some situations. Eliminating excess can make the whole system simpler and more efficient.
To learn more, please visit the LSI® Shared Storage Solutions web page: http://www.lsi.com/solutions/Pages/SharedStorage.aspx
One of the big challenges that I see so many IT managers struggling with is how are they supposed to deal with the almost exponential growth of data that has to be stored, accessed, and protected, with IT budgets that are flat or growing at rates lower than the nonstop increases in storage volumes.
I’ve found that it doesn’t seem to matter if it is a departmental or small business datacenter, or a hyperscale datacenter with many thousands of servers. The data growth continues to outpace the budgets.
At LSI we call this disparity between the IT budget and the needs growth the “data deluge gap.”
Of course, smaller datacenters have different issues than the hyperscale datacenters. However, no matter the datacenter size, concerns generally center on TCO. This, of course, includes both CapEx and OpEx for the storage systems.
It’s a good feeling to know that we are tackling these datacenter growth and operations issues head-on for many different environments – large and small.
LSI has developed and is starting to provide a new shared DAS (sDAS) architecture that supports the sharing of storage across multiple servers. We call it the LSI® SyncroTM architecture and it really is the next step in the evolution of DAS. Our Syncro solutions deliver increased uptime, help to reduce overall costs, increase agility, and are designed for ease of deployment. The fact that the Syncro architecture is built on our proven MegaRAID® technology means that our customers can trust that it will work in all types of environments.
Syncro architecture is a very exciting new capability that addresses storage and data protection needs for numerous datacenter environments. Our first product, Syncro MX-B, is targeted at hyperscale datacenter environments including Web 2.0 and cloud. I will be blogging about that offering in the near future. We will soon be announcing details on our Syncro CS product line, previously known as High Availability DAS, for small and medium businesses and I will blog about what it can mean for our customers and users.
Both of these initial versions of the Syncro architecture can be very exciting and I really like to watch how datacenter managers react when they find about these game-changing capabilities.
We say that “with the LSI Syncro architecture you take DAS out of the box and make it sharable and scalable. The LSI Syncro architecture helps make your storage Simple. Smart. On.” Our tag line for Syncro is “The Smarter Way to ON.™” It really is.
To learn more, please visit the LSI Shared Storage Solutions web page: http://www.lsi.com/solutions/Pages/SharedStorage.aspx