Deploying a mix of datacenter resources in a preconfigured server – compute, storage, network, and memory – in a way that they are fixed, can’t be tuned to a use case, and must be replaced entirely for an upgrade is how the IT industry has been working for years. Each server is an island.
This is an inefficient path when you deploy more than a few servers. That’s why there is an architectural movement in hyperscale datacenters (and it’s sure to be emulated by enterprise in a few years) to “disaggregate” – or, the term I prefer, “pool” – these resources. That allows deployments to be “configured” in the field, creating tailored platforms depending on needs. And it enables more efficient life-cycle management of subsystems. Ultimately this enables more work/$, and that’s almost everyone’s goal. One hyperscale CTO I know told me over dinner he views this pooling and allocating as “hardware-based virtualization,” which is sort of true.
In this AIS interview I talk about the concept, the rational, and show how costly forklift upgrades will be behind us once this small movement becomes common practice.
Keeping up with the flood of global data with network acceleration
It’s hardly a secret that the growth of PC, mobile, and intelligent media devices and their related applications worldwide is exploding. It also no secret that they are driving an equivalent increase in global network traffic. The mystery for many datacenter network managers is how to keep up, as the flood of data traffic saturates networks, drags down network performance, and frustrates users with waiting.
Here, LSI’s Troy Bailey discusses how networks will become smarter to deliver the data you need, in the priority you need, when you need it.
I am sitting in the terminal waiting for my flight home from – yes, you guessed it – China. I am definitely racking up frequent flier miles this year.
This trip ended up centering on resource pooling in the datacenter. Sure, you might hear a lot about disaggregation, but the consensus seems to be: that’s the wrong name (unless you happen to make standalone servers). For anyone else, it’s about a much more flexible infrastructure, simplified platforms, better lifecycle management, and higher efficiency. I call it “resource pooling,” which is descriptive, but others simply call it rack scale architecture.
It’s been a long week, but very interesting. I was asked to keynote at the SACC conference (Systems Architect Conference China) in Beijing. It was also a great chance to meet 1-on-1 with the CTOs and chief architects from the big datacenters, and visit for a few hours with other acquaintances. I even had the chance to have dinner with the CEO /CIO China Magazine editor in chief, and CIO’s from around Beijing. As always in life, if you’re willing to listen, you can learn a lot. And I did.
Thinking on disaggregation aligns
With CTOs, there was a lot of discussion about disaggregation in the datacenter. There is a lot of aligned thinking on the topic, and it’s one of those occasions where you had to laugh because I think anyone of the CTOs keynoting could have given anyone else’s presentation. So what’s the big deal? Resource pooling and rack scale architecture.
I’ll use this trip as an excuse to dig a little deeper into my view on what this means.
First – you need to understand where these large datacenters are in their evolution. They usually have 4 to 6 platforms and2 or 3 generations of each in the datacenter. That can be 18 different platforms to manage, maintain, and tune. Worse – they have to plan 6 to 9 months in advance to deploy equipment. If you guess wrong, you’ve got a bunch of useless equipment, and you spent a bunch of money – the size of mistake that will get you fired… And even if you get it right, you’re left with the problem – Do I upgrade servers when the CPU is new? Or at, say, 18 months? Or do I wait until the biggest cost item – the drives – need to be replaced in 4 or 5 years? That’s difficult math. So resource pooling is about lifecycle management of different types of components and sub-systems. You can optimally replace each resource on its own schedule.
Increasing resource utilization and efficiency
But it’s also about resource utilization and efficiency. Datacenters have multiple platforms because each platform needs a different configuration of resources. I use the term configuration on purpose. If you have storage in your server, it’s in some standard configuration – say, 6 3 TByte drives or 18 raw TBytes. Do you use all that capacity? Or do you leave some space so databases can grow? Of course you leave empty space. You might not even have any use for that much storage in that particular server – maybe you just use half the capacity. After all, it’s a standard configuration. What about disk bandwidth? Can your Hadoop node saturate 6 drives? Probably. It could probably use 12 or maybe even 24. But sorry – it’s a standard configuration. What about latency-sensitive databases? Sure, I can plug a PCIe card in, but I only have 1.6 TByte PCIe cards as my standard configuration. My database is 1.8 TBytes and growing. Sorry – you have to refactor and put on 2 servers. Or my database is only 1 TByte. I’m wasting 600 GBytes of really expensive resource.
For network resources – the standard configuration gets maybe exactly 1 10GE port. You need more? Can’t have it. You don’t need that much? Sorry – wasted bandwidth capacity. What about standard memory? You either waste DRAM you don’t use, or you starve for more DRAM you can’t get.
But if I have pools of rack scale resources that I can allocate to a standard compute platform – well – that’s a different story. I can configure exactly the amount of network bandwidth, memory, flash high- performance storage, and disk bulk storage. I can even add more configured storage if a database grows, instead of being forced to refactor a database into shards across multiple standard configurations.
Pooling resources = simplified operations
So the desire to pool resources is really as much about simplified operations as anything else. I can have standardized modules that are all “the same” to manage, but can be resource configured into a well-tailored platform that can even change over time.
But pooling is also about accommodating how the application architectures have changed, and how much more important dataflow is than compute for so much of the datacenter. As a result there is a lot of uncertainty about how parts of these rack scale architectures and interconnect will evolve, even as there is a lot of certainty that they will evolve, and they will include pooled resource “modules.” Whatever the overall case, we’re pretty sure we understand how the storage will evolve. And at a high level, that’s what I presented in my keynote. (Hey – I’m not going to publicly share all our magic!)
One storage architecture of pooled resources at the rack scale level. One storage architecture that combines boot management, flash storage for performance, and disk storage for efficient bandwidth and capacity. And those resources can be allocated however and whenever the datacenter manager needs them. And the existing software model doesn’t need to change. Existing apps, OS’s, file systems, and drivers are all supported, meaning a change to pooled resource rack scale deployments is de-risked dramatically. Overall, this one architecture simplifies the number of platforms, simplifies the management of platforms, utilizes the resources very efficiently, and simplifies image and boot management. I’m pretty sure it even reduces datacenter-level CapEx. I know it dramatically reduces OpEx.
Yea – I know what you’re thinking – it’s awesome ! (That’s what you thought – right?)
Oh – what about those CIO meetings? Well, there is tremendous pressure to not buy American IT equipment in China because of all the news from the Snowden NSA leaks. As most of the CIO’s pointed out, though, in today’s global sourcing market, it’s pretty hard to not buy US IT equipment. So they’re feeling a bit trapped. In a no-risk profession, I suspect that means they just won’t buy anything for a year or so and hope it blows over.
But in general, yep, I think this trip was centered on resource pooling in the datacenter. Sure, you might hear about disaggregation, but there’s a lot of agreement that’s the wrong name. It’s much more about resource pooling for flexible infrastructure, simplified platforms, better lifecycle management, and higher efficiency. And we aim to be right in the middle. Literally.
I was lucky enough to get together for dinner and beer with old friends a few weeks ago. Between the 4 of us, we’ve been involved in or responsible for a lot of stuff you use every day, or at least know about.
Supercomputers, minicomputers, PCs, Macs, Newton, smart phones, game consoles, automotive engine controllers and safety systems, secure passport chips, DRAM interfaces, netbooks, and a bunch of processor architectures: Alpha, PowerPC, Sparc, MIPS, StrongARM/XScale, x86 64-bit, and a bunch of other ones you haven’t heard of (um – most of those are mine, like TriCore). Basically if you drive a European car, travel internationally, use the Internet , if you play video games, or use a smart phone, well… you’re welcome.
Why do I tell you this? Well – first I’m name dropping – I’m always stunned I can call these guys friends and be their peers. But more importantly, we’ve all been in this industry as architects for about 30 years. Of course our talk went to what’s going on today. And we all agree that we’ve never seen more changes – inflexions – than the raft unfolding right now. Maybe its pressure from the recession, or maybe un-naturally pent up need for change in the ecosystem, but change there is.
Changes in who drives innovation, what’s needed, the companies on top and on bottom at every point in the food chain, who competes with whom, how workloads have changed from compute to dataflow, software has moved to opensource, how abstracted code is now from processor architecture, how individual and enterprise customers have been revolting against the “old” ways, old vendors, old business models, and what the architectures look like, how processors communicate, and how systems are purchased, and what fundamental system architectures look like. But not much besides that…
Ok – so if you’re an architect, that’s as exciting as it gets (you hear it in my voice – right ?), and it makes for a lot of opportunities to innovate and create new or changed businesses. Because innovation is so often at the intersection of changing ways of doing things. We’re at a point where the changes are definitely not done yet. We’re just at the start. (OK – now try to imagine a really animated 4-way conversation over beers at the Britannia Arms in Cupertino… Yea – exciting.)
I’m going to focus on just one sliver of the market – but it’s important to me – and that’s enterprise IT. I think the changes are as much about business models as technology.
Hyperscale datacenters drive innovation
I’ll start in a strange place. Hyperscale datacenters (think social media, search, etc.) and the scale of deployment changes the optimization point. Most of us starting to get comfortable with rack as the new purchase quantum. And some of us are comfortable with the pod or container as the new purchase quantum. But the hyperscale dataenters work more at the datacenter as the quantum. By looking at it that way, they can trade off the cost of power, real estate, bent sheet metal, network bandwidth, disk drives, flash, processor type and quantity, memory amount, where work gets done, and what applications are optimized for. In other words, we shifted from looking at local optima to looking for global optima. I don’t know about you, but when I took operations research in university, I learned there was an unbelievable difference between the two – and global optima was the one you wanted…
Hyperscale datacenters buy enough (top 6 are probably more than 10% of the market today) that 1) they need to determine what they deploy very carefully on their own, and 2) vendors work hard to give them what they need.
That means innovation used to be driven by OEMs, but now it’s driven by hyperscale datacenters and it’s driven hard. That global optimum? It’s work/$ spent. That’s global work, and global spend. It’s OK to spend more, even way more on one thing if over-all you get more done for the $’s you spend.
That’s why the 3 biggest consumers of flash in servers are Facebook, Google, and Apple, with some of the others not far behind. You want stuff, they want to provide it, and flash makes it happen efficiently. So efficiently they can often give that service away for free.
Hyperscale datacenters have started to publish their cost metrics, and open up their architectures (like OpenCompute), and open up their software (like Hadoop and derivatives). More to the point, services like Amazon have put a very clear $ value on services. And it’s shockingly low.
Enterprises are paying attention
Enterprises have looked at those numbers. Hard. That’s catalyzed a customer revolt against the old way of doing things – the old way of buy and billing. OEMs and ISVs are creating lots of value for enterprise, but not that much. They’ve been innovating around “stickiness” and “lock-in” (yea – those really are industry terms) for too long, while hyperscale datacenters have been focused on getting stuff done efficiently. The money they save per unit just means they can deploy more units and provide better services.
That revolt is manifesting itself in 2 ways. The first is seen in the quarterly reports of OEMs and ISVs. Rumors of IBM selling its X-series to Lenovo, Dell going private, Oracle trying to shift business, HP talking of the “new style of IT”… The second is enterprises are looking to emulate hyperscale datacenters as much as possible, and deploy private cloud infrastructure. And often as not, those will be running some of the same open source applications and file systems as the big hyperscale datacenters use.
Where are the hyperscale datacenters leading them? It’s a big list of changes, and they’re all over the place.
But they’re also looking at a few different things. For example, global name space NAS file systems. Personally? I think this one’s a mistake. I like the idea of file systems/object stores, but the network interconnect seems like a bottleneck. Storage traffic is shared with network traffic, creates some network spine bottlenecks, creates consistency performance bottlenecks between the NAS heads, and – let’s face it – people usually skimp on the number of 10GE ports on the server and in the top of rack switch. A typical SAS storage card now has 8 x 12G ports – that’s 96G of bandwidth. Will servers have 10 x 10G ports? Yea. I didn’t think so either.
Anyway – all this is not academic. One Wall Street bank shared with me that – hold your breath – it could save 70% of its spend going this route. It was shocked. I wasn’t shocked, because at first blush this seems absurd – not possible. That’s how I reacted. I laughed. But… The systems are simpler and less costly to make. There is simply less there to make or ship than OEMs force into the machines for uniqueness and “value.” They are purchased from much lower margin manufacturers. They have massively reduced maintenance costs (there’s less to service, and, well, no OEM service contracts). And also important – some of the incredibly expensive software licenses are flipped to open source equivalents. Net savings of 70%. Easy. Stop laughing.
Disaggregation: Or in other words, Pooled Resources
But probably the most important trend from all of this is what server manufacturers are calling “disaggregation” (hey – you’re ripping apart my server!) but architects are more descriptively calling pooled resources.
First – the intent of disaggregation is not to rip the parts of a server to pieces to get lowest pricing on the components. No. If you’re buying by the rack anyway – why not package so you can put like with like. Each part has its own life cycle after all. CPUs are 18 months. DRAM is several years. Flash might be 3 years. Disks can be 5 to 7 years. Networks are 5 to 10 years. Power supplies are… forever? Why not replace each on its own natural failure/upgrade cycle? Why not make enclosures appropriate to the technology they hold? Disk drives need solid vibration-free mechanical enclosures of heavy metal. Processors need strong cooling. Flash wants to run hot. DRAM cool.
Second – pooling allows really efficient use of resources. Systems need slush resources. What happens to a systems that uses 100% of physical memory? It slows down a lot. If a database runs out of storage? It blue screens. If you don’t have enough network bandwidth? The result is, every server is over provisioned for its task. Extra DRAM, extra network bandwidth, extra flash, extra disk drive spindles.. If you have 1,000 nodes you can easily strand TBytes of DRAM, TBytes of flash, a TByte/s of network bandwidth of wasted capacity, and all that always burning power. Worse, if you plan wrong and deploy servers with too little disk or flash or DRAM, there’s not much you can do about it. Now think 10,000 or 100,000 nodes… Ouch.
If you pool those things across 30 to 100 servers, you can allocate as needed to individual servers. Just as importantly, you can configure systems logically, not physically. That means you don’t have to be perfect in planning ahead what configurations and how many of each you’ll need. You have sub-assemblies you slap into a rack, and hook up by configuration scripts, and get efficient resource allocation that can change over time. You need a lot of storage? A little? Higher performance flash? Extra network bandwidth? Just configure them.
That’s a big deal.
And of course, this sets the stage for immense pooled main memory – once the next generation non-volatile memories are ready – probably starting around 2015.
You can’t underestimate the operational problems associated with different platforms at scale. Many hyperscale datacenters today have around 6 platforms. If you think they are rolling out new versions of those before old ones are retired they often have 3 generations of each. That’s 18 distinct platforms, with multiple software revisions of each. That starts to get crazy when you may have 200,000 to 400,000 servers to manage and maintain in a lights out environment. Pooling resources and allocating them in the field goes a huge way to simplifying operations.
Alternate Processor Architecture
It didn’t always used to be Intel x86. There was a time when Intel was an upstart in the server business. It was Power, MIPs, Alpha, SPARC… (and before that IBM mainframes and minis, etc). Each of the changes was brought on by changing the cost structure. Mainframes got displaced by multi-processor RISC, which gave way to x86.
Today, we have Oracle saying they’re getting out of x86 commodity servers and doubling down on SPARC. IBM is selling off its x86 business and doubling down on Power (hey – don’t confuse that with PowerPC – which started as an architectural cut-down of Power – I was there…). And of course there is a rash of 64-bit ARM server SOCs coming – with HP and Dell already dabbling in it. What’s important to realize is that all of these offerings are focusing on the platform architecture, and how applications really perform in total, not just the processor.
Let me warp up with an email thread cut/paste from a smart friend – Wayne Nation. I think he summed up some of what’s going on well, in a sobering way most people don’t even consider.
“Does this remind you of a time, long ago, when the market was exploding with companies that started to make servers out of those cheap little desktop x86 CPUs? What is different this time? Cost reduction and disaggregation? No, cost and disagg are important still, but not new.
A new CPU architecture? No, x86 was “new” before. ARM promises to reduce cost, as did Intel.
Disaggregation enables hyperscale datacenters to leverage vanity-free, but consistent delivery will determine the winning supplier. There is the potential for another Intel to rise from these other companies. “