I started working years ago to engage large datacenters, learn what their problems are and try to craft solutions for their problems. It’s taken years, but we engaged them, learned, changed how we thought about storage and began creating solutions that are being deployed at scale.
We’ve started to do the same with the Chinese Internet giants. They’re growing at an incredible rate. They have similar problems, but it’s surprising how different their solution approaches are. Each one is unique. And we’re constantly learning from these guys.
So to wrap up the blog series on my interview with CIO & CEO magazine, here are the last two questions to explain a bit more.
CEO & CIO: Please use examples to tell the stories about the forward-looking technologies and architectures that LSI has jointly developed with Internet giants.
While our host bus adapters (HBAs) and MegaRAID® solutions have been part of the hyperscale Internet companies’ infrastructure since the beginning, we have only recently worked very closely with them to drive joint innovation. In 2009 I led the first LSI engagement with what we then called “mega datacenters.” It took a while to understand what they were doing and why. By 2010 we realized there were specialized needs, and began to imagine new hardware products that worked with these datacenters. Out of this work came the realization that flash was important for efficiency and capability, and the “invention” of LSI® Nytro™ product portfolio. (More are in the pipeline). We have worked closely with hyperscale datacenters to evolve and tune these solutions, to where Nytro products have become the backbone of their main revenue platforms. Facebook has been a vitally important partner in evolving our Nytro platform – teaching us what was truly needed, and now much of their infrastructure runs on LSI products. These same products are a good fit for other hyperscale customers, and we are slowly winning many of the large ones.
Looking forward, we are partnered with several Internet giants in the U.S. and China to work on cold storage solutions, and more importantly shared DAS (Distributed DAS: D-DAS) solutions. We have been demonstrating prototypes. These solutions enable pooled architectures and rack scale architecture, and can be made to work tightly with software-defined datacenters (SDDCs). They simplify management and resource allocation – making task deployment more efficient and easier. Shared DAS solutions increase infrastructure efficiency and improves lifecycle management of components. And they have the potential to radically improve application performance and infrastructure costs.
Looking further into the future, we see even more radical changes in silicon supporting transport protocols and storage models, and in rack scale architectures supporting storage and pooled memory. And cold storage is a huge though, some would say, boring problem that we are also focused on – storing lots of data for free and using no power to do it… but I really can’t talk about any of that.
CEO & CIO: LSI maintains good contact with big Internet companies in China. What are the biggest differences between dealing with these Internet enterprises and dealing with traditional partners?
Yes, we have a very good relationship with large Chinese Internet companies. In fact, I will be visiting Tencent, Alibaba and Baidu in a few weeks. One of the CTOs I would like to say is a friend. That is, we have fun talking together about the future.
These meetings have evolved. The first meetings LSI had about two years ago were sales calls, or support for OEM storage solutions. These accomplished very little. Once we began visiting as architects speaking to architects, real dialogs began. Our CEO has been spending time in China meeting with these Internet companies both to learn, and to make it clear that they are important to us, and we want a chance to solve their problems. But the most interesting conversations have been the architectural ones. There have been very clear changes in the two years I have traveled within China – from standard enterprise to hyperscale architectures.
We’ve received fascinating feedback on architecture, use, application profiles, platforms, problems and goals. We have strong engagement with the U.S. Internet giants. At the highest level, the Chinese Internet companies have similar problems and goals. But the details quickly diverge because of revenue per user, resources, power availability, datacenter ownership and Internet company age. The use of flash is very different.
The Chinese Internet giants are at an amazing change point. Most are ready for explosive growth of infrastructure and deployment of cloud services. Most are changing from standard OEM systems and architectures to self-designed hyperscale systems after experimenting with Scorpio and microserver deployments. Several, like JD.com (an Amazon-like company) are moving from hosted to self-built infrastructure. And there seems to be a general realization that the datacenter has changed from a compute-centric model to a dataflow model, where storage and network dictate how much work gets done more than the CPU does. These giants are leveraging their experience and capability to move very quickly, and in a few cases are working to create true pooled rack level architectures much like Facebook and Google have started in the U.S. In fact, Baidu is similar to Facebook in this approach, but is different in its longer term goals for the architecture.
The Chinese companies are amazingly diverse, even within one datacenter, and arguments on architectural direction are raging within these Internet giants – it’s healthy and exciting. However, the innovations that are coming are similar to those developed by large U.S. Internet companies. Personally I have found these Internet companies much more exciting and satisfying to work with than traditional OEMs. The speed and cadence of advancement, the recognition of problems and their importance, the focus on efficiency and optimization have been much more exciting. And the youthful mentality and view to problems, without being burdened by “the way we’ve always done this” has been wonderful.
Also see these blogs of mine over the past year, where you can read more about some of these changes:
“Postcard from Shenzhen: China’s hyperscale datacenter growth, mixed with a more traditional approach”
“China in the clouds, again”
“China: A lot of talk about resource pooling, a better name for disaggregation”
Or see them (and others) all here.
Summary: So it’s taken years, but we engaged U.S. Internet giants, learned about their problems, changed how we thought about storage and began creating solutions that are now being deployed at scale. And we’re constantly learning from these guys. Constantly, because their problems are constantly changing.
We’ve now started to do the same with the Chinese Internet giants. They have similar problems, and will need similar solutions, but they are not the same. And just like the U.S. Internet giants, each one is unique.
Tags: Alibaba, Amazon, Baidu, CEO & CIO Magazine, China, cloud services, cold storage, D-DAS, DAS, datacenter, datacenter ecosystem, direct attached storage, distributed DAS, Facebook, flash, flash storage, Google, HBA, host bus adapter, hyperscale datacenter, Internet, JD.com, MegaRAID, OEM, original equipment manufacturer, Scorpio, Tencent
It’s the start of the new year, and it’s traditional to make predictions – right? But predicting the future of the datacenter has been hard lately. There have been and continue to be so many changes in flight that possibilities spin off in different directions. Fractured visions through a kaleidoscope. Changes are happening in the businesses behind datacenters, the scale, the tasks and what is possible to accomplish, the value being monetized, and the architectures and technologies to enable all of these.
A few months ago I was asked to describe the datacenter in 2020 for some product planning purposes. Dave Vellante of Wikibon & John Furrier of SiliconANGLE asked me a similar question a few weeks ago. 2020 is out there – almost 7 years. It’s not easy to look into the crystal ball that far and figure out what the world will look like then, especially when we are in the midst of those tremendous changes. For some context I had to think back 7 years – what was the datacenter like then, and how profound have the changes been over the past 7 years?
And 7 years ago, our forefathers…
It was a very different world. Facebook barely existed, and had just barely passed the “university only” membership. Google was using Velcro, Amazon didn’t have its services, cloud was a non-existent term. In fact DAS (direct attach storage) was on the decline because everyone was moving to SAN/NAS. 10GE networking was in the future (1GE was still in growth mode). Linux was not nearly as widely accepted in enterprise – Amazon was in the vanguard of making it usable at scale (with Werner Vogels saying “it’s terrible, but it’s free, as in free beer”). Servers were individual – no “PODs,” and VMware was not standard practice yet. SATA drives were nowhere in datacenters.
An enterprise disk drive topped out at around 200GB in capacity. Nobody used the term petabyte. People, including me, were just starting to think about flash in datacenters, and it was several years later that solutions became available. Big data did not even exist. Not as a term or as a technology, definitely not Hadoop or graph search. In fact, Google’s seminal paper on MapReduce had just been published, and it would become the inspiration for Hadoop – something that would take many years before Yahoo picked it up and helped make it real.
Analytics were statistical and slow, and you had to be very explicitly looking for something. Advertising on the web was a modest business. Cold storage was tape or MAID, not vast pools of cheap disks in the cloud at absurdly low price points. None of the Chinese web-cloud guys existed… In truth, at LSI we had not even started looking at or getting to know the web datacenter guys. We assumed they just bought from OEMs…
No one streamed mainstream media – TV and movies – and there were no tablets to stream them to. YouTube had just been purchased by Google. Blu-ray was just getting started and competing with HD-DVD (which I foolishly bought 7 years ago), and integrated GPS’s in your car were a high-tech growth area. The iPhone or Android had not launched, Danger’s Sidekick was the cool phone, flip phones were mainstream, there was no App store or the billions of sales associated with that, and a mobile web browser was virtually useless.
Dell, IBM, and HP were the only real server companies that mattered, and the whole industry revolved around them, as well as EMC and NetApp for storage. Cisco, Lenovo and Huawei were not server vendors. And Sun was still Sun.
7 years from now
So – 7 years from now? That’s hard to predict, so take this with a grain of salt… There are many ways things could play out, especially when global legal, privacy, energy, hazardous waste recycling, and data retention requirements come into play, not to mention random chaos and invention along the way.
Compute-centric to dataflow-centric
Major applications are changing (have changed) from compute-centric to dataflow architectures. That is big data. The result will probably be a decline in the influence of processor vendors, and the increased focus on storage, network and memory, and optimized rack-level architectures. A handful of hyperscale datacenters are leading the way, and dragging the rest of us along. These types of solutions are already being deployed in big enterprise for specialized use cases, and their adoption will only increase with time. In 7 years, the main deployment model will echo what hyperscale datacenters are doing today: disaggregated racks of compute, memory and storage resources.
The datacenter is now being viewed as a profit growth enabler, rather than a cost center. That implies more compute = more revenue. That changes the investment profile and the expectations for IT. It will not be enough for enterprise IT departments to minimize change and risk because then they would be slowing revenue growth.
Customers and vendors
We are in the early stages of a customer revolt. Whether it’s deserved or not is immaterial, though I believe it’s partially deserved. Large customers have decided (and I’m doing broad brush strokes here) that OEMs are charging them too much and adding “features” that add no value and burn power, that the service contracts are excessively expensive and that there is very poor management interoperability among OEM offerings – on purpose to maintain vendor lockin. The cost structures of public cloud platforms like Amazon are proof there is some merit to the argument. Management tools don’t scale well, and require a lot of admin intervention. ISVs are seen as no better. Sure the platforms and apps are valuable and critical, but they’re really expensive too, and in a few cases, open source solutions actually scale better (though ISVs are catching up quickly).
The result? We’re seeing a push to use whitebox solutions that are interoperable and simple. Open source solutions – both software and hardware – are gaining traction in spite of their problems. Just witness the latest Open Compute Summit and the adoption rate of Hadoop and OpenStack. In fact many large enterprises have a policy that’s pretty much – any new application needs to be written for open source platforms on scale-out infrastructure.
Those 3 OEMs are struggling. Dell, HP and IBM are selling more servers, but at a lower revenue. Or in the case of IBM – selling the business. They are trying to upsell storage systems to offset those lost margins, and they are trying to innovate and vertically integrate to compensate for the changes. In contrast we’re seeing a rapid increase planned from self-built, self-architected hyperscale datacenters, especially in China. To be fair – those pressures on price and supplier revenue are not necessarily good for our industry. As well, there are newer entrants like Huawei and Cisco taking a noticeable chunk of the market, as well as an impending growth of ISV and 3rd party full rack “shrink wrapped” systems. Everybody is joining the party.
Storage, cold storage and storage-class memory
Stepping further out on the limb, I believe (but who really knows) that by 2020 storage as we know is no longer shipping. SMB is hollowed out to the cloud – that is – why would any small business use anything but cloud services? The costs are too compelling. Cloud storage is stratified into 3 levels: storage-class memory, flash/NVM and cool/cold bulk disk storage. Cold storage is going to be a very, very important area. You need to save that data, but spend zero power, and zero $ on storing it. Just look at some of the radical ideas like Facebook’s Blu-ray jukebox to address that, which was masterminded by a guy I really like – Gio Coglitore – and I am very glad is getting some rightful attention. (http://www.wired.com/wiredenterprise/2014/02/facebook-robots/)
I believe that pooled storage class memory is inevitable and will disrupt high-performance flash storage, probably beginning in 2016. My processor architect friends and I have been daydreaming about this since 2005. That disruption’s OK, because flash use will continue to grow, even as disk use grows. There is just too much data. I’ve seen one massive vendor’s data showing average servers are adding something like 0.2 hard disks per year and 0.1 SSDs per year – and that’s for the average server including diskless nodes that are usually the most common in hyperscale datacenters. So growth in spite of disruption and capacity growth.
Data will be pooled, and connected by fabric as distributed objects or key/value pairs, with erasure coding. In fact, Object store (key/value – whatever) may have “obsoleted” block storage. And the need for these larger objects will probably also obsolete file as we’re used to it. Sure disk drives may still be block based, though key/value gives rise to all sorts of interesting opportunities to support variable size structures, obscure small fault domains, and variable encryption/compression without wasting space on disk platters. I even suspect that disk drives as we know them will be morphing into cold store specialty products that physically look entirely different and are made from different materials – for a lot of reasons. 15K drives will be history, and 10K drives may too. In fact 2” drives may not make sense anymore as the laptop drive and 15K drive disappear and performance and density are satisfied by flash.
Enterprise becomes private cloud that is very similar structurally to hyperscale, but is simply in an internal facility. And SAN/NAS products as we know them will be starting on the long end of the tail as legacy support products. Sure new network based storage models are about to emerge, but they’re different and more aligned to key/value.
Rack-scale architectures will have taken over clustered deployments. That means pooled resources. Processing will be pools of single socket SoC servers enabling massive clusters, rather than lots of 2- socket servers. These SoCs might even be mobile device SoCs at some point or at least derived from that – the economics of scale and fast cadence of consumer SoCs will make that interesting, maybe even inevitable. After all, the current Apple A7 in the iphone 5S is a dual core, 64-bit V8 ARM at 1.4GHz and the whole iPhone costs as much as mainstream server processor chips. In a few years, an 8 or 16 core equivalent at 1.5GHz or 2GHz is not hard to imagine, and the cost structure should be excellent.
Rapidly evolving open source applications will have morphed into eventually consistent dataflow tasks. Or they will be emerging in-memory applications working on vast data structures in the pooled storage class memory at the rack or larger scale, which will add tremendous monetary value to businesses. Whatever the evolutionary paths – the challenge for the next 10 years is optimizing dataflow as the amount used continues to exponentially grow. After all – data has value in aggregate, so why would you throw anything away, even as the amount we generate increases?
Clusters will be autonomous. Really autonomous. As in a new term I love: “emergent.” It’s when you can start using big data analytics to monitor the datacenter, and make workload/management and data placement decisions in real time, automatically, and the datacenter begins to take on un-predicted characteristics. Deployment will be autonomous too. Power on a pod of resources, and it just starts working. Google does that already.
Layer 2 datacenter network switches will either be disappearing or will have migrated to a radically different location in the rack hierarchy. There are many ways this can evolve. I’m not sure which one(s) will dominate, but I know it will look different. And it will have different bandwidth. 100G moving to 400G interconnect fabric over fiber.
So there you have it. Guaranteed correct…
Different applications and dataflow, different architectures, different processors, different storage, different fabrics. Probably even a re-alignment of vendors.
Predicting the future of the datacenter has not been easy. There have been, and are so many changes happening. The businesses behind them. The scale, the tasks and what is possible to accomplish, the value being monetized, and the architectures and technologies to enable all of these. But at least we have some idea what’s ahead. And it’s pretty different, and exciting.
Tags: 10 gigabit ethernet, 2020, Amazon, Apple, China, Cisco, cloud storage, cold storage, datacenter, Dell, EMC, Facebook, flash, Google, Hadoop, HP, Huawei, hyperscale datacenter, IBM, iPhone, kaleidoscope, Lenovo, NAS, NetApp, non-volatile memory, NVM, Open Compute, OpenStack, rack scale architecture, SAN, SoC, Sun, VMware, YouTube
You might be surprised to find out how big the infrastructure for cloud and Web 2.0 is. It is mind-blowing. Microsoft has acknowledged packing more than 1 million servers into its datacenters, and by some accounts that is fewer than Google’s massive server count but a bit more than Amazon.
Facebook’s server count is said to have skyrocketed from 30,000 in 2012 to 180,000 just this past August, serving 900 million plus users. And the social media giant is even putting its considerable weight behind the Open Compute effort to make servers fit better in a rack and draw less power. The list of mega infrastructures also includes Tencent, Baidu and Alibaba and the roster goes on and on.
Even more jaw-dropping is that almost 99.9% of these hyperscale infrastructures are built with servers featuring direct-attached storage. That’s right – they do the computing and store the data. In other words, no special, dedicated storage gear. Yes, your Facebook photos, your Skydrive personal cloud and all the content you use for entertainment, on-demand video and gaming data are stored inside the server.
Direct-attached storage reigns supreme
Everything in these infrastructures – compute and storage – is built out of x-86 based servers with storage inside. What’s more, growth of direct-attached storage is many folds bigger than any other storage deployments in IT. Rising deployments of cloud, or cloud-like, architectures are behind much of this expansion.
The prevalence of direct-attached storage is not unique to hyperscale deployments. Large IT organizations are looking to reap the rewards of creating similar on-premise infrastructures. The benefits are impressive: Build one kind of infrastructure (server racks), host anything you want (any of your properties), and scale if you need to very easily. TCO is much less than infrastructures relying on network storage or SANs.
With direct-attached you no longer need dedicated appliances for your database tier, your email tier, your analytics tier, your EDA tier. All of that can be hosted on scalable, share-nothing infrastructure. And just as with hyperscale, the storage is all in the server. No SAN storage required.
Open Compute, OpenStack and software-defined storage drive DAS growth
Open Compute is part of the picture. A recent Open Compute show I attended was mostly sponsored by hyperscale customers/suppliers. Many big-bank IT folks attended. Open Compute isn’t the only initiative driving growing deployments of direct-attached storage. So is software-defined storage and OpenStack. Big application vendors such as Oracle, Microsoft, VMware and SAP are also on board, providing solutions that support server-based storage/compute platforms that are easy and cost-effective to deploy, maintain and scale and need no external storage (or SAN including all-flash arrays).
So if you are a network-storage or SAN manufacturer, you have to be doing some serious thinking (many have already) about how you’re going to catch and ride this huge wave of growth.
Tags: Alibaba, Amazon, Baidu, cloud computing, DAS, direct attached storage, enterprise, enterprise IT, Google, hyperscale, Microsoft, Open Compute, OpenStack, Oracle, SAP, Tencent, VMware
You may have noticed I’m interested in Open Compute. What you may not know is I’m also really interested in OpenStack. You’re either wondering what the heck I’m talking about or nodding your head. I think these two movements are co-dependent. Sure they can and will exist independently, but I think the success of each is tied to the other. In other words, I think they are two sides of the same coin.
Why is this on my mind? Well – I’m the lucky guy who gets to moderate a panel at LSI’s AIS conference, with the COO of Open Compute, and the founder of OpenStack. More on that later. First, I guess I should describe my view of the two. The people running these open-source efforts probably have a different view. We’ll find that out during the panel.
I view Open Compute as the very first viable open-source hardware initiative that general business will be able to use. It’s not just about saving money for rack-scale deployments. It’s about having interoperable, multi-source systems that have known, customer-malleable – even completely customized and unique – characteristics including management. It also promises to reduce OpEx costs.
Ready for Prime Time?
But the truth is Open Compute is not ready for prime time yet. Facebook developed almost all the designs for its own use and gifted them to Open Compute, and they are mostly one or two generations old. And somewhere between 2,000 and 10,000 Open Compute servers have shipped. That’s all. But, it’s a start.
More importantly though, it’s still just hardware. There is still a need to deploy and manage the hardware, as well as distribute tasks, and load balance a cluster of Open Compute infrastructure. That’s a very specialized capability, and there really aren’t that many people who can do that. And the hardware is so bare bones – with specialized enclosures, cooling, etc – that it’s pretty hard to deploy small amounts. You really want to deploy at scale – thousands. If you’re deploying a few servers, Open Compute probably isn’t for you for quite some time.
I view OpenStack in a similar way. It’s also not ready for prime time. OpenStack is an orchestration layer for the datacenter. You hear about the “software defined datacenter.” Well, this is it – at least one version. It pools the resources (compute, object and block storage, network, and memory at some time in the future), presents them, allows them to be managed in a semi-automatic way, and automates deployment of tasks on the scaled infrastructure. Sure there are some large-scale deployments. But it’s still pretty tough to deploy at large scale. That’s because it needs to be tuned and tailored to specific hardware. In fact, the biggest datacenters in the world mostly use their own orchestration layer. So that means today OpenStack is really better at smaller deployments, like 50, 100 or 200 server nodes.
The synergy – 2 sides of the same coin
You’ll probably start to see the synergy. Open Compute needs management and deployment. OpenStack prefers known homogenous hardware or else it’s not so easy to deploy. So there is a natural synergy between the two. It’s interesting too that some individuals are working on both… Ultimately, the two Open initiatives will meet in the big, but not-too-big (many hundreds to small thousands of servers) deployments in the next few years.
And then of course there is the complexity of the interaction of for-profit companies and open-source designs and distributions. Companies are trying to add to the open standards. Sometimes to the betterment of standards, but sometimes in irrelevant ways. Several OEMs are jumping in to mature and support OpenStack. And many ODMs are working to make Open Compute more mature. And some companies are trying to accelerate the maturity and adoption of the technologies in pre-configured solutions or appliances. What’s even more interesting are the large customers – guys like Wall Street banks – that are working to make them both useful for deployment at scale. These won’t be the only way scaled systems are deployed, but they’re going to become very common platforms for scale-out or grid infrastructure for utility computing.
Here is how I charted the ecosystem last spring. There’s not a lot of direct interaction between the two, and I know there are a lot of players missing. Frankly, it’s getting crazy complex. There has been an explosion of players, and I’ve run out of space, so I’ve just not gotten around to updating it. (If anyone engaged in these ecosystems wants to update it and send me a copy – I’d be much obliged! Maybe you guys at Nebula ? ;-)).
An AIS keynote panel – What?
Which brings me back to that keynote panel at AIS. Every year LSI has a conference that’s by invitation only (sorry). It’s become a pretty big deal. We have some very high-profile keynotes from industry leaders. There is a fantastic tech showcase of LSI products, partner and ecosystem company’s products, and a good mix of proof of concepts, prototypes and what-if products. And there are a lot of breakout sessions on industry topics, trends and solutions. Last year I personally escorted an IBM fellow, Google VPs, Facebook architects, bank VPs, Amazon execs, flash company execs, several CTOs, some industry analysts, database and transactional company execs…
It’s a great place to meet and interact with peers if you’re involved in the datacenter, network or cellular infrastructure businesses. One of the keynotes is actually a panel of 2. The COO of Open Compute, Cole Crawford, and the co-founder of OpenStack, Chris Kemp (who is also the founder and CSO of Nebula). Both of them are very smart, experienced and articulate, and deeply involved in these movements. It should be a really engaging, interesting keynote panel, and I’m lucky enough to have a front-row seat. I’ll be the moderator, and I’m already working on questions. If there is something specific you would like asked, let me know, and I’ll try to accommodate you.
You can see more here.
Yea – I’m very interested in Open Compute and OpenStack. I think these two movements are co-dependent. And I think they are already changing our industry – even before they are ready for real large-scale deployment. Sure they can and will exist independently, but I think the success of each is tied to the other. The people running these open-source efforts might have a different view. Luckily, we’ll get to find out what they think next month… And I’m lucky enough to have a front row seat.
Optimizing the work per dollar spent is a high priority in datacenters around the world. But there aren’t many ways to accomplish that. I’d argue that integrating flash into the storage system drives the best – sometimes most profound – improvement in the cost of getting work done.
Yea, I know work/$ is a US-centric metric, but replace the $ with your favorite currency. The principle remains the same.
I had the chance to talk with one of the execs who’s responsible for Google’s infrastructure last week. He talked about how his fundamental job was improving performance/$. I asked about that, and he explained “performance” as how much work an application could get done. I asked if work/$ at the application was the same, and he agreed – yes – pretty much.
You remember as a kid that you brought along a big brother as authoritative backup? OK – so my big brother Google and I agree – you should be trying to optimize your work/$. Why? Well – it could be to spend less, or to do more with the same spend, or do things you could never do before, or simply to cope with the non-linear expansion in IT demands even as budgets are shrinking. Hey – that’s the definition of improving work/$… (And as a bonus, if you do it right, you’ll have a positive green impact that is bound to be worth brownie points.)
Here’s the point. Processors are no longer scaling the same – sure, there are more threads, but not all applications can use all those threads. Systems are becoming harder to balance for efficiency. And often storage is the bottleneck. Especially for any application built on a database. So sure – you can get 5% or 10% gain, or even in the extreme 100% gain in application work done by a server if you’re willing to pay enough and upgrade all aspects of the server: processors, memory, network… But it’s almost impossible to increase the work of a server or application by 200%, 300% or 400% – for any money.
I’m going to explain how and why you can do that, and what you get back in work/$. So much back that you’ll probably be spending less and getting more done. And I’m going to explain how even for the risk-averse, you can avoid risk and get the improvements.
More work/$ from general-purpose DAS servers and large databases
Let me start with a customer. It’s a bank, and it likes databases. A lot. And it likes large databases even more. So much so that it needs disks to hold the entire database. Using an early version of an LSI Nytro™ MegaRAID® card, it got 6x the work from the same individual node and database license. You can read that as 600% if you want. It’s big. To be fair – that early version had much more flash than our current products, and was much more expensive. Our current products give much closer to 3x-4x improvement. Again, you can think of that as 300%-400%. Again, slap a Nytro MegaRAID into your server and it’s going to do the work of 3 to 4 servers. I just did a web search and, depending on configuration, Nytro MegaRAIDs are $1,800 to $2,800 online. I don’t know about you, but I would have a hard time buying 2 to 3 configured servers + software licenses for that little, but that’s the net effect of this solution. It’s not about faster (although you get that). It’s about getting more work/$.
But you also want to feel safe – that you’re absolutely minimizing risk. OK. Nytro MegaRAID is a MegaRAID card. That’s overwhelmingly the most common RAID controller in the world, and it’s used by 9 of the top 10 OEMs, and protects 10’s to 100‘s of millions of disks every day. The Nytro version adds private flash caching in the card and stores hot reads and writes there. Writes to the cache use a RAID 1 pair. So if a flash module dies, you’re protected. If the flash blocks or chip die wear out, the bad blocks are removed from the cache pool, and the cache shrinks by that much, but everything keeps operating – it’s not like a normal LUN that can’t change size. What’s more, flash blocks usually finally wear out during the erase cycle – so no data is lost. And as a bonus, you can eliminate the traditional battery most RAID cards use – the embedded flash covers that – so no more annual battery service needed. This is a solution that will continue to improve work/$ for years and years, all the while getting 3x-4x the work from that server.
More work/$ from SAN-attached servers (without actually touching the SAN)
That example was great – but you don’t use DAS systems. Instead, you use a big iron SAN. (OK, not all SANs are big iron, but I like the sound of that expression.) There are a few ways to improve the work from servers attached to SANs. The easiest of course is to upgrade the SAN head, usually with a flash-based cache in the SAN controller. This works, and sometimes is “good enough” to cover needs for a year or two. However, the server still needs to reach across the SAN to access data, and it’s still forced to interact with other servers’ IO streams in deeper queues. That puts a hard limit on the possible gains.
Nytro XD caches hot data in the server. It works with virtual machines. It intercepts storage traffic at the block layer – the same place LSI’s drivers have always been. If the data isn’t hot, and isn’t cached, it simply passes the traffic through to the SAN. I say this so you understand – it doesn’t actually touch the SAN. No risk there. More importantly, the hot storage traffic never has to be squeezed through the SAN fabric, and it doesn’t get queued in the SAN head. In other words, it makes the storage really, really fast.
We’ve typically found work from a server can increase 5x to 10x, and that’s been verified by independent reviewers. What’s more, the Nytro XD solution only costs around 4x the price of a high-end SAN NIC. It’s not cheap, but it’s way cheaper than upgrading your SAN arrays, it’s way cheaper than buying more servers, and it’s proven to enable you to get far more work from your existing infrastructure. When you need to get more work – way more work – from your SAN, this is a really cost-effective approach. Seriously – how else would you get 5x-10x more work from your existing servers and software licenses?
More work/$ from databases
A lot of hyperscale datacenters are built around databases of a finite size. That may be 1, 2 or even 4 TBytes. If you use Apple’s online services for iTunes or iCloud, or if you use Facebook, you’re using this kind of infrastructure.
If your datacenter has a database that can fit within a few TBytes (or less), you can use the same approach. Move the entire LUN into a Nytro WarpDrive® card, and you will get 10x the work from your server and database software. It makes such a difference that some architects argue Facebook and Apple cloud services would never have been possible without this type of solution. I don’t know, but they’re probably right. You can buy a Nytro WarpDrive for as little as a low-end server. I mean low end. But it will give you the work of 10. If you have a fixed-size database, you owe it to yourself to look into this one.
More work/$ from virtualized and VDI (Virtual Desktop) systems
Virtual machines are installed on a lot of servers, for very good reason. They help improve the work/$ in the datacenter by reducing the number of servers needed and thereby reducing management, maintenance and power costs. But what if they could be made even more efficient?
Wall Street banks have benchmarked virtual desktops. They found that Nytro products drive these results: support of 2x the virtual desktops, 33% improvement in boot time during boot storms, and 33% lower cost per virtual desktop. In a more general application mix, Nytro increases work per server 2x-4x. And it also gives 2x performance for virtual storage appliances.
While that’s not as great as 10x the work, it’s still a real work/$ value that’s hard to ignore. And it’s the same reliable MegaRAID infrastructure that’s the backbone of enterprise DAS storage.
A real example from our own datacenter
Finally – a great example of getting far more work/$ was an experiment our CIO Bruce Decock did. We use a lot of servers to fuel our chip-design business. We tape out a lot of very big leading-edge process chips every year. Hundreds. And that takes an unbelievable amount of processing to get what we call “design closure” – that is, a workable chip that will meet performance requirements and yield. We use a tool called PrimeTime that figures out timing for every signal on the chip across different silicon process points and operating conditions. There are 10’s to 100’s of millions of signals. And we run every active design – 10’s to 100’s of chips – each night so we can see how close we’re getting, and we make multiple runs per chip. That’s a lot of computation… The thing is, electronic CAD has been designed to try not to use storage or it will never finish – just /tmp space, but CAD does use huge amounts of memory for the data structures, and that means swap space on the order of TBytes. These CAD tools usually don’t need to run faster. They run overnight and results are ready when the engineers come in the next day. These are impressive machines: 384G or 768G of DRAM and 32 threads. How do you improve work/$ in that situation? What did Bruce do?
He put LSI Nytro WarpDrives in the servers and pointed /tmp at the WarpDrives. Yep. Pretty complex. I don’t think he even had to install new drivers. The drivers are already in the latest OS distributions. Anyway – like I said – complex.
The result? WarpDrive allowed the machines to fully use the CPU and memory with no I/O contention. With WarpDrive, the PrimeTime jobs for static timing closure of a typical design could be done on 15 vs. 40 machines. That’s each Nytro node doing 260% of the work vs. a normal node and license. Remember – those are expensive machines (have you priced 768G of DRAM and do you know how much specialized electronic design CAD licenses are?) So the point wasn’t to execute faster. That’s not necessary. The point is to use fewer servers to do the work. In this case we could do 11 runs per server per night instead of just 4. A single chip design needs more than 150 runs in one night.
To be clear, the Nytro WarpDrives are a lot less expensive than the servers they displace. And the savings go beyond that – less power and cooling. Lower maintenance. Less admin time and overhead. Fewer Licenses. That’s definitely improved work/$ for years to come. Those Nytro cards are part of our standard flow, and they should probably be part of every chip company’s design flow.
So you can improve work/$ no matter the application, no matter your storage model, and no matter how risk-averse you are.
Optimizing the work per dollar spent is a high – maybe the highest – priority in datacenters around the world. And just to be clear – Google agrees with me. There aren’t many ways to accomplish that improvement, and almost no ways to dramatically improve it. I’d argue that integrating flash into the storage system is the best – sometimes most profound – improvement in the cost of getting work done. Not so much the performance, but the actual work done for the money spent. And it ripples through the datacenter, from original CapEx, to licenses, maintenance, admin overhead, power and cooling, and floor space for years. That’s a pretty good deal. You should look into it.
For those of you who are interested, I already wrote about flash in these posts:
What are the driving forces behind going diskless?
LSI is green – no foolin’
Tags: Bruce Decock, DAS, datacenter, direct attached storage, enterprise IT, flash, Google, hyperscale datacenter, Nytro MegaRAID, Nytro WarpDrive, Nytro XD, PrimeTime, RAID, SAN, server storage, storage area network, VDI, virtual desktop infrastructure, work per dollar
I was lucky enough to get together for dinner and beer with old friends a few weeks ago. Between the 4 of us, we’ve been involved in or responsible for a lot of stuff you use every day, or at least know about.
Supercomputers, minicomputers, PCs, Macs, Newton, smart phones, game consoles, automotive engine controllers and safety systems, secure passport chips, DRAM interfaces, netbooks, and a bunch of processor architectures: Alpha, PowerPC, Sparc, MIPS, StrongARM/XScale, x86 64-bit, and a bunch of other ones you haven’t heard of (um – most of those are mine, like TriCore). Basically if you drive a European car, travel internationally, use the Internet , if you play video games, or use a smart phone, well… you’re welcome.
Why do I tell you this? Well – first I’m name dropping – I’m always stunned I can call these guys friends and be their peers. But more importantly, we’ve all been in this industry as architects for about 30 years. Of course our talk went to what’s going on today. And we all agree that we’ve never seen more changes – inflexions – than the raft unfolding right now. Maybe its pressure from the recession, or maybe un-naturally pent up need for change in the ecosystem, but change there is.
Changes in who drives innovation, what’s needed, the companies on top and on bottom at every point in the food chain, who competes with whom, how workloads have changed from compute to dataflow, software has moved to opensource, how abstracted code is now from processor architecture, how individual and enterprise customers have been revolting against the “old” ways, old vendors, old business models, and what the architectures look like, how processors communicate, and how systems are purchased, and what fundamental system architectures look like. But not much besides that…
Ok – so if you’re an architect, that’s as exciting as it gets (you hear it in my voice – right ?), and it makes for a lot of opportunities to innovate and create new or changed businesses. Because innovation is so often at the intersection of changing ways of doing things. We’re at a point where the changes are definitely not done yet. We’re just at the start. (OK – now try to imagine a really animated 4-way conversation over beers at the Britannia Arms in Cupertino… Yea – exciting.)
I’m going to focus on just one sliver of the market – but it’s important to me – and that’s enterprise IT. I think the changes are as much about business models as technology.
Hyperscale datacenters drive innovation
I’ll start in a strange place. Hyperscale datacenters (think social media, search, etc.) and the scale of deployment changes the optimization point. Most of us starting to get comfortable with rack as the new purchase quantum. And some of us are comfortable with the pod or container as the new purchase quantum. But the hyperscale dataenters work more at the datacenter as the quantum. By looking at it that way, they can trade off the cost of power, real estate, bent sheet metal, network bandwidth, disk drives, flash, processor type and quantity, memory amount, where work gets done, and what applications are optimized for. In other words, we shifted from looking at local optima to looking for global optima. I don’t know about you, but when I took operations research in university, I learned there was an unbelievable difference between the two – and global optima was the one you wanted…
Hyperscale datacenters buy enough (top 6 are probably more than 10% of the market today) that 1) they need to determine what they deploy very carefully on their own, and 2) vendors work hard to give them what they need.
That means innovation used to be driven by OEMs, but now it’s driven by hyperscale datacenters and it’s driven hard. That global optimum? It’s work/$ spent. That’s global work, and global spend. It’s OK to spend more, even way more on one thing if over-all you get more done for the $’s you spend.
That’s why the 3 biggest consumers of flash in servers are Facebook, Google, and Apple, with some of the others not far behind. You want stuff, they want to provide it, and flash makes it happen efficiently. So efficiently they can often give that service away for free.
Hyperscale datacenters have started to publish their cost metrics, and open up their architectures (like OpenCompute), and open up their software (like Hadoop and derivatives). More to the point, services like Amazon have put a very clear $ value on services. And it’s shockingly low.
Enterprises are paying attention
Enterprises have looked at those numbers. Hard. That’s catalyzed a customer revolt against the old way of doing things – the old way of buy and billing. OEMs and ISVs are creating lots of value for enterprise, but not that much. They’ve been innovating around “stickiness” and “lock-in” (yea – those really are industry terms) for too long, while hyperscale datacenters have been focused on getting stuff done efficiently. The money they save per unit just means they can deploy more units and provide better services.
That revolt is manifesting itself in 2 ways. The first is seen in the quarterly reports of OEMs and ISVs. Rumors of IBM selling its X-series to Lenovo, Dell going private, Oracle trying to shift business, HP talking of the “new style of IT”… The second is enterprises are looking to emulate hyperscale datacenters as much as possible, and deploy private cloud infrastructure. And often as not, those will be running some of the same open source applications and file systems as the big hyperscale datacenters use.
Where are the hyperscale datacenters leading them? It’s a big list of changes, and they’re all over the place.
But they’re also looking at a few different things. For example, global name space NAS file systems. Personally? I think this one’s a mistake. I like the idea of file systems/object stores, but the network interconnect seems like a bottleneck. Storage traffic is shared with network traffic, creates some network spine bottlenecks, creates consistency performance bottlenecks between the NAS heads, and – let’s face it – people usually skimp on the number of 10GE ports on the server and in the top of rack switch. A typical SAS storage card now has 8 x 12G ports – that’s 96G of bandwidth. Will servers have 10 x 10G ports? Yea. I didn’t think so either.
Anyway – all this is not academic. One Wall Street bank shared with me that – hold your breath – it could save 70% of its spend going this route. It was shocked. I wasn’t shocked, because at first blush this seems absurd – not possible. That’s how I reacted. I laughed. But… The systems are simpler and less costly to make. There is simply less there to make or ship than OEMs force into the machines for uniqueness and “value.” They are purchased from much lower margin manufacturers. They have massively reduced maintenance costs (there’s less to service, and, well, no OEM service contracts). And also important – some of the incredibly expensive software licenses are flipped to open source equivalents. Net savings of 70%. Easy. Stop laughing.
Disaggregation: Or in other words, Pooled Resources
But probably the most important trend from all of this is what server manufacturers are calling “disaggregation” (hey – you’re ripping apart my server!) but architects are more descriptively calling pooled resources.
First – the intent of disaggregation is not to rip the parts of a server to pieces to get lowest pricing on the components. No. If you’re buying by the rack anyway – why not package so you can put like with like. Each part has its own life cycle after all. CPUs are 18 months. DRAM is several years. Flash might be 3 years. Disks can be 5 to 7 years. Networks are 5 to 10 years. Power supplies are… forever? Why not replace each on its own natural failure/upgrade cycle? Why not make enclosures appropriate to the technology they hold? Disk drives need solid vibration-free mechanical enclosures of heavy metal. Processors need strong cooling. Flash wants to run hot. DRAM cool.
Second – pooling allows really efficient use of resources. Systems need slush resources. What happens to a systems that uses 100% of physical memory? It slows down a lot. If a database runs out of storage? It blue screens. If you don’t have enough network bandwidth? The result is, every server is over provisioned for its task. Extra DRAM, extra network bandwidth, extra flash, extra disk drive spindles.. If you have 1,000 nodes you can easily strand TBytes of DRAM, TBytes of flash, a TByte/s of network bandwidth of wasted capacity, and all that always burning power. Worse, if you plan wrong and deploy servers with too little disk or flash or DRAM, there’s not much you can do about it. Now think 10,000 or 100,000 nodes… Ouch.
If you pool those things across 30 to 100 servers, you can allocate as needed to individual servers. Just as importantly, you can configure systems logically, not physically. That means you don’t have to be perfect in planning ahead what configurations and how many of each you’ll need. You have sub-assemblies you slap into a rack, and hook up by configuration scripts, and get efficient resource allocation that can change over time. You need a lot of storage? A little? Higher performance flash? Extra network bandwidth? Just configure them.
That’s a big deal.
And of course, this sets the stage for immense pooled main memory – once the next generation non-volatile memories are ready – probably starting around 2015.
You can’t underestimate the operational problems associated with different platforms at scale. Many hyperscale datacenters today have around 6 platforms. If you think they are rolling out new versions of those before old ones are retired they often have 3 generations of each. That’s 18 distinct platforms, with multiple software revisions of each. That starts to get crazy when you may have 200,000 to 400,000 servers to manage and maintain in a lights out environment. Pooling resources and allocating them in the field goes a huge way to simplifying operations.
Alternate Processor Architecture
It didn’t always used to be Intel x86. There was a time when Intel was an upstart in the server business. It was Power, MIPs, Alpha, SPARC… (and before that IBM mainframes and minis, etc). Each of the changes was brought on by changing the cost structure. Mainframes got displaced by multi-processor RISC, which gave way to x86.
Today, we have Oracle saying they’re getting out of x86 commodity servers and doubling down on SPARC. IBM is selling off its x86 business and doubling down on Power (hey – don’t confuse that with PowerPC – which started as an architectural cut-down of Power – I was there…). And of course there is a rash of 64-bit ARM server SOCs coming – with HP and Dell already dabbling in it. What’s important to realize is that all of these offerings are focusing on the platform architecture, and how applications really perform in total, not just the processor.
Let me warp up with an email thread cut/paste from a smart friend – Wayne Nation. I think he summed up some of what’s going on well, in a sobering way most people don’t even consider.
“Does this remind you of a time, long ago, when the market was exploding with companies that started to make servers out of those cheap little desktop x86 CPUs? What is different this time? Cost reduction and disaggregation? No, cost and disagg are important still, but not new.
A new CPU architecture? No, x86 was “new” before. ARM promises to reduce cost, as did Intel.
Disaggregation enables hyperscale datacenters to leverage vanity-free, but consistent delivery will determine the winning supplier. There is the potential for another Intel to rise from these other companies. “
I’ve been travelling to China quite a bit over the last year or so. I’m sitting in Shenzhen right now (If you know Chinese internet companies, you’ll know who I’m visiting). The growth is staggering. I’ve had a bit of a trains, planes, automobiles experience this trip, and that’s exposed me to parts of China I never would have seen otherwise. Just to accommodate sheer population growth and the modest increase in wealth, there is construction everywhere – a press of people and energy, constant traffic jams, unending urban centers, and most everything is new. Very new. It must be exciting to be part of that explosive growth. What a market. I mean – come on – there are 1.3 billion potential users in China.
The amazing thing for me is the rapid growth of hyperscale datacenters in China, which is truly exponential. Their infrastructure growth has been 200%-300% CAGR for the past few years. It’s also fantastic walking into a building in China, say Baidu, and feeling very much at home – just like you walked into Facebook or Google. It’s the same young vibe, energy, and ambition to change how the world does things. And it’s also the same pleasure – talking to architects who are super-sharp, have few technical prejudices, and have very little vanity – just a will to get to business and solve problems. Polite, but blunt. We’re lucky that they recognize LSI as a leader, and are willing to spend time to listen to our ideas, and to give us theirs.
Even their infrastructure has a similar feel to the US hyperscale datacenters. The same only different. ;-)
A lot of these guys are growing revenue at 50% per year, several getting 50% gross margin. Those are nice numbers in any country. One has $100’s of billions in revenue. And they’re starting to push out of China. So far their pushes into Japan have not gone well, but other countries should be better. They all have unique business models. “We” in the US like to say things like “Alibaba is the Chinese eBay” or “Sina Weibo is the Chinese Twitter”…. But that’s not true – they all have more hybrid business models, unique, and so their datacenter goals, revenue and growth have a slightly different profile. And there are some very cool services that simply are not available elsewhere. (You listening Apple®, Google®, Twitter®, Facebook®?) But they are all expanding their services, products and user base. Interestingly, there is very little public cloud in China. So there are no real equivalents to Amazon’s services or Microsoft’s Azure. I have heard about current development of that kind of model with the government as initial customer. We’ll see how that goes.
100’s of thousands of servers. They’re not the scale of Google, but they sure are the scale of Facebook, Amazon, Microsoft…. It’s a serious market for an outfit like LSI. Really it’s a very similar scale now to the US market. Close to 1 million servers installed among the main 4 players, and exabytes of data (we’ve blown past mere petabytes). Interestingly, they still use many co-location facilities, but that will change. More important – they’re all planning to probably double their infrastructure in the next 1-2 years – they have to – their growth rates are crazy.
Often 5 or 6 distinct platforms, just like the US hyperscale datacenters. Database platforms, storage platforms, analytics platforms, archival platforms, web server platforms…. But they tend to be a little more like a rack of traditional servers that enterprise buys with integrated disk bays, still a lot of 1G Ethernet, and they are still mostly from established OEMs. In fact I just ran into one OEM’s American GM, who I happen to know, in Tencent’s offices today. The typical servers have 12 HDDs in drive bays, though they are starting to look at SSDs as part of the storage platform. They do use PCIe® flash cards in some platforms, but the performance requirements are not as extreme as you might imagine. Reasonably low latency and consistent latency are the premium they are looking for from these flash cards – not maximum IOPs or bandwidth – very similar to their American counterparts. I think hyperscale datacenters are sophisticated in understanding what they need from flash, and not requiring more than that. Enterprise could learn a thing or two.
Some server platforms have RAIDed HDDs, but most are direct map drives using a high availability (HA) layer across the server center – Hadoop® HDFS or self-developed Hadoop like platforms. Some have also started to deploy microserver archival “bit buckets.” A small ARM® SoC with 4 HDDs totaling 12 TBytes of storage, giving densities like 72 TBytes of file storage in 2U of rack. While I can only find about 5,000 of those in China that are the first generation experiments, it’s the first of a growing wave of archival solutions based on lower performance ARM servers. The feedback is clear – they’re not perfect yet, but the writing is on the wall. (If you’re wondering about the math, that’s 5,000 x 12 TBytes = 60 Petabytes….)
Yes, it’s important, but maybe more than we’re used to. It’s harder to get licenses for power in China. So it’s really important to stay within the envelope of power your datacenter has. You simply can’t get more. That means they have to deploy solutions that do more in the same power profile, especially as they move out of co-located datacenters into private ones. Annually, 50% more users supported, more storage capacity, more performance, more services, all in the same power. That’s not so easy. I would expect solar power in their future, just as Apple has done.
Here’s where it gets interesting. They are developing a cousin to OpenCompute that’s called Scorpio. It’s Tencent, Alibaba, Baidu, and China Telecom so far driving the standard. The goals are similar to OpenCompute, but more aligned to standardized sub-systems that can be co-mingled from multiple vendors. There is some harmonization and coordination between OpenCompute and Scorpio, and in fact the Scorpio companies are members of OpenCompute. But where OpenCompute is trying to change the complete architecture of scale-out clusters, Scorpio is much more pragmatic – some would say less ambitious. They’ve finished version 1 and rolled out about 200 racks as a “test case” to learn from. Baidu was the guinea pig. That’s around 6,000 servers. They weren’t expecting more from version 1. They’re trying to learn. They’ve made mistakes, learned a lot, and are working on version 2.
Even if it’s not exciting, it will have an impact because of the sheer size of deployments these guys are getting ready to roll out in the next few years. They see the progression as 1) they were using standard equipment, 2) they’re experimenting and learning from trial runs of Scorpio versions 1 and 2, and then they’ll work on 3) new architectures that are efficient and powerful, and different.
Information is pretty sketchy if you are not one of the member companies or one of their direct vendors. We were just invited to join Scorpio by one of the founders, and would be the first group outside of China to do so. If that all works out, I’ll have a much better idea of the details, and hopefully can influence the standards to be better for these hyperscale datacenter applications. Between OpenCompute and Scorpio we’ll be seeing a major shift in the industry – a shift that will undoubtedly be disturbing to a lot of current players. It makes me nervous, even though I’m excited about it. One thing is sure – just as the server market volume is migrating from traditional enterprise to hyperscale datacenter (25-30% of the server market and growing quickly), we’re starting to see a migration to Chinese hyperscale datacenters from US-based ones. They have to grow just to stay still. I mean – come on – there are 1.3 billion potential users in China….
Tags: Alibaba, Amazon, Apple, ARM, Baidu, China, China Telecom, datacenter, Facebook, Google, Hadoop, hard disk drive, HDD, hyperscale, Microsoft, OpenCompute, Scorpio, Shenzhen, Sina Weibo, solid state drive, SSD, Tencent, Twitter