Open Compute and OpenStack are changing the datacenter world that we know and love. I thought they were having impact. Changing our OEMs and ODM products, changing what we expect from our vendors, changing the interoperability of managing infrastructure from different vendors. Changing our ability to deploy and manage grid and scale-out infrastructure. And changing how quickly and at what high level we can be innovating. I was wrong. Itâ€™s happening much more quickly than I thought.
On November 20-21 we hosted LSI AIS 2013. As I mentioned in a previous post, I was lucky enough to moderate a panel about Open Compute and OpenStack â€“ â€śthe perfect storm.â€ť Truthfully? It felt more like sitting with two friends talking about our industry over beer. I hope to pick up that conversation again someday.
The panelists were awesome: Cole Crawford of Open Compute and Chris Kemp of OpenStack. These guys are not only influential. They have been involved from the very start of these two initiatives, and are in many ways key drivers of both movements. These are impressive, passionate guys who really are changing the world. There arenâ€™t too many of us who can claim that. It was an engaging hour that I learned quite a bit from, and I think the audience did too. I wanted to share from my notes what I took away from that panel. I think youâ€™ll be interested.
Â Goals and Vision: two â€śopen sourceâ€ť initiatives
There were a few motivations behind Open Compute, and the goal was to improve these things.
The goal then, for the first time, is to work backwards from workload and create open source hardware and infrastructure that is openly available and designed from the start for large scale-out deployments. The idea is to drive high efficiency in cost, materials use and energy consumption. More work/$.
One surprising thing that came up â€“ LSI is in every current contribution in Open Compute.
OpenStack layers services that describe abstractions of computer networking and storage. LSI products tend to sit at that lowest level of abstraction, where there is now a wave of innovation. OpenStack had similar fragmentation issues to deal with and its goals are something like:
There is a certain amount of compatibility with Amazonâ€™s cloud services. Chrisâ€™s point was that Amazon is incredibly innovative and a lot of enterprises should use it, but OpenStack enables both service providers and private clouds to compete with Amazon, and it allows unique innovation to evolve on top of it.
OpenStack and Open Compute are not products. They are â€śstandardsâ€ť or platform architectures, with companies using those standards to innovate on top of them. The idea is for one company to innovate on anotherâ€™s improvements â€“ everybody building on each otherâ€™s work. A huge brain trust. The goal is to create a competitive ecosystem and enable a rapid pace of innovation, and enable large-scale, inexpensive infrastructure that can be managed by a small team of people, and can be managed like a single server to solve massive scale problems.
Hereâ€™s their thought. Hardware is a supply chain management game + services.Â Open Compute is an opportunity for anyone to supply that infrastructure. And today, OEMs are killer at that. But maybe ODMs can be too. Open Compute allows innovation on top of the basic interoperable platforms. OpenStack enables a framework for innovation on top as well: security, reliability, storage, network, performance. It becomes the enabler for innovation, and it provides an â€śeasyâ€ť way for startups to plug into a large, vibrant ecosystem. And for customers â€“ someone said its â€śexa data without exadollarâ€ťâ€¦
As a result, the argument is this shouldÂ be good for OEMs and ISVs, and help create a more innovative ecosystem and should also enable more infrastructure capacity to create new and better services. Iâ€™m not convinced that will happen yet, but itâ€™s a laudable goal, and frankly that promise is part of what is appealing to LSI.
Open Compute and OpenStack are â€śpeanut butter and jellyâ€ť
Ok â€“ if youâ€™re outside of the US, that may not mean much to you. But if youâ€™ve lived in the US, you know that means they fit perfectly, and make something much greater together than their humble selves.
Graham Weston, Chairman of the Rackspace Board, was the one who called these two â€śpeanut butter and jelly.â€ť
Cole and Chris both felt the initiatives are co-enabling, and probably co-travelers too. Sure they can and will deploy independently, but OpenStack enables the management of large scale clusters, which really is not easy. Open Compute enables lower cost large-scale manageable clusters to be deployed. Together? Large-scale clusters that can be installed and deployed more affordably, and easily without hiring a cadre of rare experts.
Personally? I still think they are both a bit short of being ready for â€śprime timeâ€ť â€“ or broad deployment, but Cole and Chris gave me really valid arguments to show me Iâ€™m wrong. I guess weâ€™ll see.
US or global vision?
I asked if these are US-centric or global visions. There were no qualms â€“ these are global visions. This is just the 3rd anniversary of OpenStack, but even so, there are OpenStack organizations in more than 100 countries, 750 active contributors, and large-scale deployments in datacenters that you probably use every day â€“ especially in China and the US. Companies like PayPal and Yahoo, Rackspace, Baidu, Sina Weibo, Alibaba, JD, and government agencies and HPC clusters like CERN, NASA, and China Defense.
Open Compute is even younger â€“ about 2 years old. (I remember â€“ I was invited to the launch). Even so, most of Facebookâ€™s infrastructure runs on Open Compute. Two Wall Street banks have deployed large clusters, with more coming, and Riot Games, which uses Open Compute infrastructure, drives 3% of the global network traffic with League of Legends. (A complete aside â€“ one of my favorite bands to workout with did a lot of that gameâ€™s music, and the live music at the League of Legends competition a few months ago: http://www.youtube.com/watch?v=mWU4QvC09uM â€“ not for everyone, but I like it.)
Both Cole and Chris emailed me more data after the fact on who is using these initiatives. I have to say â€“ they are right. It really has taken off globally, especially OpenStack in the fast-paced Chinese market this year.
Book: 4th Paradigm – A tribute to computer science researcher Jim Grey
Cole and Chris mentioned a book during the panel discussion. A book I had frankly never heard of. Itâ€™s called the 4th Paradigm. It was a series of papers dedicated to researcher Jim Grey, who was a quiet but towering figure that I believe I met once at Microsoft Research. The book was put together by Gordon Bell, someone who I have met, and have profound respect for. And there are mentions of people, places, and things that have been woven through my (long) career. I think I would sum up its thesis in a quote from Jim Grey near the start of the book:
“We have to do better producing tools to support the whole research cycle â€“ from data capture and data curation to data analysis and data visualization.”
This is stunningly similar to the very useful big data framework we have been using recently at LSI: â€ťcapture, hold, analyze”… I guess we should have added visualize, but that doesnâ€™t have too much to do with LSIâ€™s business.
As an aside, I would recommend this book for the background and inspiration in why we as an industry are trying to solve many of these computer science problems, and how transformational the impact might be. I mean really transformational in the world around us, what we know, what we can do, and how quickly we can do it â€“ which is tightly related to our CEOâ€™s keynote and the vision video at AIS.
Demos at AIS: â€śpeanut butter and jellyâ€ť - and bread?
Ok â€“ Iâ€™m struggling for analogy. We had an awesome demo at AIS that Chris and Cole pointed out during the panel. It was originally built using Nebulaâ€™s TOR appliance, Open Compute hardware, and LSIâ€™s storage magic to make it complete. The three pieces coming together. Tasty. The Open Compute hardware was swapped out last minute (for safety, those boxes were meant for the datacenter â€“ not the showcase in a hotel with tipsy techies) and wereÂ generously supplied by Supermicro.
I donâ€™t think the proto was close to any one of our visions, but even as it stood, it inspireda lot of people, and would make a great product. A short rack of servers, with pooled storage in the rack, OpenStack orchestrating the point and click spawning and tear down of dynamically sized LUNs of different characteristics under the Cinder presentation layer, and deployment of tasks or VMs on them.
Weâ€™re working on completing our joint vision. I think the industry will be very impressed when they see it. Chris thinks people will be stunned, and the industry will be changed.
Catalyzing the marketâ€¦ The future may be closer than we thinkâ€¦
Ultimately, this is all about economics. Weâ€™re in the middle of an unprecedented bifurcation in IT use. On one hand weâ€™re running existing apps on new, dense enterprise hardware using VMs to layer many applications on few servers. On the other, weâ€™re investing in applications to run at scale across inexpensive clusters of commodity hardware. This has spawned a split in IT vendor business units, product lines and offerings, and sometimes even IT infrastructure management in the datacenter.
New applications and services are needing more infrastructure, and are getting more expensive to power, cool, purchase, run. And there is pressure to transform the datacenter from a cost center into a profit center. As these innovations start, more companies will need scale infrastructure, arguably Open Compute, and then will need an Openstack framework to deploy it quickly.
Whats this mean? With a combination of big data and mobile device services driving economic value, we may be at the point where these clusters start to become mainstream. As an industry weâ€™re already seeing a slight decline in traditional IT equipment sales and a rapid growth in scale-out infrastructure sales. If that continues, then OpenStack and Open Compute are a natural fit. The deployment rate uptick in life sciences, oil and gas, financials this year â€“ really anywhere there is large-scale Hadoop, big data or analytics â€“ may be the start of that growth curve. But both Chris and Cole felt it would probably take 5 years to truly take off.
Time to Wrap Up
I asked Chris and Cole for audience takeaways. Theirs were pretty simple, though possibly controversial in an industry like ours.
Hardware vendors should think about products and how they interface and what abstractions they present and how they fit into the ecosystem. These new ecosystems should allow them to easily plug in. For example, storage under Cinder can be quickly and easily morphed â€“ thatâ€™s what we did with our demo.
We should be designing new software to run on distributed scale-out systems in clouds. Chris went on to say their code name was â€śMaestroâ€ť because it orchestrates like in a symphony, bringing things together in a beautiful way. He said â€śmake instruments for the artists out there.â€ť The brain trust. Look for their brushstrokes.
Innovate in the open, and leverage the open initiatives that are available to accelerate innovation and efficiency.
On your next IT purchase, try an RFP with an Open Compute vendor. Cole said you might be surprised. Worst case, you may get a better deal from your existing vendor.
So, Open Compute and Openstack are changing the datacenter world that we know and love. I thought these were having a quick impact, changing our OEMs and ODM products, changing what we expect from our vendors, changing the interoperability of managing infrastructure from different vendors, changing our ability toÂ deploy and manage grid and scale-out infrastructure, and changing how quickly and at what high level we can be innovating. I was wrong. Itâ€™s happening much more quickly than even I thought.
Tags: AIS, Alibaba, Amazon, Baidu, big data, CERN, China, China Defense, Chris Kemp, Cole Crawford, datacenter, Facebook, Hadoop, HPC, IT infrastructure, JD, Jim Grey, NASA, Nebula, Networking, Open Compute, OpenStack, PayPal, Rackspace, Riot Games, scale-out cluster, Sina Weibo, Storage, Supermicro, Yahoo
You may have noticed Iâ€™m interested in Open Compute. What you may not know is Iâ€™m also really interested in OpenStack. Youâ€™re either wondering what the heck Iâ€™m talking about or nodding your head. I think these two movements are co-dependent. Sure they can and will exist independently, but I think the success of each is tied to the other. Â In other words, I think they are two sides of the same coin.
Why is this on my mind? Well â€“ Iâ€™m the lucky guy who gets to moderate a panel at LSIâ€™s AIS conference, with the COO of Open Compute, and the founder of OpenStack. More on that later. First, I guess I should describe my view of the two. The people running these open-source efforts probably have a different view. Weâ€™ll find that out during the panel.
I view Open Compute as the very first viable open-source hardware initiative that general business will be able to use. Itâ€™s not just about saving money for rack-scale deployments. Itâ€™s about having interoperable, multi-source systems that have known, customer-malleable â€“ even completely customized and unique â€“ characteristics including management. Â It also promises to reduce OpEx costs.
Ready for Prime Time?
But the truth is Open Compute is not ready for prime time yet. Facebook developed almost all the designs for its own use and gifted them to Open Compute, and they are mostly one or two generations old. And somewhere between 2,000 and 10,000 Open Compute servers have shipped. Thatâ€™s all. But, itâ€™s a start.
More importantly though, itâ€™s still just hardware. There is still a need to deploy and manage the hardware, as well as distribute tasks, and load balance a cluster of Open Compute infrastructure. Thatâ€™s a very specialized capability, and there really arenâ€™t that many people who can do that. And the hardware is so bare bones â€“ with specialized enclosures, cooling, etc â€“ that itâ€™s pretty hard to deploy small amounts. You really want to deploy at scale â€“ thousands. If youâ€™re deploying a few servers, Open Compute probably isnâ€™t for you for quite some time.
I view OpenStack in a similar way. Itâ€™s also not ready for prime time. OpenStack is an orchestration layer for the datacenter. You hear about the â€śsoftware defined datacenter.â€ť Well, this is it â€“ at least one version. It pools the resources (compute, object and block storage, network, and memory at some time in the future), presents them, allows them to be managed in a semi-automatic way, and automates deployment of tasks on the scaled infrastructure. Sure there are some large-scale deployments. But itâ€™s still pretty tough to deploy at large scale. Thatâ€™s because it needs to be tuned and tailored to specific hardware. In fact, the biggest datacenters in the world mostly use their own orchestration layer.Â So that means today OpenStack is really better at smaller deployments, like 50, 100 or 200 server nodes.
The synergy â€“ 2 sides of the same coin
Youâ€™ll probably start to see the synergy. Open Compute needs management and deployment. OpenStack prefers known homogenous hardware or else itâ€™s not so easy to deploy. So there is a natural synergy between the two. Itâ€™s interesting too that some individuals are working on bothâ€¦ Ultimately, the two Open initiatives will meet in the big, but not-too-big (many hundreds to small thousands of servers) deployments in the next few years.
And then of course there is the complexity of the interaction of for-profit companies and open-source designs and distributions. Companies are trying to add to the open standards. Sometimes to the betterment of standards, but sometimes in irrelevant ways. Several OEMs are jumping in to mature and support OpenStack. And many ODMs are working to make Open Compute more mature. And some companies are trying to accelerate the maturity and adoption of the technologies in pre-configured solutions or appliances. Whatâ€™s even more interesting are the large customers â€“ guys like Wall Street banks â€“ that are working to make them both useful for deployment at scale. These wonâ€™t be the only way scaled systems are deployed, but theyâ€™re going to become very common platforms for scale-out or grid infrastructure for utility computing.
Here is how I charted the ecosystem last spring. Thereâ€™s not a lot of direct interaction between the two, and I know there are a lot of players missing. Frankly, itâ€™s getting crazy complex. There has been an explosion of players, and Iâ€™ve run out of space, soÂ Iâ€™ve just not gotten around to updating it. (If anyone engaged in these ecosystems wants to update it and send me a copy â€“ Iâ€™d be much obliged! Maybe you guys at Nebula ? ).
An AIS keynote panel â€“ What?
Which brings me back to that keynote panel at AIS. Every year LSI has a conference thatâ€™s by invitation only (sorry). Itâ€™s become a pretty big deal. We have some very high-profile keynotes from industry leaders. There is a fantastic tech showcase of LSI products, partner and ecosystem companyâ€™s products, and a good mix of proof of concepts, prototypes and what-if products. And there are a lot of breakout sessions on industry topics, trends and solutions. Last year I personally escorted an IBM fellow, Google VPs, Facebook architects, bank VPs, Amazon execs, flash company execs, several CTOs, some industry analysts, database and transactional company execsâ€¦
Itâ€™s a great place to meet and interact with peers if youâ€™re involved in the datacenter, network or cellular infrastructure businesses.Â One of the keynotes is actually a panel of 2. The COO of Open Compute, Cole Crawford, and the co-founder of OpenStack, Chris Kemp (who is also the founder and CSO of Nebula). Both of them are very smart, experienced and articulate, and deeply involved in these movements. It should be a really engaging, interesting keynote panel, and Iâ€™m lucky enough to have a front-row seat. Iâ€™ll be the moderator, and Iâ€™m already working on questions. If there is something specific you would like asked, let me know, and Iâ€™ll try to accommodate you.
You can see more here.
Yea – Iâ€™m very interested in Open Compute and OpenStack. I think these two movements are co-dependent. And I think they are already changing our industry â€“ even before they are ready for real large-scale deployment. Â Sure they can and will exist independently, but I think the success of each is tied to the other. The people running these open-source efforts might have a different view. Luckily, weâ€™ll get to find out what they think next monthâ€¦ And Iâ€™m lucky enough to have a front row seat.
I was lucky enough to get together for dinner and beer with old friends a few weeks ago. Between the 4 of us, weâ€™ve been involved in or responsible for a lot of stuff you use every day, or at least know about.
Supercomputers, minicomputers, PCs, Macs, Newton, smart phones, game consoles, automotive engine controllers and safety systems, secure passport chips, DRAM interfaces, netbooks, and a bunch of processor architectures: Alpha, PowerPC, Sparc, MIPS, StrongARM/XScale, x86 64-bit, and a bunch of other ones you haven’t heard of (um – most of those are mine, like TriCore). Basically if you drive a European car, travel internationally, use the Internet , if you play video games, or use a smart phone, wellâ€¦Â youâ€™re welcome.
Why do I tell you this? Well – first I’m name dropping – I’m always stunned I can call these guys friends and be their peers. But more importantly, we’ve all been in this industry as architects for about 30 years. Of course our talk went to whatâ€™s going on today. And we all agree that we’ve never seen more changes – inflexions – than the raft unfolding right now. Maybe its pressure from the recession, or maybe un-naturally pent up need for change in the ecosystem, but change there is.
Changes in who drives innovation, whatâ€™s needed, the companies on top and on bottom at every point in the food chain, who competes with whom, how workloads have changed from compute to dataflow, software has moved to opensource, how abstracted code is now from processor architecture, how individual and enterprise customers have been revolting against the “old” ways, old vendors, old business models, and what the architectures look like, how processors communicate, and how systems are purchased, and what fundamental system architectures look like. But not much besides that…
Ok – so if you’re an architect, thatâ€™s as exciting as it gets (you hear it in my voice â€“ right ?), and it makes for a lot of opportunities to innovate and create new or changed businesses. Because innovation is so often at the intersection of changing ways of doing things. We’re at a point where the changes are definitely not done yet. We’re just at the start. (OK â€“ now try to imagine a really animated 4-way conversation over beers at the Britannia Arms in Cupertinoâ€¦ Yea â€“ exciting.)
Iâ€™m going to focus on just one sliver of the market â€“ but itâ€™s important to me â€“ and thatâ€™s enterprise IT. Â I think the changes are as much about business models as technology.
Iâ€™ll start in a strange place.Â Hyperscale datacenters (think social media, search, etc.) and the scale of deployment changes the optimization point. Most of us starting to get comfortable with rack as the new purchase quantum. And some of us are comfortable with the pod or container as the new purchase quantum. But theÂ hyperscale dataenters work more at the datacenter as the quantum. By looking at it that way, they can trade off the cost of power, real estate, bent sheet metal, network bandwidth, disk drives, flash, processor type and quantity, memory amount, where work gets done, and what applications are optimized for. In other words, we shifted from looking at local optima to looking for global optima. I donâ€™t know about you, but when I took operations research in university, I learned there was an unbelievable difference between the two â€“ and global optima was the one you wantedâ€¦
Hyperscale datacenters buy enough (top 6 are probably more than 10% of the market today) that 1) they need to determine what they deploy very carefully on their own, and 2) vendors work hard to give them what they need.
That means innovation used to be driven by OEMs, but now itâ€™s driven by hyperscale datacenters andÂ itâ€™s driven hard. That global optimum? Itâ€™s work/$ spent. Thatâ€™s global work, and global spend. Itâ€™s OK to spend more, even way more on one thing if over-all you get more done for the $â€™s you spend.
Thatâ€™s why the 3 biggest consumers of flash in servers are Facebook, Google, and Apple, with some of the others not far behind. You want stuff, they want to provide it, and flash makes it happen efficiently. So efficiently they can often give that service away for free.
Hyperscale datacenters have started to publish their cost metrics, and open up their architectures (like OpenCompute), and open up their software (like Hadoop and derivatives). More to the point, services like Amazon have put a very clear $ value on services. And itâ€™s shockingly low.
Enterprises have looked at those numbers. Hard. Thatâ€™s catalyzed a customer revolt against the old way of doing things â€“ the old way of buy and billing. OEMs and ISVs are creating lots of value for enterprise, but not that much. They’ve been innovating around â€śstickinessâ€ť and â€ślock-inâ€ť (yea â€“ those really are industry terms) for too long, while hyperscale datacenters have been focused on getting stuff done efficiently. The money they save per unit just means they can deploy more units and provide better services.
That revolt is manifesting itself in 2 ways. The first is seen in the quarterly reports of OEMs and ISVs. Rumors of IBM selling its X-series to Lenovo, Dell going private, Oracle trying to shift business, HP talking of the â€śnew style of ITâ€ťâ€¦ The second is enterprises are looking to emulate hyperscale datacenters as much as possible, and deploy private cloud infrastructure. And often as not, those will be running some of the same open source applications and file systems as the big hyperscale datacenters use.
Where are the hyperscale datacenters leading them? Itâ€™s a big list of changes, and theyâ€™re all over the place.
But theyâ€™re also looking at a few different things. For example, global name space NAS file systems. Personally? I think this oneâ€™s a mistake. I like the idea of file systems/object stores, but the network interconnect seems like a bottleneck. Storage traffic is shared with network traffic, creates some network spine bottlenecks, creates consistency performance bottlenecks between the NAS heads, and â€“ letâ€™s face it â€“ people usually skimp on the number of 10GE ports on the server and in the top of rack switch. A typical SAS storage card now has 8 x 12G ports â€“ thatâ€™s 96G of bandwidth. Will servers have 10 x 10G ports? Yea. I didnâ€™t think so either.
Anyway â€“ all this is not academic. One Wall Street bank shared with me that â€“ hold your breath â€“ it could save 70% of its spend going this route. It was shocked. I wasnâ€™t shocked, because at first blush this seems absurd â€“ not possible. Thatâ€™s how I reacted. I laughed. Butâ€¦ The systems are simpler and less costly to make. There is simply less there to make or ship than OEMs force into the machines for uniqueness and â€śvalue.â€ť They are purchased from much lower margin manufacturers. They have massively reduced maintenance costs (thereâ€™s less to service, and, well, no OEM service contracts). And also important â€“ some of the incredibly expensive software licenses are flipped to open source equivalents. Net savings of 70%. Easy. Stop laughing.
Disaggregation: Or in other words, Pooled Resources
But probably the most important trend from all of this is what server manufacturers are calling â€śdisaggregationâ€ť (hey â€“ youâ€™re ripping apart my server!) but architects are more descriptively calling pooled resources.
First â€“ the intent of disaggregation is not to rip the parts of a server to pieces to get lowest pricing on the components. No. If youâ€™re buying by the rack anyway â€“ why not package so you can put like with like. Each part has its own life cycle after all. CPUs are 18 months. DRAM is several years. Flash might be 3 years. Disks can be 5 to 7 years. Networks are 5 to 10 years. Power supplies areâ€¦ forever? Why not replace each on its own natural failure/upgrade cycle? Why not make enclosures appropriate to the technology they hold? Disk drives need solid vibration-free mechanical enclosures of heavy metal. Processors need strong cooling. Flash wants to run hot. DRAM cool.
Second â€“ pooling allows really efficient use of resources. Systems need slush resources. What happens to a systems that uses 100% of physical memory? It slows down a lot. If a database runs out of storage? It blue screens. If you donâ€™t have enough network bandwidth? The result is, every server is over provisioned for its task. Extra DRAM, extra network bandwidth, extra flash, extra disk drive spindles.. If you have 1,000 nodes you can easily strand TBytes of DRAM, TBytes of flash, a TByte/s of network bandwidth of wasted capacity, and all that always burning power. Worse, if you plan wrong and deploy servers with too little disk or flash or DRAM, thereâ€™s not much you can do about it. Now think 10,000 or 100,000 nodesâ€¦ Ouch.
If you pool those things across 30 to 100 servers, you can allocate as needed to individual servers. Just as importantly, you can configure systems logically, not physically. That means you donâ€™t have to be perfect in planning ahead what configurations and how many of each youâ€™ll need. You have sub-assemblies you slap into a rack, and hook up by configuration scripts, and get efficient resource allocation that can change over time. You need a lot of storage? A little? Higher performance flash? Extra network bandwidth? Just configure them.
Thatâ€™s a big deal.
And of course, this sets the stage for immense pooled main memory â€“ once the next generation non-volatile memories are ready â€“ probably starting around 2015.
You canâ€™t underestimate the operational problems associated with different platforms at scale. Many hyperscale datacenters today have around 6 platforms. If you think they are rolling out new versions of those before old ones are retired they often have 3 generations of each. Thatâ€™s 18 distinct platforms, with multiple software revisions of each. That starts to get crazy when you may have 200,000 to 400,000 servers to manage and maintain in a lights out environment. Pooling resources and allocating them in the field goes a huge way to simplifying operations.
Alternate Processor Architecture
It didnâ€™t always used to be Intel x86. There was a time when Intel was an upstart in the server business. It was Power, MIPs, Alpha, SPARCâ€¦ (and before that IBM mainframes and minis, etc). Each of the changes was brought on by changing the cost structure. Mainframes got displaced by multi-processor RISC, which gave way to x86.
Today, we have Oracle saying theyâ€™re getting out of x86 commodity servers and doubling down on SPARC. IBM is selling off its x86 business and doubling down on Power (hey â€“ donâ€™t confuse that with PowerPC â€“ which started as an architectural cut-down of Power â€“ I was thereâ€¦). And of course there is a rash of 64-bit ARM server SOCs coming â€“ with HP and Dell already dabbling in it. Whatâ€™s important to realize is that all of these offerings are focusing on the platform architecture, and how applications really perform in total, not just the processor.
Let me warp up with an email thread cut/paste from a smart friend â€“ Wayne Nation. I think he summed up some of whatâ€™s going on well, in a sobering way most people donâ€™t even consider.
â€śDoes this remind you of a time, long ago, when the market was exploding with companies that started to make servers out of those cheap little desktop x86 CPUs? What is different this time? Cost reduction and disaggregation? No, cost and disagg are important still, but not new.
A new CPU architecture? No, x86 was “new” before. ARM promises to reduce cost, as did Intel.
Disaggregation enables hyperscale datacenters to leverage vanity-free, but consistent delivery will determine the winning supplier. There is the potential for another Intel to rise from these other companies. â€ś
Iâ€™ve been travelling to China quite a bit over the last year or so. Iâ€™m sitting in Shenzhen right now (If you know Chinese internet companies, youâ€™ll know who Iâ€™m visiting). The growth is staggering. Iâ€™ve had a bit of a trains, planes, automobiles experience this trip, and thatâ€™s exposed me to parts of China I never would have seen otherwise. Just to accommodate sheer population growth and the modest increase in wealth, there is construction everywhere â€“ a press of people and energy, constant traffic jams, unending urban centers, and most everything is new. Very new. It must be exciting to be part of that explosive growth. What a market. Â I mean â€“ come on â€“ there are 1.3 billion potential users in China.
The amazing thing for me is the rapid growth ofÂ hyperscale datacenters in China, which is truly exponential. Their infrastructure growth has been 200%-300% CAGR for the past few years. Itâ€™s also fantastic walking into a building in China, say Baidu, and feeling very much at home â€“ just like you walked into Facebook or Google. Itâ€™s the same young vibe, energy, and ambition to change how the world does things. And itâ€™s also the same pleasure â€“ talking to architects who are super-sharp, have few technical prejudices, and have very little vanity â€“ just a will to get to business and solve problems. Polite, but blunt. Weâ€™re lucky that they recognize LSI as a leader, and are willing to spend time to listen to our ideas, and to give us theirs.
Even their infrastructure has a similar feel to the USÂ hyperscale datacenters. The same only different. Â ;-)
A lot of these guys are growing revenue at 50% per year, several getting 50% gross margin. Those are nice numbers in any country. One has $100â€™s of billions in revenue. Â And theyâ€™re starting to push out of China. Â So far their pushes into Japan have not gone well, but other countries should be better. They all have unique business models. â€śWeâ€ť in the US like to say things like â€śAlibaba is the Chinese eBayâ€ť or â€śSina Weibo is the Chinese Twitterâ€ťâ€¦. But thatâ€™s not true â€“ they all have more hybrid business models, unique, and so their datacenter goals, revenue and growth have a slightly different profile. And there are some very cool services that simply are not available elsewhere. (You listening AppleÂ®, GoogleÂ®, TwitterÂ®, FacebookÂ®?) But they are all expanding their services, products and user base.Â Interestingly, there is very little public cloud in China. So there are no real equivalents to Amazonâ€™s services or Microsoftâ€™s Azure. I have heard about current development of that kind of model with the government as initial customer. Weâ€™ll see how that goes.
100â€™s of thousands of servers. Theyâ€™re not the scale of Google, but they sure are the scale of Facebook, Amazon, Microsoftâ€¦. Itâ€™s a serious market for an outfit like LSI. Really itâ€™s a very similar scale now to the US market. Close to 1 million servers installed among the main 4 players, and exabytes of data (weâ€™ve blown past mere petabytes). Interestingly, they still use many co-location facilities, but that will change. More important â€“ theyâ€™re all planning to probably double their infrastructure in the next 1-2 years â€“ they have to â€“ their growth rates are crazy.
Often 5 or 6 distinct platforms, just like the USÂ hyperscale datacenters. Database platforms, storage platforms, analytics platforms, archival platforms, web server platformsâ€¦. But they tend to be a little more like a rack of traditional servers that enterprise buys with integrated disk bays, still a lot of 1G Ethernet, and they are still mostly from established OEMs. In fact I just ran into one OEMâ€™s American GM, who I happen to know, in Tencentâ€™s offices today. The typical servers have 12 HDDs in drive bays, though they are starting to look at SSDs as part of the storage platform. They do use PCIeÂ® flash cards in some platforms, but the performance requirements are not as extreme as you might imagine. Reasonably low latency and consistent latency are the premium they are looking for from these flash cards â€“ not maximum IOPs or bandwidth â€“ very similar to their American counterparts. I thinkÂ hyperscale datacenters are sophisticated in understanding what they need from flash, and not requiring more than that. Enterprise could learn a thing or two.
Some server platforms have RAIDed HDDs, but most are direct map drives using a high availability (HA) layer across the server center â€“ HadoopÂ® HDFS or self-developed Hadoop like platforms. Some have also started to deploy microserver archival â€śbit buckets.â€ť A small ARMÂ® SoC with 4 HDDs totaling 12 TBytes of storage, giving densities like 72 TBytes of file storage in 2U of rack. While I can only find about 5,000 of those in China that are the first generation experiments, itâ€™s the first of a growing wave of archival solutions based on lower performance ARM servers. The feedback is clear – theyâ€™re not perfect yet, but the writing is on the wall. (If youâ€™re wondering about the math, thatâ€™s 5,000 x 12 TBytes = 60 Petabytesâ€¦.)
Yes, itâ€™s important, but maybe more than weâ€™re used to. Itâ€™s harder to get licenses for power in China. So itâ€™s really important to stay within the envelope of power your datacenter has. You simply canâ€™t get more. That means they have to deploy solutions that do more in the same power profile, especially as they move out of co-located datacenters into private ones. Annually, 50% more users supported, more storage capacity, more performance, more services, all in the same power. Thatâ€™s not so easy. I would expect solar power in their future, just as Apple has done.
Hereâ€™s where it gets interesting. They are developing a cousin to OpenCompute thatâ€™s called Scorpio. Itâ€™s Tencent, Alibaba, Baidu, and China Telecom so far driving the standard. Â The goals are similar to OpenCompute, but more aligned to standardized sub-systems that can be co-mingled from multiple vendors. There is some harmonization and coordination between OpenCompute and Scorpio, and in fact the Scorpio companies are members of OpenCompute. But where OpenCompute is trying to change the complete architecture of scale-out clusters, Scorpio is much more pragmatic â€“ some would say less ambitious. Theyâ€™ve finished version 1 and rolled out about 200 racks as a â€śtest caseâ€ť to learn from. Baidu was the guinea pig. Thatâ€™s around 6,000 servers. They werenâ€™t expecting more from version 1. Theyâ€™re trying to learn. Theyâ€™ve made mistakes, learned a lot, and are working on version 2.
Even if itâ€™s not exciting, it will have an impact because of the sheer size of deployments these guys are getting ready to roll out in the next few years. They see the progression as 1) they were using standard equipment, 2) theyâ€™re experimenting and learning from trial runs ofÂ Scorpio versions 1 and 2, and then theyâ€™ll work on 3) new architectures that are efficient and powerful, and different.
Information is pretty sketchy if you are not one of the member companies or one of their direct vendors. We were just invited to join Scorpio by one of the founders, and would be the first group outside of China to do so. If that all works out, Iâ€™ll have a much better idea of the details, and hopefully can influence the standards to be better for theseÂ hyperscale datacenter applications. Between OpenCompute and Scorpio weâ€™ll be seeing a major shift in the industry â€“ a shift that will undoubtedly be disturbing to a lot of current players. It makes me nervous, even though Iâ€™m excited about it. One thing is sure â€“ just as the server market volume is migrating from traditional enterprise toÂ hyperscale datacenter (25-30% of the server market and growing quickly), weâ€™re starting to see a migration to ChineseÂ hyperscale datacenters from US-based ones. They have to grow just to stay still. I mean â€“ come on â€“ there are 1.3 billion potential users in Chinaâ€¦.
Tags: Alibaba, Amazon, Apple, ARM, Baidu, China, China Telecom, datacenter, Facebook, Google, Hadoop, hard disk drive, HDD, hyperscale, Microsoft, OpenCompute, Scorpio, Shenzhen, Sina Weibo, solid state drive, SSD, Tencent, Twitter