You might be surprised to find out how big the infrastructure for cloud and Web 2.0 is. It is mind-blowing. Microsoft has acknowledged packing more than 1 million servers into its datacenters, and by some accounts that is fewer than Googleâ€™s massive server count but a bit more than Amazon. Â
Facebookâ€™s server count is said to have skyrocketed from 30,000 in 2012 to 180,000 just this past August, serving 900 million plus users. And the social media giant is even putting its considerable weight behind the Open Compute effort to make servers fit better in a rack and draw less power. The list of mega infrastructures also includes Tencent, Baidu and Alibaba and the roster goes on and on.
Even more jaw-dropping is that almost 99.9% of these hyperscale infrastructures are built with servers featuring direct-attached storage. Thatâ€™s right â€“ they do the computing and store the data. In other words, no special, dedicated storage gear. Yes, your Facebook photos, your Skydrive personal cloud and all the content you use for entertainment, on-demand video and gaming data are stored inside the server.
Direct-attached storage reigns supreme
Everything in these infrastructures â€“ compute and storage â€“ is built out of x-86 based servers with storage inside. Whatâ€™s more, growth of direct-attached storage is many folds bigger than any other storage deployments in IT. Rising deployments of cloud, or cloud-like, architectures are behind much of this expansion.
The prevalence of direct-attached storage is not unique to hyperscale deployments. Large IT organizations are looking to reap the rewards of creating similar on-premise infrastructures. The benefits are impressive: Build one kind of infrastructure (server racks), host anything you want (any of your properties), and scale if you need to very easily. TCO is much less than infrastructures relying on network storage or SANs.
With direct-attached you no longer need dedicated appliances for your database tier, your email tier, your analytics tier, your EDA tier. All of that can be hosted on scalable, share-nothing infrastructure. And just as with hyperscale, the storage is all in the server. No SAN storage required.
Open Compute, OpenStack and software-defined storage drive DAS growth
Open Compute is part of the picture. A recent Open Compute show I attended was mostly sponsored by hyperscale customers/suppliers. Many big-bank IT folks attended. Open Compute isnâ€™t the only initiative driving growing deployments of direct-attached storage. So is software-defined storage and OpenStack. Big application vendors such as Oracle, Microsoft, VMware and SAP are also on board, providing solutions that support server-based storage/compute platforms that are easy and cost-effective to deploy, maintain and scale and need no external storage (or SAN including all-flash arrays).
So if you are a network-storage or SAN manufacturer, you have to be doing some serious thinking (many have already) about how youâ€™re going to catch and ride this huge wave of growth.
Tags: Alibaba, Amazon, Baidu, cloud computing, DAS, direct attached storage, enterprise, enterprise IT, Google, hyperscale, Microsoft, Open Compute, OpenStack, Oracle, SAP, Tencent, VMware
I am sitting in the terminal waiting for my flight home from â€“ yes, you guessed it â€“ China. I am definitely racking up frequent flier miles this year.
This trip ended up centering on resource pooling in the datacenter. Sure, you might hear a lot about disaggregation, but the consensus seems to be: thatâ€™s the wrong name (unless you happen to make standalone servers). For anyone else, itâ€™s about a much more flexible infrastructure, simplified platforms, better lifecycle management, and higher efficiency. I call it â€śresource pooling,â€ť which is descriptive,Â but others simply call it rack scale architecture.
Itâ€™s been a long week, but very interesting. I was asked to keynote at the SACC conference (Systems Architect Conference China) in Beijing. It was also a great chance to meet 1-on-1 with the CTOs and chief architects from the big datacenters, and visit for a few hours with other acquaintances. I even had the chance to have dinner with the CEO /CIO China Magazine editor in chief, and CIOâ€™s from around Beijing. As always in life, if youâ€™re willing to listen, you can learn a lot. And I did.
Thinking on disaggregation aligns
With CTOs, there was a lot of discussion about disaggregation in the datacenter. There is a lot of aligned thinking on the topic, and itâ€™s one of those occasions where you had to laugh because I think anyone of the CTOs keynoting could have given anyone elseâ€™s presentation. So whatâ€™s the big deal? Resource pooling and rack scale architecture.
Iâ€™ll use this trip as an excuse to dig a little deeper into my view on what this means.
First â€“ you need to understand where these large datacenters are in their evolution. They usually have 4 to 6 platforms and2 or 3 generations of each in the datacenter. That can be 18 different platforms to manage, maintain, and tune. Worse â€“ they have to plan 6 to 9 months in advance to deploy equipment. If you guess wrong, youâ€™ve got a bunch of useless equipment, and you spent a bunch of money â€“ the size of mistake that will get you firedâ€¦ And even if you get it right, youâ€™re left with the problem â€“ Do I upgrade servers when the CPU is new? Or at, say, 18 months? Or do I wait until the biggest cost item â€“ the drives â€“ need to be replaced in 4 or 5 years? Thatâ€™s difficult math. So resource pooling is about lifecycle management of different types of components and sub-systems. You can optimally replace each resource on its own schedule.
Increasing resource utilization and efficiency
But itâ€™s also about resource utilization and efficiency. Datacenters have multiple platforms because each platform needs a different configuration of resources. I use the term configuration on purpose. If you have storage in your server, itâ€™s in some standard configuration â€“ say, 6 3 TByte drives or 18 raw TBytes. Do you use all that capacity? Or do you leave some space so databases can grow? Of course you leave empty space. You might not even have any use for that much storage in that particular server â€“ maybe you just use half the capacity. After all, itâ€™s a standard configuration. What about disk bandwidth? Can your Hadoop node saturate 6 drives? Probably. It could probably use 12 or maybe even 24. But sorry â€“ itâ€™s a standard configuration. What about latency-sensitive databases? Sure, I can plug a PCIe card in, but I only have 1.6 TByte PCIe cards as my standard configuration. My database is 1.8 TBytes and growing. Sorry â€“ you have to refactor and put on 2 servers. Or my database is only 1 TByte. Iâ€™m wasting 600 GBytes of really expensive resource.
For network resources â€“ the standard configuration gets maybe exactly 1 10GE port. You need more? Canâ€™t have it. You donâ€™t need that much? Sorry â€“ wasted bandwidth capacity. What about standard memory? You either waste DRAM you donâ€™t use, or you starve for more DRAM you canâ€™t get.
But if I have pools of rack scale resources that I can allocate to a standard compute platform â€“ well â€“ thatâ€™s a different story. I can configure exactly the amount of network bandwidth, memory, flash high- performance storage, and disk bulk storage. I can even add more configured storage if a database grows, instead of being forced to refactor a database into shards across multiple standard configurations.
Pooling resources = simplified operations
So the desire to pool resources is really as much about simplified operations as anything else. I can have standardized modules that are all â€śthe sameâ€ť to manage, but can be resource configured into a well-tailored platform that can even change over time.
But pooling is also about accommodating how the application architectures have changed, and how much more important dataflow is than compute for so much of the datacenter. As a result there is a lot of uncertainty about how parts of these rack scale architectures and interconnect will evolve, even as there is a lot of certainty that they will evolve, and they will include pooled resource â€śmodules.â€ť Whatever the overall case, weâ€™re pretty sure we understand how the storage will evolve. And at a high level, thatâ€™s what I presented in my keynote. (Hey â€“ Iâ€™m not going to publicly share all our magic!)
One storage architecture of pooled resources at the rack scale level. One storage architecture that combines boot management, flash storage for performance, and disk storage for efficient bandwidth and capacity. And those resources can be allocated however and whenever the datacenter manager needs them. And the existing software model doesnâ€™t need to change. Existing apps, OSâ€™s, file systems, and drivers are all supported, meaning a change to pooled resource rack scale deployments is de-risked dramatically. Overall, this one architecture simplifies the number of platforms, simplifies the management of platforms, utilizes the resources very efficiently, and simplifies image and boot management.Â Iâ€™m pretty sure it even reduces datacenter-level CapEx. I know it dramatically reduces OpEx.
Yea â€“ I know what youâ€™re thinking â€“ itâ€™s awesome ! (Thatâ€™s what you thought â€“ right?)
Oh – what about those CIO meetings? Well, there is tremendous pressure to not buy American IT equipment in China because of all the news from the Snowden NSA leaks. As most of the CIOâ€™s pointed out, though, in todayâ€™s global sourcing market, itâ€™s pretty hard to not buy US IT equipment. So theyâ€™re feeling a bit trapped. In a no-risk profession, I suspect that means they just wonâ€™t buy anything for a year or so and hope it blows over.
But in general, yep, I think this trip was centered on resource pooling in the datacenter. Sure, you might hear about disaggregation, but thereâ€™s a lot of agreement thatâ€™s the wrong name. Itâ€™s much more about resource pooling for flexible infrastructure, simplified platforms, better lifecycle management, and higher efficiency. And we aim to be right in the middle. Literally.
I’ve just been to China. Again. Â Itâ€™s only been a few months since I was last there.
I was lucky enough to attend the 5th China Cloud Computing Conference at the China National Convention Center in Beijing. You probably have not heard of it, but itâ€™s an impressive conference. Itâ€™s â€śthe oneâ€ť for the cloud computing industry. It was a unique view for me â€“ more of an inside-out view of the industry. Everyone whoâ€™s anyone in Chinaâ€™s cloud industry was there. Our CEO, Abhi Talwalkar, had been invited to keynote the conference, so I tagged along.
First, the air was really hazy, but I donâ€™t think the locals considered it that bad. The US consulate iPhone app said the particulates were in the very unhealthy range. Imagine looking across the street. Sure, you can see the building there, but the next one? Not so much. Look up. Can you see past the 10th floor? No, not really. The building disappears into the smog. Thatâ€™s what it was like at the China National Convention Center, which is part of the same Olympics complex as the famous Birdcage stadium: http://www.cnccchina.com/en/Venues/Traffic.aspx
I had a fantastic chance to catch up with a university friend, who has been living in Beijing since the 90â€™s, and is now a venture capitalist. Itâ€™s amazing how almost 30 years can disappear and you pick up where you left off. He sure knows how to live. I was picked up in his private limo, whisked off to a very well-known restaurant across the city, where we had a private room and private waitress. We even had some exotic, special dishes that needed to be ordered at least a day in advance. Wow.Â But we broke Chinese tradition and had imported beer in honor of our Canadian education.
Sizing up China’s cloud infrastructure
The most unusual meeting I attended was an invitation-only session â€“ the Sino-American roundtable on cloud computing. There were just about 40 people in a room â€“ half from the US, half from China. Mostly what I learned is that the cloud infrastructure in China is fragmented, and probably sub-scale. And itâ€™s like that for a reason. It was difficult to understand at first, but I think Iâ€™ve made sense of it.
I started asking why to friends and consultants and got some interesting answers. Essentially different regional governments are trying to capture the cloud â€śindustryâ€ť in their locality, so they promote activity, and they promote creation of new tools and infrastructure for that. Why reuse something thatâ€™s open source and works if you donâ€™t have to and you can create high-tech jobs? (Thatâ€™s sarcasm, by the way.) Many technologists I spoke with felt this will hold them back, and that they are probably 3-5 years behind the US. As well, each government-run industry specifies the datacenter and infrastructure needed to be a supplier or ecosystem partner with them, and each is different. The national train system has a different cloud infrastructure from the agriculture department, and from the shipping authority, etcâ€¦ and if you do business with them â€“ that is you are part of their ecosystem of vendors, then you use their infrastructure. It all spells fragmentation and sub-scale. In contrast, the Web 2.0 / social media companies seem to be doing just fine.
Baidu was also showing off its open rack. Itâ€™s an embodiment of the Scorpio V1 standard, which was jointly developed with Tencent, Alibaba and China Telecom. It views this as a first experiment, and is looking forward to V2, which will be a much more mature system.
I was also lucky to have personal meetings with general managers,chief architects and effective CTOs of the biggest cloud companies in China. What did I learn? They are all at an inflexion point. Many of the key technologists have experience at American Web 2.0 companies, so theyâ€™re able to evolveÂ quickly, leveraging their industry knowledge. Theyâ€™re all working to build or grow their own datacenters, their own infrastructure. And theyâ€™re aggressively expanding products, not just users, so theyâ€™re getting a compound growth rate.
Hereâ€™s a little of what I learned. In general, there is a trend to try and simplify infrastructure, harmonize divergent platforms, and deploy more infrastructure by spending less on each unit. (In general, they donâ€™t make as much per user as American companies, but they have more users). As a result they are more cost-focused than US companies. And they are starting to put more emphasis on operational simplicity in general. As one GM described it to me â€“ â€śYes, techs are inexpensive in China for maintainence, but more often than not they make mistakes that impact operations.â€ť So we (LSI) will be focussing more on simplifying management and maintainence for them.
Baiduâ€™s biggest Hadoop cluster is 20k nodes. I believe thatâ€™s as big as Yahooâ€™s â€“ and it is the originator of Hadoop. Baidu has a unique use profile for flash â€“ itâ€™s not like theÂ hyperscale datacenters in the US. But Baidu is starting to consume a lot. Like most other hyperscale datacenters, it is working on storage erasure coding across servers, racks and datacenters, andÂ it is trying to make a unified namespace across everything. One of its main interests is architecture at datacenter level, harmonizing the various platforms and looking for the optimum at the datacenter level. In general, Baidu is very proud of the advances it has made, and it has real confidence in its vision and route forward, and from what I heard, its architectural ambitions are big.
JD.com (which used to be 360buy.com) is the largest direct ecommerce company in China and (only) had about $10 billion (US) in revenue last year, with 100% CAGR growth. As the GM there said, its growth has to slow sometime, or in 5 years itâ€™ll be the biggest company in the world. I think it isÂ the closest equivalent to Amazon there is out there, and they have similar ambitions. They are in the process of transforming to a self-built, self-managed datacenter infrastructure. It is a company I am going to keep my eyes on.
Tencent is expanding into some interesting new businesses. Sure, people know about the Tencent cloud services that the Chinese government will be using, but Tencent also has some interesting and unique cloud services coming. Letâ€™s just say even I am interested in using them. And of course, while Tencent is already the largest Web 2.0 company in China, its new services promise to push it to new scale and new markets.
Extra! Extra! Read all about it …
And then there was press. I had a very enjoyable conversation with Yuan Shaolong, editor at WatchStor, that I think ran way over. Amazingly â€“ we discovered we have the same favorite band, even half a world away from each other. The results are here, though Iâ€™m not sure if Google translate messed a few things up, or if there was some miscommunication, but in general, I think most of the basics are right: http://translate.google.com/translate?hl=en&sl=zh-CN&u=http://tech.watchstor.com/storage-module-144394.htm&prev=/search%3Fq%3Drobert%2Bober%2BLSI%26client%3Dfirefox-a%26rls%3Dorg.mozilla:en-US:official%26biw%3D1346%26bih%3D619
I just keep learning new things every time I go to China. I suspect it has as much to do with how quickly things are changing as new stuff to learn. So I expect it wonâ€™t be too long until I go to China, againâ€¦
Tags: Abhi Talwalkar, Alibaba, Amazon, Baidu, China, China Cloud Computing Conference, China National Convention Center, China Telecom, datacenter, Hadoop, hyperscale, JD.com, WatchStor, web 2.0, Yahoo
Iâ€™ve been travelling to China quite a bit over the last year or so. Iâ€™m sitting in Shenzhen right now (If you know Chinese internet companies, youâ€™ll know who Iâ€™m visiting). The growth is staggering. Iâ€™ve had a bit of a trains, planes, automobiles experience this trip, and thatâ€™s exposed me to parts of China I never would have seen otherwise. Just to accommodate sheer population growth and the modest increase in wealth, there is construction everywhere â€“ a press of people and energy, constant traffic jams, unending urban centers, and most everything is new. Very new. It must be exciting to be part of that explosive growth. What a market. Â I mean â€“ come on â€“ there are 1.3 billion potential users in China.
The amazing thing for me is the rapid growth ofÂ hyperscale datacenters in China, which is truly exponential. Their infrastructure growth has been 200%-300% CAGR for the past few years. Itâ€™s also fantastic walking into a building in China, say Baidu, and feeling very much at home â€“ just like you walked into Facebook or Google. Itâ€™s the same young vibe, energy, and ambition to change how the world does things. And itâ€™s also the same pleasure â€“ talking to architects who are super-sharp, have few technical prejudices, and have very little vanity â€“ just a will to get to business and solve problems. Polite, but blunt. Weâ€™re lucky that they recognize LSI as a leader, and are willing to spend time to listen to our ideas, and to give us theirs.
Even their infrastructure has a similar feel to the USÂ hyperscale datacenters. The same only different. Â ;-)
A lot of these guys are growing revenue at 50% per year, several getting 50% gross margin. Those are nice numbers in any country. One has $100â€™s of billions in revenue. Â And theyâ€™re starting to push out of China. Â So far their pushes into Japan have not gone well, but other countries should be better. They all have unique business models. â€śWeâ€ť in the US like to say things like â€śAlibaba is the Chinese eBayâ€ť or â€śSina Weibo is the Chinese Twitterâ€ťâ€¦. But thatâ€™s not true â€“ they all have more hybrid business models, unique, and so their datacenter goals, revenue and growth have a slightly different profile. And there are some very cool services that simply are not available elsewhere. (You listening AppleÂ®, GoogleÂ®, TwitterÂ®, FacebookÂ®?) But they are all expanding their services, products and user base.Â Interestingly, there is very little public cloud in China. So there are no real equivalents to Amazonâ€™s services or Microsoftâ€™s Azure. I have heard about current development of that kind of model with the government as initial customer. Weâ€™ll see how that goes.
100â€™s of thousands of servers. Theyâ€™re not the scale of Google, but they sure are the scale of Facebook, Amazon, Microsoftâ€¦. Itâ€™s a serious market for an outfit like LSI. Really itâ€™s a very similar scale now to the US market. Close to 1 million servers installed among the main 4 players, and exabytes of data (weâ€™ve blown past mere petabytes). Interestingly, they still use many co-location facilities, but that will change. More important â€“ theyâ€™re all planning to probably double their infrastructure in the next 1-2 years â€“ they have to â€“ their growth rates are crazy.
Often 5 or 6 distinct platforms, just like the USÂ hyperscale datacenters. Database platforms, storage platforms, analytics platforms, archival platforms, web server platformsâ€¦. But they tend to be a little more like a rack of traditional servers that enterprise buys with integrated disk bays, still a lot of 1G Ethernet, and they are still mostly from established OEMs. In fact I just ran into one OEMâ€™s American GM, who I happen to know, in Tencentâ€™s offices today. The typical servers have 12 HDDs in drive bays, though they are starting to look at SSDs as part of the storage platform. They do use PCIeÂ® flash cards in some platforms, but the performance requirements are not as extreme as you might imagine. Reasonably low latency and consistent latency are the premium they are looking for from these flash cards â€“ not maximum IOPs or bandwidth â€“ very similar to their American counterparts. I thinkÂ hyperscale datacenters are sophisticated in understanding what they need from flash, and not requiring more than that. Enterprise could learn a thing or two.
Some server platforms have RAIDed HDDs, but most are direct map drives using a high availability (HA) layer across the server center â€“ HadoopÂ® HDFS or self-developed Hadoop like platforms. Some have also started to deploy microserver archival â€śbit buckets.â€ť A small ARMÂ® SoC with 4 HDDs totaling 12 TBytes of storage, giving densities like 72 TBytes of file storage in 2U of rack. While I can only find about 5,000 of those in China that are the first generation experiments, itâ€™s the first of a growing wave of archival solutions based on lower performance ARM servers. The feedback is clear – theyâ€™re not perfect yet, but the writing is on the wall. (If youâ€™re wondering about the math, thatâ€™s 5,000 x 12 TBytes = 60 Petabytesâ€¦.)
Yes, itâ€™s important, but maybe more than weâ€™re used to. Itâ€™s harder to get licenses for power in China. So itâ€™s really important to stay within the envelope of power your datacenter has. You simply canâ€™t get more. That means they have to deploy solutions that do more in the same power profile, especially as they move out of co-located datacenters into private ones. Annually, 50% more users supported, more storage capacity, more performance, more services, all in the same power. Thatâ€™s not so easy. I would expect solar power in their future, just as Apple has done.
Hereâ€™s where it gets interesting. They are developing a cousin to OpenCompute thatâ€™s called Scorpio. Itâ€™s Tencent, Alibaba, Baidu, and China Telecom so far driving the standard. Â The goals are similar to OpenCompute, but more aligned to standardized sub-systems that can be co-mingled from multiple vendors. There is some harmonization and coordination between OpenCompute and Scorpio, and in fact the Scorpio companies are members of OpenCompute. But where OpenCompute is trying to change the complete architecture of scale-out clusters, Scorpio is much more pragmatic â€“ some would say less ambitious. Theyâ€™ve finished version 1 and rolled out about 200 racks as a â€śtest caseâ€ť to learn from. Baidu was the guinea pig. Thatâ€™s around 6,000 servers. They werenâ€™t expecting more from version 1. Theyâ€™re trying to learn. Theyâ€™ve made mistakes, learned a lot, and are working on version 2.
Even if itâ€™s not exciting, it will have an impact because of the sheer size of deployments these guys are getting ready to roll out in the next few years. They see the progression as 1) they were using standard equipment, 2) theyâ€™re experimenting and learning from trial runs ofÂ Scorpio versions 1 and 2, and then theyâ€™ll work on 3) new architectures that are efficient and powerful, and different.
Information is pretty sketchy if you are not one of the member companies or one of their direct vendors. We were just invited to join Scorpio by one of the founders, and would be the first group outside of China to do so. If that all works out, Iâ€™ll have a much better idea of the details, and hopefully can influence the standards to be better for theseÂ hyperscale datacenter applications. Between OpenCompute and Scorpio weâ€™ll be seeing a major shift in the industry â€“ a shift that will undoubtedly be disturbing to a lot of current players. It makes me nervous, even though Iâ€™m excited about it. One thing is sure â€“ just as the server market volume is migrating from traditional enterprise toÂ hyperscale datacenter (25-30% of the server market and growing quickly), weâ€™re starting to see a migration to ChineseÂ hyperscale datacenters from US-based ones. They have to grow just to stay still. I mean â€“ come on â€“ there are 1.3 billion potential users in Chinaâ€¦.
Tags: Alibaba, Amazon, Apple, ARM, Baidu, China, China Telecom, datacenter, Facebook, Google, Hadoop, hard disk drive, HDD, hyperscale, Microsoft, OpenCompute, Scorpio, Shenzhen, Sina Weibo, solid state drive, SSD, Tencent, Twitter