I started working years ago to engage large datacenters, learn what their problems are and try to craft solutions for their problems. It’s taken years, but we engaged them, learned, changed how we thought about storage and began creating solutions that are being deployed at scale.
We’ve started to do the same with the Chinese Internet giants. They’re growing at an incredible rate. They have similar problems, but it’s surprising how different their solution approaches are. Each one is unique. And we’re constantly learning from these guys.
So to wrap up the blog series on my interview with CIO & CEO magazine, here are the last two questions to explain a bit more.
CEO & CIO: Please use examples to tell the stories about the forward-looking technologies and architectures that LSI has jointly developed with Internet giants.
While our host bus adapters (HBAs) and MegaRAID® solutions have been part of the hyperscale Internet companies’ infrastructure since the beginning, we have only recently worked very closely with them to drive joint innovation. In 2009 I led the first LSI engagement with what we then called “mega datacenters.” It took a while to understand what they were doing and why. By 2010 we realized there were specialized needs, and began to imagine new hardware products that worked with these datacenters. Out of this work came the realization that flash was important for efficiency and capability, and the “invention” of LSI® Nytro™ product portfolio. (More are in the pipeline). We have worked closely with hyperscale datacenters to evolve and tune these solutions, to where Nytro products have become the backbone of their main revenue platforms. Facebook has been a vitally important partner in evolving our Nytro platform – teaching us what was truly needed, and now much of their infrastructure runs on LSI products. These same products are a good fit for other hyperscale customers, and we are slowly winning many of the large ones.
Looking forward, we are partnered with several Internet giants in the U.S. and China to work on cold storage solutions, and more importantly shared DAS (Distributed DAS: D-DAS) solutions. We have been demonstrating prototypes. These solutions enable pooled architectures and rack scale architecture, and can be made to work tightly with software-defined datacenters (SDDCs). They simplify management and resource allocation – making task deployment more efficient and easier. Shared DAS solutions increase infrastructure efficiency and improves lifecycle management of components. And they have the potential to radically improve application performance and infrastructure costs.
Looking further into the future, we see even more radical changes in silicon supporting transport protocols and storage models, and in rack scale architectures supporting storage and pooled memory. And cold storage is a huge though, some would say, boring problem that we are also focused on – storing lots of data for free and using no power to do it… but I really can’t talk about any of that.
CEO & CIO: LSI maintains good contact with big Internet companies in China. What are the biggest differences between dealing with these Internet enterprises and dealing with traditional partners?
Yes, we have a very good relationship with large Chinese Internet companies. In fact, I will be visiting Tencent, Alibaba and Baidu in a few weeks. One of the CTOs I would like to say is a friend. That is, we have fun talking together about the future.
These meetings have evolved. The first meetings LSI had about two years ago were sales calls, or support for OEM storage solutions. These accomplished very little. Once we began visiting as architects speaking to architects, real dialogs began. Our CEO has been spending time in China meeting with these Internet companies both to learn, and to make it clear that they are important to us, and we want a chance to solve their problems. But the most interesting conversations have been the architectural ones. There have been very clear changes in the two years I have traveled within China – from standard enterprise to hyperscale architectures.
We’ve received fascinating feedback on architecture, use, application profiles, platforms, problems and goals. We have strong engagement with the U.S. Internet giants. At the highest level, the Chinese Internet companies have similar problems and goals. But the details quickly diverge because of revenue per user, resources, power availability, datacenter ownership and Internet company age. The use of flash is very different.
The Chinese Internet giants are at an amazing change point. Most are ready for explosive growth of infrastructure and deployment of cloud services. Most are changing from standard OEM systems and architectures to self-designed hyperscale systems after experimenting with Scorpio and microserver deployments. Several, like JD.com (an Amazon-like company) are moving from hosted to self-built infrastructure. And there seems to be a general realization that the datacenter has changed from a compute-centric model to a dataflow model, where storage and network dictate how much work gets done more than the CPU does. These giants are leveraging their experience and capability to move very quickly, and in a few cases are working to create true pooled rack level architectures much like Facebook and Google have started in the U.S. In fact, Baidu is similar to Facebook in this approach, but is different in its longer term goals for the architecture.
The Chinese companies are amazingly diverse, even within one datacenter, and arguments on architectural direction are raging within these Internet giants – it’s healthy and exciting. However, the innovations that are coming are similar to those developed by large U.S. Internet companies. Personally I have found these Internet companies much more exciting and satisfying to work with than traditional OEMs. The speed and cadence of advancement, the recognition of problems and their importance, the focus on efficiency and optimization have been much more exciting. And the youthful mentality and view to problems, without being burdened by “the way we’ve always done this” has been wonderful.
Also see these blogs of mine over the past year, where you can read more about some of these changes:
“Postcard from Shenzhen: China’s hyperscale datacenter growth, mixed with a more traditional approach”
“China in the clouds, again”
“China: A lot of talk about resource pooling, a better name for disaggregation”
Or see them (and others) all here.
Summary: So it’s taken years, but we engaged U.S. Internet giants, learned about their problems, changed how we thought about storage and began creating solutions that are now being deployed at scale. And we’re constantly learning from these guys. Constantly, because their problems are constantly changing.
We’ve now started to do the same with the Chinese Internet giants. They have similar problems, and will need similar solutions, but they are not the same. And just like the U.S. Internet giants, each one is unique.
Tags: Alibaba, Amazon, Baidu, CEO & CIO Magazine, China, cloud services, cold storage, D-DAS, DAS, datacenter, datacenter ecosystem, direct attached storage, distributed DAS, Facebook, flash, flash storage, Google, HBA, host bus adapter, hyperscale datacenter, Internet, JD.com, MegaRAID, OEM, original equipment manufacturer, Scorpio, Tencent
Software-defined datacenters (SDDC) and software-defined storage (SDS) are big movements in the industry right now. Just read the trade press or attend any conference and you’ll see that – it’s a big deal. We’re seeing for-pay vendors providing solutions, as well as strong ecosystems evolving around open source solutions. It’s not surprising why – there is a need for enterprises to deploy large scale compute clusters, and that takes either deep expertise that’s very rare, or orchestration tools that have not existed in the past. It’s the “necessity being the mother of invention” thing…
So datacenters are being forced to deploy large-scale clusters to handle the scale of compute needed, and the amount of data that is being captured, analyzed and stored. As an industry then, we’re being forced to simplify applications as well as the management and deployment of these large scale clusters. That’s great for datacenters. It’s even better that we’re figuring out how to provide those expanded resources and manage them for less money, and with fewer people to manage them. (well, it’s probably good for everyone but the sys admins…)
These new technologies are the key enabler. This blog, the second in my three-part series (based on interesting questions I was asked by CEO & CIO, a Chinese business magazine) examines how SDDC and SDS are helping enterprises get more out of their datacenter gear. You can read part 1 here.
CEO & CIO: What are your views on software-defined storage? What’s the development roadmap of LSI in achieving software-defined storage?
We see SDS as one of a number of vital changes underway in the datacenter. SDS promises to span some or all of file, object, key-value and block in order to pool resources and to simplify the infrastructure required in a datacenter, as well as to smooth the migration to object or key-value storage over time. Great examples of these SDS solutions are: Ceph, Swift, Cinder, Gluster, VSAN / VVOLs …. The model brings great benefits in datacenter management, resource pooling and allocation and usability. The main problem is performance – and by that I do not mean extreme performance. I mean poor performance that damages TCO, reduces efficiency of infrastructure and increases costs. Much worse than you would get otherwise. These solutions work, but compromise resource efficiency. Many require flash integrated in the system to simply maintain existing performance. However, this is a permanent change in how storage is used and deployed, and it’s a good change.
While block is what underlies most storage and will continue to for some time, the system and application level view is changing. We view SDS as having great synergy with LSI’s architectural direction – shared DAS infrastructure and ability to add “above the block” capability like quality of service (QoS), direct key/value hardware, etc, and bring improved performance and resource efficiency. Together, SDS + LSI innovation = resource pooling and allocation, including flash and cool/cold storage, management and virtual machine (VM) agility, performance and resource efficiency.
As a result, there has been tremendous interest from SDS vendors to work with us, to demonstrate prototype systems, and to make solutions better. We are working with many SDS partners to provide complete solutions. This is not a one-size-fits-all world, so there will be several solutions. Those solutions are not ready yet, but they’re coming, and will probably displace the older file and block storage systems we know and love.
CEO & CIO: Industry giants such as Intel have outlined their visions for software-defined datacenters. Chinese Internet giants have also put forward similar plans. What views does LSI have on software-defined datacenter?
If you view the AIS keynote, you’ll see we believe this is a critical part of the future datacenter. But just one critical part. Interestingly, we had Intel present as well during AIS.
SDDC creates a critical control plane for the datacenter. It is the software abstraction model that enables resource pooling. Resource pooling of compute, storage and network, with memory in the near future. It enables the automation and allocation of tasks and resources in the datacenter. The leading models are VMware® SDDC and OpenStack® software, but there are others that are important too. They’re just a little less public right now. Anyway – it’s way too early to predict which will be dominant. Just like SDS, SDDC exchanges simplified control and abstraction for performance and efficiency. As a result, it’s not a very useful concept, at least not at hyperscale levels, without hardware that really, truly supports and enables it. As the datacenter has changed from a compute-centric model to a dataflow model, the storage and network and, soon, memory become very important. They dictate the useful work that can be gotten from the datacenter.
I believe we are, as an industry, at the start of the hardware transition to support these. We are building hardware solutions for storage and network that are being designed into products today. We are working very closely with three of the largest datacenters in the US, and two in China to build not just the SDDC, but the pooled hardware infrastructure that is needed to make it work.
It’s critical to understand that SDDC solutions really work, but often the performance and efficiency is – well – terrible. That’s been the evolution in computer science and computer architecture since the beginning. You raise the abstraction level, which simplifies development and support, but either causes poor performance or requires more hardware capability that is architected to support those abstractions.
As a result, it’s really difficult to talk about SDDCs without a rack-scale architecture to support them. So we are working closely with the key SDDC software solutions/vendors, even the ones I didn’t list, to integrate and optimize the solutions to make the SDDC actually work. We have been working very closely with VMware and the OpenStack community, and we are changing the way the software plane interacts with the pooled resources. Again, there has been so much interest in our shared DAS, incorporating flash in the same architecture and management, and our Axxia® SDN control plane processor for networks.
I talk about rack-scale architectures to support SDS in the second half of this keynote and in my blog “China: A lot of talk about resource pooling, a better name for disaggregation.”
Summary: So I believe SDS is a big movement, it’s a good thing, and it’s here to stay. But… the performance is poor today. Very poor. That’s where we come in, with hardware that enables SDS and not only makes performance acceptable, but helps make it excellent, and improves efficiency and cost too. And SDDC is also a massive movement that will define the future datacenter. But it is intertwined with the rack-level concepts of pooling or disaggregation to make it really compelling. Again – that’s where we come in.
These were good questions that were interesting to answer. I hope it’s interesting to you too. I’ll post some more soon about how the Chinese Internet giants differ from other customers, and about forward-looking technologies.
Tags: AIS, Axxia, CEO & CIO Magazine, Ceph, China, Cinder, cold storage, cool storage, datacenter, direct attached storage, disaggregation, ecosystem, Gluster, hyperscale datacenter, key-value storage, object storage, OpenStack, pooling, QoS, quality of service, rack scale architecture, SDDC, SDS, shared DAS, software-defined datacenter, software-defined storage, Swift, virtual machine, VM, VMware, VSAN, VVOL
I was asked some interesting questions recently by CEO & CIO, a Chinese business magazine. The questions ranged from how Chinese Internet giants like Alibaba, Baidu and Tencent differ from other customers and what leading technologies big Internet companies have created to questions about emerging technologies such as software-defined storage (SDS) and software-defined datacenters (SDDC) and changes in the ecosystem of datacenter hardware, software and service providers. These were great questions. Sometimes you need the press or someone outside the industry to ask a question that makes you step back and think about what’s going on.
I thought you might interested, so this blog, the first of a 3-part series covering the interview, shares details of the first two questions.
CEO & CIO: In recent years, Internet companies have built ultra large-scale datacenters. Compared with traditional enterprises, they also take the lead in developing datacenter technology. From an industry perspective, what are the three leading technologies of ultra large-scale Internet data centers in your opinion? Please describe them.
There are so many innovations and important contributions to the industry from these hyperscale datacenters in hardware, software and mechanical engineering. To choose three is difficult. While I would prefer to choose hardware innovations as their big ones, I would suggest the following as they have changed our world and our industry and are changing our hardware and businesses:
Autonomous behavior and orchestration
An architect at Microsoft once told me, “If we had to hire admins for our datacenter in a normal enterprise way, we would hire all the IT admins in the world, and still not have enough.” There are now around 1 million servers in Microsoft datacenters. Hyperscale datacenters have had to develop autonomous, self-managing, sometimes self-deploying datacenter infrastructure simply to expand. They are pioneering datacenter technology for scale – innovating, learning by trial and error, and evolving their practices to drive more work/$. Their practices are specialized but beginning to be emulated by the broader IT industry. OpenStack is the best example of how that specialized knowledge and capability is being packaged and deployed broadly in the industry. At LSI, we’re working with both hyperscale and orchestration solutions to make better autonomous infrastructure.
High availability at datacenter level vs. machine level
As systems get bigger they have more components, more modes of failure and they get more complex and expensive to maintain reliability. As storage is used more, and more aggressively, drives tend to fail. They are simply being used more. And yet there is continued pressure to reduce costs and complexity. By the time hyperscale datacenters had evolved to massive scale – 100’s of thousands of servers in multiple datacenters – they had created solutions for absolute reliability, even as individual systems got less expensive, less complex and much less reliable. This is what has enabled the very low cost structures of the cloud, and made it a reliable resource.
These solutions are well timed too, as more enterprise organizations need to maintain on-premises data across multiple datacenters with absolute reliability. The traditional view that a single server requires 99.999% reliability is giving way to a more pragmatic view of maintaining high reliability at the macro level – across the entire datacenter. This approach accepts the failure of individual systems and components even as it maintains data center level reliability. Of course – there are currently operational issues with this approach. LSI has been working with hyperscale datacenters and OEMs to engineer improved operational efficiency and resilience, and minimized impact of individual component failure, while still relying on the datacenter high-availability (HA) layer for reliability.
It’s such an overused term. It’s difficult to believe the term barely existed a few years ago. The gift of Hadoop® to the industry – an open source attempt to copy Google® MapReduce and Google File System – has truly changed our world unbelievably quickly. Today, Hadoop and the other big data applications enable search, analytics, advertising, peta-scale reliable file systems, genomics research and more – even services like Apple® Siri run on Hadoop. Big data has changed the concept of analytics from statistical sampling to analysis of all data. And it has already enabled breakthroughs and changes in research, where relationships and patterns are looked for empirically, rather than based on theories.
Overall, I think big data has been one of the most transformational technologies this century. Big data has changed the focus from compute to storage as the primary enabler in the datacenter. Our embedded hard disk controllers, SAS (Serial Attached SCSI) host bus adaptors and RAID controllers have been at the heart of this evolution. The next evolutionary step in big data is the broad adoption of graph analysis, which integrates the relationship of data, not just the data itself.
CEO & CIO: Due to cloud computing, mobile connectivity and big data, the traditional IT ecosystem or industrial chain is changing. What are the three most important changes in LSI’s current cooperation with the ecosystem chain? How does LSI see the changes in the various links of the traditional ecosystem chain? What new links are worth attention? Please give some examples.
Cloud computing and the explosion of data driven by mobile devices and media has and continues to change our industry and ecosystem contributors dramatically. It’s true the enterprise market (customers, OEMs, technology, applications and use cases) has been pretty stable for 10-20 years, but as cloud computing has become a significant portion of the server market, it has increasingly affected ecosystem suppliers like LSI.
Timing: It’s no longer enough to follow Intel’s ticktock product roadmap. Development cycles for datacenter solutions used to be 3 to 5 years. But these cycles are becoming shorter. Now, demand for solutions is closer to 6 months – forcing hardware vendors to plan and execute to far tighter development cycles. Hyperscale datacenters also need to be able to expand resources very quickly, as customer demand dictates. As a result they incorporate new architectures, solutions and specifications out of cycle with the traditional Intel roadmap changes. This has also disrupted the ecosystem.
End customers: Hyperscale datacenters now have purchasing power in the ecosystem, with single purchase orders sometimes amounting to 5% of the server market. While OEMs still are incredibly important, they are not driving large-scale deployments or innovating and evolving nearly as fast. The result is more hyperscale design-win opportunities for component or sub-system vendors if they offer something unique or a real solution to an important problem. This also may shift profit pools away from OEMs to strong, nimble technology solution innovators. It also has the potential to reduce overall profit pools for the whole ecosystem, which is a potential threat to innovation speed and re-investment.
New players: Traditionally, a few OEMs and ISVs globally have owned most of the datacenter market. However, the supply chain of the hyperscale cloud companies has changed that. Leading datacenters have architected, specified or even built (in Google’s case) their own infrastructure, though many large cloud datacenters have been equipped with hyperscale-specific systems from Dell and HP. But more and more systems built exactly to datacenter specifications are coming from suppliers like Quanta. Newer network suppliers like Arista have increased market share. Some new hyperscale solution vendors have emerged, like Nebula. And software has shifted to open source, sometimes supported for-pay by companies copying the Redhat® Linux model – companies like Cloudera, Mirantis or United Stack. Personally, I am still waiting for the first 3rd-party hardware service emulating a Linux support and service company to appear.
Open initiatives: Yes, we’ve seen Hadoop and its derivatives deployed everywhere now – even in traditional industries like oil and gas, pharmacology, genomics, etc. And we’ve seen the emergence of open-source alternatives to traditional databases being deployed, like Casandra. But now we’re seeing new initiatives like Open Compute and OpenStack. Sure these are helpful to hyperscale datacenters, but they are also enabling smaller companies and universities to deploy hyperscale-like infrastructure and get the same kind of automated control, efficiency and cost structures that hyperscale datacenters enjoy. (Of course they don’t get fully there on any front, but it’s a lot closer). This trend has the potential to hurt OEM and ISV business models and markets and establish new entrants – even as we see Quanta, TYAN, Foxconn, Wistron and others tentatively entering the broader market through these open initiatives.
New architectures and new algorithms: There is a clear movement toward pooled resources (or rack scale architecture, or disaggregated servers). Developing pooled resource solutions has become a partnership between core IP providers like Intel and LSI with the largest hyperscale datacenter architects. Traditionally new architectures were driven by OEMs, but that is not so true anymore. We are seeing new technologies emerge to enable these rack-scale architectures (RSA) – technologies like silicon photonics, pooled storage, software-defined networks (SDN), and we will soon see pooled main memory and new nonvolatile main memories in the rack.
We are also seeing the first tries at new processor architectures about to enter the datacenter: ARM 64 for cool/cold storage and web tier and OpenPower P8 for high power processing – multithreaded, multi-issue, pooled memory processing monsters. This is exciting to watch. There is also an emerging interest in application acceleration: general-purposing computing on graphics processing units (GPGPUs), regular expression processors (regex) live stream analytics, etc. We are also seeing the first generation of graph analysis deployed at massive scale in real time.
Innovation: The pace of innovation appears to be accelerating, although maybe I’m just getting older. But the easy gains are done. On one hand, datacenters need exponentially more compute and storage, and they need to operate 10x to 1000x more quickly. On the other, memory, processor cores, disks and flash technologies are getting no faster. The only way to fill that gap is through innovation. So it’s no surprise there are lots of interesting things happening at OEMs and ISVs, chip and solution companies, as well as open source community and startups. This is what makes it such an interesting time and industry.
Consumption shifts: We are seeing a decline in laptop and personal computer shipments, a drop that naturally is reducing storage demand in those markets. Laptops are also seeing a shift to SSD from HDD. This has been good for LSI, as our footprint in laptop HDDs had been small, but our presence in laptop SSDs is very strong. Smart phones and tablets are driving more cloud content, traffic and reliance on cloud storage. We have seen a dramatic increase in large HDDs for cloud storage, a trend that seems to be picking up speed, and we believe the cloud HDD market will be very healthy and will see the emergence of new, cloud-specific HDDs that are radically different and specifically designed for cool and cold storage.
There is also an explosion of SSD and PCIe flash cards in cloud computing for databases, caches, low-latency access and virtual machine (VM) enablement. Many applications that we take for granted would not be possible without these extreme low-latency, high-capacity flash products. But very few companies can make a viable storage system from flash at an acceptable cost, opening up an opportunity for many startups to experiment with different solutions.
Summary: So I believe the biggest hyperscale innovations are autonomous behavior and orchestration, HA at the datacenter level vs. machine level, and big data. These are radically changing the whole industry. And what are those changes for our industry and ecosystem? You name it: timing, end customers, new players, open initiatives, new architectures and algorithms, innovation, and consumption patterns. All that’s staying the same are legacy products and solutions.
These were great questions. Sometimes you need the press or someone outside the industry to ask a question that makes you step back and think about what’s going on. Great questions.
Tags: Alibaba, Apple Siri, Arista, ARM 64, Baidu, big data, Casandra, CEO & CIO Magazine, China, cloud storage, Cloudera, cold storage, cool storage, datacenter, datacenter ecosystem, Dell, flash, Foxconn, Google File System, Google MapReduce, Hadoop, hard disk drive, HDD, high availability, HP, hyperscale datacenter, Intel, Internet, latency, Microsoft, Mirantis, Nebula, OEM, Open Compute, OpenPower P8, OpenStack, original equipment manufacturer, Quanta, rack scale, RAID, Redhat Linux, SAS, SDDC, SDN, SDS, Serial Attached SCSI, software-defined datacenter, software-defined networks, software-defined storage, solid state drive, SSD, Tencent, TYAN, United Stack, virtual machine, VM, Wistron