I was asked some interesting questions recently by CEO & CIO, a Chinese business magazine. The questions ranged from how Chinese Internet giants like Alibaba, Baidu and Tencent differ from other customers and what leading technologies big Internet companies have created to questions about emerging technologies such as software-defined storage (SDS) and software-defined datacenters (SDDC) and changes in the ecosystem of datacenter hardware, software and service providers. These were great questions. Sometimes you need the press or someone outside the industry to ask a question that makes you step back and think about what’s going on.
I thought you might interested, so this blog, the first of a 3-part series covering the interview, shares details of the first two questions.
CEO & CIO: In recent years, Internet companies have built ultra large-scale datacenters. Compared with traditional enterprises, they also take the lead in developing datacenter technology. From an industry perspective, what are the three leading technologies of ultra large-scale Internet data centers in your opinion? Please describe them.
There are so many innovations and important contributions to the industry from these hyperscale datacenters in hardware, software and mechanical engineering. To choose three is difficult. While I would prefer to choose hardware innovations as their big ones, I would suggest the following as they have changed our world and our industry and are changing our hardware and businesses:
Autonomous behavior and orchestration
An architect at Microsoft once told me, “If we had to hire admins for our datacenter in a normal enterprise way, we would hire all the IT admins in the world, and still not have enough.” There are now around 1 million servers in Microsoft datacenters. Hyperscale datacenters have had to develop autonomous, self-managing, sometimes self-deploying datacenter infrastructure simply to expand. They are pioneering datacenter technology for scale – innovating, learning by trial and error, and evolving their practices to drive more work/$. Their practices are specialized but beginning to be emulated by the broader IT industry. OpenStack is the best example of how that specialized knowledge and capability is being packaged and deployed broadly in the industry. At LSI, we’re working with both hyperscale and orchestration solutions to make better autonomous infrastructure.
High availability at datacenter level vs. machine level
As systems get bigger they have more components, more modes of failure and they get more complex and expensive to maintain reliability. As storage is used more, and more aggressively, drives tend to fail. They are simply being used more. And yet there is continued pressure to reduce costs and complexity. By the time hyperscale datacenters had evolved to massive scale – 100’s of thousands of servers in multiple datacenters – they had created solutions for absolute reliability, even as individual systems got less expensive, less complex and much less reliable. This is what has enabled the very low cost structures of the cloud, and made it a reliable resource.
These solutions are well timed too, as more enterprise organizations need to maintain on-premises data across multiple datacenters with absolute reliability. The traditional view that a single server requires 99.999% reliability is giving way to a more pragmatic view of maintaining high reliability at the macro level – across the entire datacenter. This approach accepts the failure of individual systems and components even as it maintains data center level reliability. Of course – there are currently operational issues with this approach. LSI has been working with hyperscale datacenters and OEMs to engineer improved operational efficiency and resilience, and minimized impact of individual component failure, while still relying on the datacenter high-availability (HA) layer for reliability.
It’s such an overused term. It’s difficult to believe the term barely existed a few years ago. The gift of Hadoop® to the industry – an open source attempt to copy Google® MapReduce and Google File System – has truly changed our world unbelievably quickly. Today, Hadoop and the other big data applications enable search, analytics, advertising, peta-scale reliable file systems, genomics research and more – even services like Apple® Siri run on Hadoop. Big data has changed the concept of analytics from statistical sampling to analysis of all data. And it has already enabled breakthroughs and changes in research, where relationships and patterns are looked for empirically, rather than based on theories.
Overall, I think big data has been one of the most transformational technologies this century. Big data has changed the focus from compute to storage as the primary enabler in the datacenter. Our embedded hard disk controllers, SAS (Serial Attached SCSI) host bus adaptors and RAID controllers have been at the heart of this evolution. The next evolutionary step in big data is the broad adoption of graph analysis, which integrates the relationship of data, not just the data itself.
CEO & CIO: Due to cloud computing, mobile connectivity and big data, the traditional IT ecosystem or industrial chain is changing. What are the three most important changes in LSI’s current cooperation with the ecosystem chain? How does LSI see the changes in the various links of the traditional ecosystem chain? What new links are worth attention? Please give some examples.
Cloud computing and the explosion of data driven by mobile devices and media has and continues to change our industry and ecosystem contributors dramatically. It’s true the enterprise market (customers, OEMs, technology, applications and use cases) has been pretty stable for 10-20 years, but as cloud computing has become a significant portion of the server market, it has increasingly affected ecosystem suppliers like LSI.
Timing: It’s no longer enough to follow Intel’s ticktock product roadmap. Development cycles for datacenter solutions used to be 3 to 5 years. But these cycles are becoming shorter. Now, demand for solutions is closer to 6 months – forcing hardware vendors to plan and execute to far tighter development cycles. Hyperscale datacenters also need to be able to expand resources very quickly, as customer demand dictates. As a result they incorporate new architectures, solutions and specifications out of cycle with the traditional Intel roadmap changes. This has also disrupted the ecosystem.
End customers: Hyperscale datacenters now have purchasing power in the ecosystem, with single purchase orders sometimes amounting to 5% of the server market. While OEMs still are incredibly important, they are not driving large-scale deployments or innovating and evolving nearly as fast. The result is more hyperscale design-win opportunities for component or sub-system vendors if they offer something unique or a real solution to an important problem. This also may shift profit pools away from OEMs to strong, nimble technology solution innovators. It also has the potential to reduce overall profit pools for the whole ecosystem, which is a potential threat to innovation speed and re-investment.
New players: Traditionally, a few OEMs and ISVs globally have owned most of the datacenter market. However, the supply chain of the hyperscale cloud companies has changed that. Leading datacenters have architected, specified or even built (in Google’s case) their own infrastructure, though many large cloud datacenters have been equipped with hyperscale-specific systems from Dell and HP. But more and more systems built exactly to datacenter specifications are coming from suppliers like Quanta. Newer network suppliers like Arista have increased market share. Some new hyperscale solution vendors have emerged, like Nebula. And software has shifted to open source, sometimes supported for-pay by companies copying the Redhat® Linux model – companies like Cloudera, Mirantis or United Stack. Personally, I am still waiting for the first 3rd-party hardware service emulating a Linux support and service company to appear.
Open initiatives: Yes, we’ve seen Hadoop and its derivatives deployed everywhere now – even in traditional industries like oil and gas, pharmacology, genomics, etc. And we’ve seen the emergence of open-source alternatives to traditional databases being deployed, like Casandra. But now we’re seeing new initiatives like Open Compute and OpenStack. Sure these are helpful to hyperscale datacenters, but they are also enabling smaller companies and universities to deploy hyperscale-like infrastructure and get the same kind of automated control, efficiency and cost structures that hyperscale datacenters enjoy. (Of course they don’t get fully there on any front, but it’s a lot closer). This trend has the potential to hurt OEM and ISV business models and markets and establish new entrants – even as we see Quanta, TYAN, Foxconn, Wistron and others tentatively entering the broader market through these open initiatives.
New architectures and new algorithms: There is a clear movement toward pooled resources (or rack scale architecture, or disaggregated servers). Developing pooled resource solutions has become a partnership between core IP providers like Intel and LSI with the largest hyperscale datacenter architects. Traditionally new architectures were driven by OEMs, but that is not so true anymore. We are seeing new technologies emerge to enable these rack-scale architectures (RSA) – technologies like silicon photonics, pooled storage, software-defined networks (SDN), and we will soon see pooled main memory and new nonvolatile main memories in the rack.
We are also seeing the first tries at new processor architectures about to enter the datacenter: ARM 64 for cool/cold storage and web tier and OpenPower P8 for high power processing – multithreaded, multi-issue, pooled memory processing monsters. This is exciting to watch. There is also an emerging interest in application acceleration: general-purposing computing on graphics processing units (GPGPUs), regular expression processors (regex) live stream analytics, etc. We are also seeing the first generation of graph analysis deployed at massive scale in real time.
Innovation: The pace of innovation appears to be accelerating, although maybe I’m just getting older. But the easy gains are done. On one hand, datacenters need exponentially more compute and storage, and they need to operate 10x to 1000x more quickly. On the other, memory, processor cores, disks and flash technologies are getting no faster. The only way to fill that gap is through innovation. So it’s no surprise there are lots of interesting things happening at OEMs and ISVs, chip and solution companies, as well as open source community and startups. This is what makes it such an interesting time and industry.
Consumption shifts: We are seeing a decline in laptop and personal computer shipments, a drop that naturally is reducing storage demand in those markets. Laptops are also seeing a shift to SSD from HDD. This has been good for LSI, as our footprint in laptop HDDs had been small, but our presence in laptop SSDs is very strong. Smart phones and tablets are driving more cloud content, traffic and reliance on cloud storage. We have seen a dramatic increase in large HDDs for cloud storage, a trend that seems to be picking up speed, and we believe the cloud HDD market will be very healthy and will see the emergence of new, cloud-specific HDDs that are radically different and specifically designed for cool and cold storage.
There is also an explosion of SSD and PCIe flash cards in cloud computing for databases, caches, low-latency access and virtual machine (VM) enablement. Many applications that we take for granted would not be possible without these extreme low-latency, high-capacity flash products. But very few companies can make a viable storage system from flash at an acceptable cost, opening up an opportunity for many startups to experiment with different solutions.
Summary: So I believe the biggest hyperscale innovations are autonomous behavior and orchestration, HA at the datacenter level vs. machine level, and big data. These are radically changing the whole industry. And what are those changes for our industry and ecosystem? You name it: timing, end customers, new players, open initiatives, new architectures and algorithms, innovation, and consumption patterns. All that’s staying the same are legacy products and solutions.
These were great questions. Sometimes you need the press or someone outside the industry to ask a question that makes you step back and think about what’s going on. Great questions.
Tags: Alibaba, Apple Siri, Arista, ARM 64, Baidu, big data, Casandra, CEO & CIO Magazine, China, cloud storage, Cloudera, cold storage, cool storage, datacenter, datacenter ecosystem, Dell, flash, Foxconn, Google File System, Google MapReduce, Hadoop, hard disk drive, HDD, high availability, HP, hyperscale datacenter, Intel, Internet, latency, Microsoft, Mirantis, Nebula, Open Compute, OpenPower P8, OpenStack, Quanta, rack scale, RAID, Redhat Linux, SAS, SDDC, SDN, SDS, Serial Attached SCSI, software-defined datacenter, software-defined networks, software-defined storage, solid state drive, SSD, Tencent, TYAN, United Stack, virtual machine, VM, Wistron
High Availability (HA) systems traditionally have been confined to large datacenters because of their high cost and the difficulty of scaling down clustered servers and shared storage arrays to support smaller environments such as Small Office Home Office (SOHO) and Remote Business Office (ROBO).
Microsoft and LSI are changing that.
As part of Windows Server® 2012, Microsoft and LSI collaborated on the development of the innovation called Cluster in a Box (CiB). With CiB, HA systems are now available for SOHO and ROBO applications. At AIS, we’re demonstrating our Syncro® CS High Availability controller in a clustered server system.
Our demo shows how Syncro CS, with its easy-to-deploy yet powerful automatic failover, helps protect and provide cost-effective continuous access and availability to your valuable data. The solution now supports both Linux® and Windows® OS environments.
Last June, we launched Syncro CS solutions with a demo at the Microsoft® Tech Ed Conference. The demo featured a Syncro CS discrete server cluster using two servers, two Syncro CS controllers and a JBOD. Each server was loaded with Windows Server 2012 in a cluster. Syncro CS controllers enabled the shared storage. The entire system was interconnected with a “backbone communications” system provided through a SAS interface. Each server was running Microsoft Server 2012 Hyper-V with a virtual machine (VM) housing Counterstrike 1.6 server. Basically, the Counterstrike server was built into a Syncro CS High Availability server cluster. Clients accessed the game with Microsoft Surface Tablets.
When one of the servers was turned off, the automatic failover engaged and the tablet users never were aware of the “failure.”
Our AIS demo uses RHEL 6.4 Linux as the native OS running the KVM hypervisor with a Windows Server 2012 VM. Microsoft Surface Tablets access the Counterstrike server housed in the Windows VM.
The demonstration highlights the option for administrators to use Linux as the native operating system for each server while running Windows applications in HA architectures.
At AIS, we’re also excited about our panel discussion “Delivering a Paradigm of High Availability” featuring several industry experts and thought leaders. On the panel are Michael Steineke, VP Information Technology, Edgenet; Trenton Baker, VP Business Development, DataOn; Gene Lee, CEO, EchoStreams; John Loveall, Principal Program Manager, Microsoft Windows Server, Microsoft Corporation; Greg Kleiman, Director Strategy, Storage Business Unit, Red Hat; and Rick Reisner, Product Line Director, Datacenter Solutions Group, LSI.
The panel will discuss market needs for HA storage, offer their perspectives on product deployment, and discuss potential future HA use cases and product developments.
In preparation for the development of Windows Server® 2012, Microsoft polled customers and found that features that make high availability easier to configure and more affordable are critical. Little wonder. The features are pennies from heaven to the vast universe of smaller IT shops that often have found traditional high-availability solutions too expensive and difficult to install and maintain.
In a recent video, John Loveall, principal program manager for the Windows Server Division of Microsoft, discusses how Microsoft® Windows Server 2012 and the LSI® Syncro™ CS solution can make it easier for organizations of all sizes to deploy high availability.
While large organizations remain a vital proving ground for new breeds of computer gear, Loveall sees small businesses, branch offices and private cloud environments using high-availability systems as a window into the future of server technology.
When I am out on the road in Europe, visiting customers and partners, one common theme that comes up on a daily basis is that high-availability systems are essential to nearly all businesses regardless of size or industry. Sadly, all too often we see what can happen when systems running business-critical applications such as transaction processing, Web servers or electronic commerce are not accessible – potentially lost revenue and lost productivity, leading to dramatically downward-spiralling customer satisfaction.
To reduce this risk, the industry focus has been on achieving the best level of high availability, and for the enterprise market segment this has often meant installing and running storage area network (SAN) solutions. SANs can offer users a complete package – scalability, performance, centralised management and the all-important uptime or high availability.
Drawbacks of SAN
But for all its positives, the SAN also has its downsides. To ensure continuous application availability, server clustering and shared-node connections that build redundancy into a cluster and eliminate single points of failure are crucial. The solution is not only extremely complex, it can have a hefty price tag, amounting to tens of thousands of dollars, and can be hard for many smaller to medium-sized businesses to afford.
When considering budgets and storage needs, many businesses have shied away from investing in a SAN and opted for a far simpler direct attached storage (DAS) solution – mainly because it can be far easier to implement and considerably cheaper. Historically, however, the biggest problem with this was that DAS could not offer high availability, and recovery from a server or storage failure could take several hours or even days.
Combining the simplicity of DAS with the high availability of SAN storage
As businesses work to reduce storage costs, simplify deployment, and increase agility and uptime in the face of massive data growth, storage architects are often looking for a way to combine the best of both worlds: the simplicity of DAS storage and the high availability of SAN storage. The goal for many is to create a system that is not only cheaper than a regular SAN but also offers full redundancy, less management complexity and guarantees uptime for the business in case a server goes down.
LSI has pioneered an HA-DAS solution, Syncro™ CS, that costs approximately 30% less than traditional HA entry-level SAN solutions, depending on the solution/configuration. It reduces complexity by providing fully redundant, shared-node storage and application failover, without requiring storage networking hardware. Syncro CS solutions are also designed to reduce latency compared to SAN-based solutions, helping to accelerate storage I/O performance and speed applications.
The good news for businesses that rely on DAS is that they have an option, Syncro CS, to now more easily upgrade their DAS infrastructure to help achieve high availability, with easier management and lower cost. The result is a much simpler failover solution that provides more affordable business continuity and reduces downtime.