Open Compute and OpenStack are changing the datacenter world that we know and love. I thought they were having impact. Changing our OEMs and ODM products, changing what we expect from our vendors, changing the interoperability of managing infrastructure from different vendors. Changing our ability to deploy and manage grid and scale-out infrastructure. And changing how quickly and at what high level we can be innovating. I was wrong. It’s happening much more quickly than I thought.
On November 20-21 we hosted LSI AIS 2013. As I mentioned in a previous post, I was lucky enough to moderate a panel about Open Compute and OpenStack – “the perfect storm.” Truthfully? It felt more like sitting with two friends talking about our industry over beer. I hope to pick up that conversation again someday.
The panelists were awesome: Cole Crawford of Open Compute and Chris Kemp of OpenStack. These guys are not only influential. They have been involved from the very start of these two initiatives, and are in many ways key drivers of both movements. These are impressive, passionate guys who really are changing the world. There aren’t too many of us who can claim that. It was an engaging hour that I learned quite a bit from, and I think the audience did too. I wanted to share from my notes what I took away from that panel. I think you’ll be interested.
Goals and Vision: two “open source” initiatives
There were a few motivations behind Open Compute, and the goal was to improve these things.
The goal then, for the first time, is to work backwards from workload and create open source hardware and infrastructure that is openly available and designed from the start for large scale-out deployments. The idea is to drive high efficiency in cost, materials use and energy consumption. More work/$.
One surprising thing that came up – LSI is in every current contribution in Open Compute.
OpenStack layers services that describe abstractions of computer networking and storage. LSI products tend to sit at that lowest level of abstraction, where there is now a wave of innovation. OpenStack had similar fragmentation issues to deal with and its goals are something like:
There is a certain amount of compatibility with Amazon’s cloud services. Chris’s point was that Amazon is incredibly innovative and a lot of enterprises should use it, but OpenStack enables both service providers and private clouds to compete with Amazon, and it allows unique innovation to evolve on top of it.
OpenStack and Open Compute are not products. They are “standards” or platform architectures, with companies using those standards to innovate on top of them. The idea is for one company to innovate on another’s improvements – everybody building on each other’s work. A huge brain trust. The goal is to create a competitive ecosystem and enable a rapid pace of innovation, and enable large-scale, inexpensive infrastructure that can be managed by a small team of people, and can be managed like a single server to solve massive scale problems.
Here’s their thought. Hardware is a supply chain management game + services. Open Compute is an opportunity for anyone to supply that infrastructure. And today, OEMs are killer at that. But maybe ODMs can be too. Open Compute allows innovation on top of the basic interoperable platforms. OpenStack enables a framework for innovation on top as well: security, reliability, storage, network, performance. It becomes the enabler for innovation, and it provides an “easy” way for startups to plug into a large, vibrant ecosystem. And for customers – someone said its “exa data without exadollar”…
As a result, the argument is this should be good for OEMs and ISVs, and help create a more innovative ecosystem and should also enable more infrastructure capacity to create new and better services. I’m not convinced that will happen yet, but it’s a laudable goal, and frankly that promise is part of what is appealing to LSI.
Open Compute and OpenStack are “peanut butter and jelly”
Ok – if you’re outside of the US, that may not mean much to you. But if you’ve lived in the US, you know that means they fit perfectly, and make something much greater together than their humble selves.
Graham Weston, Chairman of the Rackspace Board, was the one who called these two “peanut butter and jelly.”
Cole and Chris both felt the initiatives are co-enabling, and probably co-travelers too. Sure they can and will deploy independently, but OpenStack enables the management of large scale clusters, which really is not easy. Open Compute enables lower cost large-scale manageable clusters to be deployed. Together? Large-scale clusters that can be installed and deployed more affordably, and easily without hiring a cadre of rare experts.
Personally? I still think they are both a bit short of being ready for “prime time” – or broad deployment, but Cole and Chris gave me really valid arguments to show me I’m wrong. I guess we’ll see.
US or global vision?
I asked if these are US-centric or global visions. There were no qualms – these are global visions. This is just the 3rd anniversary of OpenStack, but even so, there are OpenStack organizations in more than 100 countries, 750 active contributors, and large-scale deployments in datacenters that you probably use every day – especially in China and the US. Companies like PayPal and Yahoo, Rackspace, Baidu, Sina Weibo, Alibaba, JD, and government agencies and HPC clusters like CERN, NASA, and China Defense.
Open Compute is even younger – about 2 years old. (I remember – I was invited to the launch). Even so, most of Facebook’s infrastructure runs on Open Compute. Two Wall Street banks have deployed large clusters, with more coming, and Riot Games, which uses Open Compute infrastructure, drives 3% of the global network traffic with League of Legends. (A complete aside – one of my favorite bands to workout with did a lot of that game’s music, and the live music at the League of Legends competition a few months ago: http://www.youtube.com/watch?v=mWU4QvC09uM – not for everyone, but I like it.)
Both Cole and Chris emailed me more data after the fact on who is using these initiatives. I have to say – they are right. It really has taken off globally, especially OpenStack in the fast-paced Chinese market this year.
Book: 4th Paradigm – A tribute to computer science researcher Jim Grey
Cole and Chris mentioned a book during the panel discussion. A book I had frankly never heard of. It’s called the 4th Paradigm. It was a series of papers dedicated to researcher Jim Grey, who was a quiet but towering figure that I believe I met once at Microsoft Research. The book was put together by Gordon Bell, someone who I have met, and have profound respect for. And there are mentions of people, places, and things that have been woven through my (long) career. I think I would sum up its thesis in a quote from Jim Grey near the start of the book:
“We have to do better producing tools to support the whole research cycle – from data capture and data curation to data analysis and data visualization.”
This is stunningly similar to the very useful big data framework we have been using recently at LSI: ”capture, hold, analyze”… I guess we should have added visualize, but that doesn’t have too much to do with LSI’s business.
As an aside, I would recommend this book for the background and inspiration in why we as an industry are trying to solve many of these computer science problems, and how transformational the impact might be. I mean really transformational in the world around us, what we know, what we can do, and how quickly we can do it – which is tightly related to our CEO’s keynote and the vision video at AIS.
Demos at AIS: “peanut butter and jelly” - and bread?
Ok – I’m struggling for analogy. We had an awesome demo at AIS that Chris and Cole pointed out during the panel. It was originally built using Nebula’s TOR appliance, Open Compute hardware, and LSI’s storage magic to make it complete. The three pieces coming together. Tasty. The Open Compute hardware was swapped out last minute (for safety, those boxes were meant for the datacenter – not the showcase in a hotel with tipsy techies) and were generously supplied by Supermicro.
I don’t think the proto was close to any one of our visions, but even as it stood, it inspireda lot of people, and would make a great product. A short rack of servers, with pooled storage in the rack, OpenStack orchestrating the point and click spawning and tear down of dynamically sized LUNs of different characteristics under the Cinder presentation layer, and deployment of tasks or VMs on them.
We’re working on completing our joint vision. I think the industry will be very impressed when they see it. Chris thinks people will be stunned, and the industry will be changed.
Catalyzing the market… The future may be closer than we think…
Ultimately, this is all about economics. We’re in the middle of an unprecedented bifurcation in IT use. On one hand we’re running existing apps on new, dense enterprise hardware using VMs to layer many applications on few servers. On the other, we’re investing in applications to run at scale across inexpensive clusters of commodity hardware. This has spawned a split in IT vendor business units, product lines and offerings, and sometimes even IT infrastructure management in the datacenter.
New applications and services are needing more infrastructure, and are getting more expensive to power, cool, purchase, run. And there is pressure to transform the datacenter from a cost center into a profit center. As these innovations start, more companies will need scale infrastructure, arguably Open Compute, and then will need an Openstack framework to deploy it quickly.
Whats this mean? With a combination of big data and mobile device services driving economic value, we may be at the point where these clusters start to become mainstream. As an industry we’re already seeing a slight decline in traditional IT equipment sales and a rapid growth in scale-out infrastructure sales. If that continues, then OpenStack and Open Compute are a natural fit. The deployment rate uptick in life sciences, oil and gas, financials this year – really anywhere there is large-scale Hadoop, big data or analytics – may be the start of that growth curve. But both Chris and Cole felt it would probably take 5 years to truly take off.
Time to Wrap Up
I asked Chris and Cole for audience takeaways. Theirs were pretty simple, though possibly controversial in an industry like ours.
Hardware vendors should think about products and how they interface and what abstractions they present and how they fit into the ecosystem. These new ecosystems should allow them to easily plug in. For example, storage under Cinder can be quickly and easily morphed – that’s what we did with our demo.
We should be designing new software to run on distributed scale-out systems in clouds. Chris went on to say their code name was “Maestro” because it orchestrates like in a symphony, bringing things together in a beautiful way. He said “make instruments for the artists out there.” The brain trust. Look for their brushstrokes.
Innovate in the open, and leverage the open initiatives that are available to accelerate innovation and efficiency.
On your next IT purchase, try an RFP with an Open Compute vendor. Cole said you might be surprised. Worst case, you may get a better deal from your existing vendor.
So, Open Compute and Openstack are changing the datacenter world that we know and love. I thought these were having a quick impact, changing our OEMs and ODM products, changing what we expect from our vendors, changing the interoperability of managing infrastructure from different vendors, changing our ability to deploy and manage grid and scale-out infrastructure, and changing how quickly and at what high level we can be innovating. I was wrong. It’s happening much more quickly than even I thought.
Tags: AIS, Alibaba, Amazon, Baidu, big data, CERN, China, China Defense, Chris Kemp, Cole Crawford, datacenter, Facebook, Hadoop, HPC, IT infrastructure, JD, Jim Grey, NASA, Nebula, Networking, Open Compute, OpenStack, PayPal, Rackspace, Riot Games, scale-out cluster, Sina Weibo, Storage, Supermicro, Yahoo
Last week at LSI’s annual Accelerating Innovation Summit (AIS) the company took the wraps off a vision that should lead its technical direction for the next few years.
In his keynote, LSI CEO Abhi Talwalkar shared a video of three situations as they might evolve in the future:
I’ll focus on just one of these to show how LSI expects the future to develop. In the bicycle accident scenario, a businessman falls to the ground while riding a bicycle in a foreign country. Security cameras that have been upgraded to understand what they see notify an emergency services agency which sends an ambulance to the scene. The paramedic performs a retinal scan on the victim, using it to retrieve his medical records, including his DNA sequence, from the web.
The businessman’s wearable body monitoring system also communicates with the paramedic’s instruments to share his vital signs. All of this information is used by cloud-based computers to determine a course of action which, in the video, requires an injection that has been custom-tuned to the victim’s current situation, his medical history, and his genetic makeup.
That’s a pretty tall order, and it will require several advances in the state of the art, but LSI is using this and other scenarios to work with its clients and translate this vision into the products of the future.
What are the key requirements to make this happen? Talwalkar told the audience that we need to create a society that is supported by preventive, predictive and assisted analytics to move in a direction where the general welfare is assisted by all that the Internet and advanced computing have to offer. Since data is growing at an exponential rate, he argued that this will require the instant retrieval of interlinked data objects at scale. Everything that is key to solving the task must be immediately available, and must be quickly analyzed to provide a solution to the problem at hand. The key will be the ability to process interlinked pieces of data that have not been previously structured to handle any particular situation.
To achieve this we will need larger-scale computing resources than are currently available, all closely interconnected, that all operate at very high speeds. LSI hopes to tap into these needs through its strengths in networking and communications chips for the communications, its HDD and server and storage connectivity array chips and boards for large-scale data, and its flash controller memory and PCIe SSD expertise for high performance.
LSI brought to AIS several of the customers and partners it is working with using to develop these technologies. Speakers from Intel, Microsoft, IBM, Toshiba, Ericsson and others showed how they are working with LSI’s various technologies to improve the performance of their own systems. On the exhibition floor booths from LSI and many of its clients demonstrated new technologies that performed everything from high-speed stock market analysis to fast flash management.
It’s pretty exciting to see a company that has a clear vision of its future and is committed to moving its entire ecosystem ahead to make that happen and help companies manage their business more effectively during what LSI calls the “Datacentric Era.” LSI has certainly put a lot of effort into creating a vision and determining where its talents can be brought to bear to improve our lives in the future.
Tags: Abhi Talkwalkar, AIS, chips, communications, connectivity, data, Datacentric Era, Ericsson, flash, flash memory, hard disk drive, HDD, IBM, Intel, large-scale data, Microsoft, Networking, server, Storage, Toshiba
The problem with multicore processors isn’t that they have a lot of cores. I hope my IC designer colleagues don’t jump me when I say that having more than one core on a chip is a simple matter of cut and paste. The tricky part is getting all those cores to work together – a coordinated, efficient effort is key. After all, if it were enough for the cores to work independently, we would just use multiple single-core processors. To be sure, the devil is in the details of connecting cores and managing how they share resources.
A key value of a multicore processor is using the processing muscle of additional cores – all working on a problem at the same time – to accelerate system performance. Basically, two heads are better than one. And 16 are even better. That is if they don’t get in each other’s way. When multiple cores are working on one job, they need to deftly hand off information to each other and to other on-chip resources like memory and I/O. Managing and streamlining the movement of all that information to minimize delays can require complex traffic management. If one core or another resource becomes a bottleneck, the entire performance benefit of multiple cores can be lost.
The challenge of cache coherence
Another complexity of coordinating multiple cores is cache coherence – the process of ensuring the consistency of data stored in each processor’s cache memory. Processors store frequently accessed information in this small, fast memory so they don’t have to access it again and again from slower storage such as main memory or disks. For example, if a core is running an application for ordering products online, it might load the inventory record for a particular product from disk into cache, modify it, and then write it back to disk when the transaction is complete.
The rub arises when more than one core caches the same data. If two cores were running the online ordering application, they might both cache the same inventory record. Both cores might then execute a transaction to sell the last unit of that product and not detect that the product is sold out. In a system with coherent cache, when one core makes any changes to cached data, all other cores storing the same data are notified that their cache is outdated, prompting an update for consistency. Tracking all cached data and making sure it is coherent is a formidable effort requiring highly sophisticated cache management.
A third challenge in getting multicore design right is choosing the number and type of cores. Networking system workloads consist of varying tasks. Some are large complex tasks that require powerful general-purpose cores running complex programs. Others are very simple, quick tasks that are executed millions of times a second and are best handled by specialized compute engines. And of course there are tasks that fall between these extremes. Getting the right number and mix of compute engines requires detailed understanding of the applications the multicore processor will be used in. Too many cores and the processor consumes too much power. Too few of one type of core and the others sit idle wasting cost and, again, power.
Striking the right balance of interconnect, cache coherence and cores
The problem with multicore processors is getting the right combination of interconnect, cache coherence and number and type of cores. LSI’s latest solution to the multicore challenge for enterprise networking is the Axxia® 4500 family of processors. For general-purpose processing, the Axxia 4500 features up to 4 ARM® Cortex™ A15 cores that deliver high performance and power efficiency in a standard Linux programming environment. For special-purpose packet processing, the new chips offer up to 50Gb/s packet processing and acceleration engines for security encryption, deep packet inspection, traffic management and other networking functions. Connecting all these compute resources is the ARM Corelink CCN504 interconnect with integrated cache coherence and quality of service technologies for efficient on-chip communications.
The staggering growth of smart phones, tablets and other mobile devices is sending a massive flood of data through today’s mobile networks. Compared to just a few years ago, we are all producing and consuming far more videos, photos, multimedia and other digital content, and spending more time in immersive and interactive applications such as video and other games – all from handheld devices.
Think of mobile, and you think remote – using a handheld when you’re out and about. But according to the Cisco® VNI Mobile Forecast 2013, while 75% of all videos today are viewed on mobile devices by 2017, 46% of mobile video will be consumed indoors (at home, at the office, at the mall and elsewhere). With the widespread implementation of IEEE® 802.11 WiFi on mobile devices, much of that indoor video traffic will be routed through fixed broadband pipes.
Unlike residential indoor solutions, enterprise and public area access infrastructures – for outdoor connections – are much more diverse and complicated. For example, the current access layer architectures include Layer 2/3 wiring closet switches and WiFi access points, as shown below. Mobile service providers are currently seeking architectures that enable them to take advantage of both indoor enterprise and public area access infrastructure. These architectures must integrate seamlessly with existing mobile infrastructures and require no investment in additional access equipment by service providers in order for them to provide a consistent, quality experience for end users indoors and outdoors. For their part, mobile service providers must:
The following figure shows the three possible paths mobile service providers wanting to offer indoor enterprise/public can take. Approach 1 is ideal for enterprises trying to improve coverage in particular areas of a corporate campus. Approaches 2 and 3 not only provide uniform coverage across the campus but also support differentiating capabilities such as the allocation of application and mobility-centric radio spectrum across WiFi and cellular frequencies. A key factor to consider when evaluating these approaches is the extent to which equipment ownership is split between the enterprise and the mobile service provider. Approaches 2 & 3 increase capital expenditures for the operator because of the radio heads and small cells that need to be deployed across the enterprise or public campus. At AIS, LSI is demonstrating approach 2.
The last but certainly not least important consideration between approaches 2 and 3 is whether these indoor/outdoor small cells employ self-organizing network (SON) techniques. For service providers, the small cells ideally would be self-organizing and the macro cells serve any additional management functions. The advantage of approach 3 is that it offloads more of the macro cell traffic and makes various campus small cells self-organizing, significantly reducing operational costs for the service provider.
The United Nations finding that mobile broadband subscriptions are surging in developing countries, reported by The New York Times on Sept. 26, is no surprise. Equally unsurprising, the growing number of users, density of users and increasing bandwidth needs of applications likely are continuing to strain existing wireless networks and per-user bandwidths not only in developing countries but worldwide.
But rising pressure on bandwidth, coupled with increasingly data-intensive applications, isn’t the whole story. Minimizing end-to-end latency – from user to network base station and back again – is crucial in enabling banking, e-commerce, enterprise and other important business applications. Why? The greater the latency, the more likely visitors are to lose interest if the responsiveness of the website is sluggish. A connection may have plenty of throughput over a period of time, but response time determines the user experience.
The bandwidth-per-user and end-to-end network latency constraints are bound to drive changes both to the front haul and backhaul access networks. LTE and WiFi seem to be clear winners for the front haul network (replacing wired LAN technologies). On the backhaul, given the capacity needs, wired and wireless networks are bound to converge but will likely offer many options that will continue to co-exist like LTE, Fiber, Cable, xDSL and Microwave.
For our part, LSI has deep experience building mission-critical networks for service providers and datacenters – an expertise that has been brought to bear on the development of LSI® Axxia® networking solutions. These smart chips help solve the latency problem by enabling reliable, deterministic network performance to, ultimately, quicken response times and improve the user experience.
And that, after all, is just what network providers and users are after as mobile devices continue to support more applications and rising performance expectations worldwide.
No, you are not about to read some Luddite rant about how smart phones are destroying our society. I love smart phones and most of you do too. It’s remarkable how quickly we have gone from arguing over the definition of a smart phone to not being able to live without them. In fact, the rapid adoption of smart phones has led to the problem I am going to talk about: smart phones can overwhelm dumb wireless networks.
Many of the networks that carry the wireless data to and from our smart phones are built with chips that were designed before Apple announced the first iPhone® in June of 2007. It takes a year or two to get a new semiconductor chip designed and built. Then another year or two for network equipment manufacturers to get their products into the market. By the time that new equipment has been deployed into networks around the world, five or six years have passed since chip designers decided what features their networking chips would have.
Even the latest 4G networks are built with chips that were designed before Apple invited everyone to store their music libraries in the cloud and before Vine enabled every kid with a phone to create and distribute videos. Today’s networks were not designed with these wireless data applications in mind and they are struggling to keep up.
Making dumb networks smarter
The problem is proving hard to solve because data traffic is growing faster than the obvious ways to cope with it. Network operators can’t simply deliver more network capacity. Available spectrum is limited as is the capital to invest in expanded networks. The seemingly inevitable improvements in technology performance aren’t enough to solve the problem either. Demand for data traffic is growing faster than Moore’s law can answer. Doing more of the same thing or doing the same thing faster isn’t enough. Networking companies need to figure out new ways to handle data. We need to make dumb networks smarter.
When I say “dumb networks,” I am referring to the fact that most of the existing wireless data networks were designed to move a packet of data from point A to point B in a reasonably short time. That’s a fine approach when wireless networks can easily carry occasional stock updates and photo uploads from a few million early adopters. But now, when 90% of handset sales are iPhones or Android® phones, the networks have become overwhelmed with data. Treating data packets with equal importance – whether they are part of a VOIP phone call, business critical data or the 40 thousandth download of a cute panda video – doesn’t make sense anymore.
Prioritizing data for higher speed
As networks get smarter, they will be able to triage data – for example, identifying voice packets to maintain call quality. Smart networks will know if the same video has been downloaded 5 times in the last minute, and will store it locally to speed the next download. Smart networks will know if a business user has contracted for a guaranteed level of service and prioritize those packets accordingly. Smart networks will know if an application update can wait until times of the day when the volume of network traffic is lower. Smart networks will know if a flow of packets contains virus software that could damage your phone or the network itself.
To be smart about the data being transported, networks need a higher level of real-time analytical intelligence. We are now seeing the introduction of networking chips and equipment designed in the era of the smart phone. Networks are now gaining the ability to distinguish the nature of the data contained in a packet and to make smart decisions about the way the data is delivered. Networks are, in a word, becoming smarter – better able to manage the crush of data coursing through them every day. Smart networks may soon be able to stand up to smartphones, and perhaps even outwit them.