Big data, it’s the buzz word of the year and it’s generating a lot of attention. An incalculable number of articles fervently repeat the words “variety, velocity and volume,” citing click streams, RFID tags, email, surveillance cameras, Twitter® feeds, Facebook® posts, Flickr® images, blog musings, YouTube® videos, cellular texting, healthcare monitoring …. (gasps for air). We have become a society that sweats buckets of data every day (the latest estimates are approximately 34GB per person every 24 hours) and businesses are scrambling to capture all this information to learn more about us.
Save every scrap of data!
“Save all your data” has become the new business mantra, because data – no matter how seemingly meaningless it appears – contains information, and information provides insight, and improved insight makes for better decision-making, and better decision-making leads to a more efficient and profitable business.
Okay, so we get why we save data, but if the electronic bit bucket costs become prohibitive, big data could turn into its own worst enemy, undermining the value of mining data. While Hadoop® software is an excellent (and cost-free) tool for storing and analyzing data, most organizations use a multitude of applications in conjunction with Hadoop to create a system for data ingest, analytics, data cleansing and record management. Several Hadoop vendors (Cloudera, MapR, Hortonworks, Intel, IBM, Pivotal) offer bundled software packages that ease integration and installation of these applications.
Installing a Hadoop cluster to manage big data can be a chore
With the demand for data scientists growing, the challenge can become finding the right talent to help build and manage a big data infrastructure. A case in point: Installing a Hadoop cluster involves more than just installing the Hadoop software. Here is the sequence of steps:
Setup, from bare bones to a simple 15-node cluster, can take weeks to months including planning, research, installation and integration. It’s no small job.
Appliances simplify Hadoop cluster deployments
Enter appliances: low-cost, pre-validated, easy-to-deploy “bricks.” According to a Gartner forecast (Forecast: Data Center Hardware Spending to Support Big Data Projects, Worldwide 2013), appliance spending for big data projects will grow from 0.9% of hardware spending in 2012 to 9.3% by 2017. I have found myself inside a swirl of new big data appliance projects all designed to provide highly integrated systems with easy support and fully tested integration. An appliance is a great turnkey solution for companies that can’t (or don’t wish to) employ a hardware and software installation team: Simply pick up the box from the shipping area, unpack it and start analyzing data within minutes. In addition, many companies are just beginning to dabble in Hadoop, and appliances can be an easy, cost-effective way to demonstrate the value of Hadoop before making a larger investment.
While Hadoop is commonplace in the big data infrastructure, the use models can be quite varied. I’ve heard my fair share of highly connected big data engineers attempt to identify core categories for Hadoop deployments, and they generally fall into one of four categories:
Finding the right appliance for you
While appliances lower the barrier to entry to Hadoop clusters, their designs and costs are as varied as their use cases. Some appliances build in the flexibility of cloud services, while others focus on integration of applications components and reducing service level agreements (SLAs). Still others focus primarily on low cost storage. And while some appliances are just hardware (although they are validated designs), they still require a separate software agreement and installation via a third-party vendor.
In general, pricing is usually quoted either by capacity ($/TB), or per node or rack depending on the vendor and product. Licensing can significantly increase overall costs, with annual maintenance costs (software subscription and support) and license renewals adding to the cost of doing business. The good news is that, with so many appliances to choose from, any organization can find one that enables it to design a cluster that fits its budget, operating costs and value expectations.
Tags: analytics, appliance, big data, cloud services, Cloudera, cluster, data mining, data sequencing, data storage, database applications, database management systems, DBMS, Facebook, Flickr, Gartner, Hadoop, high availability, Hortonworks, IBM, image processing, Intel, JobTracker, Kerberos, MapR, NameNode, Pivotal, Secure Shell, service level agreement, SLA, ssh, Twitter, web crawler, workflow processing, YouTube, ZooKeeper
I was asked some interesting questions recently by CEO & CIO, a Chinese business magazine. The questions ranged from how Chinese Internet giants like Alibaba, Baidu and Tencent differ from other customers and what leading technologies big Internet companies have created to questions about emerging technologies such as software-defined storage (SDS) and software-defined datacenters (SDDC) and changes in the ecosystem of datacenter hardware, software and service providers. These were great questions. Sometimes you need the press or someone outside the industry to ask a question that makes you step back and think about what’s going on.
I thought you might interested, so this blog, the first of a 3-part series covering the interview, shares details of the first two questions.
CEO & CIO: In recent years, Internet companies have built ultra large-scale datacenters. Compared with traditional enterprises, they also take the lead in developing datacenter technology. From an industry perspective, what are the three leading technologies of ultra large-scale Internet data centers in your opinion? Please describe them.
There are so many innovations and important contributions to the industry from these hyperscale datacenters in hardware, software and mechanical engineering. To choose three is difficult. While I would prefer to choose hardware innovations as their big ones, I would suggest the following as they have changed our world and our industry and are changing our hardware and businesses:
Autonomous behavior and orchestration
An architect at Microsoft once told me, “If we had to hire admins for our datacenter in a normal enterprise way, we would hire all the IT admins in the world, and still not have enough.” There are now around 1 million servers in Microsoft datacenters. Hyperscale datacenters have had to develop autonomous, self-managing, sometimes self-deploying datacenter infrastructure simply to expand. They are pioneering datacenter technology for scale – innovating, learning by trial and error, and evolving their practices to drive more work/$. Their practices are specialized but beginning to be emulated by the broader IT industry. OpenStack is the best example of how that specialized knowledge and capability is being packaged and deployed broadly in the industry. At LSI, we’re working with both hyperscale and orchestration solutions to make better autonomous infrastructure.
High availability at datacenter level vs. machine level
As systems get bigger they have more components, more modes of failure and they get more complex and expensive to maintain reliability. As storage is used more, and more aggressively, drives tend to fail. They are simply being used more. And yet there is continued pressure to reduce costs and complexity. By the time hyperscale datacenters had evolved to massive scale – 100’s of thousands of servers in multiple datacenters – they had created solutions for absolute reliability, even as individual systems got less expensive, less complex and much less reliable. This is what has enabled the very low cost structures of the cloud, and made it a reliable resource.
These solutions are well timed too, as more enterprise organizations need to maintain on-premises data across multiple datacenters with absolute reliability. The traditional view that a single server requires 99.999% reliability is giving way to a more pragmatic view of maintaining high reliability at the macro level – across the entire datacenter. This approach accepts the failure of individual systems and components even as it maintains data center level reliability. Of course – there are currently operational issues with this approach. LSI has been working with hyperscale datacenters and OEMs to engineer improved operational efficiency and resilience, and minimized impact of individual component failure, while still relying on the datacenter high-availability (HA) layer for reliability.
It’s such an overused term. It’s difficult to believe the term barely existed a few years ago. The gift of Hadoop® to the industry – an open source attempt to copy Google® MapReduce and Google File System – has truly changed our world unbelievably quickly. Today, Hadoop and the other big data applications enable search, analytics, advertising, peta-scale reliable file systems, genomics research and more – even services like Apple® Siri run on Hadoop. Big data has changed the concept of analytics from statistical sampling to analysis of all data. And it has already enabled breakthroughs and changes in research, where relationships and patterns are looked for empirically, rather than based on theories.
Overall, I think big data has been one of the most transformational technologies this century. Big data has changed the focus from compute to storage as the primary enabler in the datacenter. Our embedded hard disk controllers, SAS (Serial Attached SCSI) host bus adaptors and RAID controllers have been at the heart of this evolution. The next evolutionary step in big data is the broad adoption of graph analysis, which integrates the relationship of data, not just the data itself.
CEO & CIO: Due to cloud computing, mobile connectivity and big data, the traditional IT ecosystem or industrial chain is changing. What are the three most important changes in LSI’s current cooperation with the ecosystem chain? How does LSI see the changes in the various links of the traditional ecosystem chain? What new links are worth attention? Please give some examples.
Cloud computing and the explosion of data driven by mobile devices and media has and continues to change our industry and ecosystem contributors dramatically. It’s true the enterprise market (customers, OEMs, technology, applications and use cases) has been pretty stable for 10-20 years, but as cloud computing has become a significant portion of the server market, it has increasingly affected ecosystem suppliers like LSI.
Timing: It’s no longer enough to follow Intel’s ticktock product roadmap. Development cycles for datacenter solutions used to be 3 to 5 years. But these cycles are becoming shorter. Now, demand for solutions is closer to 6 months – forcing hardware vendors to plan and execute to far tighter development cycles. Hyperscale datacenters also need to be able to expand resources very quickly, as customer demand dictates. As a result they incorporate new architectures, solutions and specifications out of cycle with the traditional Intel roadmap changes. This has also disrupted the ecosystem.
End customers: Hyperscale datacenters now have purchasing power in the ecosystem, with single purchase orders sometimes amounting to 5% of the server market. While OEMs still are incredibly important, they are not driving large-scale deployments or innovating and evolving nearly as fast. The result is more hyperscale design-win opportunities for component or sub-system vendors if they offer something unique or a real solution to an important problem. This also may shift profit pools away from OEMs to strong, nimble technology solution innovators. It also has the potential to reduce overall profit pools for the whole ecosystem, which is a potential threat to innovation speed and re-investment.
New players: Traditionally, a few OEMs and ISVs globally have owned most of the datacenter market. However, the supply chain of the hyperscale cloud companies has changed that. Leading datacenters have architected, specified or even built (in Google’s case) their own infrastructure, though many large cloud datacenters have been equipped with hyperscale-specific systems from Dell and HP. But more and more systems built exactly to datacenter specifications are coming from suppliers like Quanta. Newer network suppliers like Arista have increased market share. Some new hyperscale solution vendors have emerged, like Nebula. And software has shifted to open source, sometimes supported for-pay by companies copying the Redhat® Linux model – companies like Cloudera, Mirantis or United Stack. Personally, I am still waiting for the first 3rd-party hardware service emulating a Linux support and service company to appear.
Open initiatives: Yes, we’ve seen Hadoop and its derivatives deployed everywhere now – even in traditional industries like oil and gas, pharmacology, genomics, etc. And we’ve seen the emergence of open-source alternatives to traditional databases being deployed, like Casandra. But now we’re seeing new initiatives like Open Compute and OpenStack. Sure these are helpful to hyperscale datacenters, but they are also enabling smaller companies and universities to deploy hyperscale-like infrastructure and get the same kind of automated control, efficiency and cost structures that hyperscale datacenters enjoy. (Of course they don’t get fully there on any front, but it’s a lot closer). This trend has the potential to hurt OEM and ISV business models and markets and establish new entrants – even as we see Quanta, TYAN, Foxconn, Wistron and others tentatively entering the broader market through these open initiatives.
New architectures and new algorithms: There is a clear movement toward pooled resources (or rack scale architecture, or disaggregated servers). Developing pooled resource solutions has become a partnership between core IP providers like Intel and LSI with the largest hyperscale datacenter architects. Traditionally new architectures were driven by OEMs, but that is not so true anymore. We are seeing new technologies emerge to enable these rack-scale architectures (RSA) – technologies like silicon photonics, pooled storage, software-defined networks (SDN), and we will soon see pooled main memory and new nonvolatile main memories in the rack.
We are also seeing the first tries at new processor architectures about to enter the datacenter: ARM 64 for cool/cold storage and web tier and OpenPower P8 for high power processing – multithreaded, multi-issue, pooled memory processing monsters. This is exciting to watch. There is also an emerging interest in application acceleration: general-purposing computing on graphics processing units (GPGPUs), regular expression processors (regex) live stream analytics, etc. We are also seeing the first generation of graph analysis deployed at massive scale in real time.
Innovation: The pace of innovation appears to be accelerating, although maybe I’m just getting older. But the easy gains are done. On one hand, datacenters need exponentially more compute and storage, and they need to operate 10x to 1000x more quickly. On the other, memory, processor cores, disks and flash technologies are getting no faster. The only way to fill that gap is through innovation. So it’s no surprise there are lots of interesting things happening at OEMs and ISVs, chip and solution companies, as well as open source community and startups. This is what makes it such an interesting time and industry.
Consumption shifts: We are seeing a decline in laptop and personal computer shipments, a drop that naturally is reducing storage demand in those markets. Laptops are also seeing a shift to SSD from HDD. This has been good for LSI, as our footprint in laptop HDDs had been small, but our presence in laptop SSDs is very strong. Smart phones and tablets are driving more cloud content, traffic and reliance on cloud storage. We have seen a dramatic increase in large HDDs for cloud storage, a trend that seems to be picking up speed, and we believe the cloud HDD market will be very healthy and will see the emergence of new, cloud-specific HDDs that are radically different and specifically designed for cool and cold storage.
There is also an explosion of SSD and PCIe flash cards in cloud computing for databases, caches, low-latency access and virtual machine (VM) enablement. Many applications that we take for granted would not be possible without these extreme low-latency, high-capacity flash products. But very few companies can make a viable storage system from flash at an acceptable cost, opening up an opportunity for many startups to experiment with different solutions.
Summary: So I believe the biggest hyperscale innovations are autonomous behavior and orchestration, HA at the datacenter level vs. machine level, and big data. These are radically changing the whole industry. And what are those changes for our industry and ecosystem? You name it: timing, end customers, new players, open initiatives, new architectures and algorithms, innovation, and consumption patterns. All that’s staying the same are legacy products and solutions.
These were great questions. Sometimes you need the press or someone outside the industry to ask a question that makes you step back and think about what’s going on. Great questions.
Tags: Alibaba, Apple Siri, Arista, ARM 64, Baidu, big data, Casandra, CEO & CIO Magazine, China, cloud storage, Cloudera, cold storage, cool storage, datacenter, datacenter ecosystem, Dell, flash, Foxconn, Google File System, Google MapReduce, Hadoop, hard disk drive, HDD, high availability, HP, hyperscale datacenter, Intel, Internet, latency, Microsoft, Mirantis, Nebula, OEM, Open Compute, OpenPower P8, OpenStack, original equipment manufacturer, Quanta, rack scale, RAID, Redhat Linux, SAS, SDDC, SDN, SDS, Serial Attached SCSI, software-defined datacenter, software-defined networks, software-defined storage, solid state drive, SSD, Tencent, TYAN, United Stack, virtual machine, VM, Wistron
Hadoop has grown from an identity-challenged adolescent, a budding technology unsure of which use cases to call its own, to a fairly mature young adult with its most recent release of Hadoop® 2.0. Apache™ Hadoop® was introduced in 2007 with the primary intent to provide MapReduce-based batch processing for big data. While the original Hadoop certainly has made a big impact on how we use big data, it also had its limitations, chief among them:
YARN beefs up Hadoop for big data
Hadoop 2.0 overcomes these shortcomings. Apache’s newest software introduced the workload manager YARN (Yet Another Resource Negotiator) to replace the original MapReduce framework. YARN provides a better structure for running applications in Hadoop, making it more of a big data operating system. In the new framework, system resources are monitored by Node Managers and Application Masters. And instead of using slots, resources are dynamically allocated based on containers – cluster resources such as memory and processing times.
While Hadoop still supports MapReduce, it’s now an add-on feature. Make no mistake: YARN is a game changer for Hadoop, allowing any distributed application to work within the Hadoop architecture. Many applications have already done this – HBase, Giraph, Storm and Tez just to name a few. With YARN providing more of an operating system layer for the Hadoop architecture, the use cases are limitless. Going forward, Hadoop may very well lay the foundation for more than just analytical batch jobs, enabling greater scalability and lower cost storage to add more oxygen for the growth of relational database management systems, data warehousing and cold storage.
Automated failover and almost limitless node scalability
With Hadoop 2.0 and the new HDFS 2 features, NameNode high availability with automated failover is a standard feature – almost guaranteeing uninterrupted service to the cluster. In addition, cluster Federation, a way of carving up the NameNode’s namespace, provides almost limitless node scalability.
Other Hadoop 2.0 features include HDFS snapshots that allow point-in-time recovery of data, and enhanced security features that help ensure government compliance and authentication in multi-tenant clusters.
The ability to run so many parallel applications on top of YARN has given rise to a wide range of application data access patterns including streaming sequential for typical batch operations and low latency random for interactive queries. To accommodate this new, dizzying array of patterns, evolving datacenter infrastructures for big data will need to take advantage of a variety of hardware including spinning media, SSDs and various volatile and non-volatile memory architectures. Features such as HDFS-2832 and HDFS-4949 will give users the benefits of non-homogenous data hierarchies to help ensure the highest performance for applications such as real-time analytics processing or extract, transform and load (ETL) operations.
Hadoop 2.0 is easy to come by. Apache released its first general-availability version of Hadoop 2.0, called Hadoop 2.2, in mid-October, and within days Hortonworks released its Hortonworks Data Platform 2. Cloudera has been beta testing its CDH 5 version since November 2013, and MapR last week announced plans to release a YARN-based version in March.
Big data: more growth, greater efficiencies
The growing momentum around YARN and HDFS 2.0 promises to drive more growth and greater efficiencies in big data as more companies and open source projects build applications and toolsets that fuel more innovation. The broad availability of these tools will enable organizations of all sizes to derive deeper insight, enhance their competitiveness and efficiency and, ultimately, improve their profitability from the staggering amount of data available to them.
It’s the start of the new year, and it’s traditional to make predictions – right? But predicting the future of the datacenter has been hard lately. There have been and continue to be so many changes in flight that possibilities spin off in different directions. Fractured visions through a kaleidoscope. Changes are happening in the businesses behind datacenters, the scale, the tasks and what is possible to accomplish, the value being monetized, and the architectures and technologies to enable all of these.
A few months ago I was asked to describe the datacenter in 2020 for some product planning purposes. Dave Vellante of Wikibon & John Furrier of SiliconANGLE asked me a similar question a few weeks ago. 2020 is out there – almost 7 years. It’s not easy to look into the crystal ball that far and figure out what the world will look like then, especially when we are in the midst of those tremendous changes. For some context I had to think back 7 years – what was the datacenter like then, and how profound have the changes been over the past 7 years?
And 7 years ago, our forefathers…
It was a very different world. Facebook barely existed, and had just barely passed the “university only” membership. Google was using Velcro, Amazon didn’t have its services, cloud was a non-existent term. In fact DAS (direct attach storage) was on the decline because everyone was moving to SAN/NAS. 10GE networking was in the future (1GE was still in growth mode). Linux was not nearly as widely accepted in enterprise – Amazon was in the vanguard of making it usable at scale (with Werner Vogels saying “it’s terrible, but it’s free, as in free beer”). Servers were individual – no “PODs,” and VMware was not standard practice yet. SATA drives were nowhere in datacenters.
An enterprise disk drive topped out at around 200GB in capacity. Nobody used the term petabyte. People, including me, were just starting to think about flash in datacenters, and it was several years later that solutions became available. Big data did not even exist. Not as a term or as a technology, definitely not Hadoop or graph search. In fact, Google’s seminal paper on MapReduce had just been published, and it would become the inspiration for Hadoop – something that would take many years before Yahoo picked it up and helped make it real.
Analytics were statistical and slow, and you had to be very explicitly looking for something. Advertising on the web was a modest business. Cold storage was tape or MAID, not vast pools of cheap disks in the cloud at absurdly low price points. None of the Chinese web-cloud guys existed… In truth, at LSI we had not even started looking at or getting to know the web datacenter guys. We assumed they just bought from OEMs…
No one streamed mainstream media – TV and movies – and there were no tablets to stream them to. YouTube had just been purchased by Google. Blu-ray was just getting started and competing with HD-DVD (which I foolishly bought 7 years ago), and integrated GPS’s in your car were a high-tech growth area. The iPhone or Android had not launched, Danger’s Sidekick was the cool phone, flip phones were mainstream, there was no App store or the billions of sales associated with that, and a mobile web browser was virtually useless.
Dell, IBM, and HP were the only real server companies that mattered, and the whole industry revolved around them, as well as EMC and NetApp for storage. Cisco, Lenovo and Huawei were not server vendors. And Sun was still Sun.
7 years from now
So – 7 years from now? That’s hard to predict, so take this with a grain of salt… There are many ways things could play out, especially when global legal, privacy, energy, hazardous waste recycling, and data retention requirements come into play, not to mention random chaos and invention along the way.
Compute-centric to dataflow-centric
Major applications are changing (have changed) from compute-centric to dataflow architectures. That is big data. The result will probably be a decline in the influence of processor vendors, and the increased focus on storage, network and memory, and optimized rack-level architectures. A handful of hyperscale datacenters are leading the way, and dragging the rest of us along. These types of solutions are already being deployed in big enterprise for specialized use cases, and their adoption will only increase with time. In 7 years, the main deployment model will echo what hyperscale datacenters are doing today: disaggregated racks of compute, memory and storage resources.
The datacenter is now being viewed as a profit growth enabler, rather than a cost center. That implies more compute = more revenue. That changes the investment profile and the expectations for IT. It will not be enough for enterprise IT departments to minimize change and risk because then they would be slowing revenue growth.
Customers and vendors
We are in the early stages of a customer revolt. Whether it’s deserved or not is immaterial, though I believe it’s partially deserved. Large customers have decided (and I’m doing broad brush strokes here) that OEMs are charging them too much and adding “features” that add no value and burn power, that the service contracts are excessively expensive and that there is very poor management interoperability among OEM offerings – on purpose to maintain vendor lockin. The cost structures of public cloud platforms like Amazon are proof there is some merit to the argument. Management tools don’t scale well, and require a lot of admin intervention. ISVs are seen as no better. Sure the platforms and apps are valuable and critical, but they’re really expensive too, and in a few cases, open source solutions actually scale better (though ISVs are catching up quickly).
The result? We’re seeing a push to use whitebox solutions that are interoperable and simple. Open source solutions – both software and hardware – are gaining traction in spite of their problems. Just witness the latest Open Compute Summit and the adoption rate of Hadoop and OpenStack. In fact many large enterprises have a policy that’s pretty much – any new application needs to be written for open source platforms on scale-out infrastructure.
Those 3 OEMs are struggling. Dell, HP and IBM are selling more servers, but at a lower revenue. Or in the case of IBM – selling the business. They are trying to upsell storage systems to offset those lost margins, and they are trying to innovate and vertically integrate to compensate for the changes. In contrast we’re seeing a rapid increase planned from self-built, self-architected hyperscale datacenters, especially in China. To be fair – those pressures on price and supplier revenue are not necessarily good for our industry. As well, there are newer entrants like Huawei and Cisco taking a noticeable chunk of the market, as well as an impending growth of ISV and 3rd party full rack “shrink wrapped” systems. Everybody is joining the party.
Storage, cold storage and storage-class memory
Stepping further out on the limb, I believe (but who really knows) that by 2020 storage as we know is no longer shipping. SMB is hollowed out to the cloud – that is – why would any small business use anything but cloud services? The costs are too compelling. Cloud storage is stratified into 3 levels: storage-class memory, flash/NVM and cool/cold bulk disk storage. Cold storage is going to be a very, very important area. You need to save that data, but spend zero power, and zero $ on storing it. Just look at some of the radical ideas like Facebook’s Blu-ray jukebox to address that, which was masterminded by a guy I really like – Gio Coglitore – and I am very glad is getting some rightful attention. (http://www.wired.com/wiredenterprise/2014/02/facebook-robots/)
I believe that pooled storage class memory is inevitable and will disrupt high-performance flash storage, probably beginning in 2016. My processor architect friends and I have been daydreaming about this since 2005. That disruption’s OK, because flash use will continue to grow, even as disk use grows. There is just too much data. I’ve seen one massive vendor’s data showing average servers are adding something like 0.2 hard disks per year and 0.1 SSDs per year – and that’s for the average server including diskless nodes that are usually the most common in hyperscale datacenters. So growth in spite of disruption and capacity growth.
Data will be pooled, and connected by fabric as distributed objects or key/value pairs, with erasure coding. In fact, Object store (key/value – whatever) may have “obsoleted” block storage. And the need for these larger objects will probably also obsolete file as we’re used to it. Sure disk drives may still be block based, though key/value gives rise to all sorts of interesting opportunities to support variable size structures, obscure small fault domains, and variable encryption/compression without wasting space on disk platters. I even suspect that disk drives as we know them will be morphing into cold store specialty products that physically look entirely different and are made from different materials – for a lot of reasons. 15K drives will be history, and 10K drives may too. In fact 2” drives may not make sense anymore as the laptop drive and 15K drive disappear and performance and density are satisfied by flash.
Enterprise becomes private cloud that is very similar structurally to hyperscale, but is simply in an internal facility. And SAN/NAS products as we know them will be starting on the long end of the tail as legacy support products. Sure new network based storage models are about to emerge, but they’re different and more aligned to key/value.
Rack-scale architectures will have taken over clustered deployments. That means pooled resources. Processing will be pools of single socket SoC servers enabling massive clusters, rather than lots of 2- socket servers. These SoCs might even be mobile device SoCs at some point or at least derived from that – the economics of scale and fast cadence of consumer SoCs will make that interesting, maybe even inevitable. After all, the current Apple A7 in the iphone 5S is a dual core, 64-bit V8 ARM at 1.4GHz and the whole iPhone costs as much as mainstream server processor chips. In a few years, an 8 or 16 core equivalent at 1.5GHz or 2GHz is not hard to imagine, and the cost structure should be excellent.
Rapidly evolving open source applications will have morphed into eventually consistent dataflow tasks. Or they will be emerging in-memory applications working on vast data structures in the pooled storage class memory at the rack or larger scale, which will add tremendous monetary value to businesses. Whatever the evolutionary paths – the challenge for the next 10 years is optimizing dataflow as the amount used continues to exponentially grow. After all – data has value in aggregate, so why would you throw anything away, even as the amount we generate increases?
Clusters will be autonomous. Really autonomous. As in a new term I love: “emergent.” It’s when you can start using big data analytics to monitor the datacenter, and make workload/management and data placement decisions in real time, automatically, and the datacenter begins to take on un-predicted characteristics. Deployment will be autonomous too. Power on a pod of resources, and it just starts working. Google does that already.
Layer 2 datacenter network switches will either be disappearing or will have migrated to a radically different location in the rack hierarchy. There are many ways this can evolve. I’m not sure which one(s) will dominate, but I know it will look different. And it will have different bandwidth. 100G moving to 400G interconnect fabric over fiber.
So there you have it. Guaranteed correct…
Different applications and dataflow, different architectures, different processors, different storage, different fabrics. Probably even a re-alignment of vendors.
Predicting the future of the datacenter has not been easy. There have been, and are so many changes happening. The businesses behind them. The scale, the tasks and what is possible to accomplish, the value being monetized, and the architectures and technologies to enable all of these. But at least we have some idea what’s ahead. And it’s pretty different, and exciting.
Tags: 10 gigabit ethernet, 2020, Amazon, Apple, China, Cisco, cloud storage, cold storage, datacenter, Dell, EMC, Facebook, flash, Google, Hadoop, HP, Huawei, hyperscale datacenter, IBM, iPhone, kaleidoscope, Lenovo, NAS, NetApp, non-volatile memory, NVM, Open Compute, OpenStack, rack scale architecture, SAN, SoC, Sun, VMware, YouTube
Open Compute and OpenStack are changing the datacenter world that we know and love. I thought they were having impact. Changing our OEMs and ODM products, changing what we expect from our vendors, changing the interoperability of managing infrastructure from different vendors. Changing our ability to deploy and manage grid and scale-out infrastructure. And changing how quickly and at what high level we can be innovating. I was wrong. It’s happening much more quickly than I thought.
On November 20-21 we hosted LSI AIS 2013. As I mentioned in a previous post, I was lucky enough to moderate a panel about Open Compute and OpenStack – “the perfect storm.” Truthfully? It felt more like sitting with two friends talking about our industry over beer. I hope to pick up that conversation again someday.
The panelists were awesome: Cole Crawford of Open Compute and Chris Kemp of OpenStack. These guys are not only influential. They have been involved from the very start of these two initiatives, and are in many ways key drivers of both movements. These are impressive, passionate guys who really are changing the world. There aren’t too many of us who can claim that. It was an engaging hour that I learned quite a bit from, and I think the audience did too. I wanted to share from my notes what I took away from that panel. I think you’ll be interested.
Goals and Vision: two “open source” initiatives
There were a few motivations behind Open Compute, and the goal was to improve these things.
The goal then, for the first time, is to work backwards from workload and create open source hardware and infrastructure that is openly available and designed from the start for large scale-out deployments. The idea is to drive high efficiency in cost, materials use and energy consumption. More work/$.
One surprising thing that came up – LSI is in every current contribution in Open Compute.
OpenStack layers services that describe abstractions of computer networking and storage. LSI products tend to sit at that lowest level of abstraction, where there is now a wave of innovation. OpenStack had similar fragmentation issues to deal with and its goals are something like:
There is a certain amount of compatibility with Amazon’s cloud services. Chris’s point was that Amazon is incredibly innovative and a lot of enterprises should use it, but OpenStack enables both service providers and private clouds to compete with Amazon, and it allows unique innovation to evolve on top of it.
OpenStack and Open Compute are not products. They are “standards” or platform architectures, with companies using those standards to innovate on top of them. The idea is for one company to innovate on another’s improvements – everybody building on each other’s work. A huge brain trust. The goal is to create a competitive ecosystem and enable a rapid pace of innovation, and enable large-scale, inexpensive infrastructure that can be managed by a small team of people, and can be managed like a single server to solve massive scale problems.
Here’s their thought. Hardware is a supply chain management game + services. Open Compute is an opportunity for anyone to supply that infrastructure. And today, OEMs are killer at that. But maybe ODMs can be too. Open Compute allows innovation on top of the basic interoperable platforms. OpenStack enables a framework for innovation on top as well: security, reliability, storage, network, performance. It becomes the enabler for innovation, and it provides an “easy” way for startups to plug into a large, vibrant ecosystem. And for customers – someone said its “exa data without exadollar”…
As a result, the argument is this should be good for OEMs and ISVs, and help create a more innovative ecosystem and should also enable more infrastructure capacity to create new and better services. I’m not convinced that will happen yet, but it’s a laudable goal, and frankly that promise is part of what is appealing to LSI.
Open Compute and OpenStack are “peanut butter and jelly”
Ok – if you’re outside of the US, that may not mean much to you. But if you’ve lived in the US, you know that means they fit perfectly, and make something much greater together than their humble selves.
Graham Weston, Chairman of the Rackspace Board, was the one who called these two “peanut butter and jelly.”
Cole and Chris both felt the initiatives are co-enabling, and probably co-travelers too. Sure they can and will deploy independently, but OpenStack enables the management of large scale clusters, which really is not easy. Open Compute enables lower cost large-scale manageable clusters to be deployed. Together? Large-scale clusters that can be installed and deployed more affordably, and easily without hiring a cadre of rare experts.
Personally? I still think they are both a bit short of being ready for “prime time” – or broad deployment, but Cole and Chris gave me really valid arguments to show me I’m wrong. I guess we’ll see.
US or global vision?
I asked if these are US-centric or global visions. There were no qualms – these are global visions. This is just the 3rd anniversary of OpenStack, but even so, there are OpenStack organizations in more than 100 countries, 750 active contributors, and large-scale deployments in datacenters that you probably use every day – especially in China and the US. Companies like PayPal and Yahoo, Rackspace, Baidu, Sina Weibo, Alibaba, JD, and government agencies and HPC clusters like CERN, NASA, and China Defense.
Open Compute is even younger – about 2 years old. (I remember – I was invited to the launch). Even so, most of Facebook’s infrastructure runs on Open Compute. Two Wall Street banks have deployed large clusters, with more coming, and Riot Games, which uses Open Compute infrastructure, drives 3% of the global network traffic with League of Legends. (A complete aside – one of my favorite bands to workout with did a lot of that game’s music, and the live music at the League of Legends competition a few months ago: http://www.youtube.com/watch?v=mWU4QvC09uM – not for everyone, but I like it.)
Both Cole and Chris emailed me more data after the fact on who is using these initiatives. I have to say – they are right. It really has taken off globally, especially OpenStack in the fast-paced Chinese market this year.
Book: 4th Paradigm – A tribute to computer science researcher Jim Grey
Cole and Chris mentioned a book during the panel discussion. A book I had frankly never heard of. It’s called the 4th Paradigm. It was a series of papers dedicated to researcher Jim Grey, who was a quiet but towering figure that I believe I met once at Microsoft Research. The book was put together by Gordon Bell, someone who I have met, and have profound respect for. And there are mentions of people, places, and things that have been woven through my (long) career. I think I would sum up its thesis in a quote from Jim Grey near the start of the book:
“We have to do better producing tools to support the whole research cycle – from data capture and data curation to data analysis and data visualization.”
This is stunningly similar to the very useful big data framework we have been using recently at LSI: ”capture, hold, analyze”… I guess we should have added visualize, but that doesn’t have too much to do with LSI’s business.
As an aside, I would recommend this book for the background and inspiration in why we as an industry are trying to solve many of these computer science problems, and how transformational the impact might be. I mean really transformational in the world around us, what we know, what we can do, and how quickly we can do it – which is tightly related to our CEO’s keynote and the vision video at AIS.
Demos at AIS: “peanut butter and jelly” - and bread?
Ok – I’m struggling for analogy. We had an awesome demo at AIS that Chris and Cole pointed out during the panel. It was originally built using Nebula’s TOR appliance, Open Compute hardware, and LSI’s storage magic to make it complete. The three pieces coming together. Tasty. The Open Compute hardware was swapped out last minute (for safety, those boxes were meant for the datacenter – not the showcase in a hotel with tipsy techies) and were generously supplied by Supermicro.
I don’t think the proto was close to any one of our visions, but even as it stood, it inspired a lot of people, and would make a great product. A short rack of servers, with pooled storage in the rack, OpenStack orchestrating the point and click spawning and tear down of dynamically sized LUNs of different characteristics under the Cinder presentation layer, and deployment of tasks or VMs on them.
We’re working on completing our joint vision. I think the industry will be very impressed when they see it. Chris thinks people will be stunned, and the industry will be changed.
Catalyzing the market… The future may be closer than we think…
Ultimately, this is all about economics. We’re in the middle of an unprecedented bifurcation in IT use. On one hand we’re running existing apps on new, dense enterprise hardware using VMs to layer many applications on few servers. On the other, we’re investing in applications to run at scale across inexpensive clusters of commodity hardware. This has spawned a split in IT vendor business units, product lines and offerings, and sometimes even IT infrastructure management in the datacenter.
New applications and services are needing more infrastructure, and are getting more expensive to power, cool, purchase, run. And there is pressure to transform the datacenter from a cost center into a profit center. As these innovations start, more companies will need scale infrastructure, arguably Open Compute, and then will need an Openstack framework to deploy it quickly.
Whats this mean? With a combination of big data and mobile device services driving economic value, we may be at the point where these clusters start to become mainstream. As an industry we’re already seeing a slight decline in traditional IT equipment sales and a rapid growth in scale-out infrastructure sales. If that continues, then OpenStack and Open Compute are a natural fit. The deployment rate uptick in life sciences, oil and gas, financials this year – really anywhere there is large-scale Hadoop, big data or analytics – may be the start of that growth curve. But both Chris and Cole felt it would probably take 5 years to truly take off.
Time to Wrap Up
I asked Chris and Cole for audience takeaways. Theirs were pretty simple, though possibly controversial in an industry like ours.
Hardware vendors should think about products and how they interface and what abstractions they present and how they fit into the ecosystem. These new ecosystems should allow them to easily plug in. For example, storage under Cinder can be quickly and easily morphed – that’s what we did with our demo.
We should be designing new software to run on distributed scale-out systems in clouds. Chris went on to say their code name was “Maestro” because it orchestrates like in a symphony, bringing things together in a beautiful way. He said “make instruments for the artists out there.” The brain trust. Look for their brushstrokes.
Innovate in the open, and leverage the open initiatives that are available to accelerate innovation and efficiency.
On your next IT purchase, try an RFP with an Open Compute vendor. Cole said you might be surprised. Worst case, you may get a better deal from your existing vendor.
So, Open Compute and Openstack are changing the datacenter world that we know and love. I thought these were having a quick impact, changing our OEMs and ODM products, changing what we expect from our vendors, changing the interoperability of managing infrastructure from different vendors, changing our ability to deploy and manage grid and scale-out infrastructure, and changing how quickly and at what high level we can be innovating. I was wrong. It’s happening much more quickly than even I thought.
Tags: AIS, Alibaba, Amazon, Baidu, big data, CERN, China, China Defense, Chris Kemp, Cole Crawford, datacenter, Facebook, Hadoop, HPC, IT infrastructure, JD, Jim Grey, NASA, Nebula, Networking, Open Compute, OpenStack, PayPal, Rackspace, Riot Games, scale-out cluster, Sina Weibo, Storage, Supermicro, Yahoo
Pushing your enterprise cluster solution to deliver the highest performance at the lowest cost is key in architecting scale-out datacenters. Administrators must expand their storage to keep pace with their compute power as capacity and processing demands grow.
safijidsjfijdsifjiodsjfiosjdifdsoijfdsoijfsfkdsjifodsjiof dfisojfidosj iojfsdiojofodisjfoisdjfiodsj ofijds fds foids gfd gfd gfd gfd gfd gfd gfd gfd gfd gfdg dfg gfdgfdg fd gfd gdf gfd gdfgdf g gfd gdfg dfgfdg fdgfdgBeyond price and capacity, storage resources must also deliver enough bandwidth to support these growing demands. Without enough I/O bandwidth, connected servers and users can bottleneck, requiring sophisticated storage tuning to maintain reasonable performance. By using direct attached storage (DAS) server architectures, IT administrators can
Beyond price and capacity, storage resources must also deliver enough bandwidth to support these growing demands. Without enough I/O bandwidth, connected servers and users can bottleneck, requiring sophisticated storage tuning to maintain reasonable performance. By using direct attached storage (DAS) server architectures, IT administrators can reduce the complexities and performance latencies associated with storage area networks (SANs). Now, with LSI 12Gb/s SAS or MegaRAID® technology, or both, connected to 12Gb/s SAS expander-based storage enclosures, administrators can leverage the DataBolt™ technology to clear I/O bandwidth bottlenecks. The result: better overall resource utilization, while preserving legacy drive investments. Typically a slower end device would step down the entire 12Gb/s SAS storage subsystem to 6Gb/s SAS speeds. How does Databolt technology overcome this? Well, without diving too deep into the nuts and bolts, intelligence in the expander buffers data and then transfers it out to the drives at 6Gb/s speeds in order to match the bandwidth between faster hosts and slower SAS or SATA devices.
So for this demonstration at AIS, we are showcasing two Hadoop Distributed File System (HDFS) servers. Each server houses the newly shipping MegaRAID 9361-8i 12Gb/s SAS RAID controller connected to a drive enclosure featuring a 12Gb/s SAS expander and 32 6Gb/s SAS hard drives. One has a DataBolt-enabled configuration, while the other is disabled.
For the benchmarks, we ran DFSIO, which simulates MapReduce workloads and is typically used to detect performance network bottlenecks and tune hardware configurations as well as overall I/O performance.
The primary goal of the DFSIO benchmarks is to saturate storage arrays with random read workloads in order to ensure maximum performance of a cluster configuration. Our tests resulted in MapReduce Jobs completing faster in 12Gb/s mode, and overall throughput increased by 25%.
Every year I diligently get in line for my annual flu (or more technically accurate “seasonal influenza”) shot. I’m not particularly fond of needles, but I have seen what the flu can do and the how many die each year from this seasonal virus.
When you get the flu shot – or, now, the nasal mist – you and I are trusting a lot of people that what you are taking will actually help protect you. According to the CDC (Centers for Disease Control and Prevention), there are 3 three strains, (A, B &C Antigenic) of influenza virus and of those three types, two cause the seasonal epidemics we suffer through each year.
Not to get too technical, but I learned that the A strain is further segregated by 2 proteins and are given code names like H1N1, H3N2 and H5N1. They can even be updated by year if there is a change in them. An example of this was in 2009, when the H1N1 became the 2009 H1N1. So where we may just call it H1N1, the World Health Organization has a whole taxonomy to describe a seasonal influenza strain.
This taxonomy includes:
As you can see, it can really get complicated quickly. If you would like to go deeper, you can read more about this here. While much of this information seems pretty arcane to the lay reader, you quickly can see that the sheer volume of information collected, stored and analyzed to combat seasonal influenza is a great example of big data.
In the US, once the CDC sifts through this data – using big data analytics tools – it uses its findings to determine what strains might affect the US and build a flu shot to combat those strains. During the 2012/2013 season, the predominant virus was Influenza A (H3N2), though some influenza B viruses contained a dash of influenza A (H1N1) pdm09 (pH1N1). (See the full report here.)
In addition to identifying dominant viruses, the CDC also uses big data to track the spread and potential effect on the population. Reviewing information from prior outbreaks, population data, and even weather patterns, the CDC uses big data analytics to quickly estimate and attempt to determine where viruses might hit first, hardest and longest so that a targeted vaccine can be produced in sufficient quantities, in the required timeframe and even for the right geography. The faster and more accurately this can be done, the more people can get this potentially life saving vaccine before the virus travels to their area.
As I stated in my previous blog post, the Hadoop® architecture is a great tool for efficiently storing and processing the growing amount of data worldwide, but Hadoop is only as good as the processing and storage performance that supports it. As with weather predictions, the more data you can quickly and efficiently analyze, the greater the likelihood of an accurate prediction. When it comes to weather and flu vaccines, these predictions can help save lives. In my final blog post in this series, I will explore how big data helps the fashion industry.
Whether in medical, weather or other fields that leverage big data technologies, the use of Hadoop for high levels of speed and accuracy in big data analysis requires computers with application acceleration. One such tool is LSI® Nytro™ Application Acceleration. You can go to TheSmarterWayToFaster™ for more information on the Nytro product family.
Part two of this three-part series continues to examine some of the diverse and potentially life-saving uses of big data in our everyday lives. It also explores how expanded data access and higher processing and storage speed can help optimize big data application performance.
We all watch the local weather and wonder how forecasters predict (or in some cases mis-predict) the future of weather. While they may not all agree on the forecast, they do agree that the more current and historical data you have, the better your ability to predict what might happen over the next hours, days and weeks.
A term used to describe this growing amount of information is Big Data, and more and more of it leverages Hadoop, a flexible architecture that provides the analysis tools and scalability required to comb through and utilize all available data. When recently talking to a US-based meteorologist (the technical name for a degreed weather forecaster), I learned that meteorologists rely on many different weather models from various sources to help create their forecasts.
Weather spawns downpour of Big Data
These models collect massive amounts of weather information from around the world. Using this information, computers then run billions of calculations to mimic the motion of weather patterns in the Earth’s dynamic atmosphere and produce forecasts for any given location over time. It was interesting to learn that not all weather models are equal.
While weather modeling websites worldwide collect this atmospheric data and provide it to meteorologists, the European community is seen as having the most accurate information. When I asked why, I learned that European weather modeling sites have some of the fastest computer hardware and technology, enabling them to analyze more data faster, which produces better overall forecasts. The US weather professional I spoke with tends to use these European sites as part of his analysis, and when European models conflict with those from US sites, he often leans toward the European data.
His use of the European weather modeling sites points to the value of fast, accurate analysis of Big Data. It also underscores the implications of vast amounts of data overwhelming the ability of the compute and storage resources available to process it. An accurate and timely weather forecast is critical and a bad or missed forecast can have terrible and even deadly consequences.
A case in point: Hurricane Sandy
In this article on Hurricane Sandy forecast speed and accuracy, you can see how removing just one source of data can dramatically reduce the accuracy of predicting a critical event such as where a hurricane will make landfall. To be sure, the more data you can store and the faster you can process it for analysis, the greater your potential competitive advantage, even in the vaunted halls of meteorological analysis and prediction.
The Hadoop® architecture is a great tool for efficiently storing and processing the growing amount of data worldwide, but Hadoop is only as good as the processing and storage performance that supports it. This gets interesting as you think about and explore the ripple effect of accurate or inaccurate forecasting in many areas. In my next blog post I will explore one of those – flu vaccines.
Whether in meteorology or other fields that leverage Big Data technologies, the use of Hadoop for high levels of speed and accuracy in Big Data analysis requires computers with application acceleration. One such tool is LSI® Nytro™ Application Acceleration. You can go to TheSmarterWayToFaster™ for more information on the Nytro product family.
This three-part series examines some of the diverse uses of Big Data in our everyday lives. It also explores how expanded data access and higher processing and storage speed can help optimize Big Data application performance.
Tags: application accleration, big data, European weather modeling, flash, flash storage, Hadoop, Hurricane Sandy, meterology, Nytro, processing performance, storage performance, weather modeling
I’ve just been to China. Again. It’s only been a few months since I was last there.
I was lucky enough to attend the 5th China Cloud Computing Conference at the China National Convention Center in Beijing. You probably have not heard of it, but it’s an impressive conference. It’s “the one” for the cloud computing industry. It was a unique view for me – more of an inside-out view of the industry. Everyone who’s anyone in China’s cloud industry was there. Our CEO, Abhi Talwalkar, had been invited to keynote the conference, so I tagged along.
First, the air was really hazy, but I don’t think the locals considered it that bad. The US consulate iPhone app said the particulates were in the very unhealthy range. Imagine looking across the street. Sure, you can see the building there, but the next one? Not so much. Look up. Can you see past the 10th floor? No, not really. The building disappears into the smog. That’s what it was like at the China National Convention Center, which is part of the same Olympics complex as the famous Birdcage stadium: http://www.cnccchina.com/en/Venues/Traffic.aspx
I had a fantastic chance to catch up with a university friend, who has been living in Beijing since the 90’s, and is now a venture capitalist. It’s amazing how almost 30 years can disappear and you pick up where you left off. He sure knows how to live. I was picked up in his private limo, whisked off to a very well-known restaurant across the city, where we had a private room and private waitress. We even had some exotic, special dishes that needed to be ordered at least a day in advance. Wow. But we broke Chinese tradition and had imported beer in honor of our Canadian education.
Sizing up China’s cloud infrastructure
The most unusual meeting I attended was an invitation-only session – the Sino-American roundtable on cloud computing. There were just about 40 people in a room – half from the US, half from China. Mostly what I learned is that the cloud infrastructure in China is fragmented, and probably sub-scale. And it’s like that for a reason. It was difficult to understand at first, but I think I’ve made sense of it.
I started asking why to friends and consultants and got some interesting answers. Essentially different regional governments are trying to capture the cloud “industry” in their locality, so they promote activity, and they promote creation of new tools and infrastructure for that. Why reuse something that’s open source and works if you don’t have to and you can create high-tech jobs? (That’s sarcasm, by the way.) Many technologists I spoke with felt this will hold them back, and that they are probably 3-5 years behind the US. As well, each government-run industry specifies the datacenter and infrastructure needed to be a supplier or ecosystem partner with them, and each is different. The national train system has a different cloud infrastructure from the agriculture department, and from the shipping authority, etc… and if you do business with them – that is you are part of their ecosystem of vendors, then you use their infrastructure. It all spells fragmentation and sub-scale. In contrast, the Web 2.0 / social media companies seem to be doing just fine.
Baidu was also showing off its open rack. It’s an embodiment of the Scorpio V1 standard, which was jointly developed with Tencent, Alibaba and China Telecom. It views this as a first experiment, and is looking forward to V2, which will be a much more mature system.
I was also lucky to have personal meetings with general managers,chief architects and effective CTOs of the biggest cloud companies in China. What did I learn? They are all at an inflexion point. Many of the key technologists have experience at American Web 2.0 companies, so they’re able to evolve quickly, leveraging their industry knowledge. They’re all working to build or grow their own datacenters, their own infrastructure. And they’re aggressively expanding products, not just users, so they’re getting a compound growth rate.
Here’s a little of what I learned. In general, there is a trend to try and simplify infrastructure, harmonize divergent platforms, and deploy more infrastructure by spending less on each unit. (In general, they don’t make as much per user as American companies, but they have more users). As a result they are more cost-focused than US companies. And they are starting to put more emphasis on operational simplicity in general. As one GM described it to me – “Yes, techs are inexpensive in China for maintainence, but more often than not they make mistakes that impact operations.” So we (LSI) will be focussing more on simplifying management and maintainence for them.
Baidu’s biggest Hadoop cluster is 20k nodes. I believe that’s as big as Yahoo’s – and it is the originator of Hadoop. Baidu has a unique use profile for flash – it’s not like the hyperscale datacenters in the US. But Baidu is starting to consume a lot. Like most other hyperscale datacenters, it is working on storage erasure coding across servers, racks and datacenters, and it is trying to make a unified namespace across everything. One of its main interests is architecture at datacenter level, harmonizing the various platforms and looking for the optimum at the datacenter level. In general, Baidu is very proud of the advances it has made, and it has real confidence in its vision and route forward, and from what I heard, its architectural ambitions are big.
JD.com (which used to be 360buy.com) is the largest direct ecommerce company in China and (only) had about $10 billion (US) in revenue last year, with 100% CAGR growth. As the GM there said, its growth has to slow sometime, or in 5 years it’ll be the biggest company in the world. I think it is the closest equivalent to Amazon there is out there, and they have similar ambitions. They are in the process of transforming to a self-built, self-managed datacenter infrastructure. It is a company I am going to keep my eyes on.
Tencent is expanding into some interesting new businesses. Sure, people know about the Tencent cloud services that the Chinese government will be using, but Tencent also has some interesting and unique cloud services coming. Let’s just say even I am interested in using them. And of course, while Tencent is already the largest Web 2.0 company in China, its new services promise to push it to new scale and new markets.
Extra! Extra! Read all about it …
And then there was press. I had a very enjoyable conversation with Yuan Shaolong, editor at WatchStor, that I think ran way over. Amazingly – we discovered we have the same favorite band, even half a world away from each other. The results are here, though I’m not sure if Google translate messed a few things up, or if there was some miscommunication, but in general, I think most of the basics are right: http://translate.google.com/translate?hl=en&sl=zh-CN&u=http://tech.watchstor.com/storage-module-144394.htm&prev=/search%3Fq%3Drobert%2Bober%2BLSI%26client%3Dfirefox-a%26rls%3Dorg.mozilla:en-US:official%26biw%3D1346%26bih%3D619
I just keep learning new things every time I go to China. I suspect it has as much to do with how quickly things are changing as new stuff to learn. So I expect it won’t be too long until I go to China, again…
Tags: Abhi Talwalkar, Alibaba, Amazon, Baidu, China, China Cloud Computing Conference, China National Convention Center, China Telecom, datacenter, Hadoop, hyperscale, JD.com, WatchStor, web 2.0, Yahoo
I was lucky enough to get together for dinner and beer with old friends a few weeks ago. Between the 4 of us, we’ve been involved in or responsible for a lot of stuff you use every day, or at least know about.
Supercomputers, minicomputers, PCs, Macs, Newton, smart phones, game consoles, automotive engine controllers and safety systems, secure passport chips, DRAM interfaces, netbooks, and a bunch of processor architectures: Alpha, PowerPC, Sparc, MIPS, StrongARM/XScale, x86 64-bit, and a bunch of other ones you haven’t heard of (um – most of those are mine, like TriCore). Basically if you drive a European car, travel internationally, use the Internet , if you play video games, or use a smart phone, well… you’re welcome.
Why do I tell you this? Well – first I’m name dropping – I’m always stunned I can call these guys friends and be their peers. But more importantly, we’ve all been in this industry as architects for about 30 years. Of course our talk went to what’s going on today. And we all agree that we’ve never seen more changes – inflexions – than the raft unfolding right now. Maybe its pressure from the recession, or maybe un-naturally pent up need for change in the ecosystem, but change there is.
Changes in who drives innovation, what’s needed, the companies on top and on bottom at every point in the food chain, who competes with whom, how workloads have changed from compute to dataflow, software has moved to opensource, how abstracted code is now from processor architecture, how individual and enterprise customers have been revolting against the “old” ways, old vendors, old business models, and what the architectures look like, how processors communicate, and how systems are purchased, and what fundamental system architectures look like. But not much besides that…
Ok – so if you’re an architect, that’s as exciting as it gets (you hear it in my voice – right ?), and it makes for a lot of opportunities to innovate and create new or changed businesses. Because innovation is so often at the intersection of changing ways of doing things. We’re at a point where the changes are definitely not done yet. We’re just at the start. (OK – now try to imagine a really animated 4-way conversation over beers at the Britannia Arms in Cupertino… Yea – exciting.)
I’m going to focus on just one sliver of the market – but it’s important to me – and that’s enterprise IT. I think the changes are as much about business models as technology.
Hyperscale datacenters drive innovation
I’ll start in a strange place. Hyperscale datacenters (think social media, search, etc.) and the scale of deployment changes the optimization point. Most of us starting to get comfortable with rack as the new purchase quantum. And some of us are comfortable with the pod or container as the new purchase quantum. But the hyperscale dataenters work more at the datacenter as the quantum. By looking at it that way, they can trade off the cost of power, real estate, bent sheet metal, network bandwidth, disk drives, flash, processor type and quantity, memory amount, where work gets done, and what applications are optimized for. In other words, we shifted from looking at local optima to looking for global optima. I don’t know about you, but when I took operations research in university, I learned there was an unbelievable difference between the two – and global optima was the one you wanted…
Hyperscale datacenters buy enough (top 6 are probably more than 10% of the market today) that 1) they need to determine what they deploy very carefully on their own, and 2) vendors work hard to give them what they need.
That means innovation used to be driven by OEMs, but now it’s driven by hyperscale datacenters and it’s driven hard. That global optimum? It’s work/$ spent. That’s global work, and global spend. It’s OK to spend more, even way more on one thing if over-all you get more done for the $’s you spend.
That’s why the 3 biggest consumers of flash in servers are Facebook, Google, and Apple, with some of the others not far behind. You want stuff, they want to provide it, and flash makes it happen efficiently. So efficiently they can often give that service away for free.
Hyperscale datacenters have started to publish their cost metrics, and open up their architectures (like OpenCompute), and open up their software (like Hadoop and derivatives). More to the point, services like Amazon have put a very clear $ value on services. And it’s shockingly low.
Enterprises are paying attention
Enterprises have looked at those numbers. Hard. That’s catalyzed a customer revolt against the old way of doing things – the old way of buy and billing. OEMs and ISVs are creating lots of value for enterprise, but not that much. They’ve been innovating around “stickiness” and “lock-in” (yea – those really are industry terms) for too long, while hyperscale datacenters have been focused on getting stuff done efficiently. The money they save per unit just means they can deploy more units and provide better services.
That revolt is manifesting itself in 2 ways. The first is seen in the quarterly reports of OEMs and ISVs. Rumors of IBM selling its X-series to Lenovo, Dell going private, Oracle trying to shift business, HP talking of the “new style of IT”… The second is enterprises are looking to emulate hyperscale datacenters as much as possible, and deploy private cloud infrastructure. And often as not, those will be running some of the same open source applications and file systems as the big hyperscale datacenters use.
Where are the hyperscale datacenters leading them? It’s a big list of changes, and they’re all over the place.
But they’re also looking at a few different things. For example, global name space NAS file systems. Personally? I think this one’s a mistake. I like the idea of file systems/object stores, but the network interconnect seems like a bottleneck. Storage traffic is shared with network traffic, creates some network spine bottlenecks, creates consistency performance bottlenecks between the NAS heads, and – let’s face it – people usually skimp on the number of 10GE ports on the server and in the top of rack switch. A typical SAS storage card now has 8 x 12G ports – that’s 96G of bandwidth. Will servers have 10 x 10G ports? Yea. I didn’t think so either.
Anyway – all this is not academic. One Wall Street bank shared with me that – hold your breath – it could save 70% of its spend going this route. It was shocked. I wasn’t shocked, because at first blush this seems absurd – not possible. That’s how I reacted. I laughed. But… The systems are simpler and less costly to make. There is simply less there to make or ship than OEMs force into the machines for uniqueness and “value.” They are purchased from much lower margin manufacturers. They have massively reduced maintenance costs (there’s less to service, and, well, no OEM service contracts). And also important – some of the incredibly expensive software licenses are flipped to open source equivalents. Net savings of 70%. Easy. Stop laughing.
Disaggregation: Or in other words, Pooled Resources
But probably the most important trend from all of this is what server manufacturers are calling “disaggregation” (hey – you’re ripping apart my server!) but architects are more descriptively calling pooled resources.
First – the intent of disaggregation is not to rip the parts of a server to pieces to get lowest pricing on the components. No. If you’re buying by the rack anyway – why not package so you can put like with like. Each part has its own life cycle after all. CPUs are 18 months. DRAM is several years. Flash might be 3 years. Disks can be 5 to 7 years. Networks are 5 to 10 years. Power supplies are… forever? Why not replace each on its own natural failure/upgrade cycle? Why not make enclosures appropriate to the technology they hold? Disk drives need solid vibration-free mechanical enclosures of heavy metal. Processors need strong cooling. Flash wants to run hot. DRAM cool.
Second – pooling allows really efficient use of resources. Systems need slush resources. What happens to a systems that uses 100% of physical memory? It slows down a lot. If a database runs out of storage? It blue screens. If you don’t have enough network bandwidth? The result is, every server is over provisioned for its task. Extra DRAM, extra network bandwidth, extra flash, extra disk drive spindles.. If you have 1,000 nodes you can easily strand TBytes of DRAM, TBytes of flash, a TByte/s of network bandwidth of wasted capacity, and all that always burning power. Worse, if you plan wrong and deploy servers with too little disk or flash or DRAM, there’s not much you can do about it. Now think 10,000 or 100,000 nodes… Ouch.
If you pool those things across 30 to 100 servers, you can allocate as needed to individual servers. Just as importantly, you can configure systems logically, not physically. That means you don’t have to be perfect in planning ahead what configurations and how many of each you’ll need. You have sub-assemblies you slap into a rack, and hook up by configuration scripts, and get efficient resource allocation that can change over time. You need a lot of storage? A little? Higher performance flash? Extra network bandwidth? Just configure them.
That’s a big deal.
And of course, this sets the stage for immense pooled main memory – once the next generation non-volatile memories are ready – probably starting around 2015.
You can’t underestimate the operational problems associated with different platforms at scale. Many hyperscale datacenters today have around 6 platforms. If you think they are rolling out new versions of those before old ones are retired they often have 3 generations of each. That’s 18 distinct platforms, with multiple software revisions of each. That starts to get crazy when you may have 200,000 to 400,000 servers to manage and maintain in a lights out environment. Pooling resources and allocating them in the field goes a huge way to simplifying operations.
Alternate Processor Architecture
It didn’t always used to be Intel x86. There was a time when Intel was an upstart in the server business. It was Power, MIPs, Alpha, SPARC… (and before that IBM mainframes and minis, etc). Each of the changes was brought on by changing the cost structure. Mainframes got displaced by multi-processor RISC, which gave way to x86.
Today, we have Oracle saying they’re getting out of x86 commodity servers and doubling down on SPARC. IBM is selling off its x86 business and doubling down on Power (hey – don’t confuse that with PowerPC – which started as an architectural cut-down of Power – I was there…). And of course there is a rash of 64-bit ARM server SOCs coming – with HP and Dell already dabbling in it. What’s important to realize is that all of these offerings are focusing on the platform architecture, and how applications really perform in total, not just the processor.
Let me warp up with an email thread cut/paste from a smart friend – Wayne Nation. I think he summed up some of what’s going on well, in a sobering way most people don’t even consider.
“Does this remind you of a time, long ago, when the market was exploding with companies that started to make servers out of those cheap little desktop x86 CPUs? What is different this time? Cost reduction and disaggregation? No, cost and disagg are important still, but not new.
A new CPU architecture? No, x86 was “new” before. ARM promises to reduce cost, as did Intel.
Disaggregation enables hyperscale datacenters to leverage vanity-free, but consistent delivery will determine the winning supplier. There is the potential for another Intel to rise from these other companies. “