Big data, it’s the buzz word of the year and it’s generating a lot of attention. An incalculable number of articles fervently repeat the words “variety, velocity and volume,” citing click streams, RFID tags, email, surveillance cameras, Twitter® feeds, Facebook® posts, Flickr® images, blog musings, YouTube® videos, cellular texting, healthcare monitoring …. (gasps for air). We have become a society that sweats buckets of data every day (the latest estimates are approximately 34GB per person every 24 hours) and businesses are scrambling to capture all this information to learn more about us.
Save every scrap of data!
“Save all your data” has become the new business mantra, because data – no matter how seemingly meaningless it appears – contains information, and information provides insight, and improved insight makes for better decision-making, and better decision-making leads to a more efficient and profitable business.
Okay, so we get why we save data, but if the electronic bit bucket costs become prohibitive, big data could turn into its own worst enemy, undermining the value of mining data. While Hadoop® software is an excellent (and cost-free) tool for storing and analyzing data, most organizations use a multitude of applications in conjunction with Hadoop to create a system for data ingest, analytics, data cleansing and record management. Several Hadoop vendors (Cloudera, MapR, Hortonworks, Intel, IBM, Pivotal) offer bundled software packages that ease integration and installation of these applications.
Installing a Hadoop cluster to manage big data can be a chore
With the demand for data scientists growing, the challenge can become finding the right talent to help build and manage a big data infrastructure. A case in point: Installing a Hadoop cluster involves more than just installing the Hadoop software. Here is the sequence of steps:
Setup, from bare bones to a simple 15-node cluster, can take weeks to months including planning, research, installation and integration. It’s no small job.
Appliances simplify Hadoop cluster deployments
Enter appliances: low-cost, pre-validated, easy-to-deploy “bricks.” According to a Gartner forecast (Forecast: Data Center Hardware Spending to Support Big Data Projects, Worldwide 2013), appliance spending for big data projects will grow from 0.9% of hardware spending in 2012 to 9.3% by 2017. I have found myself inside a swirl of new big data appliance projects all designed to provide highly integrated systems with easy support and fully tested integration. An appliance is a great turnkey solution for companies that can’t (or don’t wish to) employ a hardware and software installation team: Simply pick up the box from the shipping area, unpack it and start analyzing data within minutes. In addition, many companies are just beginning to dabble in Hadoop, and appliances can be an easy, cost-effective way to demonstrate the value of Hadoop before making a larger investment.
While Hadoop is commonplace in the big data infrastructure, the use models can be quite varied. I’ve heard my fair share of highly connected big data engineers attempt to identify core categories for Hadoop deployments, and they generally fall into one of four categories:
Finding the right appliance for you
While appliances lower the barrier to entry to Hadoop clusters, their designs and costs are as varied as their use cases. Some appliances build in the flexibility of cloud services, while others focus on integration of applications components and reducing service level agreements (SLAs). Still others focus primarily on low cost storage. And while some appliances are just hardware (although they are validated designs), they still require a separate software agreement and installation via a third-party vendor.
In general, pricing is usually quoted either by capacity ($/TB), or per node or rack depending on the vendor and product. Licensing can significantly increase overall costs, with annual maintenance costs (software subscription and support) and license renewals adding to the cost of doing business. The good news is that, with so many appliances to choose from, any organization can find one that enables it to design a cluster that fits its budget, operating costs and value expectations.
Tags: analytics, appliance, big data, cloud services, Cloudera, cluster, data mining, data sequencing, data storage, database applications, database management systems, DBMS, Facebook, Flickr, Gartner, Hadoop, high availability, Hortonworks, IBM, image processing, Intel, JobTracker, Kerberos, MapR, NameNode, Pivotal, Secure Shell, service level agreement, SLA, ssh, Twitter, web crawler, workflow processing, YouTube, ZooKeeper