How did he do that?
Growing up, I watched a little TV. Okay, a lot of TV as I did not have my DVR or iPad and a man who would one day occupy the White House as VP had not yet invented the Internet. Of the many shows I watched, MacGyver was one of my favorites. He would take ordinary objects and use them to solve complicated problems in a way no one could have imagined. Out of all the things he used, his trusty Swiss army knife was the most awesome. With all its blades, tools and accessories, it could solve multiple problems at the same time. It was easy to use, did not take up a lot of space and was very cost-effective.
Nytro MegaRAID – the Swiss Army knife of server storage
LSI has its own multi-function, get-yourself-out-of-a-fix workhorse – the Nytro MegaRAID® card, part of the Nytro product family. It combines caching intelligence, RAID protection and flash on a single PCIe® card to accelerate applications, so it can be deployed to solve problems across a broad number of applications.
A feature for every challenge!
The Nytro MegaRAID card is built on the same trusted technology as the MegaRAID cards deployed in datacenters worldwide. That means, it is enterprise architected and hardened and datacenter tested. Its Swiss Army knife-like features include, as I mentioned, on-board flash storage that can be configured to monitor the flow of data from an application to the attached RAID protected storage, intelligently identify hot, or the most frequently accessed, data, and automatically move a copy of that data to the flash storage to accelerate applications. The next time the application needs that data, the information is fetched from flash, not the much slower traditional hard disk drive (HDD) storage.
Hard drives can lead to slowdowns in another way, too, when the mechanics wear out and fail. When they do, your storage (and application) performance can dramatically decrease – in a RAID storage environment, this is called degraded mode. The good news is that the Nytro MegaRAID card stores much of an application’s frequently used data in its intelligent flash based cache, boosting the performance of a connected HDD in degrade mode by as much as 10x, depending on the configuration. The Swiss Army knife follow-on benefit is that when you replace the failed drive, Nytro MegaRAID speeds RAID storage rebuilds by as much as 4x. RAID rebuilds add to IT admin time, and IT time is money, so that’s money you get to keep in your pocket.
The Nytro MegaRAID card also can be configured so you can use half of its onboard flash as a pair of mirrored boot drives. In big data environments, this mirroring frees up two boot drives for use as data storage to help increase your server storage density (aka available storage capacity), often significantly, while dramatically improving boot time. What’s more, that same flash can be deployed instead as primary storage to complement your secondary HDD storage with higher speeds, providing a superfast repository for key files like virtual desktop infrastructure (VDI) golden images or key database log files.
One MacGyver Swiss Army knife, one Nytro MegaRAID card – both easy-to-use solutions for a number of complex problems.
Tags: application acceleration, big data, data protection, database, flash, flash card, flash-based cache, hard disk drive, HDD, MacGyver, Nytro MegaRAID card, PCIe card, RAID, server storage, Swiss Army knife, VDI, virtual desktop infrastructure
I was asked some interesting questions recently by CEO & CIO, a Chinese business magazine. The questions ranged from how Chinese Internet giants like Alibaba, Baidu and Tencent differ from other customers and what leading technologies big Internet companies have created to questions about emerging technologies such as software-defined storage (SDS) and software-defined datacenters (SDDC) and changes in the ecosystem of datacenter hardware, software and service providers. These were great questions. Sometimes you need the press or someone outside the industry to ask a question that makes you step back and think about what’s going on.
I thought you might interested, so this blog, the first of a 3-part series covering the interview, shares details of the first two questions.
CEO & CIO: In recent years, Internet companies have built ultra large-scale datacenters. Compared with traditional enterprises, they also take the lead in developing datacenter technology. From an industry perspective, what are the three leading technologies of ultra large-scale Internet data centers in your opinion? Please describe them.
There are so many innovations and important contributions to the industry from these hyperscale datacenters in hardware, software and mechanical engineering. To choose three is difficult. While I would prefer to choose hardware innovations as their big ones, I would suggest the following as they have changed our world and our industry and are changing our hardware and businesses:
Autonomous behavior and orchestration
An architect at Microsoft once told me, “If we had to hire admins for our datacenter in a normal enterprise way, we would hire all the IT admins in the world, and still not have enough.” There are now around 1 million servers in Microsoft datacenters. Hyperscale datacenters have had to develop autonomous, self-managing, sometimes self-deploying datacenter infrastructure simply to expand. They are pioneering datacenter technology for scale – innovating, learning by trial and error, and evolving their practices to drive more work/$. Their practices are specialized but beginning to be emulated by the broader IT industry. OpenStack is the best example of how that specialized knowledge and capability is being packaged and deployed broadly in the industry. At LSI, we’re working with both hyperscale and orchestration solutions to make better autonomous infrastructure.
High availability at datacenter level vs. machine level
As systems get bigger they have more components, more modes of failure and they get more complex and expensive to maintain reliability. As storage is used more, and more aggressively, drives tend to fail. They are simply being used more. And yet there is continued pressure to reduce costs and complexity. By the time hyperscale datacenters had evolved to massive scale – 100’s of thousands of servers in multiple datacenters – they had created solutions for absolute reliability, even as individual systems got less expensive, less complex and much less reliable. This is what has enabled the very low cost structures of the cloud, and made it a reliable resource.
These solutions are well timed too, as more enterprise organizations need to maintain on-premises data across multiple datacenters with absolute reliability. The traditional view that a single server requires 99.999% reliability is giving way to a more pragmatic view of maintaining high reliability at the macro level – across the entire datacenter. This approach accepts the failure of individual systems and components even as it maintains data center level reliability. Of course – there are currently operational issues with this approach. LSI has been working with hyperscale datacenters and OEMs to engineer improved operational efficiency and resilience, and minimized impact of individual component failure, while still relying on the datacenter high-availability (HA) layer for reliability.
It’s such an overused term. It’s difficult to believe the term barely existed a few years ago. The gift of Hadoop® to the industry – an open source attempt to copy Google® MapReduce and Google File System – has truly changed our world unbelievably quickly. Today, Hadoop and the other big data applications enable search, analytics, advertising, peta-scale reliable file systems, genomics research and more – even services like Apple® Siri run on Hadoop. Big data has changed the concept of analytics from statistical sampling to analysis of all data. And it has already enabled breakthroughs and changes in research, where relationships and patterns are looked for empirically, rather than based on theories.
Overall, I think big data has been one of the most transformational technologies this century. Big data has changed the focus from compute to storage as the primary enabler in the datacenter. Our embedded hard disk controllers, SAS (Serial Attached SCSI) host bus adaptors and RAID controllers have been at the heart of this evolution. The next evolutionary step in big data is the broad adoption of graph analysis, which integrates the relationship of data, not just the data itself.
CEO & CIO: Due to cloud computing, mobile connectivity and big data, the traditional IT ecosystem or industrial chain is changing. What are the three most important changes in LSI’s current cooperation with the ecosystem chain? How does LSI see the changes in the various links of the traditional ecosystem chain? What new links are worth attention? Please give some examples.
Cloud computing and the explosion of data driven by mobile devices and media has and continues to change our industry and ecosystem contributors dramatically. It’s true the enterprise market (customers, OEMs, technology, applications and use cases) has been pretty stable for 10-20 years, but as cloud computing has become a significant portion of the server market, it has increasingly affected ecosystem suppliers like LSI.
Timing: It’s no longer enough to follow Intel’s ticktock product roadmap. Development cycles for datacenter solutions used to be 3 to 5 years. But these cycles are becoming shorter. Now, demand for solutions is closer to 6 months – forcing hardware vendors to plan and execute to far tighter development cycles. Hyperscale datacenters also need to be able to expand resources very quickly, as customer demand dictates. As a result they incorporate new architectures, solutions and specifications out of cycle with the traditional Intel roadmap changes. This has also disrupted the ecosystem.
End customers: Hyperscale datacenters now have purchasing power in the ecosystem, with single purchase orders sometimes amounting to 5% of the server market. While OEMs still are incredibly important, they are not driving large-scale deployments or innovating and evolving nearly as fast. The result is more hyperscale design-win opportunities for component or sub-system vendors if they offer something unique or a real solution to an important problem. This also may shift profit pools away from OEMs to strong, nimble technology solution innovators. It also has the potential to reduce overall profit pools for the whole ecosystem, which is a potential threat to innovation speed and re-investment.
New players: Traditionally, a few OEMs and ISVs globally have owned most of the datacenter market. However, the supply chain of the hyperscale cloud companies has changed that. Leading datacenters have architected, specified or even built (in Google’s case) their own infrastructure, though many large cloud datacenters have been equipped with hyperscale-specific systems from Dell and HP. But more and more systems built exactly to datacenter specifications are coming from suppliers like Quanta. Newer network suppliers like Arista have increased market share. Some new hyperscale solution vendors have emerged, like Nebula. And software has shifted to open source, sometimes supported for-pay by companies copying the Redhat® Linux model – companies like Cloudera, Mirantis or United Stack. Personally, I am still waiting for the first 3rd-party hardware service emulating a Linux support and service company to appear.
Open initiatives: Yes, we’ve seen Hadoop and its derivatives deployed everywhere now – even in traditional industries like oil and gas, pharmacology, genomics, etc. And we’ve seen the emergence of open-source alternatives to traditional databases being deployed, like Casandra. But now we’re seeing new initiatives like Open Compute and OpenStack. Sure these are helpful to hyperscale datacenters, but they are also enabling smaller companies and universities to deploy hyperscale-like infrastructure and get the same kind of automated control, efficiency and cost structures that hyperscale datacenters enjoy. (Of course they don’t get fully there on any front, but it’s a lot closer). This trend has the potential to hurt OEM and ISV business models and markets and establish new entrants – even as we see Quanta, TYAN, Foxconn, Wistron and others tentatively entering the broader market through these open initiatives.
New architectures and new algorithms: There is a clear movement toward pooled resources (or rack scale architecture, or disaggregated servers). Developing pooled resource solutions has become a partnership between core IP providers like Intel and LSI with the largest hyperscale datacenter architects. Traditionally new architectures were driven by OEMs, but that is not so true anymore. We are seeing new technologies emerge to enable these rack-scale architectures (RSA) – technologies like silicon photonics, pooled storage, software-defined networks (SDN), and we will soon see pooled main memory and new nonvolatile main memories in the rack.
We are also seeing the first tries at new processor architectures about to enter the datacenter: ARM 64 for cool/cold storage and web tier and OpenPower P8 for high power processing – multithreaded, multi-issue, pooled memory processing monsters. This is exciting to watch. There is also an emerging interest in application acceleration: general-purposing computing on graphics processing units (GPGPUs), regular expression processors (regex) live stream analytics, etc. We are also seeing the first generation of graph analysis deployed at massive scale in real time.
Innovation: The pace of innovation appears to be accelerating, although maybe I’m just getting older. But the easy gains are done. On one hand, datacenters need exponentially more compute and storage, and they need to operate 10x to 1000x more quickly. On the other, memory, processor cores, disks and flash technologies are getting no faster. The only way to fill that gap is through innovation. So it’s no surprise there are lots of interesting things happening at OEMs and ISVs, chip and solution companies, as well as open source community and startups. This is what makes it such an interesting time and industry.
Consumption shifts: We are seeing a decline in laptop and personal computer shipments, a drop that naturally is reducing storage demand in those markets. Laptops are also seeing a shift to SSD from HDD. This has been good for LSI, as our footprint in laptop HDDs had been small, but our presence in laptop SSDs is very strong. Smart phones and tablets are driving more cloud content, traffic and reliance on cloud storage. We have seen a dramatic increase in large HDDs for cloud storage, a trend that seems to be picking up speed, and we believe the cloud HDD market will be very healthy and will see the emergence of new, cloud-specific HDDs that are radically different and specifically designed for cool and cold storage.
There is also an explosion of SSD and PCIe flash cards in cloud computing for databases, caches, low-latency access and virtual machine (VM) enablement. Many applications that we take for granted would not be possible without these extreme low-latency, high-capacity flash products. But very few companies can make a viable storage system from flash at an acceptable cost, opening up an opportunity for many startups to experiment with different solutions.
Summary: So I believe the biggest hyperscale innovations are autonomous behavior and orchestration, HA at the datacenter level vs. machine level, and big data. These are radically changing the whole industry. And what are those changes for our industry and ecosystem? You name it: timing, end customers, new players, open initiatives, new architectures and algorithms, innovation, and consumption patterns. All that’s staying the same are legacy products and solutions.
These were great questions. Sometimes you need the press or someone outside the industry to ask a question that makes you step back and think about what’s going on. Great questions.
Tags: Alibaba, Apple Siri, Arista, ARM 64, Baidu, big data, Casandra, CEO & CIO Magazine, China, cloud storage, Cloudera, cold storage, cool storage, datacenter, datacenter ecosystem, Dell, flash, Foxconn, Google File System, Google MapReduce, Hadoop, hard disk drive, HDD, high availability, HP, hyperscale datacenter, Intel, Internet, latency, Microsoft, Mirantis, Nebula, Open Compute, OpenPower P8, OpenStack, Quanta, rack scale, RAID, Redhat Linux, SAS, SDDC, SDN, SDS, Serial Attached SCSI, software-defined datacenter, software-defined networks, software-defined storage, solid state drive, SSD, Tencent, TYAN, United Stack, virtual machine, VM, Wistron
Customer dilemma: I just purchased PCIe® flash cards to increase performance of my enterprise applications that run on Linux® and Unix®. How do I set them up to get the best performance?
Good question. I wish there were a simple answer but each environment is different. There is no cookie-cutter configuration that fits all, though a few questions will reveal how the PCIe flash cards should be configured for optimum performance.
Most of the popular relational and non-relational databases run on many different operating systems. I will be describing Linux-specific configurations, but most of them should also work with Unix systems that are supported by the PCIe flash card vendor. I’m a database guy, but these same principals and techniques that I’ll be covering apply to other applications like mail servers, web servers, application servers and, of course, databases.
Aligning PCIe flash devices
The most important step to perform on each PCIe flash card is to create a partition that is aligned on a specific boundary (such as 4k or 8k) so each read and write to the flash device will require only one physical input/output (IO) operation. If the card is not partitioned on such a boundary, then reads and writes will span the sector groups, which doubles the IO latency for each read or write request.
To align a partition, I use the sfdisk command to start a partition on a 1M boundary (sector 2048). Aligning to a 1M boundary resolves the dependency to align to a 4k, 8k, or even a 64k boundary. But before I do this, I need to know how I am going to use this device. Will this be a standalone partition? Part of a logical volume? Or part of a RAID group?
Which one is best?
If I were deploying the PCIe flash device for database caching (for example, the Oracle database has provided this caching functionality for years using the Database Smart Flash Cache feature, and Facebook created the open source Flashcache used in MySQL databases), I would use a single-partitioned PCIe flash card if I knew the capacity would meet my needs now and over the next 5 years. If I selected this configuration, the sfdisk command to create the partition would be:
echo “2048,,” | sfdisk –uS /dev/sdX –force
This single partitioning is also required with the Oracle® Automatic Storage Management system (ASM). Oracle has provided ASM for many years and I will go over how to use this storage feature in Part 3 of this series.
If I need to deploy multiple PCIe flash cards for database caching, I would create Logical Volume Manager (LVM) over all the flash devices to simplify administration. The sfdisk command to create a partition for each PCIe flash card would be:
echo “2048,,8e” | sfdisk –uS /dev/sdX –force
“8e” is the system partition type for creating a logical volume.
Neither of these solutions needs fault tolerance since they will be used for write-thru caching. My recent blog “How to optimize PCIe flash cards – a new approach to creating logical volumes” covers this process in detail.
If I want to use the PCIe flash card for persisting data, I would need to make the PCIe flash cards fault tolerant, using two or more cards to build the RAID array and eliminate any single point of failure. There are a number of ways to create a RAID over multiple PCIe flash cards, two of which are:
But what type of RAID setup is best to use?
Oracle coined the term S.A.M.E. – Stripe And Mirror Everything – in 1999 and popularized the practice, which many database administrators (DBA) and storage administrators have followed ever since. I follow this practice and suggest you do the same.
First, you need to determine how these cards will be accessed:
In database deployments, your choice is usually among online transaction processing (OLTP) applications like airline and hotel reservation systems and corporate financial or enterprise resource planning (ERP) applications, or data warehouse/data mining/data analytics applications, or a mix of both environments. OLTP applications involve small random reads and writes as well as many sequential writes for log files. Data warehouse/data mining/data analytics applications involve mostly large sequential reads with very few sequential log writes.
Before setting up one or many PCIe flash cards in a RAID array either using LVM on RAID or creating a RAID array using MDADM, you need to know the access pattern of the IO, capacity requirements and budget. These requirements will dictate which RAID level will work best for your environment and fit your budget.
I would pick either a RAID 1/RAID 10 configuration (mirroring without striping, or striping and mirroring respectively), or RAID 5 (striping with parity). RAID 1/RAID 10 costs more but delivers the best performance, whereas RAID 5 costs less but imposes a significant write penalty.
Optimizing OLTP application performance
To optimize performance of an OLTP application, I would implement either a RAID 1 or RAID 10 array. If I were budget constrained, or implementing a data warehouse application, I would use a RAID 5 array. Normally a RAID 5 array will produce a higher throughput (megabits per second) appropriate for a data warehouse/data mining application.
In a nutshell, knowing how to tune the configuration to the application is key to reaping the best performance.
For either RAID array, you need to create an aligned partition using sfdisk:
echo “2048,,fd” | sfdisk –uS /dev/sdX –force
“fd” is the system identifier for a Linux RAID auto device.
Keep in mind that it is not mandatory to create a partition for LVMs or RAID arrays. Instead, you can assign RAW devices. It’s important to remember to align the sectors if combining RAW and partitioned devices or just creating a basic partition. It’s sound practice to always create an aligned partition when using PCIe flash cards.
At this point, aligned partitions have been created and are now ready to be used in LVMs or RAID arrays. Instructions for creating these are on the web or in Linux/Unix reference manuals. Here are a couple of websites that go over the process of creating LVM, RAID, or LVM on RAID:
Specifying a stripe width value
Also remember that, when creating LVMs with striping or RAID arrays, you’ll need to specify a stripe width value. Many years ago, Oracle and EMC conducted a number studies on this and concluded that a 1M stripe width performed the best as long as the database IO request was equal to or less than 1M. When implementing Oracle ASM, Oracle’s standard is to use 1M allocation units, which matches its coarse striping size of 1M.
Part 2 of this series will describe how to create RAW devices or file systems.
Part 3 of this series will describe how to use Oracle ASM when deploying PCIe flash cards.
Part 4 of this series will describe how to persist assignment to dynamically changing NWD/NMR devices.
Tags: ASM, automatic storage management, data analytics, data mining, data warehouse, Database Smart Cache, EMC, enterprise resource planning, ERP, Facebook, flash storage, Flashcache, Linux, logical volume, Logical Volume Manager, LVM, MDADM, multiple device administration, MySQL, non-relational database, OLTP, online transaction processing, Oracle, partition, PCI Express, PCIe flash, performance, RAID, RAW, relational database, SAME, sector, Stripe and Mirror Everything, Unix
Most users have no idea that reading electronic information from a data storage medium like a hard disk drive (HDD) or solid state drive (SSD) is plagued with read errors. For this reason error correction codes (ECC) are used to fix the random bit errors that arise during the reading process before the incorrect data is returned to the user. But the error correction codes can only handle so many errors at one time. If data errors exceed the ECC limits, the data goes uncorrected and is lost forever. More recent ECC algorithms like the LSI SHIELD error correction technology go a lot farther to protect user data than prior solutions.
What happens to the data when the ECC fails?
If the ECC fails, only a backup protection mechanism will recover the data. There are three alternatives. First, users should always back up their critical data since ECC failure and other threats can destroy data or render it inaccessible such as natural disasters (earthquakes, tornadoes, flooding etc.) that cause heavy damage to buildings and their contents, lightning overloading that can burn up a computer without adequate electrical protection, and of course computer theft. Any backup system should be either automated or at least run consistently if it is manual. Industry reports cite that less than 10% of computer users back up their data. That is not very comforting.
The second solution is to employ a RAID (Redundant Array of Independent Disks) array that uses multiple storage devices with one or more of the drives acting as a parity device to provide redundancy. That way if one drive fails, the redundant drive provides enough parity information to restore the original data. This type of system is very common in enterprise environments—a work computer—but hardly used in home systems or laptop PC.
Is the third solution simple, automatic, and operable in a single-drive environment?
Yes. Yes. And Yes. LSI® SandForce® flash and SSD controllers have a feature called RAISE™ data protection that meets all of these needs. Introduced in 2009 with the first SandForce controller, RAISE technology stands for Redundant Array of Independent Silicon Elements. It sounds like RAID, and acts something like RAID, but protects data using a single drive. With RAISE technology, the individual flash die act like the drives in a RAID array. The original RAISE level 1 technology protects against single page and block failures in the flash. These types of failures are beyond the protection of the ECC, but RAISE technology can recover the data.
With the introduction of the SF3700 this month, RAISE technology now offers more flexibility to deliver greater data protection. With the original RAISE level 1, the space of a full flash die had to be allocated solely to protect user data. In small-capacity configurations, like 64GB, RAISE level 1 required too much over provisioning and therefore had to be disabled or, with RAISE left on, its available capacity reduced to 60GB or 55GB. With a new enhancement to the SF3700, no such tradeoff is necessary. The new Fractional RAISE option for this first level of protection uses only a small portion of a die to protect user data in even the smallest configurations and preserve over provisioning (OP). This is important because, as I explained in my blog titled Gassing up your SSD, the more space you allocate for OP, the lower the write amplification, which translates to higher performance during writes and longer endurance of the flash memory.
Stronger data protection with RAISE level 2
A new RAISE level 2 capability offers even stronger data protection, safeguarding against multiple, simultaneous page and block failures, as well as a full die failure. If a die fails, the SandForce controller recovers the user data. RAISE level 2 includes Auto-Reallocation that can be set up to automatically redistribute and protect user data in the event of a subsequent die failure. Because the option to protect against a second die failure would reduce the available OP area, the RAISE level 2 feature can be set up to simply drop back to RAISE level 1 protection without sacrificing any OP space. .
Another new capability is an additional (9th) flash channel that enables the manufacturer to populate an extra flash package with one die that enables full RAISE level 1 protection while maintaining maximum user data capacity such as 64GB, 128GB, 256GB, etc. Without the 9th channel option, the SSD capacity would be forced to sacrifice a few GBs of capacity (reducing available user capacity to 60GB, 120GB, 240GB, respectively) because RAISE requires extra storage space.
Although all these new features cannot protect against the would-be thief or catastrophic drive failures from electrical surges or natural disasters, the probability of those events is much lower than a simple ECC failure. That’s why you would be best suited to have an SSD with RAISE technology to automatically protect against the more common ECC failures and then make a backup copy of your system at least periodically to protect your data against those far more serious events.
Many of you may have heard of a poem written by Robert Fulgham 25 years ago called “All I Really Need to Know I Learned in Kindergarten.” In it he provides such pearls of wisdom like “Play fair,” “Clean up your own mess,” “Don’t take things that aren’t yours” and “Flush.” By now you’re wondering what any of this has to do with storage technology. Well the #1 item on the kindergarten knowledge list is “Share Everything.” And from my perspective that includes DAS (direct-attached storage).
Sharable DAS has been a primary topic of discussion at this year’s annual LSI Accelerating Innovation Summit (AIS). During one keynote session I proposed a continuum of data sharing, spanning from traditional server-based DAS to traditional external NAS and SAN with multiple points in between – including external DAS, simple pooled storage, advanced pooled storage, shared storage and HA (high-availability) shared storage. Each step along the continuum adds incremental features and value, giving datacenter architects the latitude to choose – and pay for – only the level of sharing absolutely required, and no more. This level of choice is being very warmly received by the market as storage requirements vary widely among Web-cloud, private cloud, traditional enterprise, and SMB configurations and applications.
Sharable DAS pools storage for operational benefits and efficiencies
Sharable DAS, with its inherent storage resource pooling, offers a number of operational benefits and efficiencies when applied at the rack level:
LSI rolls out proof-of-concept Rack Scale architecture using sharable DAS
In addition to just talking about sharable DAS at AIS, we also rolled out a proof-of-concept Rack Scale architecture employing sharable DAS. In it we configured 20 servers with 12Gb/s SAS RAID controllers, a prototype 40-port 12Gb/s SAS switch (that’s 160 12Gb/s SAS lanes) and 10 JBODs with 12Gb/s SAS for a total of 200 disk drives – all in a single rack. The drives were configured as a single storage resource pool with our media sharing (ability to spread volumes across multiple disk drives and aggregate disk drive bandwidth) and distributed RAID (ability to disperse data protection across multiple disk drives) features. This configuration pools the server storage into a single resource, delivering substantial, tangible performance and availability improvements, when compared to 20 stand-alone servers. In particular, the configuration:
I’m sure you’ll agree with me that Rack Scale architecture with sharable DAS is clearly a major step forward in providing a wide range of storage solutions under a single architecture. This in turn provides a multitude of operational efficiencies and performance benefits, giving datacenter architects wide latitude to employ what is needed – and only what is needed.
Now that we’ve tackled the #1 item on the kindergarten learning list, maybe I’ll set my sights on another item, like “Take a nap every afternoon.”
Optimizing the work per dollar spent is a high priority in datacenters around the world. But there aren’t many ways to accomplish that. I’d argue that integrating flash into the storage system drives the best – sometimes most profound – improvement in the cost of getting work done.
Yea, I know work/$ is a US-centric metric, but replace the $ with your favorite currency. The principle remains the same.
I had the chance to talk with one of the execs who’s responsible for Google’s infrastructure last week. He talked about how his fundamental job was improving performance/$. I asked about that, and he explained “performance” as how much work an application could get done. I asked if work/$ at the application was the same, and he agreed – yes – pretty much.
You remember as a kid that you brought along a big brother as authoritative backup? OK – so my big brother Google and I agree – you should be trying to optimize your work/$. Why? Well – it could be to spend less, or to do more with the same spend, or do things you could never do before, or simply to cope with the non-linear expansion in IT demands even as budgets are shrinking. Hey – that’s the definition of improving work/$… (And as a bonus, if you do it right, you’ll have a positive green impact that is bound to be worth brownie points.)
Here’s the point. Processors are no longer scaling the same – sure, there are more threads, but not all applications can use all those threads. Systems are becoming harder to balance for efficiency. And often storage is the bottleneck. Especially for any application built on a database. So sure – you can get 5% or 10% gain, or even in the extreme 100% gain in application work done by a server if you’re willing to pay enough and upgrade all aspects of the server: processors, memory, network… But it’s almost impossible to increase the work of a server or application by 200%, 300% or 400% – for any money.
I’m going to explain how and why you can do that, and what you get back in work/$. So much back that you’ll probably be spending less and getting more done. And I’m going to explain how even for the risk-averse, you can avoid risk and get the improvements.
More work/$ from general-purpose DAS servers and large databases
Let me start with a customer. It’s a bank, and it likes databases. A lot. And it likes large databases even more. So much so that it needs disks to hold the entire database. Using an early version of an LSI Nytro™ MegaRAID® card, it got 6x the work from the same individual node and database license. You can read that as 600% if you want. It’s big. To be fair – that early version had much more flash than our current products, and was much more expensive. Our current products give much closer to 3x-4x improvement. Again, you can think of that as 300%-400%. Again, slap a Nytro MegaRAID into your server and it’s going to do the work of 3 to 4 servers. I just did a web search and, depending on configuration, Nytro MegaRAIDs are $1,800 to $2,800 online. I don’t know about you, but I would have a hard time buying 2 to 3 configured servers + software licenses for that little, but that’s the net effect of this solution. It’s not about faster (although you get that). It’s about getting more work/$.
But you also want to feel safe – that you’re absolutely minimizing risk. OK. Nytro MegaRAID is a MegaRAID card. That’s overwhelmingly the most common RAID controller in the world, and it’s used by 9 of the top 10 OEMs, and protects 10’s to 100‘s of millions of disks every day. The Nytro version adds private flash caching in the card and stores hot reads and writes there. Writes to the cache use a RAID 1 pair. So if a flash module dies, you’re protected. If the flash blocks or chip die wear out, the bad blocks are removed from the cache pool, and the cache shrinks by that much, but everything keeps operating – it’s not like a normal LUN that can’t change size. What’s more, flash blocks usually finally wear out during the erase cycle – so no data is lost. And as a bonus, you can eliminate the traditional battery most RAID cards use – the embedded flash covers that – so no more annual battery service needed. This is a solution that will continue to improve work/$ for years and years, all the while getting 3x-4x the work from that server.
More work/$ from SAN-attached servers (without actually touching the SAN)
That example was great – but you don’t use DAS systems. Instead, you use a big iron SAN. (OK, not all SANs are big iron, but I like the sound of that expression.) There are a few ways to improve the work from servers attached to SANs. The easiest of course is to upgrade the SAN head, usually with a flash-based cache in the SAN controller. This works, and sometimes is “good enough” to cover needs for a year or two. However, the server still needs to reach across the SAN to access data, and it’s still forced to interact with other servers’ IO streams in deeper queues. That puts a hard limit on the possible gains.
Nytro XD caches hot data in the server. It works with virtual machines. It intercepts storage traffic at the block layer – the same place LSI’s drivers have always been. If the data isn’t hot, and isn’t cached, it simply passes the traffic through to the SAN. I say this so you understand – it doesn’t actually touch the SAN. No risk there. More importantly, the hot storage traffic never has to be squeezed through the SAN fabric, and it doesn’t get queued in the SAN head. In other words, it makes the storage really, really fast.
We’ve typically found work from a server can increase 5x to 10x, and that’s been verified by independent reviewers. What’s more, the Nytro XD solution only costs around 4x the price of a high-end SAN NIC. It’s not cheap, but it’s way cheaper than upgrading your SAN arrays, it’s way cheaper than buying more servers, and it’s proven to enable you to get far more work from your existing infrastructure. When you need to get more work – way more work – from your SAN, this is a really cost-effective approach. Seriously – how else would you get 5x-10x more work from your existing servers and software licenses?
More work/$ from databases
A lot of hyperscale datacenters are built around databases of a finite size. That may be 1, 2 or even 4 TBytes. If you use Apple’s online services for iTunes or iCloud, or if you use Facebook, you’re using this kind of infrastructure.
If your datacenter has a database that can fit within a few TBytes (or less), you can use the same approach. Move the entire LUN into a Nytro WarpDrive® card, and you will get 10x the work from your server and database software. It makes such a difference that some architects argue Facebook and Apple cloud services would never have been possible without this type of solution. I don’t know, but they’re probably right. You can buy a Nytro WarpDrive for as little as a low-end server. I mean low end. But it will give you the work of 10. If you have a fixed-size database, you owe it to yourself to look into this one.
More work/$ from virtualized and VDI (Virtual Desktop) systems
Virtual machines are installed on a lot of servers, for very good reason. They help improve the work/$ in the datacenter by reducing the number of servers needed and thereby reducing management, maintenance and power costs. But what if they could be made even more efficient?
Wall Street banks have benchmarked virtual desktops. They found that Nytro products drive these results: support of 2x the virtual desktops, 33% improvement in boot time during boot storms, and 33% lower cost per virtual desktop. In a more general application mix, Nytro increases work per server 2x-4x. And it also gives 2x performance for virtual storage appliances.
While that’s not as great as 10x the work, it’s still a real work/$ value that’s hard to ignore. And it’s the same reliable MegaRAID infrastructure that’s the backbone of enterprise DAS storage.
A real example from our own datacenter
Finally – a great example of getting far more work/$ was an experiment our CIO Bruce Decock did. We use a lot of servers to fuel our chip-design business. We tape out a lot of very big leading-edge process chips every year. Hundreds. And that takes an unbelievable amount of processing to get what we call “design closure” – that is, a workable chip that will meet performance requirements and yield. We use a tool called PrimeTime that figures out timing for every signal on the chip across different silicon process points and operating conditions. There are 10’s to 100’s of millions of signals. And we run every active design – 10’s to 100’s of chips – each night so we can see how close we’re getting, and we make multiple runs per chip. That’s a lot of computation… The thing is, electronic CAD has been designed to try not to use storage or it will never finish – just /tmp space, but CAD does use huge amounts of memory for the data structures, and that means swap space on the order of TBytes. These CAD tools usually don’t need to run faster. They run overnight and results are ready when the engineers come in the next day. These are impressive machines: 384G or 768G of DRAM and 32 threads. How do you improve work/$ in that situation? What did Bruce do?
He put LSI Nytro WarpDrives in the servers and pointed /tmp at the WarpDrives. Yep. Pretty complex. I don’t think he even had to install new drivers. The drivers are already in the latest OS distributions. Anyway – like I said – complex.
The result? WarpDrive allowed the machines to fully use the CPU and memory with no I/O contention. With WarpDrive, the PrimeTime jobs for static timing closure of a typical design could be done on 15 vs. 40 machines. That’s each Nytro node doing 260% of the work vs. a normal node and license. Remember – those are expensive machines (have you priced 768G of DRAM and do you know how much specialized electronic design CAD licenses are?) So the point wasn’t to execute faster. That’s not necessary. The point is to use fewer servers to do the work. In this case we could do 11 runs per server per night instead of just 4. A single chip design needs more than 150 runs in one night.
To be clear, the Nytro WarpDrives are a lot less expensive than the servers they displace. And the savings go beyond that – less power and cooling. Lower maintenance. Less admin time and overhead. Fewer Licenses. That’s definitely improved work/$ for years to come. Those Nytro cards are part of our standard flow, and they should probably be part of every chip company’s design flow.
So you can improve work/$ no matter the application, no matter your storage model, and no matter how risk-averse you are.
Optimizing the work per dollar spent is a high – maybe the highest – priority in datacenters around the world. And just to be clear – Google agrees with me. There aren’t many ways to accomplish that improvement, and almost no ways to dramatically improve it. I’d argue that integrating flash into the storage system is the best – sometimes most profound – improvement in the cost of getting work done. Not so much the performance, but the actual work done for the money spent. And it ripples through the datacenter, from original CapEx, to licenses, maintenance, admin overhead, power and cooling, and floor space for years. That’s a pretty good deal. You should look into it.
For those of you who are interested, I already wrote about flash in these posts:
What are the driving forces behind going diskless?
LSI is green – no foolin’
Tags: Bruce Decock, DAS, datacenter, direct attached storage, enterprise IT, flash, Google, hyperscale datacenter, Nytro MegaRAID, Nytro WarpDrive, Nytro XD, PrimeTime, RAID, SAN, server storage, storage area network, VDI, virtual desktop infrastructure, work per dollar
Lenovo is whopping big. The planet’s second largest PC maker, the sixth largest server vendor and China’s top server supplier.
So when a big gun like Lenovo recognizes us with its Technology Innovation award for our 12G SAS technology, we love to talk about it. The lofty honor came at the recent Lenovo Supplier Conference in Hefei, China.
Hefei is big too. As recently at the mid-1930’s, Hefei was a quiet market town of only about 30,000. Today, it’s home to more than 7 million people spread across 4,300 square miles. No matter how you cut it, that’s explosive growth – and no less dizzying than the global seam-splitting growth that Lenovo is helping companies worldwide manage with its leading servers.
For more than a decade, LSI has been the SAS/RAID strategic partner for Lenovo and in 2009 it chose LSI as its exclusive SAS/RAID vendor. The reason: Our ability to provide enterprise class and industry-leading SAS/RAID solutions. Lenovo says it better.
“In 2012, Lenovo began to sharpen its focus on the enterprise server business with the goal of becoming a tier-1 server in the global market,” said Jack Xing, senior sales manager in China. ”To support this strategy, the company realized the importance of selecting a trusted and innovative SAS/RAID partner, which is why it has turned to LSI exclusively for its 12G SAS technology.”
Trust. Innovation. High compliments from Lenovo, a major engine of technology innovation in one of the world’s fastest-growing economies. It’s dizzying, even heady. You can see why we love to talk about it.
I’m reminded that when I do what I do best and don’t try to be all things to all people, I get much more accomplished. Interestingly, I’ve found that the same approach applies to server storage system controllers – and to the home PC I use for photo editing.
The question many of us face is whether it’s best to use an integrated or discrete solution. Think digital television. Do you want a TV with an integrated DVD player, or do you prefer a feature-rich, dedicated player that you can upgrade and replace independent of the TV? I’ve pondered a similar question many times when considering my PC: Do I use a motherboard with an integrated graphics controller or go with a discrete graphics adapter card.
If I look only at initial costs and am satisfied with the performance of my display for day-to-day computing activities, I could go with the integrated controller, something that many consumers do. But my needs aren’t that simple. I need multiple displays, higher screen resolution, higher display system performance, and the ability to upgrade and tune the graphics to my applications. To do these things, I go with a separate discrete graphics controller card.
Hardware RAID delivers enterprise-class data protection and features
In the datacenter, IT architects often face the choice between hardware RAID, a discrete solution, and software RAID, hardware RAID’s integrated counterpart. Hardware RAID offers enterprise-class robustness and features, such as higher performance without operating systems (OS) and application interference, particularly in compute-intensive RAID 5 and RAID 6 application environments. Also, hardware-based RAID can help optimize the performance and scalability of the SAS protocol. Sure, the build of materials (BOM) costs with hardware RAID are higher when a RAID on Chip or IOC component enters the mix, but these purpose-built solutions are designed to deliver performance and flexibility unmatched by most software RAID solutions.
Enterprise-hardened RAID solutions that protect data, manage and deliver high availability can scale up and down because they are based on RAID-on-chip (ROC) solutions, and they are designed to provide a consistent experience and boot across OS’s and BIOS.
One of the biggest differences between hardware and software RAID is in data protection. For example, if the OS shuts down in the middle of a write, once it is back up the OS can’t recognize whether the write was compromised or failed because the RAID cache was from host memory. A hardware RAID solution holds the write data in separate, non-volatile cache and completes the write when the system comes back online. Even more subtly, the CPU and storage cache are offloaded from the host memory, freeing up resources for application performance.
Software RAID cost rises as features added
For software RAID to deliver write cache and advanced features, a non-volatile write cache via battery or flash backup schemes needs to be added, and suddenly the BOM costs are similar or higher than the more flexible hardware RAID solution.
In the end, LSI enterprise hardware RAID solutions bring many features and capabilities that simply cannot exist in a software RAID on-load environment. To be sure, an enterprise server is no PC or TV, but the choice between a discrete and integrated solution, whether in consumer electronics or storage server technology, is of a kindred sort. I always feel gratified when we can help one of our customers make the best choice.
For more information about our enterprise RAID solutions please visit us at http://www.lsi.com/solutions/Pages/enterpriseRAID.aspx