I often think about green, environmental impact, and what we’re doing to the environment. One major reason I became an engineer was to leave the world a little better than when I arrived. I’ve gotten sidetracked a few times, but I’ve tried to help, even if just a little.
The good people in LSI’s EHS (Environment, Health & Safety) asked me a question the other day about carbon footprint, energy impact, and materials use. Which got me thinking … OK – I know most people in LSI don’t really think of ourselves as a “green tech” company. But we are – really. No foolin’. We are having a big impact on the global power consumption and material consumption of the IT industry. And I mean that in a good way.
There are many ways to look at this, both from what we enable datacenters to do, to what we enable integrators to do, all the way to hard-core technology improvements and massive changes in what it’s possible to do.
Back in 2008 I got to speak at the AlwaysOn GoingGreen conference. (I was lucky enough to be just after Elon Musk– he’s a lot more famous now with Tesla doing so well.
http://www.smartplanet.com/video/making-the-case-for-green-it/305467 (at 2:09 in video)
IT consumes massive amounts of energy
The massive deployment of IT equipment, all the ancillary metal, plastic wiring, etc. that goes with them, consumes energy as its being shipped and moved halfway around the world, and, more importantly, then gets scrapped out quickly. This has been a concern for me for quite a while. I mean – think about that. As an industry we are generating about 9 million servers a year, about 3 million go into hyperscale datacenters (or hyperscale if you prefer). Many of those are scrapped on a 2, 3 or 4 year cycle – so in steady state, maybe 1 million to 2 million a year are scrapped. Worse – there is amazing use of energy by that many servers (even as they have advanced the state of the art unbelievably since 2008). And frankly, you and I are responsible for using all that power. Did you know thousands of servers are activated every time you make a Google® query from your phone?
I want to take a look at basic silicon improvements we make, the impact of disk architecture improvement, SSDs, system and improvements, efficiency improvements, and also where we’re going in the near future with eliminating scrap in hard drives and batteries. In reality, it’s the massive pressure on work/$ that has made us optimize everything – being able to do much more work at a lower cost, when a lot of cost is the energy and material that goes into the products that forces our hand. But the result is a real, profound impact on our carbon footprint that we should be proud of.
Sure we have a general silicon roadmap where each node enables reduced power, even as some standards and improvements actually increase individual device power. For example, our transition from 28nm semi process to 14nm FinFET can literally cut the power consumption of a chip in half. But that’s small potatoes.
How about Ethernet? It’s everywhere – right? Did you know servers often have 4 ethernet ports, and that there are a matching 4 ports on a network switch? LSI pioneered something called Energy Efficient Ethernet (EEE). We’re also one of the biggest manufacturers of Ethernet PHYs – the part that drives the cable – and we come standard in everything from personal computers to servers to enterprise switches. The savings are hard to estimate, because they depend very much on how much traffic there is, but you can realistically save Watts per interface link, and there are often 256 links in a rack. 500 Watts per rack is no joke, and in some datacenters it adds up to 1 or 2 MegaWatts.
How about something a little bigger and more specific? Hard disk drives. Did you know a typical hyperscale datacenter has between 1 million and 1.5 million disk drives? Each one of those consumes about 9 Watts, and most have 2 TBytes of capacity. So for easy math, 1 million drives is about 9 MegaWatts (!?) and about 2 Exabytes of capacity (remember – data is often replicated 3 or more times). Data capacities in these facilities are needed to grow about 50% per year. So if we did nothing, we would need to go from 1 million drives to 1.5 million drives: 9 MegaWatts goes to 13.5 MegaWatts. Wow! Instead – our high linearity, low noise PA and read channel designs are allowing drives to go to 4 TBytes per drives. (Sure the chip itself may use slightly more power, but that’s not the point, what it enables is a profound difference.) So to get that 50% increase in capacity we could actually reduce the number of drives deployed, with a net savings of 6.75 MegaWatts. Consider an average US home, with air conditioning, uses 1 kiloWatt. That’s almost 7,000 homes. In reality – they won’t get deployed that way – but it will still be a huge savings. Instead of buying another 0.5 million drives they would buy 0.25 million drives with a net savings of 2.2 MegaWatts. That’s still HUGE! (way to go, guys!) How many datacenters are doing that? Dozens. So that’s easily 20 or 30 MegaWatts globally. Did I say we saved them money too? A lot of money.
SSDs sip power to help improve energy profile
SSDs don’t always get the credit they deserve. Yes, they really are fast, and they are awesome in your laptop, but they also end up being much lower power than hard drives. Our controllers were in about half the flash solutions shipped last year. Think tens of millions. If you just assume they were all laptop SSDs (at least half were not) then that’s another 20 MegaWatts in savings.
Did you know that in a traditional datacenter, about 30% of the power going into the building is used for air conditioning? It doesn’t actually get used on the IT equipment at all, but is used to remove the heat that the IT equipment generates. We design our solutions so they can accommodate 40C ambient inlet air (that’s a little over 100F… hot). What that means is that the 30% of power used for the air conditioners disappears. Gone. That’s not theoretical either. Most of the large social media, search engine, web shopping, and web portal companies are using our solutions this way. That’s a 30% reduction in the power of storage solutions globally. Again, its MegaWatts in savings. And mega money savings too.
But let’s really get to the big hitters: improved work per server. Yep – we do that. In fact adding a Nytro™ MegaRAID® solution will almost always give you 4x the work out of a server. It’s a slam dunk if you’re running a database. You heard me – 1 server doing the work that it previously took 4 servers to do. Not only is that a huge savings in dollars (especially if you pay for software licenses!) but it’s a massive savings in power. You can replace 4 servers with 1, saving at least 900 Watts, and that lone server that’s left is actually dissipating less power too, because it’s actively using fewer HDDs, and using flash for most traffic instead. If you go a step further and use Nytro WarpDrive Flash cards in the servers, you can get much more – 6 to 8 times the work. (Yes, sometimes up to 10x, but let’s not get too excited). If you think that’s just theoretical again, check your Facebook® account, or download something from iTunes®. Those two services are the biggest users of PCIe® flash in the world. Why? It works cost effectively. And in case you haven’t noticed those two companies like to make money, not spend it. So again, we’re talking about MegaWatts of savings. Arguably on the order of 150 MegaWatts. Yea – that’s pretty theoretical, because they couldn’t really do the same work otherwise, but still, if you had to do the work in a traditional way, it would be around that.
It’s hard to be more precise than giving round numbers at these massive scales, but the numbers are definitely in the right zone. I can say with a straight face we save the world 10’s, and maybe even 100’s of MegaWatts per year. But no one sees that, and not many people even think about it. Still – I’d say LSI is a green hero.
Hey – we’re not done by a long shot. Let’s just look at scrap. If you read my earlier post on false disk failure, you’ll see some scary numbers. (http://blog.lsi.com/what-is-false-disk-failure-and-why-is-it-a-problem/ ) A normal hyperscale datacenter can expect 40-60 disks per day to be mistakenly scrapped out. That’s around 20,000 disk drives a year that should not have been scrapped, from just one web company. Think of the material waste, shipping waste, manufacturing waste, and eWaste issues. Wow – all for nothing. We’re working on solutions to that. And batteries. Ugly, eWaste, recycle only, heavy metal batteries. They are necessary for RAID protected storage systems. And much of the world’s data is protected that way – the battery is needed to save meta-data and transient writes in the event of a power failure, or server failure. We ship millions a year. (Sorry, mother earth). But we’re working diligently to make that a thing of the past. And that will also result in big savings for datacenters in both materials and recycling costs.
Can we do more? Sure. I know I am trying to get us the core technologies that will help reduce power consumption, raise capability and performance, and reduce waste. But we’ll never be done with that march of technology. (Which is a good thing if engineering is your career…)
I still often think about green, environmental impact, and what we’re doing to the environment. And I guess in my own small way, I am leaving the world a little better than when I arrived. And I think we at LSI should at least take a moment and pat ourselves on the back for that. You have to celebrate the small victories, you know? Even as the fight goes on.
I want to warn you, there is some thick background information here first. But don’t worry. I’ll get to the meat of the topic and that’s this: Ultimately, I think that PCIe® cards will evolve to more external, rack-level, pooled flash solutions, without sacrificing all their great attributes today. This is just my opinion, but other leaders in flash are going down this path too…
I’ve been working on enterprise flash storage since 2007 – mulling over how to make it work. Endurance, capacity, cost, performance have all been concerns that have been grappled with. Of course the flash is changing too as the nodes change: 60nm, 50nm, 35nm, 24nm, 20nm… and single level cell (SLC) to multi level cell (MLC) to triple level cell (TLC) and all the variants of these “trimmed” for specific use cases. The spec “endurance” has gone from 1 million program/erase cycles (PE) to 3,000, and in some cases 500.
It’s worth pointing out that almost all the “magic” that has been developed around flash was already scoped out in 2007. It just takes a while for a whole new industry to mature. Individual die capacity increased, meaning fewer die are needed for a solution – and that means less parallel bandwidth for data transfer… And the “requirement” for state-of-the-art single operation write latency has fallen well below the write latency of the flash itself. (What the ?? Yea – talk about that later in some other blog. But flash is ~1500uS write latency, where state of the art flash cards are ~50uS.) When I describe the state of technology it sounds pretty pessimistic. I’m not. We’ve overcome a lot.
We built our first PCIe card solution at LSI in 2009. It wasn’t perfect, but it was better than anything else out there in many ways. We’ve learned a lot in the years since – both from making them, and from dealing with customer and users – about our own solutions and our competitors. We’re lucky to be an important player in storage, so in general the big OEMs, large enterprises and the hyperscale datacenters all want to talk with us – not just about what we have or can sell, but what we could have and what we could do. They’re generous enough to share what works and what doesn’t. What the values of solutions are and what the pitfalls are too. Honestly? It’s the hyperscale datacenters in the lead both practically and in vision.
If you haven’t nodded off to sleep yet, that’s a long-winded way of saying – things have changed fast, and, boy, we’ve learned a lot in just a few years.
Most important thing we’ve learned…
Most importantly, we’ve learned it’s latency that matters. No one is pushing the IOPs limits of flash, and no one is pushing the bandwidth limits of flash. But they sure are pushing the latency limits.
PCIe cards are great, but…
We’ve gotten lots of feedback, and one of the biggest things we’ve learned is – PCIe flash cards are awesome. They radically change performance profiles of most applications, especially databases allowing servers to run efficiently and actual work done by that server to multiply 4x to 10x (and in a few extreme cases 100x). So the feedback we get from large users is “PCIe cards are fantastic. We’re so thankful they came along. But…” There’s always a “but,” right??
It tends to be a pretty long list of frustrations, and they differ depending on the type of datacenter using them. We’re not the only ones hearing it. To be clear, none of these are stopping people from deploying PCIe flash… the attraction is just too compelling. But the problems are real, and they have real implications, and the market is asking for real solutions.
Of course, everyone wants these fixed without affecting single operation latency, or increasing cost, etc. That’s what we’re here for though – right? Solve the impossible?
A quick summary is in order. It’s not looking good. For a given solution, flash is getting less reliable, there is less bandwidth available at capacity because there are fewer die, we’re driving latency way below the actual write latency of flash, and we’re not satisfied with the best solutions we have for all the reasons above.
If you think these through enough, you start to consider one basic path. It also turns out we’re not the only ones realizing this. Where will PCIe flash solutions evolve over the next 2, 3, 4 years? The basic goals are:
One easy answer would be – that’s a flash SAN or NAS. But that’s not the answer. Not many customers want a flash SAN or NAS – not for their new infrastructure, but more importantly, all the data is at the wrong end of the straw. The poor server is left sucking hard. Remember – this is flash, and people use flash for latency. Today these SAN type of flash devices have 4x-10x worse latency than PCIe cards. Ouch. You have to suck the data through a relatively low bandwidth interconnect, after passing through both the storage and network stacks. And there is interaction between the I/O threads of various servers and applications – you have to wait in line for that resource. It’s true there is a lot of startup energy in this space. It seems to make sense if you’re a startup, because SAN/NAS is what people use today, and there’s lots of money spent in that market today. However, it’s not what the market is asking for.
Another easy answer is NVMe SSDs. Right? Everyone wants them – right? Well, OEMs at least. Front bay PCIe SSDs (HDD form factor or NVMe – lots of names) that crowd out your disk drive bays. But they don’t fix the problems. The extra mechanicals and form factor are more expensive, and just make replacing the cards every 5 years a few minutes faster. Wow. With NVME SSDs, you can fit fewer HDDs – not good. They also provide uniformly bad cooling, and hard limit power to 9W or 25W per device. But to protect the storage in these devices, you need to have enough of them that you can RAID or otherwise protect. Once you have enough of those for protection, they give you awesome capacity, IOPs and bandwidth, too much in fact, but that’s not what applications need – they need low latency for the working set of data.
What do I think the PCIe replacement solutions in the near future will look like? You need to pool the flash across servers (to optimize bandwidth and resource usage, and allocate appropriate capacity). You need to protect against failures/errors and limit the span of failure, commit writes at very low latency (lower than native flash) and maintain low latency, bottleneck-free physical links to each server… To me that implies:
That means the performance looks exactly as if each server had multiple PCIe cards. But the capacity and bandwidth resources are shared, and systems can remain resilient. So ultimately, I think that PCIe cards will evolve to more external, rack level, pooled flash solutions, without sacrificing all their great attributes today. This is just my opinion, but as I say – other leaders in flash are going down this path too…
What’s your opinion?
Tags: DAS, datacenter, direct attached storage, enterprise IT, flash, hard disk drive, HDD, hyperscale, latency, NAS, network attached storage, NVMe, PCIe, SAN, solid state drive, SSD, storage area network
It may sound crazy, but hard disk drives (HDDs) do not have a delete command. Now we all know HDDs have a fixed capacity, so over time the older data must somehow get removed, right? Actually it is not removed, but overwritten. The operating system (OS) uses a reference table to track the locations (addresses) of all data on the HDD. This table tells the OS which spots on the HDD are used and which are free. When the OS or a user deletes a file from the system, the OS simply marks the corresponding spot in the table as free, making it available to store new data.
The HDD is told nothing about this change, and it does not need to know since it would not do anything with that information. When the OS is ready to store new data in that location, it just sends the data to the HDD and tells it to write to that spot, directly overwriting the prior data. It is simple and efficient, and no delete command is required.
However, with the advent of NAND flash-based solid state drives (SSDs) a new problem emerged. In my blog, Gassing up your SSD, I explain how NAND flash memory pages cannot be directly overwritten with new data, but must first be erased at the block level through a process called garbage collection (GC). I further describe how the SSD uses non-user space in the flash memory (over provisioning or OP) to improve performance and longevity of the SSD. In addition, any user space not consumed by the user becomes what we call dynamic over provisioning – dynamic because it changes as the amount of stored data changes.
When less data is stored by the user, the amount of dynamic OP increases, further improving performance and endurance. The problem I alluded to earlier is caused by the lack of a delete command. Without a delete command, every SSD will eventually fill up with data, both valid and invalid, eliminating any dynamic OP. The result would be the lowest possible performance at that factory OP level. So unlike HDDs, SSDs need to know what data is invalid in order to provide optimum performance and endurance.
Keeping your SSD TRIM
A number of years ago, the storage industry got together and developed a solution between the OS and the SSD by creating a new SATA command called TRIM. It is not a command that forces the SSD to immediately erase data like some people believe. Actually the TRIM command can be thought of as a message from the OS about what previously used addresses on the SSD are no longer holding valid data. The SSD takes those addresses and updates its own internal map of its flash memory to mark those locations as invalid. With this information, the SSD no longer moves that invalid data during the GC process, eliminating wasted time rewriting invalid data to new flash pages. It also reduces the number of write cycles on the flash, increasing the SSD’s endurance. Another benefit of the TRIM command is that more space is available for dynamic OP.
Today, most current operating systems and SSDs support TRIM, and all SandForce Driven™ member SSDs have always supported TRIM. Note that most RAID environments do not support TRIM, although some RAID 0 configurations have claimed to support it. I have presented on this topic in detail previously. You can view the presentation in full here. In my next blog I will explain how there may be an alternate solution using SandForce Driven member SSDs.
The term global warming can be very polarizing in a conversation and both sides of the argument have mountains of material that support or discredit the overall situation. The most devout believers in global warming point to the average temperature increases in the Earth’s atmosphere over the last 100+ years. They maintain the rise is primarily caused by increased greenhouse gases from humans burning fossil fuels and deforestation.
The opposition generally agrees with the measured increase in temperature over that time, but claims that increase is part of a natural cycle of the planet and not something humans can significantly impact one way or another. The US Energy Information Administration estimates that 90% of world’s marketed energy consumption is from non-renewable energy sources like fossil fuels. Our internet-driven lives run through datacenters that are well-known to consume large quantities of power. No matter which side of the global warming argument you support, most people agree that wasting power is not a good long-term position. Therefore, if the power consumed by datacenters can be reduced, especially as we live in an increasingly digitized world, this would benefit all mankind.
When we look at the most power-hungry components of a datacenter, we find mainly server and storage systems. However, people sometimes forget that those systems require cooling to counteract the heat generated. But the cooling itself consumes even more energy. So anything that can store data more efficiently and quickly will reduce both the initial energy consumption and the energy to cool those systems. As datacenters demand faster data storage, they are shifting to solid state drives (SSDs). SSDs generally provide higher performance per watt of power consumed over hard disk drives, but there is still more that can be done.
Reducing data to help turn down the heat
The good news is that there’s a way to reduce the amount of data that reaches the flash memory of the SSD. The unique DuraWrite™ technology found in all LSI® SandForce® flash controllers reduces the amount of data written to the flash memory to cut the time it takes to complete the writes and therefore reduce power consumption, below levels of other SSD technologies. That, in turn, reduces the cooling needed to further reduce overall power consumption. Now this data reduction is “loss-less,” meaning 100% of what is saved is returned to the host, unlike MPEG, JPEG, and MP3 files, which tolerate some amount of data loss to reduce file sizes.
Today you can find many datacenters already using SandForce Driven SSDs and LSI Nytro™ application acceleration products (which use DuraWrite technology as well). When we start to see datacenters deploying these flash storage products by the millions, you will certainly be able to measure the reduction in power consumed by datacenters. Unfortunately, LSI will not be able to claim it stopped global warming, but at least we, and our customers, can say we did something to help defer the end result.
Have you ever run out of gas in your car? Do you often risk running your gas tank dry? Hopefully you are more cautious than that and you start searching for a gas station when you get down to a ¼ tank. You do this because you want plenty of cushion in case something comes up that prevents you from getting to a station before it is too late.
The reason most people stretch their tank is to maximize travel between station visits. The downside to pushing the envelope to “E” is you can end up stranded with a dead vehicle waiting for AAA to bring you some gas.
Now most people know you don’t put gas in a solid state drive (SSD), but the pros and cons associated with how much you leave in the “tank” is very relevant to SSDs.
To understand how these two seemingly unrelated technologies are similar, we first need to drill into some technical SSD details. To start, SSDs act, and often look, like traditional hard disk drives (HDDs), but they do not record data in the same way. SSDs today typically use NAND flash memory to store data and a flash controller to connect the memory with the host computer. The flash controller can write a page of data (often 4,096 bytes) directly to the flash memory, but cannot overwrite the same page of data without first erasing it. The erase cycle cannot expunge only a single page. Instead, it erases a whole block of data (usually 128 pages). Because the stored data is sometimes updated randomly across the flash, the erase cycle for NAND flash requires a process called garbage collection.
Garbage collection is just dumping the trash
Garbage collection starts when a flash block is full of data, usually a mix of valid (good) and invalid (older, replaced) data. The invalid data must be tossed out to make room for new data, so the flash controller copies the valid data of a flash block to a previously erased block, and skips copying the invalid data of that block. The final step is to erase the original whole block, preparing it for new data to be written.
Before and during garbage collection, some data – valid data copied during garbage collection and the (typically) multiple copies of the invalid data – is in two or more locations at once, a phenomenon known as write amplification. To store this extra data not counted by the operating system, the flash controller needs some spare capacity beyond what the operating system knows. This is called over-provisioning (OP), and it is a critical part of every NAND flash-based SSD.
Over-provisioning is like the gas that remains in your tank
While every SSD has some amount of OP, some will have more or less than others. The amount of OP varies depending on trade-offs made between total storage capacity and benefits in performance and endurance. The less OP allocated in an SSD, the more information a user can store. This is like the driver who will take their tank of gas clear down to near-empty just to maximize the total number of miles between station visits.
What many SSD users don’t realize is there are major benefits to NOT stretching this OP area too thin. When you allocate more space for OP, you achieve a lower write amplification, which translates to a higher performance during writes and longer endurance of the flash memory. This is like the driver who is more cautious and visits the gas station more often to enable greater flexibility in selecting a more cost-effective station, and allows for last-minute deviations in travel plans that end up burning more fuel than originally anticipated.
The choice is yours
Most SSD users do not realize they have full control of how much OP is configured in their SSD. So even if you buy an SSD with “0%” OP, you can dedicate some of the user space back to OP for the SSD.
A more detailed presentation of how OP works and what 0% OP really means was presented at the Flash Memory Summit 2012 and can be viewed with this link for your convenience: http://www.lsi.com/downloads/Public/Flash%20Storage%20Processors/LSI_PRS_FMS2012_TE21_Smith.pdf
It pays to be the cautious driver who fills the gas tank long before you get to empty. When it comes to both performance and endurance, your SSD will cover a lot more ground if you treat the over-provisioning space the same way – keeping more in reserve.
I’ve been travelling to China quite a bit over the last year or so. I’m sitting in Shenzhen right now (If you know Chinese internet companies, you’ll know who I’m visiting). The growth is staggering. I’ve had a bit of a trains, planes, automobiles experience this trip, and that’s exposed me to parts of China I never would have seen otherwise. Just to accommodate sheer population growth and the modest increase in wealth, there is construction everywhere – a press of people and energy, constant traffic jams, unending urban centers, and most everything is new. Very new. It must be exciting to be part of that explosive growth. What a market. I mean – come on – there are 1.3 billion potential users in China.
The amazing thing for me is the rapid growth of hyperscale datacenters in China, which is truly exponential. Their infrastructure growth has been 200%-300% CAGR for the past few years. It’s also fantastic walking into a building in China, say Baidu, and feeling very much at home – just like you walked into Facebook or Google. It’s the same young vibe, energy, and ambition to change how the world does things. And it’s also the same pleasure – talking to architects who are super-sharp, have few technical prejudices, and have very little vanity – just a will to get to business and solve problems. Polite, but blunt. We’re lucky that they recognize LSI as a leader, and are willing to spend time to listen to our ideas, and to give us theirs.
Even their infrastructure has a similar feel to the US hyperscale datacenters. The same only different. ;-)
A lot of these guys are growing revenue at 50% per year, several getting 50% gross margin. Those are nice numbers in any country. One has $100’s of billions in revenue. And they’re starting to push out of China. So far their pushes into Japan have not gone well, but other countries should be better. They all have unique business models. “We” in the US like to say things like “Alibaba is the Chinese eBay” or “Sina Weibo is the Chinese Twitter”…. But that’s not true – they all have more hybrid business models, unique, and so their datacenter goals, revenue and growth have a slightly different profile. And there are some very cool services that simply are not available elsewhere. (You listening Apple®, Google®, Twitter®, Facebook®?) But they are all expanding their services, products and user base. Interestingly, there is very little public cloud in China. So there are no real equivalents to Amazon’s services or Microsoft’s Azure. I have heard about current development of that kind of model with the government as initial customer. We’ll see how that goes.
100’s of thousands of servers. They’re not the scale of Google, but they sure are the scale of Facebook, Amazon, Microsoft…. It’s a serious market for an outfit like LSI. Really it’s a very similar scale now to the US market. Close to 1 million servers installed among the main 4 players, and exabytes of data (we’ve blown past mere petabytes). Interestingly, they still use many co-location facilities, but that will change. More important – they’re all planning to probably double their infrastructure in the next 1-2 years – they have to – their growth rates are crazy.
Often 5 or 6 distinct platforms, just like the US hyperscale datacenters. Database platforms, storage platforms, analytics platforms, archival platforms, web server platforms…. But they tend to be a little more like a rack of traditional servers that enterprise buys with integrated disk bays, still a lot of 1G Ethernet, and they are still mostly from established OEMs. In fact I just ran into one OEM’s American GM, who I happen to know, in Tencent’s offices today. The typical servers have 12 HDDs in drive bays, though they are starting to look at SSDs as part of the storage platform. They do use PCIe® flash cards in some platforms, but the performance requirements are not as extreme as you might imagine. Reasonably low latency and consistent latency are the premium they are looking for from these flash cards – not maximum IOPs or bandwidth – very similar to their American counterparts. I think hyperscale datacenters are sophisticated in understanding what they need from flash, and not requiring more than that. Enterprise could learn a thing or two.
Some server platforms have RAIDed HDDs, but most are direct map drives using a high availability (HA) layer across the server center – Hadoop® HDFS or self-developed Hadoop like platforms. Some have also started to deploy microserver archival “bit buckets.” A small ARM® SoC with 4 HDDs totaling 12 TBytes of storage, giving densities like 72 TBytes of file storage in 2U of rack. While I can only find about 5,000 of those in China that are the first generation experiments, it’s the first of a growing wave of archival solutions based on lower performance ARM servers. The feedback is clear – they’re not perfect yet, but the writing is on the wall. (If you’re wondering about the math, that’s 5,000 x 12 TBytes = 60 Petabytes….)
Yes, it’s important, but maybe more than we’re used to. It’s harder to get licenses for power in China. So it’s really important to stay within the envelope of power your datacenter has. You simply can’t get more. That means they have to deploy solutions that do more in the same power profile, especially as they move out of co-located datacenters into private ones. Annually, 50% more users supported, more storage capacity, more performance, more services, all in the same power. That’s not so easy. I would expect solar power in their future, just as Apple has done.
Here’s where it gets interesting. They are developing a cousin to OpenCompute that’s called Scorpio. It’s Tencent, Alibaba, Baidu, and China Telecom so far driving the standard. The goals are similar to OpenCompute, but more aligned to standardized sub-systems that can be co-mingled from multiple vendors. There is some harmonization and coordination between OpenCompute and Scorpio, and in fact the Scorpio companies are members of OpenCompute. But where OpenCompute is trying to change the complete architecture of scale-out clusters, Scorpio is much more pragmatic – some would say less ambitious. They’ve finished version 1 and rolled out about 200 racks as a “test case” to learn from. Baidu was the guinea pig. That’s around 6,000 servers. They weren’t expecting more from version 1. They’re trying to learn. They’ve made mistakes, learned a lot, and are working on version 2.
Even if it’s not exciting, it will have an impact because of the sheer size of deployments these guys are getting ready to roll out in the next few years. They see the progression as 1) they were using standard equipment, 2) they’re experimenting and learning from trial runs of Scorpio versions 1 and 2, and then they’ll work on 3) new architectures that are efficient and powerful, and different.
Information is pretty sketchy if you are not one of the member companies or one of their direct vendors. We were just invited to join Scorpio by one of the founders, and would be the first group outside of China to do so. If that all works out, I’ll have a much better idea of the details, and hopefully can influence the standards to be better for these hyperscale datacenter applications. Between OpenCompute and Scorpio we’ll be seeing a major shift in the industry – a shift that will undoubtedly be disturbing to a lot of current players. It makes me nervous, even though I’m excited about it. One thing is sure – just as the server market volume is migrating from traditional enterprise to hyperscale datacenter (25-30% of the server market and growing quickly), we’re starting to see a migration to Chinese hyperscale datacenters from US-based ones. They have to grow just to stay still. I mean – come on – there are 1.3 billion potential users in China….
Tags: Alibaba, Amazon, Apple, ARM, Baidu, China, China Telecom, datacenter, Facebook, Google, Hadoop, hard disk drive, HDD, hyperscale, Microsoft, OpenCompute, Scorpio, Shenzhen, Sina Weibo, solid state drive, SSD, Tencent, Twitter
There’s no need to wait for higher speed. Server builders can take advantage of 12Gb/s SAS now. And this is even as HDD and SSD makers continue to tweak, tune and otherwise prepare their 12Gb/s SAS products for market. The next generation of 12Gb/s SAS without supporting drives? What gives?
It’s simple. LSI is already producing 12Gb/s ROC and IOC solutions, meaning that customers can take advantage of 12Gb/s SAS performance today with currently shipping systems and storage. As for the numbers, LSI 12Gb/s SAS enables performance increases of up to 45% in throughput and up to 58% in IOPS when compared to 6Gb/s SAS.
True, 12Gb/s SAS isn’t a Big Bang Disruption in storage systems; rather it’s an evolutionary change, but a big step forward. It may not be clear why it matters so much, so I want to briefly explain. In latest generation PCIe 3 systems, 6Gb/s SAS is the bottleneck that prevents systems from achieving full PCIe 3 throughput of 6,400 MB/s.
With 12Gb/s SAS, customers will be able to take full advantage of the performance of PCIe 3 systems. Earlier this month at CeBIT computer expo in Hanover, Germany, we announced that we are the first to ship production-level 12Gb/s SAS ROC (RAID on Chip) and IOC (I/O Controllers) to OEM customers. This convergence of new technologies and the expansion of existing capabilities create significant improvements for datacenters of all kinds.
At CeBIT, we demonstrated our 12Gb/s SAS solutions with the unique DataBoltTM feature and how, with DataBolt, systems with 6Gb/s SAS HDDs can achieve 12Gb/s SAS performance.
DataBolt uses bandwidth aggregation to create throughput performance acceleration. Most importantly, customers don’t have to wait for the next inflection in drive design to get the highest possible performance and connectivity.
Most people fully understand that electronics are useless without power, but what happens when devices lose power in the middle of operating? That answer is highly dependent on a number of variables including what type of electronic device is in question.
For solid state drives (SSDs) the answer depends on factors such as whether an uninterruptable power supply (UPS) is connected, what controller or flash processor is used, the design of the power circuit of the SSD, and the type of memory. If an SSD is in the middle of a write operation to the flash memory and power to the SSD is disconnected, many bad things could happen if the right safety measures are not in place. Many users do not think about all the non-user initiated operations the SSDs may be performing like background garbage collection that could be under way when the power fails. Without the correct protection, in most cases data will be corrupted.
According to the Nielsen company, 108.4 million viewers were tuned into the 2013 Super Bowl in New Orleans only to be shocked to witness the power go down for 34 minutes in the middle of the game. If power can be lost during such an incredibly high profile event such as this, it can happen just about anywhere.
Inside the New Orleans Superdome stadium operations and broadcast server rooms
Enterprise computing environments typically have multiple servers with data connections and lots of storage. Over the past few years, a larger percentage of the storage is kept on SSDs for the very active or “hot” data. This greatly improves data access time and reduces overall latency of the system. Enterprise servers are often connected to UPS systems that supply the connected devices with temporary power during a power failure.
Usually this is enough power to support uninterrupted system operations until power is restored, or at least until technicians can complete their current work and shut down safely. However there are many times when UPS systems are not deployed or fail to operate properly themselves. In those cases, the server will experience a power failure as abrupt as if someone had yanked the power cord from the wall socket.
The LSI® SandForce® flash controllers are at the heart of many popular SSDs sold today. The flash controller connects the host computer with the flash memory to store user data in fast non-volatile memory. The SandForce flash controllers are specifically engineered to operate in different environments, and the SF-2500/2600 FSPs are designed to provide the high level of data integrity required for enterprise applications. In the area of power failure protection, they include a combination of firmware (FW) and hardware circuitry that monitors the power coming into the SSD. In the event of a power failure or even a brown-out, the flash controller is alerted to the situation and hold-up capacitors in the SSD provide the necessary power and time for the controller to complete pending writes to the flash memory. This same circuit is also designed to prevent the risk of lower page corruption with Multi-level Cell (MLC) memory.
Watch out for SSD solutions that provide backup capacitors, but lack the necessary support circuitry and special firmware to ensure the data is fully committed to the flash memory before the power runs out. Even if these other special circuits are present, only truly enterprise SSDs that are meticulously designed and tested to withstand power failures are up to the task of storing and protecting highly critical data.
In the control room and down on the field
The usage patterns of non-enterprise systems like notebooks and ultrabooks call for a different power failure support mechanism. Realize that when you have a notebook or ultrabook system, you have a built in mini-UPS system. A power outage from the wall socket has no impact to the system until the battery gets low. At that point the operating system will tell the computer to shut down and that will be ample warning for the SSD to safely shut down and ensure the integrity of the data. But what if the operating system locks up and does not warn the SSD or the system is an A/C-powered desktop without a battery?
The LSI SF-2100/2200 FSPs are purpose-built for these client environments and operate with the assumption that power could disappear at any point in time. They use special FW techniques so that even without a battery present, as is the case with desktop systems, they greatly limit the potential for data loss.
The naked facts
It should be clear that the answer to the original question is highly dependent upon the flash controller at the heart of the SSD. Without having the critical features discussed above and designed into the LSI SandForce flash controllers, it is very possible to lose data during a power failure. The LSI SandForce flash controllers are engineered to withstand power failures like the one that hit New Orleans at the Super Bowl, but don’t expect them to fix wardrobe malfunctions.
Merging the working cultures of two different companies can be a very complex task. In my past experience with these situations I have not typically found the result to be highly positive for the employees of the incoming company.
As one example, the acquiring company may tell the employees they will be able to keep their startup environment and mentality, but within one or two quarters nearly all of those attributes are eliminated and the employees just become another cog in the bigger machine. After that the drive and creativity that was fostered with that startup mentality often disappears.
Year two and still happily married
In the first week of 2012, LSI completed the acquisition of SandForce, further expanding its growing coverage of flash memory technology IP. SandForce had grown significantly since its emergence from stealth mode in 2009 to become a leading provider of flash controllers for enterprise, cloud and client solid state storage solutions.
The SandForce team was kept in whole and created the Flash Components Division of LSI. The team even continues to reside in the same building it had occupied the prior year, further supporting the internal feeling of their original startup mentality. Most of the original culture of SandForce was kept intact and that enabled the transition into the larger LSI Corporation much easier for most people. Because those changes were spread out over a longer period of time it was easier for the team to digest with minimal disruption to their daily flow.
For the business side of the acquisition, there has been significant upside as a result of the merger. Over the last 14 months, both companies have invested significant time and resources to leverage the SandForce® flash controller technology across the company. LSI had already designed a high-end enterprise solution using the SandForce technology in their PCIe based Nytro™ product line. With both companies now under one umbrella, the engineering teams are free to develop advanced capabilities between the products to enable deeper integration that will result in greater customer benefits.
Greater efficiency … for the sake of customers
As a single larger company, LSI is now able to redistribute engineering and support resources as needed to better align with the quick expansion of flash memory storage solutions for its customers. It is also much easier to ensure a higher level of interoperability between related products and solutions. Many of the customers already purchasing LSI products can use the same sales and support teams already in place to access and incorporate a larger set of solutions from LSI.
Enterprise storage manufacturers have millions if not billions of dollars of revenue and their reputation at stake when they select new and emerging technologies like flash memory to provide storage for their customers. There is always a level of concern when these companies work with smaller startup organizations. The high-technology industry is full of company names that had a great technology, but for one reason or another they could not sustain the business and fell into oblivion. The acquisition of SandForce by LSI adds the support and confidence of a multi-billion dollar company to help assuage any possible concerns of those enterprise manufacturers.
Personally, as one of the early SandForce employees, I have found the acquisition to be very beneficial to everyone involved including employees, customers, and end users. I look forward to further advancements we will make to the flash industry as we continue to drive flash memory into the storage industry.