I want to warn you, there is some thick background information here first. But don’t worry. I’ll get to the meat of the topic and that’s this: Ultimately, I think that PCIe® cards will evolve to more external, rack-level, pooled flash solutions, without sacrificing all their great attributes today. This is just my opinion, but other leaders in flash are going down this path too…
I’ve been working on enterprise flash storage since 2007 – mulling over how to make it work. Endurance, capacity, cost, performance have all been concerns that have been grappled with. Of course the flash is changing too as the nodes change: 60nm, 50nm, 35nm, 24nm, 20nm… and single level cell (SLC) to multi level cell (MLC) to triple level cell (TLC) and all the variants of these “trimmed” for specific use cases. The spec “endurance” has gone from 1 million program/erase cycles (PE) to 3,000, and in some cases 500.
It’s worth pointing out that almost all the “magic” that has been developed around flash was already scoped out in 2007. It just takes a while for a whole new industry to mature. Individual die capacity increased, meaning fewer die are needed for a solution – and that means less parallel bandwidth for data transfer… And the “requirement” for state-of-the-art single operation write latency has fallen well below the write latency of the flash itself. (What the ?? Yea – talk about that later in some other blog. But flash is ~1500uS write latency, where state of the art flash cards are ~50uS.) When I describe the state of technology it sounds pretty pessimistic. I’m not. We’ve overcome a lot.
We built our first PCIe card solution at LSI in 2009. It wasn’t perfect, but it was better than anything else out there in many ways. We’ve learned a lot in the years since – both from making them, and from dealing with customer and users – about our own solutions and our competitors. We’re lucky to be an important player in storage, so in general the big OEMs, large enterprises and the hyperscale datacenters all want to talk with us – not just about what we have or can sell, but what we could have and what we could do. They’re generous enough to share what works and what doesn’t. What the values of solutions are and what the pitfalls are too. Honestly? It’s the hyperscale datacenters in the lead both practically and in vision.
If you haven’t nodded off to sleep yet, that’s a long-winded way of saying – things have changed fast, and, boy, we’ve learned a lot in just a few years.
Most important thing we’ve learned…
Most importantly, we’ve learned it’s latency that matters. No one is pushing the IOPs limits of flash, and no one is pushing the bandwidth limits of flash. But they sure are pushing the latency limits.
PCIe cards are great, but…
We’ve gotten lots of feedback, and one of the biggest things we’ve learned is – PCIe flash cards are awesome. They radically change performance profiles of most applications, especially databases allowing servers to run efficiently and actual work done by that server to multiply 4x to 10x (and in a few extreme cases 100x). So the feedback we get from large users is “PCIe cards are fantastic. We’re so thankful they came along. But…” There’s always a “but,” right??
It tends to be a pretty long list of frustrations, and they differ depending on the type of datacenter using them. We’re not the only ones hearing it. To be clear, none of these are stopping people from deploying PCIe flash… the attraction is just too compelling. But the problems are real, and they have real implications, and the market is asking for real solutions.
Of course, everyone wants these fixed without affecting single operation latency, or increasing cost, etc. That’s what we’re here for though – right? Solve the impossible?
A quick summary is in order. It’s not looking good. For a given solution, flash is getting less reliable, there is less bandwidth available at capacity because there are fewer die, we’re driving latency way below the actual write latency of flash, and we’re not satisfied with the best solutions we have for all the reasons above.
If you think these through enough, you start to consider one basic path. It also turns out we’re not the only ones realizing this. Where will PCIe flash solutions evolve over the next 2, 3, 4 years? The basic goals are:
One easy answer would be – that’s a flash SAN or NAS. But that’s not the answer. Not many customers want a flash SAN or NAS – not for their new infrastructure, but more importantly, all the data is at the wrong end of the straw. The poor server is left sucking hard. Remember – this is flash, and people use flash for latency. Today these SAN type of flash devices have 4x-10x worse latency than PCIe cards. Ouch. You have to suck the data through a relatively low bandwidth interconnect, after passing through both the storage and network stacks. And there is interaction between the I/O threads of various servers and applications – you have to wait in line for that resource. It’s true there is a lot of startup energy in this space. It seems to make sense if you’re a startup, because SAN/NAS is what people use today, and there’s lots of money spent in that market today. However, it’s not what the market is asking for.
Another easy answer is NVMe SSDs. Right? Everyone wants them – right? Well, OEMs at least. Front bay PCIe SSDs (HDD form factor or NVMe – lots of names) that crowd out your disk drive bays. But they don’t fix the problems. The extra mechanicals and form factor are more expensive, and just make replacing the cards every 5 years a few minutes faster. Wow. With NVME SSDs, you can fit fewer HDDs – not good. They also provide uniformly bad cooling, and hard limit power to 9W or 25W per device. But to protect the storage in these devices, you need to have enough of them that you can RAID or otherwise protect. Once you have enough of those for protection, they give you awesome capacity, IOPs and bandwidth, too much in fact, but that’s not what applications need – they need low latency for the working set of data.
What do I think the PCIe replacement solutions in the near future will look like? You need to pool the flash across servers (to optimize bandwidth and resource usage, and allocate appropriate capacity). You need to protect against failures/errors and limit the span of failure, commit writes at very low latency (lower than native flash) and maintain low latency, bottleneck-free physical links to each server… To me that implies:
That means the performance looks exactly as if each server had multiple PCIe cards. But the capacity and bandwidth resources are shared, and systems can remain resilient. So ultimately, I think that PCIe cards will evolve to more external, rack level, pooled flash solutions, without sacrificing all their great attributes today. This is just my opinion, but as I say – other leaders in flash are going down this path too…
What’s your opinion?
Tags: DAS, datacenter, direct attached storage, enterprise IT, flash, hard disk drive, HDD, hyperscale, latency, NAS, network attached storage, NVMe, PCIe, SAN, solid state drive, SSD, storage area network