The lifeblood of any online retailer is the speed of its IT infrastructure. Shoppers arenâ€™t infinitely patient. Sluggish infrastructure performance can make shoppers wait precious seconds longer than they can stand, sending them fleeing to other sites for a faster purchase. Our federal governmentâ€™s halting rollout of the Health Insurance Marketplace website is a glaring example of what can happen when IT infrastructure isnâ€™t solid. A few bad user experiences that go viral can be damaging enough. Tens of thousands can be crippling. Â
In hyperscale datacenters, any number of problems including network issues, insufficient scaling and inconsistent management can undermine end usersâ€™ experience. But one that hits home for me is the impact of slow storage on the performance of databases, where the data sits. With the database at the heart of all those online transactions, retailers can ill afford to haveÂ their tier of database servers operating at anything less than peak performance.
Slow storage undermines database performance
Typically,Â Web 2.0 and e-commerce companies run relational databases (RDBs) on these massive server-centric infrastructures. (Take a look at my blog last week to get a feel for the size of these hyperscale datacenter infrastructures). If you are running that many servers to support millions of users, you are likely using some kind of open-sourced RDB such as MySQL or other variations. Keep in mind that Oracle 11gR2 likely retails around $30K per core but MSQL is free. But the performance of both, and most other relational databases, suffer immensely when transactions are retrieving data from storage (or disk). You can only throw so much RAM and CPU power at the performance problem â€¦ sooner rather than later you have to deal with slow storage.
Almost everyone in industry â€“ Web 2.0, cloud, hyperscale and other providers of massive database infrastructures â€“ is lining up to solve this problem the best way they can. How? By deploying flash as the sole storage for database servers and applications. But is low-latency flash enough? For sheer performance it beats rotational disk hands down. But â€¦ even flash storage has its limitations, most notably when you are trying to drive ultra-low latencies for write IOs. Most IO accesses by RDBs, which do the transactional processing, are a mix or read/writes to the storage. Specifically, the mix is 70%/30% reads/writes. These are also typically low q-depth accesses (less than 4). It is those writes that can really slow things down.
PCIe flash reduces write latencies
The good news is that the right PCIe flash technology in the mix can solve the slowdowns. Some interesting PCIe flash technologies designed to tackle this latency problem are on display atÂ AISÂ this week. DRAM and in particular NVDRAM are being deployed as a tier in front of flash to really tackle those nasty write latencies.
Among other demos, weâ€™re showing how a Nytroâ„˘ 6000 series PCIe flash cardÂ helps solve the MySQL database performance issues. The typical response time for a small data read (this is what the database will see for a Database IO) from an HDD is 5ms. Flash-based devices such as the Nytro WarpDriveÂ® card can complete the same read in less than 50ÎĽs on average during testing, an improvement of several orders-of-magnitude in response time. This response time translates to getting much higher transactions out of the same infrastructure â€“ but with less space (flash is denser) and a lot less power (flash consumes a lot lower power than HDDs).
Weâ€™re also showing the Nytro 7000 series PCIe flash cards. They reach even lower write latencies than the 6000 series and very low q-depths.Â The 7000 series cards also provide DRAM buffering while maintaining data-integrity even in the event of a power loss.
For online retailers and other businesses, higher database speeds mean more than just faster transactions. They canÂ help keep those cash registers ringing.
Tags: AIS, database, DRAM, e-commerce, flash, flash memory, hard disk drive, HDD, hyperscale datacenter, latency, MySQL, NVDRAM, Nytro 6000, Nytro 7000, Nytro WarpDrive, Oracle, PCIe flash, relational database, storage latency, web 2.0, write latency
The United Nations finding that mobile broadband subscriptions are surging in developing countries, reported by The New York Times on Sept. 26, is no surprise. Equally unsurprising, the growing number of users, density of users and increasing bandwidth needs of applications likely are continuing to strain existing wireless networks and per-user bandwidths not only in developing countries but worldwide.
But rising pressure on bandwidth, coupled with increasingly data-intensive applications, isnâ€™t the whole story. Minimizing end-to-end latency â€“ from user to network base station and back again â€“ is crucial in enabling banking, e-commerce, enterprise and other important business applications.Â Why? The greater the latency, the more likely visitors are to lose interest if the responsiveness of the website is sluggish. A connection may have plenty of throughput over a period of time, but response time determines the user experience.
The bandwidth-per-user and end-to-end network latency constraints are bound to drive changes both to the front haul and backhaul access networks. LTE and WiFi seem to be clear winners for the front haul network (replacing wired LAN technologies). On the backhaul, given the capacity needs, wired and wireless networks are bound to converge but will likely offer many options that will continue to co-exist like LTE, Fiber, Cable, xDSL and Microwave.
For our part, LSI has deep experience building mission-critical networks for service providers and datacenters â€“ an expertise that has been brought to bear on the development of LSIÂ® AxxiaÂ® networking solutions. These smart chips help solve the latency problem by enabling reliable, deterministic network performance to, ultimately, quicken response times and improve the user experience.
And that, after all, is just what network providers and users are after as mobile devices continueÂ to support more applications and rising performance expectations worldwide.
I want to warn you, there is some thick background information here first. But donâ€™t worry. Iâ€™ll get to the meat of the topic and thatâ€™s this: Ultimately, I think thatÂ PCIeÂ® cards will evolve to more external, rack-level, pooled flash solutions, without sacrificing all their great attributes today. This is just my opinion, but other leaders in flash are going down this path tooâ€¦
Iâ€™ve been working on enterprise flash storage since 2007 â€“ mulling over how to make it work. Endurance, capacity, cost, performance have all been concerns that have been grappled with. Of course the flash is changing too as the nodes change: 60nm, 50nm, 35nm, 24nm, 20nmâ€¦ and single level cell (SLC) to multi level cell (MLC) to triple level cell (TLC) and all the variants of these â€śtrimmedâ€ť for specific use cases. The spec â€śenduranceâ€ť has gone from 1 million program/erase cycles (PE) to 3,000, and in some cases 500.
Itâ€™s worth pointing out that almost all the â€śmagicâ€ť that has been developed around flash was already scoped out in 2007. It just takes a while for a whole new industry to mature. Individual die capacity increased, meaning fewer die are needed for a solution â€“ and that means less parallel bandwidth for data transferâ€¦ And the â€śrequirementâ€ť for state-of-the-art single operation write latency has fallen well below the write latency of the flash itself. (What the ?? Yea â€“ talk about that later in some other blog. But flash is ~1500uS write latency, where state of the art flash cards are ~50uS.) When I describe the state of technology it sounds pretty pessimistic. Â Iâ€™m not. Weâ€™ve overcome a lot.
We built our first PCIe card solution at LSI in 2009. It wasnâ€™t perfect, but it was better than anything else out there in many ways. Weâ€™ve learned a lot in the years since â€“ both from making them, and from dealing with customer and users â€“ both of our own solutions and our competitors.Â Weâ€™re lucky to be an important player in storage, so in general the big OEMs, large enterprises and theÂ hyperscale datacenters all want to talk with us â€“ not just about what we have or can sell, but what we could have and what we could do. Theyâ€™re generous enough to share what works and what doesnâ€™t. What the values of solutions are and what the pitfalls are too. Honestly? Itâ€™s theÂ hyperscale datacenters in the lead both practically and in vision.
If you havenâ€™tÂ nodded off to sleep yet, thatâ€™s a long-winded way of saying â€“ things have changed fast, and, boy, weâ€™ve learned a lot in just a few years.
Most important thing weâ€™ve learnedâ€¦
Most importantly, weâ€™ve learned itâ€™s latency that matters. No one is pushing the IOPs limits of flash, and no one is pushing the bandwidth limits of flash. But they sure are pushing the latency limits.
PCIe cards are great, butâ€¦
Weâ€™ve gotten lots of feedback, and one of the biggest things weâ€™ve learned is â€“ PCIe flash cards are awesome. They radically change performance profiles of most applications, especially databases allowing servers to run efficiently and actual work done by that server to multiply 4x to 10x (and in a few extreme cases 100x). So the feedback we get from large users is â€śPCIe cards are fantastic. Weâ€™re so thankful they came along. Butâ€¦â€ť Thereâ€™s always a â€śbut,â€ť right??
It tends to be a pretty long list of frustrations, and they differ depending on the type of datacenter using them. Weâ€™re not the only ones hearing it. To be clear, none of these are stopping people from deploying PCIe flashâ€¦ the attraction is just too compelling. But the problems are real, and they have real implications, and the market is asking for real solutions.
Of course, everyone wants these fixed without affecting single operation latency, or increasing cost, etc. Thatâ€™s what weâ€™re here for though â€“ right? Solve the impossible?
A quick summary is in order. Itâ€™s not looking good. For a given solution, flash is getting less reliable, there is less bandwidth available at capacity because there are fewer die, weâ€™re driving latency way below the actual write latency of flash, and weâ€™re not satisfied with the best solutions we have for all the reasons above.
If you think these through enough, you start to consider one basic path. It also turns out weâ€™re not the only ones realizing this. Where will PCIe flash solutions evolve over the next 2, 3, 4 years? The basic goals are:
One easy answer would be â€“ thatâ€™s a flash SAN or NAS. But thatâ€™s not the answer. Not many customers want a flash SAN or NAS â€“ not for their new infrastructure, but more importantly, all the data is at the wrong end of the straw. The poor server is left sucking hard. Remember â€“ this is flash, and people use flash for latency. Today these SAN type of flash devices have 4x-10x worse latency than PCIe cards. Ouch. You have to suck the data through a relatively low bandwidth interconnect, after passing through both the storage and network stacks. And there is interaction between the I/O threads of various servers and applications â€“ you have to wait in line for that resource. Itâ€™s true there is a lot of startup energy in this space. Â It seems to make sense if youâ€™re a startup, because SAN/NAS is what people use today, and thereâ€™s lots of money spent in that market today. However, itâ€™s not what the market is asking for.
Another easy answer is NVMe SSDs. Right? Everyone wants them â€“ right? Well, OEMs at least. Front bay PCIe SSDs (HDD form factor or NVMe â€“ lots of names) that crowd out your disk drive bays. But they donâ€™t fix the problems. The extra mechanicals and form factor are more expensive, and just make replacing the cards every 5 years a few minutes faster. Wow. With NVME SSDs, you can fit fewer HDDs â€“ not good. They also provide uniformly bad cooling, and hard limit power to 9W or 25W per device. But to protect the storage in these devices, you need to have enough of them that you can RAID or otherwise protect. Once you have enough of those for protection, they give you awesome capacity, IOPs and bandwidth, too much in fact, but thatâ€™s not what applications need â€“ they need low latency for the working set of data.
What do I think the PCIe replacement solutions in the near future will look like? You need to pool the flash across servers (to optimize bandwidth and resource usage, and allocate appropriate capacity). You need to protect against failures/errors and limit the span of failure,Â commit writes at very low latency (lower than native flash) and maintain low latency, bottleneck-free physical links to each serverâ€¦ To me that implies:
That means the performance looks exactly as if each server had multiple PCIe cards. But the capacity and bandwidth resources are shared, and systems can remain resilient. So ultimately, I think that PCIe cards will evolve to more external, rack level, pooled flash solutions, without sacrificing all their great attributes today. This is just my opinion, but as I say â€“ other leaders in flash are going down this path tooâ€¦
Whatâ€™s your opinion?
Tags: DAS, datacenter, direct attached storage, enterprise IT, flash, hard disk drive, HDD, hyperscale, latency, NAS, network attached storage, NVMe, PCIe, SAN, solid state drive, SSD, storage area network