Mad Scientist in his Flash Analytics Lab

For many years LSI has known the importance of truly understanding the complexities of interfacing with NAND flash memory to optimize its performance and lifetime. For that reason LSI created a group focused on characterizing NAND flash behavior as it interfaces with LSI flash controllers. I recently spoke to LSI’s expert in this area, Bill Hunt, Engineering Director Flash Analytics at LSI, to better understand what his group produces for LSI and how that translates into better solutions for our customers.

Q: Is all NAND flash created equal?
Bill: Definitely not. NAND flash specs, performance and ratings not only vary from vendor to vendor, they also vary between process geometries, between models within the same NAND family, and over the production life – especially during early production ramp. Also, NAND vendors intentionally create unequal models of the same part to address their different markets, like client and enterprise. Understanding the difference between NAND types is critical to building a robust solution.

Q: How does NAND vary from vendor to vendor?
Bill: There are really two levels of differences among NAND vendors, differences as a result of different architectures and differences when NAND vendors share architectures. For NAND vendors with completely different designs and fab processes, there are many differences in the NAND specifications. Some of the differences between NAND devices include different pin-outs, power requirements, block and page layouts, addressing schemes, timing specifications, commands, and read recovery procedures. I could go on.

Some NAND vendors have common designs and fab processes. But even these devices can have significant operational differences for each vendor. Each device can have unique features enabled with different device trims (editor’s note: manufacturing settings), command, diagnostics, and read recovery steps. Even using standard interfaces like ONFI and Toggle doesn’t guarantee common operations. Each vendor has their own interpretation and implementation of these standards.

Q: How does NAND vary from generation to generation?
Bill: Shrinking the process geometry requires a new device architecture. The new architecture drives changes to the operation and specification of a NAND device. The greatest changes are driven by NAND capacity increases. For example, the size and layout of the planes, blocks, and pages have to be modified to deal with the new architecture and increased capacity. Since the NAND cells are smaller and closer together, the error handling capability also has to be increased. The error-correcting code (ECC) requirements and resulting spare areas increase. The NAND also has to change to deal with increased bad block rates. The data rate and performance of each generation must also improve to keep up with what users are asking for. This drives changes to the interface timing specifications and adds new feature sets. In general NAND endurance gets worse with shrinking geometries and it is critical to understand changes due to new generations to develop more powerful and effective ECC algorithms.

Q: Does LSI have any dedicated facility to evaluate NAND from different suppliers?
Bill: Yes, LSI’s Flash Analytics lab is dedicated to evaluating and characterizing NAND flash that will be used with LSI flash controllers.

Graph showing results from testing NAND flash memory
Q: What kinds of testing does LSI do in the Flash Analytics lab?
Bill: The Flash Analytics lab has two main functions. First, we integrate NAND devices into solid-state drives (SSDs) with LSI SandForce controllers to ensure they work well together. Second, we characterize NAND devices to see how the NAND flash performs and operates over the lifetime of the device. We do this in various operational modes. It is critical to understand the behavior of the raw NAND to design and develop solutions with the reliability and performance demanded by the market.

Q: Does LSI test flash memory beyond their rated lifetimes?
Bill: Yes. NAND vendors do not always share their own characterization testing results beyond their rated endurance limit, so we gather that data. Typically we perform program-erase cycles on devices until very poor raw bit error rate is achieved or a catastrophic error occurs. We also exceed other specifications, such as retention limits and read disturb limits. Understanding what happens to flash as it ages gives us valuable information on how devices might fail in real-world scenarios.

Q: What type of data is generated from all the tests that are conducted?
Bill: We generate a characterization report for each device we test. This report compares our results to the vendor specifications, including graphs of error rates vs. program-erase cycles for different retention limits and error correction limits. The report also evaluates the effects from read disturb over endurance and retention lifetimes. Other sections include an analysis of the physical location of errors, and read recovery effectiveness. We also evaluate the impact to performance over the life of the drive.

Q: What does LSI do with this data?
Bill: First, we use it to validate the flash vendor specifications. Second, we use it to design and optimize our LSI SandForce flash controller designs. In particular, we use the data to optimize our error recovery and SHIELD technology. We also use it to evaluate possible trade-offs: for example, to trade performance for increasing the endurance of the NAND and extending its life. Last, we share information with customers whenever possible. The goal of collecting this data is to develop the most advanced ECC possible to increase SSD reliability, endurance, and performance.

Q: Is LSI able to generate better products because we collect this data?
Bill: The information we gather in the LSI Flash Analytics lab has certainly helped improve our products. Our testing has improved quality by assuring NAND parts are meeting the vendor specs. When we show our data to the NAND vendor, they are more motivated to share their detailed data with us. Our lab is also equipped to run specific tests to help diagnose problems seen with our products during qualification and production. As an example, we have run tests to run specific tests to evaluate read recovery issues and physical location stress. We also gather raw data during our characterization testing that is used by our product architecture team. The raw data is fed into simulation models and used to optimize our flash channel and SHIELD technology. In a nutshell, our improved understanding of flash memory helps us build better flash controllers – which helps our customers build better SSDs.

Q: Does LSI work closely with NAND vendors on this analysis?
Bill: Yes, we have regular meetings with all of the NAND flash vendors that our flash controllers support. We work closely with NAND vendors to assure we have the latest information. We not only make sure we have the latest roadmaps, datasheets and application notes, but we get clarifications about flash operations, quality and performance. We share the characterization data we collect, and get insight to our results. We also keep the NAND vendors informed about our controller roadmap and features so they assure their products are tracking with us.

Mastering NAND flash memory is critical to flash controller development success
Any NAND Flash memory controller developer would be remiss if the engineers did not perform in-depth testing and characterization to better understand this very complex technology. Also, any company that can support more than one flash vendor must be able to understand the differences between manufacturers to better design and modify the controller to support the widest selection of NAND flash providing the greatest flexibility for their customers.

Tags: , , , , , , , ,
Views: (1386)