I was asked some interesting questions recently by CEO & CIO, a Chinese business magazine. The questions ranged from how Chinese Internet giants like Alibaba, Baidu and Tencent differ from other customers and what leading technologies big Internet companies have created to questions about emerging technologies such as software-defined storage (SDS) and software-defined datacenters (SDDC) and changes in the ecosystem of datacenter hardware, software and service providers. These were great questions. Sometimes you need the press or someone outside the industry to ask a question that makes you step back and think about what’s going on.
I thought you might interested, so this blog, the first of a 3-part series covering the interview, shares details of the first two questions.
CEO & CIO: In recent years, Internet companies have built ultra large-scale datacenters. Compared with traditional enterprises, they also take the lead in developing datacenter technology. From an industry perspective, what are the three leading technologies of ultra large-scale Internet data centers in your opinion? Please describe them.
There are so many innovations and important contributions to the industry from these hyperscale datacenters in hardware, software and mechanical engineering. To choose three is difficult. While I would prefer to choose hardware innovations as their big ones, I would suggest the following as they have changed our world and our industry and are changing our hardware and businesses:
Autonomous behavior and orchestration
An architect at Microsoft once told me, “If we had to hire admins for our datacenter in a normal enterprise way, we would hire all the IT admins in the world, and still not have enough.” There are now around 1 million servers in Microsoft datacenters. Hyperscale datacenters have had to develop autonomous, self-managing, sometimes self-deploying datacenter infrastructure simply to expand. They are pioneering datacenter technology for scale – innovating, learning by trial and error, and evolving their practices to drive more work/$. Their practices are specialized but beginning to be emulated by the broader IT industry. OpenStack is the best example of how that specialized knowledge and capability is being packaged and deployed broadly in the industry. At LSI, we’re working with both hyperscale and orchestration solutions to make better autonomous infrastructure.
High availability at datacenter level vs. machine level
As systems get bigger they have more components, more modes of failure and they get more complex and expensive to maintain reliability. As storage is used more, and more aggressively, drives tend to fail. They are simply being used more. And yet there is continued pressure to reduce costs and complexity. By the time hyperscale datacenters had evolved to massive scale – 100’s of thousands of servers in multiple datacenters – they had created solutions for absolute reliability, even as individual systems got less expensive, less complex and much less reliable. This is what has enabled the very low cost structures of the cloud, and made it a reliable resource.
These solutions are well timed too, as more enterprise organizations need to maintain on-premises data across multiple datacenters with absolute reliability. The traditional view that a single server requires 99.999% reliability is giving way to a more pragmatic view of maintaining high reliability at the macro level – across the entire datacenter. This approach accepts the failure of individual systems and components even as it maintains data center level reliability. Of course – there are currently operational issues with this approach. LSI has been working with hyperscale datacenters and OEMs to engineer improved operational efficiency and resilience, and minimized impact of individual component failure, while still relying on the datacenter high-availability (HA) layer for reliability.
It’s such an overused term. It’s difficult to believe the term barely existed a few years ago. The gift of Hadoop® to the industry – an open source attempt to copy Google® MapReduce and Google File System – has truly changed our world unbelievably quickly. Today, Hadoop and the other big data applications enable search, analytics, advertising, peta-scale reliable file systems, genomics research and more – even services like Apple® Siri run on Hadoop. Big data has changed the concept of analytics from statistical sampling to analysis of all data. And it has already enabled breakthroughs and changes in research, where relationships and patterns are looked for empirically, rather than based on theories.
Overall, I think big data has been one of the most transformational technologies this century. Big data has changed the focus from compute to storage as the primary enabler in the datacenter. Our embedded hard disk controllers, SAS (Serial Attached SCSI) host bus adaptors and RAID controllers have been at the heart of this evolution. The next evolutionary step in big data is the broad adoption of graph analysis, which integrates the relationship of data, not just the data itself.
CEO & CIO: Due to cloud computing, mobile connectivity and big data, the traditional IT ecosystem or industrial chain is changing. What are the three most important changes in LSI’s current cooperation with the ecosystem chain? How does LSI see the changes in the various links of the traditional ecosystem chain? What new links are worth attention? Please give some examples.
Cloud computing and the explosion of data driven by mobile devices and media has and continues to change our industry and ecosystem contributors dramatically. It’s true the enterprise market (customers, OEMs, technology, applications and use cases) has been pretty stable for 10-20 years, but as cloud computing has become a significant portion of the server market, it has increasingly affected ecosystem suppliers like LSI.
Timing: It’s no longer enough to follow Intel’s ticktock product roadmap. Development cycles for datacenter solutions used to be 3 to 5 years. But these cycles are becoming shorter. Now, demand for solutions is closer to 6 months – forcing hardware vendors to plan and execute to far tighter development cycles. Hyperscale datacenters also need to be able to expand resources very quickly, as customer demand dictates. As a result they incorporate new architectures, solutions and specifications out of cycle with the traditional Intel roadmap changes. This has also disrupted the ecosystem.
End customers: Hyperscale datacenters now have purchasing power in the ecosystem, with single purchase orders sometimes amounting to 5% of the server market. While OEMs still are incredibly important, they are not driving large-scale deployments or innovating and evolving nearly as fast. The result is more hyperscale design-win opportunities for component or sub-system vendors if they offer something unique or a real solution to an important problem. This also may shift profit pools away from OEMs to strong, nimble technology solution innovators. It also has the potential to reduce overall profit pools for the whole ecosystem, which is a potential threat to innovation speed and re-investment.
New players: Traditionally, a few OEMs and ISVs globally have owned most of the datacenter market. However, the supply chain of the hyperscale cloud companies has changed that. Leading datacenters have architected, specified or even built (in Google’s case) their own infrastructure, though many large cloud datacenters have been equipped with hyperscale-specific systems from Dell and HP. But more and more systems built exactly to datacenter specifications are coming from suppliers like Quanta. Newer network suppliers like Arista have increased market share. Some new hyperscale solution vendors have emerged, like Nebula. And software has shifted to open source, sometimes supported for-pay by companies copying the Redhat® Linux model – companies like Cloudera, Mirantis or United Stack. Personally, I am still waiting for the first 3rd-party hardware service emulating a Linux support and service company to appear.
Open initiatives: Yes, we’ve seen Hadoop and its derivatives deployed everywhere now – even in traditional industries like oil and gas, pharmacology, genomics, etc. And we’ve seen the emergence of open-source alternatives to traditional databases being deployed, like Casandra. But now we’re seeing new initiatives like Open Compute and OpenStack. Sure these are helpful to hyperscale datacenters, but they are also enabling smaller companies and universities to deploy hyperscale-like infrastructure and get the same kind of automated control, efficiency and cost structures that hyperscale datacenters enjoy. (Of course they don’t get fully there on any front, but it’s a lot closer). This trend has the potential to hurt OEM and ISV business models and markets and establish new entrants – even as we see Quanta, TYAN, Foxconn, Wistron and others tentatively entering the broader market through these open initiatives.
New architectures and new algorithms: There is a clear movement toward pooled resources (or rack scale architecture, or disaggregated servers). Developing pooled resource solutions has become a partnership between core IP providers like Intel and LSI with the largest hyperscale datacenter architects. Traditionally new architectures were driven by OEMs, but that is not so true anymore. We are seeing new technologies emerge to enable these rack-scale architectures (RSA) – technologies like silicon photonics, pooled storage, software-defined networks (SDN), and we will soon see pooled main memory and new nonvolatile main memories in the rack.
We are also seeing the first tries at new processor architectures about to enter the datacenter: ARM 64 for cool/cold storage and web tier and OpenPower P8 for high power processing – multithreaded, multi-issue, pooled memory processing monsters. This is exciting to watch. There is also an emerging interest in application acceleration: general-purposing computing on graphics processing units (GPGPUs), regular expression processors (regex) live stream analytics, etc. We are also seeing the first generation of graph analysis deployed at massive scale in real time.
Innovation: The pace of innovation appears to be accelerating, although maybe I’m just getting older. But the easy gains are done. On one hand, datacenters need exponentially more compute and storage, and they need to operate 10x to 1000x more quickly. On the other, memory, processor cores, disks and flash technologies are getting no faster. The only way to fill that gap is through innovation. So it’s no surprise there are lots of interesting things happening at OEMs and ISVs, chip and solution companies, as well as open source community and startups. This is what makes it such an interesting time and industry.
Consumption shifts: We are seeing a decline in laptop and personal computer shipments, a drop that naturally is reducing storage demand in those markets. Laptops are also seeing a shift to SSD from HDD. This has been good for LSI, as our footprint in laptop HDDs had been small, but our presence in laptop SSDs is very strong. Smart phones and tablets are driving more cloud content, traffic and reliance on cloud storage. We have seen a dramatic increase in large HDDs for cloud storage, a trend that seems to be picking up speed, and we believe the cloud HDD market will be very healthy and will see the emergence of new, cloud-specific HDDs that are radically different and specifically designed for cool and cold storage.
There is also an explosion of SSD and PCIe flash cards in cloud computing for databases, caches, low-latency access and virtual machine (VM) enablement. Many applications that we take for granted would not be possible without these extreme low-latency, high-capacity flash products. But very few companies can make a viable storage system from flash at an acceptable cost, opening up an opportunity for many startups to experiment with different solutions.
Summary: So I believe the biggest hyperscale innovations are autonomous behavior and orchestration, HA at the datacenter level vs. machine level, and big data. These are radically changing the whole industry. And what are those changes for our industry and ecosystem? You name it: timing, end customers, new players, open initiatives, new architectures and algorithms, innovation, and consumption patterns. All that’s staying the same are legacy products and solutions.
These were great questions. Sometimes you need the press or someone outside the industry to ask a question that makes you step back and think about what’s going on. Great questions.
Tags: Alibaba, Apple Siri, Arista, ARM 64, Baidu, big data, Casandra, CEO & CIO Magazine, China, cloud storage, Cloudera, cold storage, cool storage, datacenter, datacenter ecosystem, Dell, flash, Foxconn, Google File System, Google MapReduce, Hadoop, hard disk drive, HDD, high availability, HP, hyperscale datacenter, Intel, Internet, latency, Microsoft, Mirantis, Nebula, Open Compute, OpenPower P8, OpenStack, Quanta, rack scale, RAID, Redhat Linux, SAS, SDDC, SDN, SDS, Serial Attached SCSI, software-defined datacenter, software-defined networks, software-defined storage, solid state drive, SSD, Tencent, TYAN, United Stack, virtual machine, VM, Wistron
I was recently speaking to a customer about data reduction technology and I remembered a conversation I had with my mother when I was a teenager. She used to complain how chaotic my bedroom looked, and one time I told her “I was illustrating the second law of thermodynamics” for my physics class. I was referring to the mess and the tendency of things to evolve towards the state of maximum entropy, or randomness. I have to admit I only used that line once with my mom because it pissed her off and she likened me to an intelligent donkey.
I never expected those early lessons in theoretical physics to be useful in the real world, but as it turns out entropy can be a significant factor in determining solid state drive (SSD) performance. When an SSD employs data reduction technology, the degree of entropy or randomness in the data stream becomes inversely related to endurance and performance—the lower the data entropy, the higher the endurance and performance of the SSD.
Entropy affects data reduction
In this context I am defining entropy as the degree of randomness in data stored by an SSD. Theoretically, minimal or nonexistent entropy would be characterized by data bits of all ones or all zeros, and maximum entropy by a completely random series of ones and zeros. In practice, the entropy of what we often call real-world data falls somewhere in between these two extremes. Today we have hardware engines and software algorithms that can perform deduplication, string substitution and other advanced procedures that can reduce files to a fraction of their original size with no loss of information. The greater the predictability of data – that is, the lower the entropy – the more it can be reduced. In fact, some data can be reduced by 95% or more!
Files such as documents, presentations and email generally contain repeated data patterns with low randomness, so are readily reducible. In contrast, video files (which are usually compressed) and encrypted files (which are inherently random) are poor candidates for data reduction.
A reminder is in order not to confuse random data with random I/O. Random (and sequential) I/Os describe the way data is accessed from the storage media. The mix of random vs. sequential I/Os also influences performance, but in a different way than entropy, described in my blog “Teasing out the lies in SSD benchmarking.”
Why data reduction matters in an SSD
The NAND flash memory inside SSDs is very sensitive to the cumulative amount of data written to it. The more data written to flash, the shorter the SSD’s service life and the sooner its performance will degrade. Writing less data, therefore, means better endurance and performance. You can read more about this topic in my two blogs “Can data reduction technology substitute for TRIM” and “Write Amplification – Part 2.”
Real-world examples in client computing
Take an encrypted text document. The file started out as mostly text with some background formatting data. All things considered, the original text file is fairly simple and organized. The encryption, by design, turns the data into almost completely random gibberish with almost no predictability to the file. The original text file, then, has low entropy and the encrypted file high entropy.
Intel Labs examined entropy in the context of compressibility as background research to support its Intel SSD 520 Series. The following chart summarizes Intel’s findings for the kinds of data commonly found on client storage drives, and the amount of compression that might be achieved:
According to Intel, “75% of the file types observed can be typically compressed 60% or more.” Granted, the kind of files found on drives varies widely according to the type of user. Home systems might contain more compressed audio and video, for example – poor candidates, as we mentioned, for data reduction. But after examining hundreds of systems from a wide range of environments, LSI estimates that the entropy of typical user data averages about a 50%, suggesting that many users would see at least a moderate improvement in performance and endurance from data reduction because most data can be reduced before it is written to the SSD.
Real-world examples in the enterprise
Enterprise IT managers might be surprised at the extent to which data reduction technology can increase workload performance. While gauging the level of improvement with any precision would require data-specific benchmarking, sample data can provide useful insights. LSI examined the entropy of various data types, shown in the chart below. I found the high reducibility of the Oracle® database file very surprising because I had previously been told by database engineers that I should expect higher entropy. I later came to understand these enterprise databases are designed for speed, not capacity optimization. Therefore it is faster to store the data in its raw form rather than use a software compression application to compress and decompress the database on the fly and slow it down.
Putting it all together
The chief goals of PC and laptop users and IT managers have long been, and remain, to maximize the performance and lifespan of storage devices – SSDs and HDDs – and at a competitive price point. The challenge for SSD users is to find a device that delivers on all three fronts. LSI® SandForce® DuraWrite™ technology helps give SSD users exactly what they want. By reducing the amount of data written to flash memory, DuraWrite increases SSD endurance and performance without additional cost – even if it doesn’t help organize your teenager’s bedroom.
It’s the start of the new year, and it’s traditional to make predictions – right? But predicting the future of the datacenter has been hard lately. There have been and continue to be so many changes in flight that possibilities spin off in different directions. Fractured visions through a kaleidoscope. Changes are happening in the businesses behind datacenters, the scale, the tasks and what is possible to accomplish, the value being monetized, and the architectures and technologies to enable all of these.
A few months ago I was asked to describe the datacenter in 2020 for some product planning purposes. Dave Vellante of Wikibon & John Furrier of SiliconANGLE asked me a similar question a few weeks ago. 2020 is out there – almost 7 years. It’s not easy to look into the crystal ball that far and figure out what the world will look like then, especially when we are in the midst of those tremendous changes. For some context I had to think back 7 years – what was the datacenter like then, and how profound have the changes been over the past 7 years?
And 7 years ago, our forefathers…
It was a very different world. Facebook barely existed, and had just barely passed the “university only” membership. Google was using Velcro, Amazon didn’t have its services, cloud was a non-existent term. In fact DAS (direct attach storage) was on the decline because everyone was moving to SAN/NAS. 10GE networking was in the future (1GE was still in growth mode). Linux was not nearly as widely accepted in enterprise – Amazon was in the vanguard of making it usable at scale (with Werner Vogels saying “it’s terrible, but it’s free, as in free beer”). Servers were individual – no “PODs,” and VMware was not standard practice yet. SATA drives were nowhere in datacenters.
An enterprise disk drive topped out at around 200GB in capacity. Nobody used the term petabyte. People, including me, were just starting to think about flash in datacenters, and it was several years later that solutions became available. Big data did not even exist. Not as a term or as a technology, definitely not Hadoop or graph search. In fact, Google’s seminal paper on MapReduce had just been published, and it would become the inspiration for Hadoop – something that would take many years before Yahoo picked it up and helped make it real.
Analytics were statistical and slow, and you had to be very explicitly looking for something. Advertising on the web was a modest business. Cold storage was tape or MAID, not vast pools of cheap disks in the cloud at absurdly low price points. None of the Chinese web-cloud guys existed… In truth, at LSI we had not even started looking at or getting to know the web datacenter guys. We assumed they just bought from OEMs…
No one streamed mainstream media – TV and movies – and there were no tablets to stream them to. YouTube had just been purchased by Google. Blu-ray was just getting started and competing with HD-DVD (which I foolishly bought 7 years ago), and integrated GPS’s in your car were a high-tech growth area. The iPhone or Android had not launched, Danger’s Sidekick was the cool phone, flip phones were mainstream, there was no App store or the billions of sales associated with that, and a mobile web browser was virtually useless.
Dell, IBM, and HP were the only real server companies that mattered, and the whole industry revolved around them, as well as EMC and NetApp for storage. Cisco, Lenovo and Huawei were not server vendors. And Sun was still Sun.
7 years from now
So – 7 years from now? That’s hard to predict, so take this with a grain of salt… There are many ways things could play out, especially when global legal, privacy, energy, hazardous waste recycling, and data retention requirements come into play, not to mention random chaos and invention along the way.
Compute-centric to dataflow-centric
Major applications are changing (have changed) from compute-centric to dataflow architectures. That is big data. The result will probably be a decline in the influence of processor vendors, and the increased focus on storage, network and memory, and optimized rack-level architectures. A handful of hyperscale datacenters are leading the way, and dragging the rest of us along. These types of solutions are already being deployed in big enterprise for specialized use cases, and their adoption will only increase with time. In 7 years, the main deployment model will echo what hyperscale datacenters are doing today: disaggregated racks of compute, memory and storage resources.
The datacenter is now being viewed as a profit growth enabler, rather than a cost center. That implies more compute = more revenue. That changes the investment profile and the expectations for IT. It will not be enough for enterprise IT departments to minimize change and risk because then they would be slowing revenue growth.
Customers and vendors
We are in the early stages of a customer revolt. Whether it’s deserved or not is immaterial, though I believe it’s partially deserved. Large customers have decided (and I’m doing broad brush strokes here) that OEMs are charging them too much and adding “features” that add no value and burn power, that the service contracts are excessively expensive and that there is very poor management interoperability among OEM offerings – on purpose to maintain vendor lockin. The cost structures of public cloud platforms like Amazon are proof there is some merit to the argument. Management tools don’t scale well, and require a lot of admin intervention. ISVs are seen as no better. Sure the platforms and apps are valuable and critical, but they’re really expensive too, and in a few cases, open source solutions actually scale better (though ISVs are catching up quickly).
The result? We’re seeing a push to use whitebox solutions that are interoperable and simple. Open source solutions – both software and hardware – are gaining traction in spite of their problems. Just witness the latest Open Compute Summit and the adoption rate of Hadoop and OpenStack. In fact many large enterprises have a policy that’s pretty much – any new application needs to be written for open source platforms on scale-out infrastructure.
Those 3 OEMs are struggling. Dell, HP and IBM are selling more servers, but at a lower revenue. Or in the case of IBM – selling the business. They are trying to upsell storage systems to offset those lost margins, and they are trying to innovate and vertically integrate to compensate for the changes. In contrast we’re seeing a rapid increase planned from self-built, self-architected hyperscale datacenters, especially in China. To be fair – those pressures on price and supplier revenue are not necessarily good for our industry. As well, there are newer entrants like Huawei and Cisco taking a noticeable chunk of the market, as well as an impending growth of ISV and 3rd party full rack “shrink wrapped” systems. Everybody is joining the party.
Storage, cold storage and storage-class memory
Stepping further out on the limb, I believe (but who really knows) that by 2020 storage as we know is no longer shipping. SMB is hollowed out to the cloud – that is – why would any small business use anything but cloud services? The costs are too compelling. Cloud storage is stratified into 3 levels: storage-class memory, flash/NVM and cool/cold bulk disk storage. Cold storage is going to be a very, very important area. You need to save that data, but spend zero power, and zero $ on storing it. Just look at some of the radical ideas like Facebook’s Blu-ray jukebox to address that, which was masterminded by a guy I really like – Gio Coglitore – and I am very glad is getting some rightful attention. (http://www.wired.com/wiredenterprise/2014/02/facebook-robots/)
I believe that pooled storage class memory is inevitable and will disrupt high-performance flash storage, probably beginning in 2016. My processor architect friends and I have been daydreaming about this since 2005. That disruption’s OK, because flash use will continue to grow, even as disk use grows. There is just too much data. I’ve seen one massive vendor’s data showing average servers are adding something like 0.2 hard disks per year and 0.1 SSDs per year – and that’s for the average server including diskless nodes that are usually the most common in hyperscale datacenters. So growth in spite of disruption and capacity growth.
Data will be pooled, and connected by fabric as distributed objects or key/value pairs, with erasure coding. In fact, Object store (key/value – whatever) may have “obsoleted” block storage. And the need for these larger objects will probably also obsolete file as we’re used to it. Sure disk drives may still be block based, though key/value gives rise to all sorts of interesting opportunities to support variable size structures, obscure small fault domains, and variable encryption/compression without wasting space on disk platters. I even suspect that disk drives as we know them will be morphing into cold store specialty products that physically look entirely different and are made from different materials – for a lot of reasons. 15K drives will be history, and 10K drives may too. In fact 2” drives may not make sense anymore as the laptop drive and 15K drive disappear and performance and density are satisfied by flash.
Enterprise becomes private cloud that is very similar structurally to hyperscale, but is simply in an internal facility. And SAN/NAS products as we know them will be starting on the long end of the tail as legacy support products. Sure new network based storage models are about to emerge, but they’re different and more aligned to key/value.
Rack-scale architectures will have taken over clustered deployments. That means pooled resources. Processing will be pools of single socket SoC servers enabling massive clusters, rather than lots of 2- socket servers. These SoCs might even be mobile device SoCs at some point or at least derived from that – the economics of scale and fast cadence of consumer SoCs will make that interesting, maybe even inevitable. After all, the current Apple A7 in the iphone 5S is a dual core, 64-bit V8 ARM at 1.4GHz and the whole iPhone costs as much as mainstream server processor chips. In a few years, an 8 or 16 core equivalent at 1.5GHz or 2GHz is not hard to imagine, and the cost structure should be excellent.
Rapidly evolving open source applications will have morphed into eventually consistent dataflow tasks. Or they will be emerging in-memory applications working on vast data structures in the pooled storage class memory at the rack or larger scale, which will add tremendous monetary value to businesses. Whatever the evolutionary paths – the challenge for the next 10 years is optimizing dataflow as the amount used continues to exponentially grow. After all – data has value in aggregate, so why would you throw anything away, even as the amount we generate increases?
Clusters will be autonomous. Really autonomous. As in a new term I love: “emergent.” It’s when you can start using big data analytics to monitor the datacenter, and make workload/management and data placement decisions in real time, automatically, and the datacenter begins to take on un-predicted characteristics. Deployment will be autonomous too. Power on a pod of resources, and it just starts working. Google does that already.
Layer 2 datacenter network switches will either be disappearing or will have migrated to a radically different location in the rack hierarchy. There are many ways this can evolve. I’m not sure which one(s) will dominate, but I know it will look different. And it will have different bandwidth. 100G moving to 400G interconnect fabric over fiber.
So there you have it. Guaranteed correct…
Different applications and dataflow, different architectures, different processors, different storage, different fabrics. Probably even a re-alignment of vendors.
Predicting the future of the datacenter has not been easy. There have been, and are so many changes happening. The businesses behind them. The scale, the tasks and what is possible to accomplish, the value being monetized, and the architectures and technologies to enable all of these. But at least we have some idea what’s ahead. And it’s pretty different, and exciting.
Tags: 10 gigabit ethernet, 2020, Amazon, Apple, China, Cisco, cloud storage, cold storage, datacenter, Dell, EMC, Facebook, flash, Google, Hadoop, HP, Huawei, hyperscale datacenter, IBM, iPhone, kaleidoscope, Lenovo, NAS, NetApp, non-volatile memory, NVM, Open Compute, OpenStack, rack scale architecture, SAN, SoC, Sun, VMware, YouTube
A major reason enterprise customers see high latency and poorer than expected performance when implementing flash technology is that the flash partition is not aligned on a sector boundary that allows the flash device to access its data efficiently. When creating a Logical Volume (LVM), things can even get more complicated. Proper partition alignment is critical to performance when implementing flash in your enterprise.
An aligned partition is one that starts on a sector number that’s evenly divisible by 4k, or 8k, or a starting sector that is divisible by eight. Aligned input-out (IO) operations will start at sector 8 for 4k alignments, 16 for an 8k alignments, and so forth, with sector 2048 for 1M alignments.
If a flash partition is unaligned – its IO operations start at a sector number not divisible by eight – the device will perform two IOs over adjacent blocks instead of one. These extra IOs will degrade performance of the flash device. In our testing, we have seen up to 4x performance gains by properly aligning the flash device.
Out with the old … in with the new
There are many articles, websites, and Linux system administrators best practice documents describing how to create a logical volume (LVM) – an abstraction of a number of flash devices into a single storage volume that enables dynamic volume resizing and makes it easier to replace, re-partition and back up individual devices in Linux. However, most of these practices were developed before the advent of PCIe® flash devices. I have worked with customers who have used these old practices of creating LVMs and some of them are seeing very poor performance when implementing flash in their environments.
My conversations with customers and documents I’ve read on creating LVMs have revealed that the first step in creating a LVM – to create a physical volume (PV) – needs refinement. The reason is the PV create process can use a raw device, a partitioned device, or a mix. I would suggest getting into the habit of aligning all flash devices on a physical sector boundary so that all PVs are aligned. The PV command is typically specified as either “pvcreate /dev/sdX,” which allocates the whole device (non-partitioned) to the PV, or “pvcreate /dev/sdX1,” which uses a partition to create the PV. If the PV is created using a mix of raw devices and partitioned devices, or multiple partitioned devices, is there alignment over all the PVs? Maybe! Maybe not! That’s the problem!
Aligning for higher speed
I recommend a new approach to creating LVMs when using flash technology. My suggestion is to align each of the flash devices on a 1M boundary before creating the PV. Here are the steps to help make sure that you are using boundary-aligned devices when creating a LVM:
echo “2048,,8e” | sfdisk – uS /dev/sdX – force
Implementing flash in the enterprise is a great way to produce low latencies while providing high IOPs and throughput. By following these steps, you will successfully set up an LVM over multiple flash devices that are aligned on a proper boundary to get the best performance.
A customer recently asked me if the SF3700, our latest flash controller, supports SATA Express and fired away with a bunch of other questions about the standard. The depth of his curiosity suggested a broader need for education on the basics of the standard.
To help me with the following overview of SATA Express, I recruited Sumit Puri, Sr. Director of Strategic Marketing for the Flash Components Division at LSI (SandForce). Sumit is a longtime contributor to many storage standards bodies and has been working with SATA- IO – the group responsible for SATA Express – for many years. He has first-hand knowledge of SATA- IO’s work.
Here are his insights into some of the fundamentals of SATA Express.
What is SATA Express?
Sumit: There’s quite a bit of confusion in the industry about what SATA Express defines. In simple terms, SATA Express is a specification for a new connector type that enables the routing of both PCIe® and SATA signals. SATA Express is not a command or signaling protocol. It should really be thought of as a connector that mates with legacy SATA cables and new PCIe cables.
Why was SATA Express created?
Sumit: SATA Express was developed to help smooth the transition from the legacy SATA interface to the new PCIe interface. SATA Express gives system vendors a common connector that supports both traditional SATA and PCIe signaling and helps OEMs streamline connector inventory and reduce related costs.
What is the protocol used in SATA Express?
Sumit: One of the misconceptions about SATA Express is that it’s a protocol specification. Rather, as I mentioned, it’s a mechanical specification for a connector and the matching cabling. Protocols that support SATA Express include SATA, AHCI and NVME.
What are the form factors for SATA Express?
Sumit: SATA Express defines connectors for both a 2.5” drive and the host system. SATA Express connects the drive and system using SATA cables or the newly defined PCIe cables.
What connector configuration is used for SATA Express?
Sumit: Because SATA Express supports both SATA and PCIe signaling as well as the legacy SATA connectors, there are multiple configuration options available to motherboard and device manufacturers. The image below shows plug (a) which is built for attaching to a PCIe device. Socket (b) would be part of a cable assembly for receiving plug (a) or a standard SATA plug, and Socket (c) would mount to a backplane or motherboard fir receiving plug (a) or a standard SATA plug. The last two connectors are a mating pair designed to enable cabling (e) to connect to motherboards (d).
When will hosts begin supporting SATA Express?
Sumit: We expect systems to begin using SATA Express connectors early this year. They will primarily be deployed in desktop environments, which require cabling. In contrast, we expect limited use of SATA Express in notebook and other portable systems that are moving to cableless card-edge connector designs like the recently minted M.2 form factor. We also expect to see scant use of SATA Express in enterprise backplanes. Enterprise customers will likely transition to other connectors that support higher speed PCIe signaling like the SFF-8639, a new connector that was originally included in the SATA Express specification but has since been removed.
Will LSI support SATA Express?
Sumit: Absolutely. Our SF3700 flash controller will be fully compatible with the newly defined SATA Express connector and support either SATA or PCIe. Our current SF-2000 SATA flash controllers support SATA cabling used on SATA Express, but not PCIe.
Will LSI also support SRIS?
Sumit: PCIe devices enabled with SRIS (Separate Refclk Independent SSC) can self-clock so need no reference clock from the host, allowing system builders to use lower cost PCIe cables. SRIS is an important cost-saving feature for cabling that supports PCIe signaling. It doesn’t support card-edge connector designs. Today the SF3700 supports PCIe connectivity, and LSI will support SRIS in future releases of SF3700 and other products.
Why is it called SATA Express?
Sumit: SATA Express blends the names of the two connectors and captures the hybridization of the physical interconnects. The name reflects the ability of legacy SATA connectors to support higher PCIe data rates to simplify the transition to PCIe devices. SATA Express can pull double duty, supporting both PCIe and SATA signaling in the same motherboard socket. The same SATA Express socket accepts both traditional SATA and new PCIe cables and links to either a legacy SATA or SATA Express device connector.
How fast can SATA Express run?
Sumit: The PCIe interface defines the top SATA Express speed. A PCIe Gen2 x2 device supports up to 900 MB/s of throughput, a PCIe Gen3 x2 device up to 1800 MB/s of throughput – both significantly higher than 550 Mb/s speed ceiling of today’s SATA devices.
Is SATA Express similar to M.2?
Sumit: There are two key similarities. Both support SATA and PCIe on the same host connector, and both are designed to help transition from SATA to PCIe over time.
SATA Express delivers the future of connector speeds today
SATA Express was born of the stuff of all great inventions. Necessity. The challenge SATA-IO faced in doubling SATA 6 Gb/s speeds was herculean. The undertaking would have been too time-consuming to support the next-generation connection speeds that PCIe answers. It would have been too involved, requiring an overhaul of the SATA standard. Even in the brightest scenario, the effort would have produced a power guzzler at a time when greater power efficiency is a must for system builders. SATA-IO found a better path, an elegant bridge to PCIe speeds in the form of SATA Express.
Solid state drive (SSD) makers have introduced many new layout form factors that are not possible with hard disk drives (HDDs). My blog Size matters: Everything you need to know about SSD form factors talks about the many current SSD form factors, but I gave the new M.2 form factor only a glimpse. The specification and its history merit a deeper look.
A few years ago the PCI Special Interest Group (PCI-SIG), teaming with The Serial ATA International Organization (SATA-IO), started to develop a new form factor standard to replace Mini-PCIe and mSATA since specifications from both of these organizations are required to build SATA M.2 SSDs. The new layout and connector would be used for applications including WiFi, WWAN, USB, PCIe and SATA, with SSD implementations using either PCIe or SATA interfaces. The groups set out to create a narrower connector that supports higher data rates, a lower profile and boards of varying lengths to accommodate various very small notebook computers.
This new form factor also aimed to support micro servers and similar high-density systems by enabling the deployment of dozens of M.2 boards. Unique notches in the edge connector known as “keys” would be used to differentiate the wide array of products using the M.2 connector and prevent the insertion of incompatible products into the wrong socket.
The name change
Initially the M.2 form factor was called Next Generation Form Factor, or NGFF for short. NGFF was designed to follow the dimensional specifications of M.2, a different specification from NGFF, which at that time was being defined by the PCI SIG. Soon after NGFF was announced, confusion between the identical form factors reigned, prompting the name change of NGFF to M.2. Many people in the industry have been slow to adopt the new M.2 name and you often see articles that describe these solutions as M.2, formally known as NGFF.
In the world of connectors or sockets, a “key” prevents the insertion of a connector into an incompatible socket to ensure the proper mating of connectors and sockets. The M.2 specification has defined 11 key configurations, seven for use sometime in the future. A socket can only have one key, but the plug-in cards can have keyways cut for multiple keys if they support those socket types. Of the four defined keys available for current use, two support SSDs. Key ID B (pins 12-19) gives PCIe SSDs up to two lanes of connectivity and key ID M (pins 59-66) provides PCIe SSDs with up to four lanes of connectivity. Both can accommodate SATA devices. All of the key patterns are uniquely configured so that the card cannot be flipped over and inserted incorrectly.
Unfortunately these keys alone do not tell the user enough about an SSD to help in the selection of replacement or upgrade drives. For example, a computer with an M.2 socket for PCIe x2 support features a B key so that no M.2 boards with PCIe x4 requirements (M key) can fit. However, even though a SATA M.2 card with a B key can fit in the same system, the host will not recognize SATA signals from the motherboard’s PCIe socket. With this signal incompatibility, users need to carefully read other socket specifications either printed on the motherboard or included in the system configuration information to see if the socket is PCIe or SATA.
The profile and lengths
Pin spacing on the M.2 card connector is higher in density than prior connector specifications, enabling a narrower board and thinner, lighter mobile computing systems that are smaller and weigh less. What’s more, the M.2 specification defines a module with components populating only one side of the board, leaving enough space between the main system board and the module for other components. The number of flash chips used by SSDs varies with storage capacity. The less the storage capacity requirement of an SSD, the shorter the module can be used, leaving system manufacturers more space for other components.
It’s all in the name
When I hear people call this specification by the name M.2 formally known as NGFF, I cannot help but think about the time when the rock artist Prince changed his name to an unpronounceable symbol and everyone was stuck calling him The Artist Formerly Known as Prince. In his case I believe he was going for the publicity of the confusion.
As for the renaming of NGFF to M.2, I really don’t think that was the goal. In fact I believe it was intended to simplify brand identity by eliminating a second name for the same specification. No matter what we call this new form factor, it appears destined to thrive in both the mobile computing and high-density server markets.
The term ”form factor” is used in the computer industry to describe the shape and size of computer components, like drives, motherboards and power supplies. When hard disk drives (HDDs) initially made their way into microprocessor-based computers, they used magnetic platters up to 8 inches in diameter. Because that was the largest single component inside the HDD, it defined the minimum width of the HDD housing—the metal box around the guts of the drive.
The height was dictated by the number of platters stacked on the motor (about 14 for the largest configurations). Over time the standard size of the magnetic patter diameter shrank, which allowed the HDD width to decrease as well. The computer industry used the platter diameter dimensions to describe the HDD form factors, and those contours shrank over the years. Those 8” HDDs for datacenter storage and desktop PCs shed size to 5” to today’s 3.5”, and laptop HDDs, starting at 2.5”, are now as small as 1.8”.
What defines an SSD form factor?
When solid state drives (SSDs) first started replacing HDDs, they had to fit into computer chassis or laptop drive bays (mounting location) built for HDDs, so they had to conform to HDD dimensions. The two SSDs shown below are form factor identical twins—without the outer casing—to 1.8” and 2.5” HDDs. The SSDs also use standard SATA connectors, but note that the SATA connector for 1.8” devices is narrower than the 2.5” devices to accommodate the smaller width.
However, there’s no requirement for the SSD to match the shape of a typical HDD form factor. In fact some of the early SSDs slid into the high-speed PCIe slots inside the computer chassis, not into the drive bays. A PCIe® SSD card solution resembles an add-in graphic card and installs the same way in the PCIe slot since the physical interface is PCIe.
The largest component of an SSD is a flash memory chip so, depending on how many flash chips are used, manufacturers have virtually limitless options in defining dimensions. JEDEC (Joint Electronic Device Engineering Council) defines technical standards for the electronic industry including SSD form factors. JEDEC defined the MO-297 standard, which establishes parameters for the layout, connector locations and dimensions of 54mm x 39mm Serial ATA (SATA) SSDs, so they can use the same connector as standard 2.5” HDDs, but fit into a much smaller space.
The most important element of an SSD form factor is the interface connector, the conduit to the host computer. In the early days of SSDs, that connector was typically the same SATA connector used with HDDs. But over time the width of some SSDs became smaller than the SATA connector itself, driving the need for new connectors.
Card edge connectors – the part of a computer board that plugs into a computer – emerged to enable smaller designs and to further reduce manufacturing and component costs by requiring the installation of only a single female socket on the host as a receptor for the edge of the SSD’s printed circuit board. (The original 2.5” and 1.8” SSD SATA connector required both a male and female plastic connector to mate the SSD to the computer).
With standardization of these connectors critical to ensuring interoperability among different manufacturers, a few organizations have defined standards for these new connectors. JEDEC defined the MO-300 (50.8mm x 29.85mm), which uses a mini-SATA (mSATA) connector, the same physical connector as mini PCI Express, although the two are not electrically compatible. SSD manufacturers have used that same mSATA edge connector and board width, but customized the length to accommodate more flash chips for higher capacity SSDs.
In 2012 a new, even smaller form factor was introduced as Next Generation Form Factor (NGFF), but was later renamed to M.2. The M.2 standard defines a long list of optional board sizes, and the connector supports both SATA and PCIe electrical interfaces. The keyways or notches on the connector can help determine the interface and number of PCIe lanes possible to the board. However, that gets into more details than we have space to cover here, so we will save that for a future blog.
Apple® MacBook Air® and some MacBook Pro systems use an SSD with a connector and dimensions that closely resemble those of the M.2 form factor. In fact Apple MacBook systems have used a number of different connectors and interfaces for its SSD over the years. Apple used a custom connector with SATA signals from 2010 through 2012 and in 2013 switched to a custom connector with PCIe signals.
In some cases, standard SSD form factor configurations are not an option, so SSD manufacturers have taken it upon themselves to create custom board and interface configurations that meet those less typical needs.
And finally there’s the ubiquitous USB-based connection. While USB flash drives have been around for nearly a decade, many people do not realize the performance of these devices can vary by 10 to 20 times. Typically a USB flash drive is used to make data portable—replacing the old floppy disk. In those cases the speed of the device is not critical since it is used infrequently.
Now with the high speed USB 3 interface, a SATA-to-USB 3 bridge chip, and a high performance flash controller like the LSI® SandForce® controller, these external devices can operate as a primary system SSDs, performing as fast as a standard SSD inside the system. The primary advantages of these SSDs are removability and transportability while providing high-speed operation.
If there’s one constant in life, it’s demand for ever smaller storage form factors that prompt changes in circuit layout, connector position and, of course, dimensions. New connectors proposed for future generations of storage devices like the SFF-8639 specification will enable multiple interfaces and data path channels on the same connector. While the SFF-8639 does not technically define the device to which it connects, the connector itself is rather large, so the form factor of the SSD will need to be big enough to hold the connector. That’s why the primary SFF-8639 market is datacenters that use back-plane connectors and racks of storage devices. A similar connector – like SFF-8639, very large and built to support multiple data paths – is the SATA Express connector. I will save the details of that connector for an upcoming blog.
The sky’s the limit for SSD shapes and sizes. Without a spinning platter inside a box, designers can let their imaginations run wild. Creative people in the industry will continue to find new applications for SSDs that were previously restricted by the internal components of HDDs. That creativity and flexibility will take on growing importance as we continue to press datacenters and consumer electronics to do more with less, reminding us that size does in fact matter.
In the spirit of Christmas and the holidays, I thought it would be appropriate for a slight change from the usual blog entry today. In this case you need to think about the classic song “The Twelve Days of Christmas.” I promise not to dive into the origins of the song and the meaning behind it, but I will provide an alternate version of the words to think about when you are with your family singing Christmas carols.
Rather than write out all the 12 separate choruses, I decided to start with the final chorus on the 12th day.
Sung to the tune of “The 12 Days of Christmas”
On the 12th day of Christmas my true love gave to me
Twelve second boot times,
Eleven data reduction benefits,
Ten nm-class flash,
Nine flash channels,
Eight flash chips,
Seven percent over provisioning,
Six gigabits per second,
Five golden edge connectors,
Four PCIe lanes,
Third generation controller,
Two interfaces in one package,
And a SandForce Driven SSD.
Maybe if you were good this year, Santa will bring you a SandForce Driven SSD to revitalize your computer too! Merry Christmas and happy holidays to all!
Most users have no idea that reading electronic information from a data storage medium like a hard disk drive (HDD) or solid state drive (SSD) is plagued with read errors. For this reason error correction codes (ECC) are used to fix the random bit errors that arise during the reading process before the incorrect data is returned to the user. But the error correction codes can only handle so many errors at one time. If data errors exceed the ECC limits, the data goes uncorrected and is lost forever. More recent ECC algorithms like the LSI SHIELD error correction technology go a lot farther to protect user data than prior solutions.
What happens to the data when the ECC fails?
If the ECC fails, only a backup protection mechanism will recover the data. There are three alternatives. First, users should always back up their critical data since ECC failure and other threats can destroy data or render it inaccessible such as natural disasters (earthquakes, tornadoes, flooding etc.) that cause heavy damage to buildings and their contents, lightning overloading that can burn up a computer without adequate electrical protection, and of course computer theft. Any backup system should be either automated or at least run consistently if it is manual. Industry reports cite that less than 10% of computer users back up their data. That is not very comforting.
The second solution is to employ a RAID (Redundant Array of Independent Disks) array that uses multiple storage devices with one or more of the drives acting as a parity device to provide redundancy. That way if one drive fails, the redundant drive provides enough parity information to restore the original data. This type of system is very common in enterprise environments—a work computer—but hardly used in home systems or laptop PC.
Is the third solution simple, automatic, and operable in a single-drive environment?
Yes. Yes. And Yes. LSI® SandForce® flash and SSD controllers have a feature called RAISE™ data protection that meets all of these needs. Introduced in 2009 with the first SandForce controller, RAISE technology stands for Redundant Array of Independent Silicon Elements. It sounds like RAID, and acts something like RAID, but protects data using a single drive. With RAISE technology, the individual flash die act like the drives in a RAID array. The original RAISE level 1 technology protects against single page and block failures in the flash. These types of failures are beyond the protection of the ECC, but RAISE technology can recover the data.
With the introduction of the SF3700 this month, RAISE technology now offers more flexibility to deliver greater data protection. With the original RAISE level 1, the space of a full flash die had to be allocated solely to protect user data. In small-capacity configurations, like 64GB, RAISE level 1 required too much over provisioning and therefore had to be disabled or, with RAISE left on, its available capacity reduced to 60GB or 55GB. With a new enhancement to the SF3700, no such tradeoff is necessary. The new Fractional RAISE option for this first level of protection uses only a small portion of a die to protect user data in even the smallest configurations and preserve over provisioning (OP). This is important because, as I explained in my blog titled Gassing up your SSD, the more space you allocate for OP, the lower the write amplification, which translates to higher performance during writes and longer endurance of the flash memory.
Stronger data protection with RAISE level 2
A new RAISE level 2 capability offers even stronger data protection, safeguarding against multiple, simultaneous page and block failures, as well as a full die failure. If a die fails, the SandForce controller recovers the user data. RAISE level 2 includes Auto-Reallocation that can be set up to automatically redistribute and protect user data in the event of a subsequent die failure. Because the option to protect against a second die failure would reduce the available OP area, the RAISE level 2 feature can be set up to simply drop back to RAISE level 1 protection without sacrificing any OP space. .
Another new capability is an additional (9th) flash channel that enables the manufacturer to populate an extra flash package with one die that enables full RAISE level 1 protection while maintaining maximum user data capacity such as 64GB, 128GB, 256GB, etc. Without the 9th channel option, the SSD capacity would be forced to sacrifice a few GBs of capacity (reducing available user capacity to 60GB, 120GB, 240GB, respectively) because RAISE requires extra storage space.
Although all these new features cannot protect against the would-be thief or catastrophic drive failures from electrical surges or natural disasters, the probability of those events is much lower than a simple ECC failure. That’s why you would be best suited to have an SSD with RAISE technology to automatically protect against the more common ECC failures and then make a backup copy of your system at least periodically to protect your data against those far more serious events.
Last week at LSI’s annual Accelerating Innovation Summit (AIS) the company took the wraps off a vision that should lead its technical direction for the next few years.
In his keynote, LSI CEO Abhi Talwalkar shared a video of three situations as they might evolve in the future:
I’ll focus on just one of these to show how LSI expects the future to develop. In the bicycle accident scenario, a businessman falls to the ground while riding a bicycle in a foreign country. Security cameras that have been upgraded to understand what they see notify an emergency services agency which sends an ambulance to the scene. The paramedic performs a retinal scan on the victim, using it to retrieve his medical records, including his DNA sequence, from the web.
The businessman’s wearable body monitoring system also communicates with the paramedic’s instruments to share his vital signs. All of this information is used by cloud-based computers to determine a course of action which, in the video, requires an injection that has been custom-tuned to the victim’s current situation, his medical history, and his genetic makeup.
That’s a pretty tall order, and it will require several advances in the state of the art, but LSI is using this and other scenarios to work with its clients and translate this vision into the products of the future.
What are the key requirements to make this happen? Talwalkar told the audience that we need to create a society that is supported by preventive, predictive and assisted analytics to move in a direction where the general welfare is assisted by all that the Internet and advanced computing have to offer. Since data is growing at an exponential rate, he argued that this will require the instant retrieval of interlinked data objects at scale. Everything that is key to solving the task must be immediately available, and must be quickly analyzed to provide a solution to the problem at hand. The key will be the ability to process interlinked pieces of data that have not been previously structured to handle any particular situation.
To achieve this we will need larger-scale computing resources than are currently available, all closely interconnected, that all operate at very high speeds. LSI hopes to tap into these needs through its strengths in networking and communications chips for the communications, its HDD and server and storage connectivity array chips and boards for large-scale data, and its flash controller memory and PCIe SSD expertise for high performance.
LSI brought to AIS several of the customers and partners it is working with using to develop these technologies. Speakers from Intel, Microsoft, IBM, Toshiba, Ericsson and others showed how they are working with LSI’s various technologies to improve the performance of their own systems. On the exhibition floor booths from LSI and many of its clients demonstrated new technologies that performed everything from high-speed stock market analysis to fast flash management.
It’s pretty exciting to see a company that has a clear vision of its future and is committed to moving its entire ecosystem ahead to make that happen and help companies manage their business more effectively during what LSI calls the “Datacentric Era.” LSI has certainly put a lot of effort into creating a vision and determining where its talents can be brought to bear to improve our lives in the future.
Tags: Abhi Talkwalkar, AIS, chips, communications, connectivity, data, Datacentric Era, Ericsson, flash, flash memory, hard disk drive, HDD, IBM, Intel, large-scale data, Microsoft, Networking, server, Storage, Toshiba