The problem with multicore processors isn’t that they have a lot of cores. I hope my IC designer colleagues don’t jump me when I say that having more than one core on a chip is a simple matter of cut and paste. The tricky part is getting all those cores to work together – a coordinated, efficient effort is key. After all, if it were enough for the cores to work independently, we would just use multiple single-core processors. To be sure, the devil is in the details of connecting cores and managing how they share resources.
A key value of a multicore processor is using the processing muscle of additional cores – all working on a problem at the same time – to accelerate system performance. Basically, two heads are better than one. And 16 are even better. That is if they don’t get in each other’s way. When multiple cores are working on one job, they need to deftly hand off information to each other and to other on-chip resources like memory and I/O. Managing and streamlining the movement of all that information to minimize delays can require complex traffic management. If one core or another resource becomes a bottleneck, the entire performance benefit of multiple cores can be lost.
The challenge of cache coherence
Another complexity of coordinating multiple cores is cache coherence – the process of ensuring the consistency of data stored in each processor’s cache memory. Processors store frequently accessed information in this small, fast memory so they don’t have to access it again and again from slower storage such as main memory or disks. For example, if a core is running an application for ordering products online, it might load the inventory record for a particular product from disk into cache, modify it, and then write it back to disk when the transaction is complete.
The rub arises when more than one core caches the same data. If two cores were running the online ordering application, they might both cache the same inventory record. Both cores might then execute a transaction to sell the last unit of that product and not detect that the product is sold out. In a system with coherent cache, when one core makes any changes to cached data, all other cores storing the same data are notified that their cache is outdated, prompting an update for consistency. Tracking all cached data and making sure it is coherent is a formidable effort requiring highly sophisticated cache management.
A third challenge in getting multicore design right is choosing the number and type of cores. Networking system workloads consist of varying tasks. Some are large complex tasks that require powerful general-purpose cores running complex programs. Others are very simple, quick tasks that are executed millions of times a second and are best handled by specialized compute engines. And of course there are tasks that fall between these extremes. Getting the right number and mix of compute engines requires detailed understanding of the applications the multicore processor will be used in. Too many cores and the processor consumes too much power. Too few of one type of core and the others sit idle wasting cost and, again, power.
Striking the right balance of interconnect, cache coherence and cores
The problem with multicore processors is getting the right combination of interconnect, cache coherence and number and type of cores. LSI’s latest solution to the multicore challenge for enterprise networking is the Axxia® 4500 family of processors. For general-purpose processing, the Axxia 4500 features up to 4 ARM® Cortex™ A15 cores that deliver high performance and power efficiency in a standard Linux programming environment. For special-purpose packet processing, the new chips offer up to 50Gb/s packet processing and acceleration engines for security encryption, deep packet inspection, traffic management and other networking functions. Connecting all these compute resources is the ARM Corelink CCN504 interconnect with integrated cache coherence and quality of service technologies for efficient on-chip communications.