At its GTC event in April 2021, NVIDIA announced Grace, the company’s first CPU targeted at data center operations. This announcement represents a major challenge to Intel’s leadership in the data center CPU business.
At its GTC event in April 2021, NVIDIA announced Grace, the company’s first CPU targeted at data center operations. This announcement represents a major challenge to Intel’s leadership in the data center CPU business as artificial intelligence (AI)-oriented chipsets continue to reshape the data center market.
NVIDIA targets Intel’s data center CPU stronghold
Intel has been under siege in the data center chip market with the rising prominence of NVIDIA’s GPUs in AI applications eroding the company’s hegemony. However, Intel’s predicament now has grown even more precarious with news that NVIDIA is opening a new front in the data center chip war.
At its annual GTC event last month, NVIDIA announced its first data center CPU: the Grace Arm-based CPU. Grace is designed for computer applications using massive amounts of data—specifically AI. Promising to exceed the performance of today’s state-of-the-art servers by a factor of 10, Grace is aimed squarely at Intel’s highly profitable business selling high-end Xeon CPUs. For Intel, this represents the most direct threat yet to emerge from the AI revolution, posing an even more fundamental menace than the rise of graphics processing units (GPUs).
Intel’s Xeons once were the largely uncontested kings of the data center, with the company’s X86-type chips lying at the heart of the most powerful servers in data center operations in the world.
With data-center workloads increasingly shifting into the AI realm, AI is becoming infused in every data center application, including the cloud, according to the Cloud and Data Center Research Practice of Omdia. As a result, GPUs have become the new capital chip of the data center, acting as coprocessors for x86 servers to deliver a level of AI performance not possible with CPUs.
In contrast, Intel’s Xeon CPUs are increasingly dedicated to general-purpose application workloads. Consequently, available money and developer mindshare in the data center AI market has shifted away from Intel CPUs and toward NVIDIA GPUs.
Intel is certainly aware of the risks involved in the shift of data center workloads toward AI. The company has taken steps to enhance its microprocessor lines with features designed to accelerate deep learning. For example, the company’s Xeon CPUs include a feature called Deep Learning Boost, which accelerates inference tasks that use a vector neural network instruction.
However, the expected arrival of NVIDIA’s Grace represents a direct assault on Intel’s Xeon business itself, with potential ramifications for Intel’s most profitable CPU products.
NVIDIA adopts a third processor architecture
NVIDIA views Grace as a core product for the future, sitting alongside its other major data center AI-oriented chip lines: the GPU and the data processing unit (DPU) used for managing and processing data.
Used in combination, the three devices—the GPU, DPU, and Grace—represent a new computer architecture designed to handle the demands of today’s AI-oriented data centers. NVIDIA CEO Jensen Huang at his GTC event keynote described the data center as “the new unit of computing.” As CPU virtualization tends to drive the integration of the entire data center, so Omdia expects GPU virtualization, a major theme at GTC whether in the sense of sharing GPUs between multiple workloads or splitting workloads between multiple GPUs, to do the same for GPU-accelerated workloads. Thus, the release of Grace will help NVIDIA attain its vision of transforming the data center into a system that acts like a single, massive computer, a concept originally formulated as long ago as 2004 by Google Senior Vice President Urs Hölzle in the classic paper The Data Center as a Computer. “This Arm-based CPU gives us the third foundational technology for computing and the ability to rearchitect every aspect of the data center for AI,” Huang said. “Our datacenter roadmap is now a rhythm consisting of three chips: CPU, GPU, and DPU.”
Huang noted that new versions of each chip with be released in two-year intervals, with a likely “kicker” in between. According to a roadmap shown by Huang, 2022 will bring the release of new versions of the company’s Ampere GPU and Bluefield DPU, while the release of the first Grace CPU is expected in 2023. In 2024, attention will return to GPUs and DPUs with the release of the follow-on generations of Ampere and Bluefield. In 2025, NVIDIA expects to introduce the next generation of the Grace CPU.
The Grace CPU is designed to boost server performance partly through an improved memory communications link, NVIDIA’s NVLink. Along with a revised server design using multiple CPUs and multiple links to memory, NVLink can accelerate memory performance and increase the amount of memory available to GPUs. The processor also supports the fifth generation of low-power double-data-rate (LPDDR5) memory, a faster variety of DRAM than the fourth generation of double-data-rate (DDR4) commonly employed in today’s data center servers.
One secret of Grace’s performance is the CPU’s distinct architecture compared to typical PC servers. Grace employs a design similar to a modified version of the Harvard architecture. The Harvard architecture uses separate storage and signal pathways for transferring instructions and data to achieve higher performance. In line with this approach, Grace connects to memory-linked GPUs and DRAM via separate pathways, using NVLink cross-connects between the combined storage-processing nodes. This can be read, at the level of the whole system, as an architecture that sends instructions via the cross-connect mesh and then moves data from storage to the CPU and GPU on the much shorter local path.
This approach differs from the architecture used in conventional servers, where instructions and data share the same pathway. The aim is to bring processing close to the data, rather than bringing data to the processor, an aim demonstrated both in Grace and in the Bluefield DPU’s integration of ARM CPU cores into network interface cards. NVIDIA’s chief architect for CUDA, Stephen Jones, gave a talk during GTC in which he argued that the main challenge in processing parallel workloads such as deep learning was overcoming the input/output (I/O) bottleneck and maximizing both the use of the GPU and the work done per clock cycle.
Uneasy lies the head that wears a crown
Grace is targeted at competing with X86 CPUs, with Huang touting the superiority of the chip’s Arm architecture compared to other microprocessor architectures in areas including energy efficiency.
With Arm’s superior energy efficiency and greater bang for the buck, server vendors are likely to be receptive to the idea of using the architecture in place of X86. Because of their advantages, Arm devices are already making extensive inroads into territory traditionally occupied by Intel’s X86 processors. Apple now has largely completed the transition of its MacBook line from Intel CPUs to its Arm-based M1 system on chip (SoC). Microsoft also has reportedly commissioned the development of custom ARM SoCs from Qualcomm for its Surface PCs.
More importantly in the data center context, Amazon Web Services is progressively moving more of its workloads onto its in-house Arm-based Graviton 2 CPUs. Driven by the demands of AI, data center power consumption is soaring, with these facilities expected to account for 15% of global electricity output by 2025, up from 2% in 2020, according to Applied Materials. Data center operators are seeking lower-power alternatives to Intel’s energy-hungry CPUs. It is worth noting, though, that AWS’s decision was explicitly driven by performance rather than energy saving.
For Intel, this represents a major competitive threat. Intel remains the top player in the global markets for chips, microprocessors, and datacenter CPUs. Despite briefly losing the top position in the global chip market in 2017 and 2018, Intel has since reestablished a tight grip on global semiconductor leadership.
In 2019, Intel held a 16.5% share of the global semiconductor business—its highest portion in the last 20 years, according to the Omdia Competitive Landscaping Tool. The company’s 16.1% share of global semiconductor revenue in 2020 is well above the company’s 14.4% average market share from 2001 through 2020, as presented in the figure below.
However, this dominant position is at risk as AI continues to reshape the semiconductor and data center markets where Intel derives so much revenue. NVIDIA believes that AI represents a fundamental shift in these businesses that will allow it to reshuffle the status quo and erode Intel’s leadership position.
The battle over AI chipsets is shaping the fate of the semiconductor industry—and so far, much of this battle is being fought in territories now occupied by Intel. For Intel, which built its data center chip leadership during a previous era of computing, this battle presents more downside risk than upside opportunity.
However, with Grace not expected to ship until 2023, Intel still has time to get its AI house in order. The company now is developing its next-generation Xeon CPU, dubbed Sapphire Rapids, which is expected to further boost AI performance beyond Intel’s current third-generation Xeon processors. Intel may also have time to boost the performance of its fledgling Xe GPU line to make it more competitive in the AI realm.
Intel’s annual share of global semiconductor market revenue
Jonathan Cassell, Principal Analyst, Advanced Computing
Alexander Harrowell, Senior Analyst, Advanced Computing, AI and IoT