With the annual Hot Chips Conference taking place this week, many of the industry’s largest chip design companies are at the show, talking about their latest and/or upcoming products. For Intel, it’s about the latter, as the company is at the Hot Chips conference to talk about its next generation of Xeon processors, Granite Rapids and Sierra Forest, which are scheduled to be released in 2024. Intel has previously revealed these processors on its data core roadmap – done Most recently updated in March this year – and for Hot Chips, the company is providing more technical details of the chips and their common platform.
Despite not having a “junk” generation of Intel Xeon processors, Granite Rapids and Sierra Forest promise to be one of Intel’s most significant updates to the Xeon Scalable hardware platform yet, thanks to the introduction of space-saving e-cores. It’s already been a mainstay of Intel consumer processors since 12y The Core (Alder Lake) generation, with the next generation of the Xeon Scalable platform will finally bring e-cores to the Intel server platform. Although unlike the consumer segments where both cores are shuffled into a single chip, Intel is going for a completely homogeneous strategy, giving us all P-core Granite Rapids processors, and all Sierra Forest E-core processors.
As Intel’s first ever scalable E-core Xeon chip for use in data centers, Sierra Forest is arguably the more important of the two chips. Fittingly, it is Intel’s flagship vehicle for the EUV-based Intel Processing Node 3, and is the first Xeon to be introduced. According to the company, it is still on its way to release H1’2024. Meanwhile, Granite Rapids will be “soon” behind that, on the same Intel 3 processing node.
Since Intel is scheduled to introduce two different Xeon processors in one generation, one of the big elements of the next generation of the Xeon Scalable platform is that both processors will share the same platform. That means same socket(s), same memory, same chip-based design philosophy, same firmware, etc. And while there are still differences, especially when it comes to AVX-512 support, Intel tries to make these chips as interchangeable as possible.
As announced by Intel in 2022, both the Granite and Sierra are chip-based designs, which rely on a combination of compute chips and I/O chips brought together using Intel’s active EMIB bridge technology. While this isn’t Intel’s first dance with small chips in the Xeon space (the XCC Sapphire Rapids takes the honour), this is a distinct evolution of the small chip design using distinct compute/IO chips rather than bundling together “full” Xeon chips. Among other things, this means that Granite and Sierra can share a common I/O chip (based on the Intel 7 process), and from a manufacturing standpoint, whether the Xeon is Granite or Sierra is “just” a matter of processor type. The account chip has been placed down.
It’s worth noting here that Intel confirms for the first time that its next-generation Xeon Scalable platform has autonomous capabilities, making it a true SoC. With Intel putting all the necessary I/O features required for operation inside I/O chips, no external chipset (or FPGA) is required to run these processors. This brings Intel’s Xeon lineup closer in functionality to AMD’s EPYC lineup, which has been similarly self-powered for a while now.
In all, the next-generation Xeon Scalable platform will support up to 12 memory channels, while increasing the number and capabilities of existing compute blocks. As previously revealed by Intel, this platform will be the first to support the new MCR (MCR) DIMM module, which essentially groups two/orders of memory chips in order to double the effective bandwidth to and from the DIMM. With the combination of higher memory bus speeds and more memory channels overall, Intel says the platform can deliver up to 2.8 times more bandwidth than current Sapphire Rapids Xeons.
As for I/O, the maximum configuration of Xeon will be able to offer up to 136 global I/O lanes, plus up to 6 UPI links (144 lanes in total) for multi-socket communication. For I/O, the platform supports PCIe 5.0 (why not PCIe 6.0? We’re told the timing didn’t work), as well as the newer CXL 2.0 standard. As is traditional with Intel’s large core Xeons, the Granite Rapids chipset will be able to scale up to 8 sockets in total. On the other hand, Sierra Forest will only be able to scale with two sockets, given the number of CPU cores in play as well as the different use cases that Intel expects from its customers.
Along with cross-platform details, Intel is also providing for the first time a high-level overview of the architectures used in the electronic cores and P-cores. As has been the case for many generations of Xeons processors now, Intel takes advantage of the same CPU core architecture that goes into the consumer parts. Her own. So Granite and Sierra can be considered a deconstructed Meteor Lake processor, with Granite getting Redwood Cove P cores, and Sierra getting Crestmont E-Cores.
As mentioned earlier, this is Intel’s first step in introducing electronic cores to the Xeon market. For Intel, this means tuning its core design for data center workloads, rather than the consumer-focused workloads that defined previous generation core design.
While not going too deep into the architecture itself, Intel reveals that Crestmont offers a 6-wide instruction decoding path as well as an 8-wide retirement backend. While not as powerful as Intel’s P-cores, the E-core is by no means It’s a lightweight core, and Intel’s design decisions reflect that. However, it is designed to be much more efficient in terms of die area and energy consumption than the P-cores that would go into granite.
The L1 instruction cache (I-cache) size for Crestmont will be 64 KB, the same size as that of Gracemont. Meanwhile, new to the E-core lineup with Crestmont, the cores can be grouped in either 2 or 4 core groups, unlike today’s Gracemont, which is only available as a 4-core group. This is the primary way that Intel will adjust the ratio of L2 cache to CPU cores; With 4MB of shared L2 regardless of configuration, a dual-core cluster per core delivers twice as much L2 per core as you can get. This essentially gives Intel another handle on tuning chip performance; Customers who need a slightly higher-performance Sierra design (rather than simply maxing out the number of CPU cores) can get by with fewer cores with the higher performance effectively coming from the larger L2 cache.
Finally for Sierra/Crestmont, the chip will provide as close to instruction parity as possible with Granite Rapids. This means support for the BF16 data type, as well as support for various instruction sets such as AVX-IFMA and AVX-DOT-PROD-INT8. The only thing you won’t find here, besides the AMX Matrix Engine, is support for AVX-512; The Intel Wide Vector Format is not part of the Crestmont feature set. Ultimately, AVX10 will help solve this problem, but for now this is as close as Intel can get to parity between the two processors.
Meanwhile, for Granite Rapids we have a Redwood Cove P core. The traditional core of a Xeon processor, Redwood/Granite isn’t a big change for Intel like Sierra Forest. But this does not mean that they sit idly by.
In terms of the microarchitecture, Redwood Cove gets the same 64KB I cache as we saw on the Crestmont, which unlike the E-core, is twice as large as its predecessor. It’s rare for Intel to touch cache capacity (due to balancing hit rates with latency), so this is a notable change and it will be interesting to see the ramifications once Intel talks more about the architecture.
But most notably here, Intel has managed to reduce the latency for floating-point multiplication, bringing it down from 4/5 cycles to just 3 cycles. Basic instruction latency improvements like this are rare, so they’re always welcome to see.
Other than that, the remaining highlights of the Redwood Cove microarchitecture are branch prediction and prefetching, which are typical optimization goals for Intel. Anything they can do to improve branch prediction (and reduce the cost of rare errors) tends to pay relatively large dividends in terms of performance.
More applicable to the Xeon family in particular, the AMX matrix engine for Redwood Cove gains FP16 support. The FP16 is not used as heavily as the already supported BF16 and INT8, but it is an improvement on AMX’s overall flexibility.
Memory encryption support is also improved. The Redwood Cove flavor of Granite Rapids will support 2048 256-bit memory keys, compared to 128 keys in Sapphire Rapids. Cache Allocation Technology (CAT) and Code and Data Prioritization Function (CDP) also get some improvements here, with Intel expanding it to be able to control what goes into the L2 cache, instead of just the LLC/L3 cache in Ex apps.
Ultimately, it goes without saying that Intel thinks it is well positioned for 2024 and beyond with the upcoming Xeons. By improving performance on high-end P-core Xeon processors, while offering E-core Xeons for customers who just need a lot lighter CPU cores, Intel believes they can address the entire market with two core CPU types that share in one common platform.
While it’s still too early to talk about individual SKUs for Granite Rapids and Sierra Forest, Intel has told us that overall core numbers are going up. The Granite Rapids parts will offer more CPU cores than the Sapphire Rapids (up from 60 for the SPR XCC), and of course, at 144 cores, Sierra will offer even more. However, it’s worth noting that Intel won’t split the CPU lines by core count — Sierra Forest will be available in smaller core counts, too (unlike AMD’s EPYC Zen4c Bergamo chipset). This reflects the different performance capabilities of the P and E cores, and Intel is no doubt looking to fully embrace the scalability that comes from using small chips.
And while Sierra Forest will indeed reach 144 CPU cores, Intel also made an interesting comment at our pre-conference that it could have gone up with the core count of the first E-core Xeon Scalable processor. But the company decided to prioritize performance per core a bit more, which led to chips and cores coming out next year.
Above all – and perhaps letting marketing take the lead a little longer here for Hot Chips – Intel is stressing the fact that its next-generation Xeon processors remain on track for a 2024 launch. Needless to say, Intel is now recovering from a significant delay In Sapphire Rapids (and the indirect effect of Emerald Rapids), so the company is keen to reassure customers that Granite Rapids and Sierra Forest are where Intel timing gets back on track. . Between past Xeon delays and the long time it took to bring the E-core Xeon Scalable chip to market, Intel hasn’t dominated the data center market the way it once did, so Granite Rapids and Sierra Forest will be an important inflection point for Intel’s data center offerings going forward.