Ahead of the official announcement of AMD’s Ryzen 7000 Series Desktop Processors, Angstronomics previews some key microarchitectural details and what this means for cooling and competitiveness.
Zen 4 Microarchitecture
AMD calls its Zen 4 CPU core design an “incremental update“ over Zen 3. This shows with the same CPUID Family 19h for products using either core generation. The next major revision arrives in 2024 with Zen 5 (Family 1Ah).
Transistor Count
TSMC N5 Core Complex Die (CCD)
Codename “Durango”
8 Zen 4 “Persephone“ Cores
8MB L2 + 32MB L3 Cache
6.57 Billion Transistors
Each 8-core 5nm Zen 4 CCD is made up of 6.57 Billion transistors. This packs 58% more transistors than the previous 7nm Zen 3 CCD (4.15B) despite the die being over 10% smaller. Thus the average transistor density across the whole die goes from below 50 MTr/mm² to over 90 MTr/mm².
Core Density
As we know that cache area scaling is smaller than logic area scaling moving from 7nm to 5nm, this tells you that cache takes up a higher percentage of total die area. What you get is that a 5nm Zen 4 core with 1MB L2 takes less area than a 7nm Zen 3 core with 0.5MB L2 while transistor count more than doubles. This means that logic area has also more than doubled in average transistor density.
Transistor Spend
Zen 4 has spent its transistor budget in 2 main areas. Modifying the Floating Point pipeline to handle AVX-512 instructions using the same 256b datapath as Zen 3, and improving the Front-End of the core to feed the execution engines better, which was more bottlenecked in Zen 3. The AVX-512 instruction support level is closer to Intel’s Sunny Cove than Golden Cove, with AVX-512F but no VP2INTERSECT. The Front-End now has a significantly larger micro-op cache with over 6K entries vs 4K on Zen 3. Funnily enough, that means the micro-op cache is 50% bigger than the L1 instruction cache, in terms of transistor budget. Of course, it is much more energy efficient to stay within the micro-op cache pool than going out to L1i or L2.
Consequences
The doubling of transistor density at the heat generating sections of the die has a major impact on the cooling of the chip. Cache has a comparatively low heat output per area, while logic is where the hotspots are located. Due to thermal limits at the logic hotspots, there is increasing use of “dark silicon”, meaning a lower percentage of transistors can be switching and producing heat at any one time.
Cooling Challenge
Ryzen 7000 Desktop will therefore represent the ultimate cooling challenge seen thus far. The die can approach 2 Watts/mm² if your cooler is up to the task. The smaller heatspreader area on Ryzen 7000 package design only makes things worse. We discussed the cooling difficulty and why the heatspreader is smaller in our Computex piece in May.
Due to the cooling difficulty and higher power limits on Ryzen 7000 Dekstop, the philosophy has changed to follow a more laptop-like approach. Instead of fixing a clockspeed or power limit, the processor is now temperature limited. This means maximizing performance for a given cooling capability. The processor will adjust to stay within the temperature limits. This also means a more significant performance difference that changes with cooling ability.
Competitiveness
AMD’s client desktop processors remain the weakest in their line-up compared to Intel. Outright performance for the money lags behind while being much more difficult to cool than Intel’s offerings. Ryzen 7000’s 5nm process significantly increases production costs and risks eroding margins as it is difficult to increase ASPs commensurately in such a competitive environment. Mandatory DDR5 on the new AM5 platform will also have some impact on uptake due to a much higher base platform cost.
Makes for a more realistic and level-headed take on the 7000-series compared to the giddy hyperbole and hype spouted by some Techtubers. This is the first I've heard that a; Ryzen 7000 will be a serious challenge to cool (at the high end at least) and b; production costs for it are significantly higher. This is key as with the return of Intel increasing ASP is not an option.
Any word yet on the iGPU? It's not supposed to be anything special, enough for basic desktop and video streaming/conferencing, but I'm curious which arch it is, ALUs/speed, and what kind of output formats and bandwidth we're looking at.