This is a project I’ve been wanting to work on for a long time – since it became clear that AMD’s brand new Navi architecture would feature in the next-gen consoles, in fact. From PS4 and Xbox One, through the enhanced consoles and up to the reveal of Google Stadia, graphics power has been measured by a somewhat arbitrary unit: the teraflop. And let’s be clear: how many teraflops the new consoles have remains a preoccupation for many observers, eager to get some idea of what PlayStation 5 or Project Scarlett may deliver up against the hardware of today. But perhaps the focus needs to shift and maybe we need to take a closer look at the new AMD Navi architecture itself. Put simply, a teraflop of Navi compute should produce much faster game performance than an old-school GCN equivalent – but can we quantify that?
Testing Navi – and its teraflops – sounds like a relatively simple task. You’d start by tracking down graphics cards across the last seven years of AMD history, starting all the way back at GCN 1.0, the architectural foundation of the GPUs found in the current generation of consoles. From there, we’d equalise shader count, core clocks and memory bandwidth across the various GCN iterations and stack them up against a similarly specced Navi. After completing a thorough range of benchmarks, we’d have a progression of AMD performance improvements from the dawn of GCN right up to the brand new RDNA products – and at the end of it, maybe we’d get some idea of how a GCN 1.0 teraflop compares against an RDNA 1.0 equivalent.
Unfortunately, carrying out this procedure is somewhat difficult because equalising frequencies, compute units and memory bandwidth is essentially impossible. The GCN era began with Tahiti – a 32 compute unit GPU, while Navi’s lowest end offering has 36 CUs. Further complicating matters is that Navi’s GDDR6 VRAM offers a vast 448GB/s of bandwidth – way beyond the limits of any kind of comparable GCN part, with no obvious means of underclocking it. However, a tip from the brilliant Steve Burke at Gamers Nexus pointed me towards MorePowerTool, which I found underclock memory to 256GB/s – the upper-end of GDDR5’s capabilities on prior GCN products. With that hurdle overcome, some mathematical shenanigans can get us to where we need to be, as this table demonstrates.
We can’t compare GCN 1.0 to RDNA 1.0 directly, but we can do the next best thing. The original Graphics Core Next silicon, codenamed Tahiti, is represented here by the Radeon R9 280X with 32 compute units. Its 384-bit memory interface tops out at 288GB/s of bandwidth and can be easily underclocked to 256GB/s. Moving onto the evergreen Polaris architecture, the Radeon RX 570 has the same CU count, and its RAM can be overclocked to 256GB/s. The plan is starting to come together – we can compare GCN 1.0 and GCN 4.0 directly.