NVIDIA GeForce RTX 4090 could be the first Gaming Graphics Card to Break Past 100 TFLOPs with AD102 GPU
NVIDIA GeForce RTX 4090 could be the first Gaming Graphics Card to Break Past 100 TFLOPs with AD102 GPU

Rumors regarding the next-gen NVIDIA GeForce RTX 4090 series indicate that the AD102-powered graphics card might be the first gaming GPU to break exceeding the 100 TFLOPs barrier.

Presently, the NVIDIA GeForce RTX 3090 Ti delivers the highest compute performance among all gaming GPU out there, hitting anywhere between 40 to 45 TFLOPs of FP32 (Single-Precision) GPU compute. But later this year the next-generation GPUs arriving and things are going to take a big boost in GPU performance.

According to rumors from Kopite7kimi and Greymon55, the next-generation graphics cards, not only from NVIDIA but also from AMD, are anticipated to achieve the 100 TFLOPs mark. This would mark a gigantic milestone in the consumer GPU market which has indeed seen significant performance and also a power consumption jump with the current generation of graphic cards. We went directly from 275W being the limit to 350-400W becoming the standard. The RTX 3090 Ti is already consuming over 500W of power. The next-generation GPU is going to be an even more power-hungry beast but if the compute numbers are anything to go by, then we already know one reason why the upcoming graphic cards are going to consume that much power.

As per the information, NVIDIA’s Ada Lovelace GPUs mainly the AD102 chip, has caught some significant breakthrough on TSMC’s 4N process node. The current estimates are that AMD and NVIDIA will have boost speeds identical to each other and that’s around 2.8-3.0 GHz, compared to the earlier 2.2-2.4 GHz clock speed rumors. Especially NVIDIA is going to combine a total of 18,432 cores associated with 96 MB of L2 cache and a 384-bit bus interface. For a total of 144 SMs, these will be piled in a 12 GPC die layout with 6 TPCs and 2 SMs per TPC.

You get up to 103 TFLOPs of computing performance and the rumors are indicating even higher boost clocks which are based on a theoretical clock speed of 2.8 GHz. Now, these are indeed sounding like peak clocks, similar to AMD’s peak frequencies which are higher than the average ‘Game’ clock. A 100+ TFLOPs compute performance means more than double the horsepower compared to the 3090 Ti flagship. But we should keep in mind that compute performance doesn’t necessarily mean the overall gaming performance but except that, it will be a massive upgrade for gaming PCs and an 8.5x increase over the current fastest console, the Xbox Series X.

fp32 compute horsepower comparisons
Image source: wccftech

At the end of the day, we are bound to see PC hardware, especially graphics cards, get more and more powerful but it will be incredible to see all that power put to good use to run the next-gen games, mainly 8K titles with ray-tracing on and future graphical effects.

Upcoming Flagship AMD, Intel, NVIDIA GPU Specs (Preliminary)

CodenameAda LovelaceRDNA 3Battlemage
Flagship SKUGeForce RTX 4090 SeriesRadeon RX 7900 SeriesArc B900 Series
GPU ProcessTSMC 4NTSMC 5nm+ TSMC 6nmTSCM 5nm?
GPU PackageMonolithicMCD (Multi-Chiplet Die)MCM (Multi-Chiplet Module)
GPU DiesMono x 12 x GCD + 4 x MCD + 1 x IODQuad-Tile (tGPU)
GPU Mega Clusters12 GPCs (Graphics Processing Clusters)6 Shader Engines10 Render Slices
GPU Super Clusters72 TPC (Texture Processing Clusters)30 WGPs (Per MCD)
60 WGPs (In Total)
40 Xe-Cores (Per Tile)
160 Xe-Cores (Total)
GPU Clusters144 Stream Multiprocessors (SM)120 Compute Units (CU)
240 Compute Units (in total)
1280 Xe VE (Per Tile)
5120 Xe VE (In Total)
Cores (Per Die)18432 CUDA Cores7680 SPs (Per GCD)
15360 SPs (In Total)
20480 ALUs (In Total)
Peak Clock~2.85 GHz~3.0 GHzTBD
FP32 Compute~105 TFLOPs~92 TFLOPsTBD
Memory Capacity24 GB32 GBTBD
Memory Bus384-bit256-bitTBD
Memory Speeds~21 Gbps~18 GbpsTBD
Cache Subsystems96 MB L2 Cache512 MB (Infinity Cache)TBD
LaunchQ4 2022Q4 20222023


Please enter your comment!
Please enter your name here