New details of the GeForce RTX 40
We now have fresh data that go deeper into the configuration of the AD102 core, the top-of-the-line GPU that NVIDIA will employ in Ada Lovelace, due to a new batch of leaks.
In theory, the AD102 GPU will have 12 GPCs (Graphics Processing Clusters), 6 TPCs (Texture Processing Clusters) for each GPC, two SM (Streaming Multiprocessors) units for each TPC, 128 shaders or FP32 units per SM, 192 FP32+INT32 units per SM, and 64 warps per SM unit. In total, each SM unit would have 2,048 threads, 192 KB of L1 cache, 32 ROPs per active GPC, and 96 MB of L2 cache.
As long as the information is accurate, we can deduce highly important conclusions from all of this data. If it employs a full AD102 GPU, the top model of the GeForce RTX 40 series would count, with 70% more GPC units than the GA102 chip, the top of the Ampere series for public consumption, which was previously used on the GeForce RTX 3090 Ti.
It would also have a distinct configuration in the FP32 and INT32 units, which would be distinguished by the addition of 128 and 64 cores, respectively. This equates to a 50% boost over the GA102 core.
Other significant differences from the GA102 core include a 33% increase in warps, a 33% increase in threads, a 50% increase in L1 cache, a 160 percent increase in L2 cache, and a 100% increase in ROPs (rasterization units). The latter is particularly intriguing because, if confirmed, it would mean that the GeForce RTX 4090 might have a massive 384 ROPs. The RTX 3090 Ti, for example, has 112 ROPs.
GeForce RTX 40 vs. GeForce RTX 30
Based on the information we have so far, we can create a comprehensive and direct list of all the changes and improvements that the GeForce RTX 40 will provide in comparison to the GeForce RTX 30. Keep in mind that we began this list with rumors and leaks, which means that some of the information we'll see below may not be accurate.
- There has been a significant increase in the number of active shaders (up to 18,432 in the AD102 core).
- The number of rasterization units per GPC has been doubled, which should result in a significant improvement in rasterization performance. These units are in charge of "assembling" all of the work done by the other GPU elements to form the image that will be sent to the frame buffer and displayed on the monitor.
- L1 and L2 cache sizes have been increased.
- 4th generation tensor cores and 3rd generation RT cores, which could increase AI, DLSS, and ray tracing performance.
- In principle, the upgraded 5nm node (N4) could aid boost efficiency and working frequencies.
With all of this in mind, it is clear to us that the generational leap that we will see with the GeForce RTX 40 will be massive, both in rasterization and ray tracing, though something tells me that it will be especially massive with this latest technology.
In terms of consumption, a significant rise has also been rumored, particularly in the most powerful models of the GeForce RTX 40 series, although We believe we should proceed with caution given some unreasonably high estimates.