NVIDIA's 1.4 Billion Transistor GPU: GT200 Arrives as the GeForce GTX 280 & 260
by Anand Lal Shimpi & Derek Wilson on June 16, 2008 9:00 AM EST- Posted in
- GPUs
Lots More Compute, a Leetle More Texturing
NVIDIA's GT200 GPU has a significant increase in computational power thanks to its 240 streaming processors, up from 128 in the previous G80 design. As a result, NVIDIA's GT200 GPU showcases a tremendous increase in transistor count over its previous generation architecture (1.4 billion up from 686 million in G80).
The increase in compute power of GT200 is not mirrored however in the increase in texture processing power. On the previous page we outlined how the Texture/Processing Clusters went from two Shader Multiprocessors to three, and how there are now a total of ten TPCs in the chip up from 8 in the GeForce 8800 GTX.
In the original G80 core, used in the GeForce 8800 GTX NVIDIA's texture block looked like this:
In each block you had 4 texture address units and 8 texture filtering units.
With the move to G92, used in the GeForce 8800 GT, 8800 GTS 512 and 9800 GTX, NVIDIA doubled the number of texture address units and achieved a 1:1 ratio of address/filtering units:
With GT200 in the GeForce GTX 280/260, NVIDIA kept the address-to-filtering ratio at 1:1 but increased the ratio of SPs to texture processors:
In the previous designs you'd have 8 address and 8 filtering units per TPC (or 16 streaming processors), in the GT200 you have the same 8 address and 8 filtering units but for a larger TPC with 24 SPs.
Here's how the specs stand up across the generations:
NVIDIA Architecture Comparison | G80 | G92 | GT200 |
Streaming Processors per TPC | 16 | 16 | 24 |
Texture Address Units per TPC | 4 | 8 | 8 |
Texture Filtering Units per TPC | 8 | 8 | 8 |
Total SPs | 128 | 128 | 240 |
Total Texture Address Units | 32 | 64 | 80 |
Total Texture Filtering Units | 64 | 64 | 80 |
For a 87.5% increase in compute, there's a mere 25% increase in texture processing power. This ratio echoes what NVIDIA has been preaching for years: that games are running more complex shaders and are not as bound by texture processing as they were in years prior. If this wasn't true then we'd see a closer to 25% increase in performance of GT200 over G80 at the same clock rather than something much greater.
It also means that GT200's performance advantage over G80 or G92 based architectures (e.g. GeForce 9800 GTX) will be determined much by how computationally bound the games we're testing are.
The ratio of increase compute/texture power in the GT200 has been evident in NVIDIA architectures for years now, dating back to the ill-fated GeForce FX. NVIDIA sacrificed memory bandwidth on the GeForce FX, equipping it with a narrow 128-bit memory bus (compared to ATI's 256-bit interface on the Radeon 9700 Pro) and instead focused on building a much more powerful compute engine. Unfortunately, the bet was the wrong one to make at the time and the GeForce FX was hardly competitive (for more reasons than just a lack of memory bandwidth), but today we're dealing in a very different world. Complex shader programs run on each pixel on the screen and there's a definite need for more compute power in today's GPUs.
An Increase in Rasterization Throughput
In addition to the 25% increase in texture processing capabilities of the GT200, NVIDIA added two more ROP partitions to the GPU. While the GeForce 8800 GTX had six ROP partitions, each capable of outputting a maximum of 4 pixels per clock, the GT200 adds two more partitions.
With eight ROP partitions the GT200 can now output a maximum of 32 pixels per clock, up from 24 pixels per clock in the GeForce 8800 GTX and 9800 GTX.
The pixel blend rate on G80/G92 was half-speed, meaning that while you could output 24 pixels per clock, you could only blend 12 pixels per clock. Thanks to the 65nm shrink and redesign, GT200 can now output and blend pixels at full speed - that's 32 pixels per clock for each.
The end result is a non-linear performance improvement in everything from anti-aliasing and fire effects to shadows on GT200. It's an evolutionary change, but that really does sum up many of the enhancements of GT200 over G80/G92.
108 Comments
View All Comments
tkrushing - Wednesday, June 18, 2008 - link
Say what you want about this guy but this is partially true which is why AMD/ATI is in the position they have been. They are slowly climbing out of that hole they've been in though. Would have been nice to see 4870x2 hit the market first. As we know competition = less prices for everyone!hk690 - Tuesday, June 17, 2008 - link
I would love to kick you hard in the face, breaking it. Then I'd cut your stomach open with a chainsaw, exposing your intestines. Then I'd cut your windpipe in two with a boxcutter. Then I'd tie you to the back of a pickup truck, and drag you, until your useless fucking corpse was torn to a million fucking useless, bloody, and gory pieces.
Hopefully you'll get what's coming to you. Fucking bitch
http://www.youtube.com/watch?v=XNAFUpDTy3M">http://www.youtube.com/watch?v=XNAFUpDTy3M
I wish you a truly painful, bloody, gory and agonizing death, cunt
7Enigma - Wednesday, June 18, 2008 - link
Anand, I'm all for free speech and such, but this guy is going a bit far. I read these articles at work frequently and once the dreaded C-word is used I'm paranoid I'm being watched.Mr Roboto - Thursday, June 19, 2008 - link
I thought those comments would be deleted already. I'm sure no one cares if they are. I don't know what that person is so mad about .hk690 - Tuesday, June 17, 2008 - link
Die painfully okay? Prefearbly by getting crushed to death in a garbage compactor, by getting your face cut to ribbons with a pocketknife, your head cracked open with a baseball bat, your stomach sliced open and your entrails spilled out, and your eyeballs ripped out of their sockets. Fucking bitch
Mr Roboto - Wednesday, June 18, 2008 - link
Ouch.. Looks like you hit a nerve with AMD\ATI's marketing team!bobsmith1492 - Monday, June 16, 2008 - link
The main benefit from the 280 is the reduced power at idle! If I read the graph right, at idle the 9800 takes ~150W more than the 280 while at idle. Since that's where computers spend the majority of their time, depending on how much you game, that can be a significant cost.kilkennycat - Monday, June 16, 2008 - link
Maybe you should look at the GT200 series from the point of view of nvidia's GPGPU customers - the academic researchers, technology companies requiring fast number-cruching available on the desktop, the professionals in graphics-effects and computer animation - not necessarily real-time, but as quick as possible... The CUDA-using crew. The Tesla initative. This is an explosively-expanding and highly profitable business for nVidia - far more profitable per unit than any home desktop graphics application. An in-depth analysis by Anandtech of what the GT200 architecture brings to these markets over and above the current G8xx/G9xx architecture would be highly appreciated. I have a very strong suspicion that sales of the GT2xx series to the (ultra-rich) home user who has to have the latest and greatest graphics card is just another way of paying the development bills and not the true focus for this particular architecture or product line.nVidia is strongly rumored to be working on the true 2nd-gen Dx10.x product family, to be introduced early next year. Considering the size of the GTX280 silicon, I would expect them to transition the 65nm GTX280 GPU to either TSMC's 45nm or 55nm process before the end of 2008 to prove out the process with this size of device, then in 2009 introduce their true 2nd-gen GPU/GPGPU family on this latter process. A variant on the Intel "tic-toc" process strategy.
strikeback03 - Tuesday, June 17, 2008 - link
But look at the primary audience of this site. Whatever nvidia's intentions are for the GT280, I'm guessing more people here are interested in gaming than in subsidizing research.Wirmish - Tuesday, June 17, 2008 - link
"...requiring fast number-cruching available on the desktop..."GTX 260 = 715 GFLOPS
GTX 280 = 933 GFLOPS
HD 4850 = 1000 GFLOPS
HD 4870 = 1200 GFLOPS
4870 X2 = 2400 GFLOPS
Take a look here: http://tinyurl.com/5jwym5">http://tinyurl.com/5jwym5