NVIDIA's 1.4 Billion Transistor GPU: GT200 Arrives as the GeForce GTX 280 & 260
by Anand Lal Shimpi & Derek Wilson on June 16, 2008 9:00 AM EST- Posted in
- GPUs
Building NVIDIA's GT200
Here's a Streaming Processor, NVIDIA calls it an SP:
NVIDIA calls an individual SP a single processing core, which is actually true. It is a fully pipelined, single-issue, in-order microprocessor complete with two ALUs and a FPU. An SP doesn't have any cache, so it's not particularly great at anything other than cranking through tons of mathematical operations. Since an SP spends most of its time working on pixel or vertex data, the fact that it doesn't have a cache doesn't really matter. Aside from name similarities, one NVIDIA SP is a lot like a very simplified version of a SPE in the Cell microprocessor (or maybe the SPE is like a really simple version of one of NVIDIA's SMs, which we'll get to in a minute). While a single SPE in Cell has seven execution units, a single NVIDIA SP only has three.
By itself a SP is fairly useless, but NVIDIA builds GPUs and if you add up enough of these little monsters you can start to get something productive given that graphics rendering is a highly parallelizable task.
Here's a Streaming Multiprocessor, which NVIDIA abbreviates as SM:
A SM is an array of SPs, eight to be specific, along with two more processors called Special Function Units (SFUs). Each SFU has four FP multiply units which are used for transcendental operations (e.g. sin, cosin) and interpolation, the latter being used in some of the calculations for things like anisotropic texture filtering. Although NVIDIA isn't specific in saying so, we assume that each SFU is also a fully pipelined, single-issue, in-order microprocessor. There's a MT issue unit that dispatches instructions to all of the SPs and SFUs in the group.
In addition to the processor cores in a SM, there's a very small instruction cache, a read only data cache and a 16KB read/write shared memory. These cache sizes are kept purposefully small because unlike a conventional desktop microprocessor, the datasets we're trying to cache here are small. Each SP ends up working on an individual pixel and despite the move to 32-bit floating point values, there's only so much data associated with a single pixel. The 16KB memory is akin to Cell's local stores in that it's not a cache, but a software-managed data store so that latency is always predictable. With this many cores in a single SM, control and predictability and very important to making the whole thing work efficiently.
Take one more step back and you've got a Texture/Processor Cluster (TPC):
The G80/G92 TPC (left) vs. the GT200 TPC (right)
NVIDIA purposefully designed its GPU architecture to be modular, so a single TPC can be made up of any number of SMs. In the G80 architecture it was made up of two SMs but with the GT200 architecture it now has three SMs.
The components of the TPC however haven't changed; a TPC is made up of SMs, some control logic and a texture block. Remember that a SM is a total of 8 SPs and 2 SFUs, so that brings the total up to 24 SPs and 6 SFUs (must...not...type...STFU) per cluster in GT200 (up from 16 SPs and 4 SFUs in G80). The texture block includes texture addressing and filtering logic as well as a L1 texture cache.
The modular theme continues with the Streaming Processor Array (SPA) that is composed of a number of TPCs:
The GT200 SPA, that's 240 SPs in there if you want to count them
In G80 the SPA was made up of 8 TPCs, but with GT200 we've moved up to 10. Note that each TPC now has 3 SMs vs. 2, so the overall processing power of GT200 has increased by 87.5% over G80.
And here's G80/G92, only 128 SPs thanks to two SMs per TPC and 8 TPCs
At the front end of the GPU we've got schedulers and control logic to distribute workloads to the entire array of processing cores. At the other end we've got L2 texture caches and rasterization processors that handle final filtering and output of data to the frame buffer.
The culmination of all of this is that the new GT200 GPU, the heart of the GeForce GTX 280 and 260, features 240 SPs, 160KB of local memory, an even smaller amount of cache and is built on TSMC's 65nm process using 1.4 billion transistors.
1.4 Billion Transistors. It wants vertex data. Really bad.
754 Million Transistors
There are more transistors in this chip than there are people in China, and it's the largest, most compute-dense chip we've ever reviewed.
108 Comments
View All Comments
Anand Lal Shimpi - Monday, June 16, 2008 - link
Thanks for the heads up, you're right about G92 only having 4 ROPs, I've corrected the image and references in the article. I also clarified the GeForce FX statement, it definitely fell behind for more reasons than just memory bandwidth, but the point was that NVIDIA has been trying to go down this path for a while now.Take care,
Anand
mczak - Monday, June 16, 2008 - link
Thanks for correcting. Still, the paragraph about the FX is a bit odd imho. Lack of bandwidth really was the least of its problem, it was a too complicated core with actually lots of texturing power, and sacrificed raw compute power for more programmability in the compute core (which was its biggest problem).Arbie - Monday, June 16, 2008 - link
I appreciate the in-depth look at the architecture, but what really matters to me are graphics performance, heat, and noise. You addressed the card's idle power dissipation but only in full-system terms, which masks a lot. Will it really draw 25W in idle under WinXP?And this highly detailed review does not even mention noise! That's very disappointing. I'm ready to buy this card, but Tom's finds their samples terribly noisy. I was hoping and expecting Anandtech to talk about this.
Arbie
Anand Lal Shimpi - Monday, June 16, 2008 - link
I've updated the article with some thoughts on noise. It's definitely loud under load, not GeForce FX loud but the fan does move a lot of air. It's the loudest thing in my office by far once you get the GPU temps high enough.From the updated article:
"Cooling NVIDIA's hottest card isn't easy and you can definitely hear the beast moving air. At idle, the GPU is as quiet as any other high-end NVIDIA GPU. Under load, as the GTX 280 heats up the fan spins faster and moves much more air, which quickly becomes audible. It's not GeForce FX annoying, but it's not as quiet as other high-end NVIDIA GPUs; then again, there are 1.4 billion transistors switching in there. If you have a silent PC, the GTX 280 will definitely un-silence it and put out enough heat to make the rest of your fans work harder. If you're used to a GeForce 8800 GTX, GTS or GT, the noise will bother you. The problem is that returning to idle from gaming for a couple of hours results in a fan that doesn't want to spin down as low as when you first turned your machine on.
While it's impressive that NVIDIA built this chip on a 65nm process, it desperately needs to move to 55nm."
Mr Roboto - Monday, June 16, 2008 - link
I agree with what Darkryft said about wanting a card that absolutely without a doubt, stomps the 8800GTX. So far that hasn't happened as the GX2 and GT200 hardly do either. The only thing they proved with the G90 and G92 is that they know how to cut costs.Well thanks for making me feel like such a smart consumer as it's going on 2 years with my 8800GTX and it still owns 90% of the games I play.
P.S. It looks like Nvidia has quietly discontinued the 8800GTX as it's no longer on major retail sites.
Rev1 - Monday, June 16, 2008 - link
Ya the 640 8800 gts also. No Sli for me lol.wiper - Monday, June 16, 2008 - link
What about noise ? Other reviews show mixed data. One says it's another dustblower, others says the noise level is ok.Zak - Monday, June 16, 2008 - link
First thing though, don't rely entirely on spell checker:)) Page 4 "Derek Gets Technical": "borrowing terminology from weaving was cleaver" I believe you meant "clever"?As darkryft pointed out:
"In my opinion, for $650, I want to see some f-ing God-like performance."
Why would anyone pay $650 for this? Ugh? This is probably THE disappointment of the year:(((
Z.
js01 - Monday, June 16, 2008 - link
On techpowerups review it seemed to pull much bigger numbers but they were using xp sp2.http://www.techpowerup.com/reviews/Point_Of_View/G...">http://www.techpowerup.com/reviews/Point_Of_View/G...
NickelPlate - Monday, June 16, 2008 - link
Pfft, title says it all. Let's hope that driver updates widen the gap between previous high end products. Otherwise, I'll pass on this one.