NVIDIA's GeForce 8800 (G80): GPUs Re-architected for DirectX 10
by Anand Lal Shimpi & Derek Wilson on November 8, 2006 6:01 PM EST- Posted in
- GPUs
Branching
In order to talk generally about SPs and their capabilities, all the vertices, primitives, pixel components, etc. to be processed are referred to as threads. This way we can look at each SP as handling its own thread no matter what type of data is being processed. G80 is able to sustain "thousands" of threads at a time, but the actual number of threads that can be active at any given time is not disclosed. While all SPs can handle any type of thread, SPs that share resources must be running the same type of thread at any given time. In this way, each block of 16 SPs can be running one type of shader program on 16 threads. This indicates something about branch granularity as well. For vertex shaders, branch granularity is 16 vertices. For pixel shaders, branch granularity is 32 pixels (arranged in pairs of blocks of 4x4 pixels).
Branch granularity defines how many threads must follow the same path through data. When a group of 32 pixel threads all take the same branch, we don't have a problem. If even one thread must take a path that is different from the others, all 32 threads must be evaluated with both paths following the branch. The branch then defines what result each individual thread will keep and which it will discard. It's easy to see that optimum granularity is 1 thread, as no unnecessary work would be done. The way resources are allocated and the way instructions are run on SPs grouped together currently doesn't allow any more fine-grained branching. Here's a chart that address branch granularity:
GPU | Branch Granularity |
NVIDIA NV4x | ~1K pixels |
NVIDIA G70 | ~256 pixels |
ATI R580 | 48 pixels |
NVIDIA G80 | 16 vertex 32 pixels |
Clearly G80 has the advantage here, as it's less likely that smaller groups of pixels will take different directions through a branch. This gives programmers the ability to more easily integrate branching into their code without getting a massive performance hit. If programmers are able to incorporate more branches, shader code can become more general purpose and we will see many more effects make their way into games. Now that G80 has caught up to ATI in terms of potential branch performance, we hope developers will take the reality of more complex code seriously.
Early-Z, Memory Interface
NVIDIA has added hardware for Early-Z to G80, after their current Z-Cull hardware which removes regions of pixels completely occluded by other geometry. Early-Z is a more fine-grained occlusion culling method that looks at a calculated Z value of a fragment before it hits the pixel pipeline. Z-Cull doesn't look at per fragment Z values, but uses a Z value based on geometry. While Z-Cull can get rid of large blocks of data it has issues handling surfaces that are only partially occluded or intersecting surfaces. Looking at individual depth values per pixel can help remove unnecessary fragments from heading down the pipeline only to be thrown out when the ROPs get to them.
The memory interface has been dramatically redesigned to support the access patterns of all of G80's independent stream processors. Given the theme of increasing granularity within G80 it's no surprise that we are now seeing 5 and 6 channels of GDDR rather than the 2 or 4 channels we have been used to for the past few years. 8800 GTX will have a 384 bit bus (6 x 64-bit channels), while the 8800 GTS will have a 320 bit wide connection to DRAM (5 x 64-bit channels). We would love to delve further into the details of G80's new memory interface, but NVIDIA isn't discussing the details of this aspect of their hardware.
111 Comments
View All Comments
Sunrise089 - Thursday, November 9, 2006 - link
Then I suppose he's in the market to part with an ugly old high-end CRT. I'd love to buy it from him. Seriously.JarredWalton - Thursday, November 9, 2006 - link
You want an older 21" Cornerstone CRT? It's a beast, but you can have it for the cost of shipping (which unfortunately would probably be ~$50). I'd also sell my Samsung 997DF 19" CRT for about $50, and maybe an NEC FE991-SB for $50 (which unfortunately has a scratch from my daughter in the anti-glare coating). If anyone lives in the Olympia, WA area, you know how to contact me (I hope). I'd rather someone come by to pick up any of these CRTs rather than shipping, as I don't think I have the original boxes.DerekWilson - Thursday, November 9, 2006 - link
lol next thing you know links to ebay auctions are gonna start showing up in our articles :-)yyrkoon - Thursday, November 9, 2006 - link
lol, I've got a 21" techtronics I'll sell for $200 usd, plus shipping ;) Hasnt been used since I purchased my Viewsonic VA1912wb (well, been used very little ).imaheadcase - Wednesday, November 8, 2006 - link
can't stand AA benchmarks myself :)Question: Do you have any info on what kinda card nvidia releasing this feb? Is it something in between these 2 cards or something even lower?
Im looking for a $300ish g80! :D
flexy - Wednesday, November 8, 2006 - link
if ANYTHING counts then how those high-end cards perform WITH their various AA settings.... the power of those cards (and the money spent on :) RIGHT translated into ---> IMAGE QUALITY/PERFORMANCE.Please dont tell you you would get an G80 but do NOT care about AA, this does NOT make any sense...sorry...
I am especially impressed reading that transparency AA has such a LITTLE performance impact. What game engine did you test this on ?
On the older ATI cards (and am i right that T.A. is the same as "adaptive antialiasing" ? )...this feature (depending on game engine) is the FPS killer....eg. w/ games like oblivion (WHERE ARE THE GOTHIC 3 BENCHEIS BTW ? :)...much vegetation etc. game-engines.
Enable transparency AA and see all those trees, grass etc. without jaggies.
imaheadcase - Thursday, November 9, 2006 - link
Well lots of people don't are for AA. Even if i had this card I would not use it. I visually see NO difference with it on or off. Its personal test. I don't even see "jaggies" on my older 9700 PRO card.flexy - Thursday, November 9, 2006 - link
you sure are talking about ANTIALIASING ???What resoltions do you run ? Not that my CRT can even handle more than 1600x1200..but even w/ 1600 i get VERY prominent jaggies if i dont run AA.
I made it a habit to run at least 4xAA in ANY game, and some engines (hl2:source engine) etc. run extremely well with 4xAA, even 6xAA is very playable at elast with HL2.
The very recent games, namely NWN2 and G3 now dont support AA, playing at 1280x1024 and it looks utterly horrible ! If you say you dont see jags in say ANY resolution under 1600..very hard to believe
imaheadcase - Thursday, November 9, 2006 - link
Yes im talking about antialiasing. I normally play BF2 and oblivion at 1024x768 (9700 pro remember).Fact is most people won't see them unless someone points them out. The brain is still better at rendering stuff the way you want to see it vs hardware :)
flexy - Thursday, November 9, 2006 - link
ok..but then it's also a performance problem. If it doesn't bother you, well ok.I also have to settle w/ the fact that many RECENT games are even unable to do AA..however i wish they would.
But once i get a 8800 i will do &&&& to get the most out of IQ, AA, AF, transparency/texture AF, you name it. ALONE also for the reason that i would need a super-high end monitor first to even run resolutions like 2000xsomething...and as long as i have a lame 19" CRT and CANNOT even go over 1600 (99,99% of games even running everything on 1280x or 1360x) i will use all the power to get out best possible IQ in those low resolutions.
Also..looking at the benchmarks..its NOT that you lose any real time gaming-experiencee since THOSE monster cards are made for exactly this...eg. running oblivion with all those settings at MAX AND AA on and HDR...and you are still in VERY reasobale FPS ranges.