NVIDIA's GeForce 8800 (G80): GPUs Re-architected for DirectX 10
by Anand Lal Shimpi & Derek Wilson on November 8, 2006 6:01 PM EST- Posted in
- GPUs
Branching
In order to talk generally about SPs and their capabilities, all the vertices, primitives, pixel components, etc. to be processed are referred to as threads. This way we can look at each SP as handling its own thread no matter what type of data is being processed. G80 is able to sustain "thousands" of threads at a time, but the actual number of threads that can be active at any given time is not disclosed. While all SPs can handle any type of thread, SPs that share resources must be running the same type of thread at any given time. In this way, each block of 16 SPs can be running one type of shader program on 16 threads. This indicates something about branch granularity as well. For vertex shaders, branch granularity is 16 vertices. For pixel shaders, branch granularity is 32 pixels (arranged in pairs of blocks of 4x4 pixels).
Branch granularity defines how many threads must follow the same path through data. When a group of 32 pixel threads all take the same branch, we don't have a problem. If even one thread must take a path that is different from the others, all 32 threads must be evaluated with both paths following the branch. The branch then defines what result each individual thread will keep and which it will discard. It's easy to see that optimum granularity is 1 thread, as no unnecessary work would be done. The way resources are allocated and the way instructions are run on SPs grouped together currently doesn't allow any more fine-grained branching. Here's a chart that address branch granularity:
GPU | Branch Granularity |
NVIDIA NV4x | ~1K pixels |
NVIDIA G70 | ~256 pixels |
ATI R580 | 48 pixels |
NVIDIA G80 | 16 vertex 32 pixels |
Clearly G80 has the advantage here, as it's less likely that smaller groups of pixels will take different directions through a branch. This gives programmers the ability to more easily integrate branching into their code without getting a massive performance hit. If programmers are able to incorporate more branches, shader code can become more general purpose and we will see many more effects make their way into games. Now that G80 has caught up to ATI in terms of potential branch performance, we hope developers will take the reality of more complex code seriously.
Early-Z, Memory Interface
NVIDIA has added hardware for Early-Z to G80, after their current Z-Cull hardware which removes regions of pixels completely occluded by other geometry. Early-Z is a more fine-grained occlusion culling method that looks at a calculated Z value of a fragment before it hits the pixel pipeline. Z-Cull doesn't look at per fragment Z values, but uses a Z value based on geometry. While Z-Cull can get rid of large blocks of data it has issues handling surfaces that are only partially occluded or intersecting surfaces. Looking at individual depth values per pixel can help remove unnecessary fragments from heading down the pipeline only to be thrown out when the ROPs get to them.
The memory interface has been dramatically redesigned to support the access patterns of all of G80's independent stream processors. Given the theme of increasing granularity within G80 it's no surprise that we are now seeing 5 and 6 channels of GDDR rather than the 2 or 4 channels we have been used to for the past few years. 8800 GTX will have a 384 bit bus (6 x 64-bit channels), while the 8800 GTS will have a 320 bit wide connection to DRAM (5 x 64-bit channels). We would love to delve further into the details of G80's new memory interface, but NVIDIA isn't discussing the details of this aspect of their hardware.
111 Comments
View All Comments
yyrkoon - Thursday, November 9, 2006 - link
If you're using Firefox, get, and install the extension "flashblock". Just did this myself today, tired of all the *animated* adds bothering me while reading articles.Sorry AT guys, but we've had this discussion before, and its realy annoying.
JarredWalton - Thursday, November 9, 2006 - link
Do you want to be able for us to continue as a site? Because ads support us. Anyway, his problem is related to not seeing images, so your comment about blocking ads via flashblock is completely off topic.yyrkoon - Thursday, November 9, 2006 - link
Of course I want you guys to continue on as a site, just wish it were possible without annoying flashing adds in a section where I'm trying to concentrate on the article.As for the off topic part, yeah, my bad, I mis-read the full post (bad habit). Feel free to edit or remove that post of mine :)
archcommus - Thursday, November 9, 2006 - link
What browser are you using?falc0ne - Thursday, November 9, 2006 - link
firefox 2.0JarredWalton - Thursday, November 9, 2006 - link
If Firefox, I know there's an option to block images not on the originating website. In this case, images come from image.anandtech.com while the article is on www.anandtech.com, so that my be the cause of your problems. IE7 and other browsers might have something similar, though I haven't ever looked. Other than that, perhaps some firewall or ad blocking software is to blame - it might be getting false positives?archcommus - Thursday, November 9, 2006 - link
Wow to Anandtech - another amazing, incredibly in-depth article. It is so obvious this site is run by dedicated professionals who have degrees in these fields versus most other review sites where the authors just take pictures of the product and run some benches. Articles like this keep the AT reader base very very strong.Also wow to the G80, obviously an amazing card. My question, is 450W the PSU requirement for the GTX only or for both the GTX and GTS? I ask because I currently have a 400W PSU and am wondering if it will be sufficient for next-gen DX10 class hardware, and I know I would not be buying the highest model card. I also only have one HDD and one optical drive in my system.
Yet another wow goes out to the R&D monetary investment - $475 million! It's amazing that that amount is even acceptable to nVidia, I can't believe the sales of such a high end, enthusiast-targeted card are great enough to warrant that.
JarredWalton - Thursday, November 9, 2006 - link
Sales of the lower end parts which will be based off G80 are what make it worthwhile, I would guess. As for PSU, I think that 450W is for the GTX, and more is probably a safe bet (550W would be in line with a high-end system these days, although 400W ought to suffice if it's a good quality 400W). You can see that the GTX tops out at just under 300W average system power draw with an X6800, so if you use an E6600 and don't overclock, a decent 400W ought to work. The GTX tops out around 260W average with the X6800, so theoretically even a decent 350W will work fine. Just remember to upgrade the PSU if you ever add other components.photoguy99 - Thursday, November 9, 2006 - link
I just wanted to second that thought -AT articles have incredible quality and depth at this point - you guys are doing great work.
It's actually getting embarrasing for some of your competing sites, I browsed the Tom's article and it had so much fluff and retread I had to stop.
Please don't forget the effort is noticed and appreciated.
shabby - Wednesday, November 8, 2006 - link
It wasnt mentioned in the review, but whats the purpose of the 2nd sli connector?