NVIDIA's GeForce 8800 (G80): GPUs Re-architected for DirectX 10

Name: NVIDIA's GeForce 8800 (G80): GPUs Re-architected for DirectX 10
Item: NVIDIA's GeForce 8800 (G80): GPUs Re-architected for DirectX 10
Author: Anand Lal Shimpi & Derek Wilson

by Anand Lal Shimpi & Derek Wilson on November 8, 2006 6:01 PM EST

Posted in
GPUs

111 Comments | Add A Comment

111 Comments

Branching

In order to talk generally about SPs and their capabilities, all the vertices, primitives, pixel components, etc. to be processed are referred to as threads. This way we can look at each SP as handling its own thread no matter what type of data is being processed. G80 is able to sustain "thousands" of threads at a time, but the actual number of threads that can be active at any given time is not disclosed. While all SPs can handle any type of thread, SPs that share resources must be running the same type of thread at any given time. In this way, each block of 16 SPs can be running one type of shader program on 16 threads. This indicates something about branch granularity as well. For vertex shaders, branch granularity is 16 vertices. For pixel shaders, branch granularity is 32 pixels (arranged in pairs of blocks of 4x4 pixels).

Branch granularity defines how many threads must follow the same path through data. When a group of 32 pixel threads all take the same branch, we don't have a problem. If even one thread must take a path that is different from the others, all 32 threads must be evaluated with both paths following the branch. The branch then defines what result each individual thread will keep and which it will discard. It's easy to see that optimum granularity is 1 thread, as no unnecessary work would be done. The way resources are allocated and the way instructions are run on SPs grouped together currently doesn't allow any more fine-grained branching. Here's a chart that address branch granularity:

GPU	Branch Granularity
NVIDIA NV4x	~1K pixels
NVIDIA G70	~256 pixels
ATI R580	48 pixels
NVIDIA G80	16 vertex 32 pixels

Clearly G80 has the advantage here, as it's less likely that smaller groups of pixels will take different directions through a branch. This gives programmers the ability to more easily integrate branching into their code without getting a massive performance hit. If programmers are able to incorporate more branches, shader code can become more general purpose and we will see many more effects make their way into games. Now that G80 has caught up to ATI in terms of potential branch performance, we hope developers will take the reality of more complex code seriously.

Early-Z, Memory Interface

NVIDIA has added hardware for Early-Z to G80, after their current Z-Cull hardware which removes regions of pixels completely occluded by other geometry. Early-Z is a more fine-grained occlusion culling method that looks at a calculated Z value of a fragment before it hits the pixel pipeline. Z-Cull doesn't look at per fragment Z values, but uses a Z value based on geometry. While Z-Cull can get rid of large blocks of data it has issues handling surfaces that are only partially occluded or intersecting surfaces. Looking at individual depth values per pixel can help remove unnecessary fragments from heading down the pipeline only to be thrown out when the ROPs get to them.

The memory interface has been dramatically redesigned to support the access patterns of all of G80's independent stream processors. Given the theme of increasing granularity within G80 it's no surprise that we are now seeing 5 and 6 channels of GDDR rather than the 2 or 4 channels we have been used to for the past few years. 8800 GTX will have a 384 bit bus (6 x 64-bit channels), while the 8800 GTS will have a 320 bit wide connection to DRAM (5 x 64-bit channels). We would love to delve further into the details of G80's new memory interface, but NVIDIA isn't discussing the details of this aspect of their hardware.

Digging deeper into the shader core General Purpose Processing with G80

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

111 Comments

View All Comments

haris - Thursday, November 9, 2006 - link
You must have missed the article they published the very next day http://www.theinquirer.net/default.aspx?article=35...">here. saying they goofed.
Araemo - Thursday, November 9, 2006 - link
Yes I did - thanks.

I wish they would have updated the original post to note the mistake, as it is still easily accessible via google. ;) (And the 'we goofed' post is only shown when you drill down for more results)
Araemo - Thursday, November 9, 2006 - link
In all the AA comparison photos of the power lines, with the dome in the background - why does the dome look washed out in the G80 images? Is that a driver glitch? I'm only on page 12, so if you explain it after that.. well, I'll get it eventually.. ;) But is that just a driver glitch, or is it an IQ problem with the G80 implementation of AA?
bobsmith1492 - Thursday, November 9, 2006 - link
Gamma-correcting AA sucks.
Araemo - Thursday, November 9, 2006 - link
That glitch still exists whether or not gamma-correcting AA is enabled or disabled, so that isn't it.
iwodo - Thursday, November 9, 2006 - link
I want to know if these power hungry monster have any power saving features?
I mean what happen if i am using Windows only most of the time? Afterall CPU have much better power management when they are idle or doing little work. Will i have to pay extra electricity bill simply becoz i am a cascual gamer with a power - hungry/ ful GPU ?

Another question pop up my mind was with CUDA would it now be possible for thrid party to program a H.264 Decoder running on GPU? Sounds good to me:D
DerekWilson - Thursday, November 9, 2006 - link
oh man ... I can't believe I didn't think about that ... video decoder would be very cool.
Pirks - Friday, November 10, 2006 - link
decoder is not interesting, but the mpeg4 asp/avc ENCODER on the G80 GPU... man I can't imagine AVC or ASP encoding IN REAL TIME... wow, just wooowww
I'm holding my breath here
Igi - Thursday, November 9, 2006 - link
Great article. The only thing I would like to see in a follow up article is performance comparison in CAD/CAM applications (Solidworks, ProEngineer,...).

BTW, how noisy are new cards in comparison to 7900GTX and others (in idle and under load)?
JarredWalton - Thursday, November 9, 2006 - link
I thought it was stated somewhere that they are as loud (or quiet if you prefer) as the 7900 GTX. So really not bad at all, considering the performance offered.

NVIDIA's GeForce 8800 (G80): GPUs Re-architected for DirectX 10

Branching

Early-Z, Memory Interface

Post Your Comment

111 Comments

View All Comments

haris - Thursday, November 9, 2006 - link

Araemo - Thursday, November 9, 2006 - link

Araemo - Thursday, November 9, 2006 - link

bobsmith1492 - Thursday, November 9, 2006 - link

Araemo - Thursday, November 9, 2006 - link

iwodo - Thursday, November 9, 2006 - link

DerekWilson - Thursday, November 9, 2006 - link

Pirks - Friday, November 10, 2006 - link

Igi - Thursday, November 9, 2006 - link

JarredWalton - Thursday, November 9, 2006 - link

Log in

Don't have an account? Sign up now