NVIDIA's GeForce 8800 (G80): GPUs Re-architected for DirectX 10
by Anand Lal Shimpi & Derek Wilson on November 8, 2006 6:01 PM EST- Posted in
- GPUs
What is CSAA?
Taking another step forward in antialiasing quality and performance, NVIDIA is introducing Coverage Sample Antialiasing with G80. Coverage Sample AA is an evolutionary step forward in AA technology designed to improve how accurately the hardware is able to determine the area of a pixel covered by any given surface. CSAA can be thought of as extending MSAA. NVIDIA is calling all of their AA modes CSAA, even though common AA modes (2x, 4x, and now 8x (8xQ to NVIDIA)) are performed exactly the same way MSAA would be performed.
To enable modes that more accurately represent each polygon's coverage of a pixel, NVIDIA has introduced an "Enhance the application" option in their driver. This option will allow you to enable a desired MSAA mode in a game (either 4x or 8x) and then "enhance" it by enabling 8x, 16x, or 16xQ CSAA. This will make the 4xAA requested in the game look like 8xAA or 16xAA. Enhancing 8x to 16xQ gives the effect of 16xMSAA without the huge performance impact that would be associated with such a setting.
To understand how it comes together, lets take a quick look at fragments and the evolution of AA.
We usually refer to fragments as pixels for simplicity sake (and because Microsoft decided to use the term pixel shader rather than fragment shader in DirectX), but it helps to understand what the difference between a pixel and a fragment is when talking about AA methods. A pixel is simply a colored dot on the screen (or stored in a frame buffer). The different pieces of data that go into determining the color of a particular pixel are called fragments. For example, if 2 triangles cover the area of a single pixel, both will be processed as fragments. Texture look ups will be done for each at the pixel center, and a color and depth will be determined, and any of this data can be manipulated by a fragment (pixel) shader. Without AA (and ignoring blending, transparency, etc...), only the fragment that is nearest the viewer and covers the pixel center will determine the color of the pixel. Antialiasing techniques are used to make the final pixel color reflect an accurate blend of the colors that cover a pixel.
A sub-pixel can be thought of as a zoomed in look at the area a pixel covers, so for example instead of a single pixel it can be viewed as a 10x10 grid of sub-pixels. Current popular FSAA (full screen AA) methods use the calculated colors of multiple sub-pixels that fall within the area of a pixel rather than just the pixel center to determine the final color. Super Sample AA takes each of these sub-pixels through the entire pipeline to determine texture and pixel shader output at each location. This is very accurate, but wastes lots of processing power without providing a proportional benefit. This is because sub-pixels that fall on the same surface don't usually end up with very different colors. MSAA only looks at one textured/shaded sample point per fragment. The colors of the sub-pixels on a polygon are the same as the color at the center of the pixel, but each sub-pixel gets its own depth value. When two polygons cover the same pixel, we can end up with different colored sub-pixels. Blending these colors proportionally results in properly antialiased polygon edges.
CSAA extends MSAA by decoupling color and depth values from the positions of the sample points within a pixel. Color values are determined at the pixel center, and color and depth data are stored in a buffer. The extension of this in CSAA comes in that we can look at more sample points in the pixel than we store color/Z data for. Under NVIDIA's 16x CSAA, four color values are stored, but the fragment coverage information for each of 16 sample points is retained. These coverage sample points are able to reference the appropriate color/Z data stored for the polygon that covers them.
While NVIDIA couldn't go into much detail on the technology behind CSAA, we can extrapolate what's going on behind the scenes in order to make this happen. For each triangle that covers a pixel, each CSAA sample point gets a boolean value that indicates whether or not it is covered by the triangle. Color/Z data for the fragment are stored in a buffer for that pixel. For this whole thing to work, each CSAA sample point must also know what color in the buffer to indicate. If we assume position is predefined, the most storage that would be needed for each CSAA point is 4 bits (one boolean coverage value plus 3bits to index 8 color/Z values). The color and Z data will be significantly larger than 8 bytes per pixel, especially for floating point color data, so the memory footprint shouldn't be much larger than MSAA.
As fragments are sent out of the pixel shader, sub-pixel data is updated based on depth tests, and coverage samples and color/Z data will be updated as necessary. When the scene is ready to be drawn, the coverage sample points and color/Z data will be used to determine the color of a pixel based on each fragment that influenced it.
So what are the downsides? We have less depth information inside the pixel, but in most cases this isn't as important as color information. We do need to know depth at different sub-pixel positions in order to handle intersecting polygons, but doing this with a different level of detail than color information shouldn't have a big impact on quality.
The other drawback is that algorithms that require stencil/Z data at sub-pixel locations will not work correctly with CSAA in modes where there are more coverage samples than colors stored. In these cases, like with the stencil shadows used in FEAR, only the coverage samples located where color values are taken are used. This effectively reverts these algorithms to MSAA quality levels. CSAA will still be applied to polygon edges, and stencil algorithms will still work with the decreased level of antialiasing applied.
At a basic level, CSAA can provide more accurate coverage information for a pixel without the storage requirements of MSAA. This not only gives gamers an option to enable higher quality AA, but the option to enable higher quality AA without a large performance impact. While the explanation of how it does this may be overly complex, here's a simple table to help convey what's going on:
111 Comments
View All Comments
haris - Thursday, November 9, 2006 - link
You must have missed the article they published the very next day http://www.theinquirer.net/default.aspx?article=35...">here. saying they goofed.Araemo - Thursday, November 9, 2006 - link
Yes I did - thanks.I wish they would have updated the original post to note the mistake, as it is still easily accessible via google. ;) (And the 'we goofed' post is only shown when you drill down for more results)
Araemo - Thursday, November 9, 2006 - link
In all the AA comparison photos of the power lines, with the dome in the background - why does the dome look washed out in the G80 images? Is that a driver glitch? I'm only on page 12, so if you explain it after that.. well, I'll get it eventually.. ;) But is that just a driver glitch, or is it an IQ problem with the G80 implementation of AA?bobsmith1492 - Thursday, November 9, 2006 - link
Gamma-correcting AA sucks.Araemo - Thursday, November 9, 2006 - link
That glitch still exists whether or not gamma-correcting AA is enabled or disabled, so that isn't it.iwodo - Thursday, November 9, 2006 - link
I want to know if these power hungry monster have any power saving features?I mean what happen if i am using Windows only most of the time? Afterall CPU have much better power management when they are idle or doing little work. Will i have to pay extra electricity bill simply becoz i am a cascual gamer with a power - hungry/ ful GPU ?
Another question pop up my mind was with CUDA would it now be possible for thrid party to program a H.264 Decoder running on GPU? Sounds good to me:D
DerekWilson - Thursday, November 9, 2006 - link
oh man ... I can't believe I didn't think about that ... video decoder would be very cool.Pirks - Friday, November 10, 2006 - link
decoder is not interesting, but the mpeg4 asp/avc ENCODER on the G80 GPU... man I can't imagine AVC or ASP encoding IN REAL TIME... wow, just wooowwwI'm holding my breath here
Igi - Thursday, November 9, 2006 - link
Great article. The only thing I would like to see in a follow up article is performance comparison in CAD/CAM applications (Solidworks, ProEngineer,...).BTW, how noisy are new cards in comparison to 7900GTX and others (in idle and under load)?
JarredWalton - Thursday, November 9, 2006 - link
I thought it was stated somewhere that they are as loud (or quiet if you prefer) as the 7900 GTX. So really not bad at all, considering the performance offered.