NVIDIA's GeForce 8800 (G80): GPUs Re-architected for DirectX 10
by Anand Lal Shimpi & Derek Wilson on November 8, 2006 6:01 PM EST- Posted in
- GPUs
What is CSAA?
Taking another step forward in antialiasing quality and performance, NVIDIA is introducing Coverage Sample Antialiasing with G80. Coverage Sample AA is an evolutionary step forward in AA technology designed to improve how accurately the hardware is able to determine the area of a pixel covered by any given surface. CSAA can be thought of as extending MSAA. NVIDIA is calling all of their AA modes CSAA, even though common AA modes (2x, 4x, and now 8x (8xQ to NVIDIA)) are performed exactly the same way MSAA would be performed.
To enable modes that more accurately represent each polygon's coverage of a pixel, NVIDIA has introduced an "Enhance the application" option in their driver. This option will allow you to enable a desired MSAA mode in a game (either 4x or 8x) and then "enhance" it by enabling 8x, 16x, or 16xQ CSAA. This will make the 4xAA requested in the game look like 8xAA or 16xAA. Enhancing 8x to 16xQ gives the effect of 16xMSAA without the huge performance impact that would be associated with such a setting.
To understand how it comes together, lets take a quick look at fragments and the evolution of AA.
We usually refer to fragments as pixels for simplicity sake (and because Microsoft decided to use the term pixel shader rather than fragment shader in DirectX), but it helps to understand what the difference between a pixel and a fragment is when talking about AA methods. A pixel is simply a colored dot on the screen (or stored in a frame buffer). The different pieces of data that go into determining the color of a particular pixel are called fragments. For example, if 2 triangles cover the area of a single pixel, both will be processed as fragments. Texture look ups will be done for each at the pixel center, and a color and depth will be determined, and any of this data can be manipulated by a fragment (pixel) shader. Without AA (and ignoring blending, transparency, etc...), only the fragment that is nearest the viewer and covers the pixel center will determine the color of the pixel. Antialiasing techniques are used to make the final pixel color reflect an accurate blend of the colors that cover a pixel.
A sub-pixel can be thought of as a zoomed in look at the area a pixel covers, so for example instead of a single pixel it can be viewed as a 10x10 grid of sub-pixels. Current popular FSAA (full screen AA) methods use the calculated colors of multiple sub-pixels that fall within the area of a pixel rather than just the pixel center to determine the final color. Super Sample AA takes each of these sub-pixels through the entire pipeline to determine texture and pixel shader output at each location. This is very accurate, but wastes lots of processing power without providing a proportional benefit. This is because sub-pixels that fall on the same surface don't usually end up with very different colors. MSAA only looks at one textured/shaded sample point per fragment. The colors of the sub-pixels on a polygon are the same as the color at the center of the pixel, but each sub-pixel gets its own depth value. When two polygons cover the same pixel, we can end up with different colored sub-pixels. Blending these colors proportionally results in properly antialiased polygon edges.
CSAA extends MSAA by decoupling color and depth values from the positions of the sample points within a pixel. Color values are determined at the pixel center, and color and depth data are stored in a buffer. The extension of this in CSAA comes in that we can look at more sample points in the pixel than we store color/Z data for. Under NVIDIA's 16x CSAA, four color values are stored, but the fragment coverage information for each of 16 sample points is retained. These coverage sample points are able to reference the appropriate color/Z data stored for the polygon that covers them.
While NVIDIA couldn't go into much detail on the technology behind CSAA, we can extrapolate what's going on behind the scenes in order to make this happen. For each triangle that covers a pixel, each CSAA sample point gets a boolean value that indicates whether or not it is covered by the triangle. Color/Z data for the fragment are stored in a buffer for that pixel. For this whole thing to work, each CSAA sample point must also know what color in the buffer to indicate. If we assume position is predefined, the most storage that would be needed for each CSAA point is 4 bits (one boolean coverage value plus 3bits to index 8 color/Z values). The color and Z data will be significantly larger than 8 bytes per pixel, especially for floating point color data, so the memory footprint shouldn't be much larger than MSAA.
As fragments are sent out of the pixel shader, sub-pixel data is updated based on depth tests, and coverage samples and color/Z data will be updated as necessary. When the scene is ready to be drawn, the coverage sample points and color/Z data will be used to determine the color of a pixel based on each fragment that influenced it.
So what are the downsides? We have less depth information inside the pixel, but in most cases this isn't as important as color information. We do need to know depth at different sub-pixel positions in order to handle intersecting polygons, but doing this with a different level of detail than color information shouldn't have a big impact on quality.
The other drawback is that algorithms that require stencil/Z data at sub-pixel locations will not work correctly with CSAA in modes where there are more coverage samples than colors stored. In these cases, like with the stencil shadows used in FEAR, only the coverage samples located where color values are taken are used. This effectively reverts these algorithms to MSAA quality levels. CSAA will still be applied to polygon edges, and stencil algorithms will still work with the decreased level of antialiasing applied.
At a basic level, CSAA can provide more accurate coverage information for a pixel without the storage requirements of MSAA. This not only gives gamers an option to enable higher quality AA, but the option to enable higher quality AA without a large performance impact. While the explanation of how it does this may be overly complex, here's a simple table to help convey what's going on:
111 Comments
View All Comments
DerekWilson - Thursday, November 9, 2006 - link
i'm sure there was a lot burried in there ... sorry if it wasn't easy to find.8800 gtx and gtx are both no louder than 7900 gtx. 1950 xtx still takes the cake for loudest graphics card around by a long shot -- especially after it heats up in a game.
crystal clear - Thursday, November 9, 2006 - link
My comments in Daily Tech on this subject-More "G80" Derivatives in February R
E: More info would be nice
By crystal clear on 11/8/06, Rating: 2
By crystal clear on 11/8/2006 8:03:43 AM , Rating: 2
If you link VISTA -SANTA ROSA platform-Core2DUO(merom)CPU line up(T7300,7500,7700 models)then a matching Graphics card
to complete the link.
So a G80 for laptops/notebooks?
The pairing of Intels Santa Rosa platform with Vista in the 2Q 07 is next big thing for the first tier notebook manufacturers & all they need is a matching G80 for this setup.
Unquote-
Nvidia currently caters to Desktop requirement/needs with the new G80 releases,wonder how the notebook/server versions will be-with Vista ofcourse.
yyrkoon - Thursday, November 9, 2006 - link
Vitual memory is probably a good thing for most cases, but in the graphics arena, this *could* potentially make for sloppy/ bad coding practises. Knowing a lot of game devers (some of which actually work for well known companies), I've heard them from time to time complain about maxing a 16x PCI-E pipe. What I'm trying to say here, is that while it would be a good thing for never having to run out of texture memory, but that system memory, and definately the swap disk can not hold a candle to the memory bandwidth that most Video cards are capable of. End result, is that you definately *will* get a performance hit. All this, and we already know the memory bandwidth capabilities of modern PCs, suffice it to say, the most we'll see from current systems is what ? 12-13K GB/s ? Even a 7800GS can do roughly 35 GB/s on card. A 7600GT ? 22GB/s ?Still I think Directx10 is a very good thing, and as I didnt read the whole article, perhaps a missed a little ? Reason being, I've been reading about Directx10 since April, and a friend of mine was privy to some of this information after an interview with ATI.
http://www.gamedev.net/reference/programming/featu...">http://www.gamedev.net/reference/programming/featu...
saratoga - Thursday, November 9, 2006 - link
I don't know how they threading really works, but its quite possible VM support is required in order to allow multiple threads to run without stepping all over each other,.saratoga - Thursday, November 9, 2006 - link
Sorry, should read "I don't know how THEIR threading works"falc0ne - Thursday, November 9, 2006 - link
I don't know what is the problem but I'm really unable to see the images within the latest articles from Anand...Can anyone give me a suggestion? What might be the cause of that?The thing is I'm really, really interested in these articles and I need to see those images. Thanks
yyrkoon - Thursday, November 9, 2006 - link
Oh, er, then in the options tab of Firefox, (tools->options->content) check the "load images" check box ;)falc0ne - Thursday, November 9, 2006 - link
well...it would've been simple but I'm afraid is not that...It might be the addblock extension from firefox, other than that I have nooo ideeea...Well I will use the IE tab option instead and load the pages using IE 7. Thanks anyway:)yyrkoon - Thursday, November 9, 2006 - link
Checked the exceptions list ? I know that firefox makes it really simple to block images from a site (to a point of being too easy).JarredWalton - Thursday, November 9, 2006 - link
If you've got AdBlock on Firefox, press Ctrl+Shift+A and you can see what it's blocking. If it blocks the images.anandtech.com stuff, you can then see which RegEx isn't working right and edit that.