Better Image Quality: CSAA & TMAA
NVIDIA’s next big trick for image quality is that they’ve revised Coverage Sample Anti-Aliasing. CSAA, which was originally introduced with the G80, is a lightweight method of better determining how much of a polygon actually covers a pixel. By merely testing polygon coverage and storing the results, the ROP can get more information without the expense of fetching and storing additional color and Z data as done with a regular sample under MSAA. The quality improvement isn’t as pronounced as just using more multisamples, but coverage samples are much, much cheaper.
32x CSAA sampling pattern
For the G80 and GT200, CSAA could only test polygon edges. That’s great for resolving aliasing at polygon edges, but it doesn’t solve other kinds of aliasing. In particular, GF100 will be waging a war on billboards – flat geometry that uses textures with transparency to simulate what would otherwise require complex geometry. Fences, leaves, and patches of grass in fields are three very common uses of billboards, as they are “minor” visual effects that would be very expensive to do with real geometry, and would benefit little from the quality improvement.
Since billboards are faking geometry, regular MSAA techniques do not remove the aliasing within the billboard. To resolve that DX10 introduced alpha to coverage functionality, which allows MSAA to anti-alias the fake geometry by using the alpha mask as a coverage mask for the MSAA process. The end result of this process is that the GPU creates varying levels of transparency around the fake geometry, so that it blends better with its surroundings.
It’s a great technique, but it wasn’t done all that well by the G80 and GT200. In order to determine the level of transparency to use on an alpha to coverage sampled pixel, the anti-aliasing hardware on those GPUs used MSAA samples to test the coverage. With up to 8 samples (8xQ MSAA mode), the hardware could only compute 9 levels of transparency, which isn’t nearly enough to establish a smooth gradient. The result was that while alpha to coverage testing allowed for some anti-aliasing of billboards, the result wasn’t great. The only way to achieve really good results was to use super-sampling on billboards through Transparency Super-Sample Anti-Aliasing, which was ridiculously expensive given that when billboards are used, they usually cover most of the screen.
For GF100, NVIDIA has made two tweaks to CSAA. First, additional CSAA modes have been unlocked – GF100 can do up to 24 coverage samples per pixel as opposed 16. The second change is that the CSAA hardware can now participate in alpha to coverage testing, a natural extension of CSAA’s coverage testing capabilities. With this ability CSAA can test the coverage of the fake geometry in a billboard along with MSAA samples, allowing the anti-aliasing hardware to fetch up to 32 samples per pixel. This gives the hardware the ability to compute 33 levels of transparency, which while not perfect allows for much smoother gradients.
The example NVIDIA has given us for this is a pair of screenshots taken from a field in Age of Conan, a DX10 game. The first screenshot is from a GT200 based video card running the game with NVIDIA’s 16xQ anti-aliasing mode, which is composed of 8 MSAA samples and 8 CSAA samples. Since the GT200 can’t do alpha to coverage testing using the CSAA samples, the resulting grass blades are only blended with 9 levels of transparency based on the 8 MSAA samples, giving them a dithered look.
Age of Conan grass, GT200 16x AA
The second screenshot is from GF100 running in NVIDIA’s new 32x anti-aliasing mode, which is composed of 8 MSAA samples and 24 CSAA samples. Here the CSAA and MSAA samples can be used in alpha to coverage, giving the hardware 32 samples from which to compute 33 levels of transparency. The result is that the blades of grass are still somewhat banded, but overall much smoother than what the GT200 produced. Bear in mind that since 8x MSAA is faster on the GF100 than it was GT200, and CSAA has very little overhead in comparison (NVIDIA estimates 32x has 93% of the performance of 8xQ), the entire process should be faster on GF100 even if it were running at the same speeds as GT200. Image quality improved, and at the same time the performance improved too.
Age of Conan grass, GF100 32x AA
The ability to use CSAA on billboards left us with a question however: isn’t this what Transparency Anti-Aliasing was for? The answer as it turns out is both yes and no.
Transparency Anti-Aliasing was introduced on the G70 (GeForce 7800GTX) and was intended to help remove aliasing on billboards, exactly what NVIDIA is doing today with MSAA. The difference is that while DX10 has alpha to coverage, DX9 does not – and DX9 was all there was when G70 was released. Transparency Multi-Sample Anti-Aliasing (TMAA) as implemented today is effectively a shader replacement routine to make up for what DX9 lacks. With it, DX9 games can have alpha to coverage testing done on their billboards in spite of DX9 not having this feature, allowing for image quality improvements on games still using DX9. Under DX10 TMAA is superseded by alpha to coverage in the API, but TMAA is still alive and well due to the large number of older games using DX9 and the large number of games yet to come that will still use DX9.
Because TMAA is functionally just enabling alpha to coverage on DX9 games, all of the changes we just mentioned to the CSAA hardware filter down to TMAA. This is excellent news, as TMAA has delivered lackluster results in the past – it was better than nothing, but only Transparency Super-Sample Anti-Aliasing (TSAA) really fixed billboard aliasing, and only at a high cost. Ultimately this means that a number of cases in the past where only TSAA was suitable are suddenly opened up to using the much faster TMAA, in essence making good billboard anti-aliasing finally affordable on newer DX9 games on NVIDIA hardware.
As a consequence of this change, TMAA’s tendency to have fake geometry on billboards pop in and out of existence is also solved. Here we have a set of screenshots from Left 4 Dead 2 showcasing this in action. The GF100 with TMAA generates softer edges on the vertical bars in this picture, which is what stops the popping from the GT200.
Left 4 Dead 2: TMAA on GT200
Left 4 Dead 2: TMAA on GF100
115 Comments
View All Comments
FlyTexas - Monday, January 18, 2010 - link
I have a feeling that nVidia is taking the long road here...The past 6 months have been painful for nVidia, however I think they are looking way ahead. At its core, the 5000 series from AMD is really just a supersized 4000 series. Not a bad thing, but nothing new either (DX11 is nice, but that'll be awhile, and multiple monitors are still rare).
Games have all looked the same for years now. CPU and GPU power have gone WAY up in the past 5 years, but too much is still developed for DX9 (X360/PS3 partly to blame, as is Vista's poor adoption), and I suspect that even the 5000 series is really still designed around DX9 and games meant for it with a few "enhancements".
This new chip seems designed for DX11 and much higher detailed graphics. Polygon counts can go up with this, the number of new details can really shine, but only once games are designed from scratch for it. From that point, the 6 month wait isn't a big deal, it'll be another few years before games are really designed from scratch for DX11 ONLY. Otherwise you have DX9 games with a few "enhancements" that don't add to gameplay.
It seems like we are really skipping DX10 here, partly due to Vista's poor adoption, partly due to XP not being able to use DX10. With Windows 7 being a success and DX11 backported to Vista, I think in the next 2-3 years you'll finally see most games come out that really require Vista/7 because they will require DX10/11.
Of course, my 260GTX still runs everything I throw at it, so until games get more complex or something else changes, I see no reason to upgrade. I thought about a 5870 as an upgrade, but why? Everything already runs fast enough, what does it get me other than some headroom? If I was still on a 8800GT, it would make sense, but I'd rather wait for nVidia to launch so the prices come down.
PorscheRacer - Tuesday, January 19, 2010 - link
Well then there's the fact ATI designed their 2000 series (and 3000 and 4000 series) to comply with the full DirectX 10 specification. NVIDIA didn't have the chips required for this spec, and talked Microsoft into castrating DX10 by only adding in a few things. Tessellation was notably left out. ATI wsa hung out to dry on performanec and features wasted on die. They finalyl got DX10.1 later on but the damage was done.Sure people complained about Vista, mostly gamers as games ran slower, but I wonder how those games would have been if DX10 was run at the full spec (which was marginally lower the DX11 today)?
Scali - Wednesday, January 27, 2010 - link
I think you need to read this, and reconsider your statement:http://scalibq.spaces.live.com/blog/cns">http://scalibq.spaces.live.com/blog/cns!663AD9A4F9CB0661!194.entry
jimhsu - Monday, January 18, 2010 - link
I made this post in another forum, but I think it's relevant here:---
Yes, I'm beginning to see this [games becoming less GPU limited and more CPU limited] with more mainstream games (to repeat, Crysis is NOT a mainstream game). FLOP wise, a high end video card (i.e. 5970 at 5 TFLOP) is something like 100 TIMES the performance of a high end CPU (i7 at 50 GFLOPS).
In comparison, during the 2004 days, we had GPUs like the 6800 Ultra (54 GFLOP) and P4's (6 GFLOP) (historical data here: http://forum.beyond3d.com/showthread.php?t=51677)">http://forum.beyond3d.com/showthread.php?t=51677). That's 9X the performance. We've gone from 9X to 100X the performance in a matter of 5 years. No wonder few modern games are actually pushing modern GPUs (requiring people who want to "get the most" out of their high powered GPUs to go for multiple screens, insane AA/AF, insane detail settings, complex shaders, etc)
I know this is a horrible comparison, but still - it gives you an idea of the imbalance in performance. This kind of reminds me of the whole hard drive capacity vs. transfer rate argument. Today's 2 TB monsters are actually not much faster than the few GB drives at the turn of the millennium (and even less so latency wise).
Personally, I think the days of GPU bound (for mainstream discrete GPU computing) closed when Nvidia's 8 series launched (the 8800GTX is perhaps the longest-lived video card ever made). And in general, when the industry adopted programmable compute units (aka DirectX 10).
AznBoi36 - Tuesday, January 19, 2010 - link
Actually the Radeon 9700/9800 Pro had a pretty long life too. The 9700 Pro I bought in 2002/2003 had lasted me all the way to early 2007, which was when I then bought a 8800GTS 640mb. 4 years is pretty good. It could have lasted longer, but then I was itching for a new platform and needed to get a PCI-Express card (the Radeon was AGP).RJohnson - Monday, January 18, 2010 - link
Sorry you lost all credibility when you tried to spin this bullsh*#t "Today's 2 TB monsters are actually not much faster than the few GB drives at the turn of the millennium"Go try and run your new rig off one of those old drives, come back and post your results in 2 hours when your system finally boots.
jimhsu - Monday, January 18, 2010 - link
A fun chart. Note the performance disparity.http://i65.photobucket.com/albums/h204/killer-ra/V...">http://i65.photobucket.com/albums/h204/...Game%20S...
jimhsu - Monday, January 18, 2010 - link
Disclosure: I'm still on a 8800 GTS 512, and I am in no pressure to upgrade right now. While a 58xx would be nice to have, on a single monitor I really have no need to upgrade. I may look into going i7 though.dentatus - Monday, January 18, 2010 - link
If something works well for you then there is no real reason (or need) to upgrade.I still run an 8800 ultra, it still runs many games well on a 22 inch monitor. The GT200 was really only a 50% boost over the 8 series on average. For comparison, I bought a second hand ultra for $60, transplanted both of them into an i7 based system and this really produced a significant boost over a GTX285 in the games I liked; about 25% more performance- roughly equivalent to HD5850, albeit not always as smooth.
It would be good to upgrade to a single GPU that is more than double the performance of this kind of setup. But a HD5800 series card is not in that league, and it remains to be seen if the GF100 is.
dentatus - Monday, January 18, 2010 - link
I agree this chip does seem designed around new or upcoming features. Many architectural shortcomings from the GT200 chip seem to be addressed and worked around getting usable performance (like tesselation) for new API features.Anyway to be pragmatic about things, nvidias history leaves much to be desired; performance promised and performance delivered is very variable. HardOCP mentioned the 5800 Ultra launch as a con, there is also th G80 launch on the flip side.
A GPU's theoretical performance and the expectations hanging around it are nothing to make choices by, wait for the real proof. Anyone recall the launch of the 'monstrous' 2900XT? A toothless beast that one.