ATI's New Leader in Graphics Performance: The Radeon X1900 Series
by Derek Wilson & Josh Venning on January 24, 2006 12:00 PM EST- Posted in
- GPUs
R580 Architecture
The architecture itself is not that different from the R520 series. There are a couple tweaks that found their way into the GPU, but these consist mainly of the same improvements made to the RV515 and RV530 over the R520 due to their longer lead time (the only reason all three parts arrived at nearly the same time was because of a bug that delayed the R520 by a few months). For a quick look at what's under the hood, here's the R520 and R580 vertex pipeline:
and the internals of each pixel quad:
The real feature of interest is the ability to load and filter 4 texture addresses from a single channel texture map. Textures which describe color generally have four components at every location in the texture, and normally the hardware will load an address from a texture map, split the 4 channels and filter them independently. In cases where single channel textures are used (ATI likes to use the example of a shadow map), the R520 will look up the appropriate address and will filter the single channel (letting the hardware's ability to filter 3 other components go to waste). In what ATI calls it's Fetch4 feature, the R580 is capable of loading 3 other adjacent single channel values from the texture and filtering these at the same time. This effectively loads 4 and filters four times the texture data when working with single channel formats. Traditional color textures, or textures describing vector fields (which make use of more than one channel per position in the texture) will not see any performance improvement, but for some soft shadowing algorithms performance increases could be significant.
That's really the big news in feature changes for this part. The actual meat of the R580 comes in something Tim Allen could get behind with a nice series of manly grunts: More power. More power in the form of a 384 million transistor 90nm chip that can push 12 quads (48 pixels) worth of data around at a blisteringly fast 650MHz. Why build something different when you can just triple the hardware?
To be fair, it's not a straight tripling of everything and it works out to look more like 4 X1600 parts than 3 X1800 parts. The proportions work out to match what we see in the current midrange part: all you need for efficient processing of current games is a three to one ratio of pixel pipelines to render backends or texture units. When the X1000 series initially launched, we did look at the X1800 as a part that had as much crammed into it as possible while the X1600 was a little more balanced. Focusing on pixel horsepower makes more efficient use of texture and render units when processing complex and interesting shader programs. If we see more math going on in a shader program than texture loads, we don't need enough hardware to load a texture every single clock cycle for every pixel when we can cue them up and aggregate requests in order to keep available resources busy more consistently. With texture loads required to hide latency (even going to local video memory isn't instantaneous yet), handling the situation is already handled.
Other than keeping the number of texture and render units the same as the X1800 (giving the X1900 the same ratios of math to texture/fill rate power as the X1600), there isn't much else to say about the new design. Yes, they increased the number of registers in proportion to the increase in pixel power. Yes they increased the width of the dispatch unit to compensate for the added load. Unfortunately, ATI declined allowing us to post the HDL code for their shader pipeline citing some ridiculous notion that their intellectual property has value. But we can forgive them for that.
This handy comparison page will have to do for now.
The architecture itself is not that different from the R520 series. There are a couple tweaks that found their way into the GPU, but these consist mainly of the same improvements made to the RV515 and RV530 over the R520 due to their longer lead time (the only reason all three parts arrived at nearly the same time was because of a bug that delayed the R520 by a few months). For a quick look at what's under the hood, here's the R520 and R580 vertex pipeline:
and the internals of each pixel quad:
The real feature of interest is the ability to load and filter 4 texture addresses from a single channel texture map. Textures which describe color generally have four components at every location in the texture, and normally the hardware will load an address from a texture map, split the 4 channels and filter them independently. In cases where single channel textures are used (ATI likes to use the example of a shadow map), the R520 will look up the appropriate address and will filter the single channel (letting the hardware's ability to filter 3 other components go to waste). In what ATI calls it's Fetch4 feature, the R580 is capable of loading 3 other adjacent single channel values from the texture and filtering these at the same time. This effectively loads 4 and filters four times the texture data when working with single channel formats. Traditional color textures, or textures describing vector fields (which make use of more than one channel per position in the texture) will not see any performance improvement, but for some soft shadowing algorithms performance increases could be significant.
That's really the big news in feature changes for this part. The actual meat of the R580 comes in something Tim Allen could get behind with a nice series of manly grunts: More power. More power in the form of a 384 million transistor 90nm chip that can push 12 quads (48 pixels) worth of data around at a blisteringly fast 650MHz. Why build something different when you can just triple the hardware?
To be fair, it's not a straight tripling of everything and it works out to look more like 4 X1600 parts than 3 X1800 parts. The proportions work out to match what we see in the current midrange part: all you need for efficient processing of current games is a three to one ratio of pixel pipelines to render backends or texture units. When the X1000 series initially launched, we did look at the X1800 as a part that had as much crammed into it as possible while the X1600 was a little more balanced. Focusing on pixel horsepower makes more efficient use of texture and render units when processing complex and interesting shader programs. If we see more math going on in a shader program than texture loads, we don't need enough hardware to load a texture every single clock cycle for every pixel when we can cue them up and aggregate requests in order to keep available resources busy more consistently. With texture loads required to hide latency (even going to local video memory isn't instantaneous yet), handling the situation is already handled.
Other than keeping the number of texture and render units the same as the X1800 (giving the X1900 the same ratios of math to texture/fill rate power as the X1600), there isn't much else to say about the new design. Yes, they increased the number of registers in proportion to the increase in pixel power. Yes they increased the width of the dispatch unit to compensate for the added load. Unfortunately, ATI declined allowing us to post the HDL code for their shader pipeline citing some ridiculous notion that their intellectual property has value. But we can forgive them for that.
This handy comparison page will have to do for now.
120 Comments
View All Comments
photoguy99 - Tuesday, January 24, 2006 - link
Why do the editors keep implying the power of cards is "getting ahead" of games when it's actually not even close?- 1600x1200 monitors are pretty affordable
- 8xAA does look better than 4xAA
- It's nice play games with a minimum frame rate of 50-60
Yes these are high end desires, but the X1900XT can't even meet these needs despite it's great power.
Let's face it - the power of cards could double tomorrow and still be put to good use.
mi1stormilst - Tuesday, January 24, 2006 - link
Well said well said my friend...We need to stop being so impressed by so very little. When games look like REAL LIFE does with lots of colors, shading, no jagged edges (unless its from the knife I just plunged into your eye) lol you get the picture.
poohbear - Tuesday, January 24, 2006 - link
technology moves forward at a slower pace then that mates. U expect every vid card to be a 9700pro?! right. there has to be a pace the developers can follow.photoguy99 - Wednesday, January 25, 2006 - link
I think we are agreeing with you -The article authors keep implying they have to struggle to push these cards to their limit because they are getting so powerful so fast.
To your point, I do agree it's moving forward slow - relative to what people can make use of.
For example 90% of Office users can not make use of a faster CPU.
However 90% of gamers could make use of a faster GPU.
So even though GPU performance is doubling faster than CPU performance they should keep it up because we can and will use every ounce of it.
Powermoloch - Tuesday, January 24, 2006 - link
It is great to see that ATi is doing their part right ;)photoguy99 - Tuesday, January 24, 2006 - link
When DX10 is released with vist it seems like this card would be like having SM2.0 - you're behind the curve again.Yea, I know there is always something better around the corner - and I don't recommend waiting if you want a great card now.
But I'm sure some people would like to know.
Spoelie - Thursday, January 26, 2006 - link
Not at all, I do not see DX10 arriving before vista near the end of this year. If it does earlier it will not make any splash whatsoever on game development before that. Even so, you cannot be 'behind' if you're only competitor is still at SM3.0 as well. As far as I can tell, there will be no HARD architectural changes in G71/7900 - they might improve tidbits here and there, like support for AA while doing HDR rendering, but that will be about the full extent of changes.DigitalFreak - Tuesday, January 24, 2006 - link
True, but I'm betting it will be quite a while before we see any DX10 games. I would suspect that the R620/G80 will be DX10 parts.timmiser - Tuesday, January 24, 2006 - link
I expect that Microsoft's Flight Simulator X will be the first DX10 game.hwhacker - Tuesday, January 24, 2006 - link
Question to Derek (or whomever):Perhaps I interpreted something wrong, but is it correct that you're saying X1900 is more of a 12x4 technology (because of fetch4) than the 16x3 we always thought? If so, that would make it A LOT more like Xenos, and perhaps R600, which makes sense, if I recall their ALU setup correctly (Xenos is 16x4, one for stall, so effective 16x3). R520 was 16x1, so...I gotta ask...Does this mean a 16x4 is imminent, or am I just reading the information incorrectly?
If that's true, ATi really did mess with the definition of a pipeline.
I can hear the rumours now...R590 with 16 QUADS, 16 ROPs, 16 TMUs, and 64 pixel processors...Oh yeah, and GDDR4 (on a 80nm process.) You heard it here first. ;)