ATI's Radeon 8500 & 7500: A Preview
by Anand Lal Shimpi on August 14, 2001 2:54 AM EST- Posted in
- GPUs
The Chip
The R200 is a 0.15-micron chip that consists of 60 million transistors. Just as NVIDIA claimed during the introduction of the GeForce3 (57 million transistors), ATI is quick to point out that this makes the chip more "complex" than a Pentium III processor. To satisfy the CPU designers out there, just because a chip is made up of millions of transistors doesn't mean that it's more complex or more powerful than another. Remember that cache is one of the most transistor hungry things you can put on a CPU, but that doesn't necessarily mean that a CPU with a lot of cache is very powerful. Needless to say, there is a reason why the R200 features exactly twice as many transistors as the Rage6C/R100 core it is replacing, but we'll get to that later.
The 0.15-micron core is clocked at 250MHz, a full 37% higher than the 183MHz clock of its 0.18-micron predecessor. Unlike the R100 core, the R200 features four rendering pipelines (instead of two), giving it a 1 Gigapixel/s fill rate vs. the 366 Megapixels/s fill rate of the R100.
One of the major "features" ATI touted with the original Radeon was its three texture units per pipeline. This meant that the original Radeon could apply three textures to a single pixel in a single pass as well as a single clock cycle. However, a lack of developer support for this took a lot of the wind out of the feature. The only game we ever tested that could take advantage of this feature was Serious Sam, courtesy of Croteam's eagerness to implement as many features in their engine as possible. Because of this, the R200 only has two texture units per pipeline, which is identical to what the GeForce3 offers.
To clear up the confusion, the R200 cannot apply six textures in a single clock; it doesn't have six texture units. It can, however, apply six textures in a single pass, which will serve the R200 very well in future games. To quote John Carmack on the upcoming Doom 3 game:
"The standard lighting model in DOOM, with all features enabled, but no custom shaders, takes five passes on a GF1/2 or Radeon, either two or three passes on a GF3, and should be possible in a clear + single pass on ATI's new part."
Being able to apply more textures in a single pass is much more important in this case than being able to apply three textures per pixel every clock. This is akin to the transition from being able to render one texture per pass and two per pass in the earlier days of 3D acceleration.
Although fill rate numbers are very misleading, the end result of this is a 2 Gigatexel/s fill rate for the R200 compared to a ~1.1 Gigatexel/s fill rate for the original Radeon. The reason that these numbers are misleading is because memory bandwidth limitations almost always prevent the cards from reaching these fill rates.
In order to cope with the increased fill rate abilities of the chip, ATI had to increase the memory clock as well. Again, contrary to popular belief the R200 does not implement a drastically different memory controller from the original Radeon. The chip still implements a single channel 128-bit DDR memory bus (some of ATI's leaked presentations were misleading in their listing of a dual channel 128-bit memory bus). This is virtually identical to the 128-bit memory controller in the original Radeon in that it does not make use of any crossbar-like features like the GeForce3. The only real difference is that the fetch size has been increased from 128 bits to 256 bits. Since the controller interfaces with DDR SDRAM, 256 bits of data are transferred every clock.
After much internal testing, ATI realized that they had overestimated the need for granularity in memory accesses on the original Radeon. Only fetching 128 bits of data at a time actually offered lower performance than a single 256-bit request. ATI attributes this to the nature of their pixel cache; unfortunately, we could not find out much information about it. This is the exact opposite of NVIDIA's GeForce3, which benefits from smaller memory accesses (32-bit accesses across each of the four independent memory controllers). The only likely conclusion here is that the GeForce3 has a smaller pixel cache or one that is better suited for smaller data fetches. This just goes to show you how different the two architectures are. The memory controller runs the memory at 275MHz DDR, which offers 8.8GB/s of peak memory bandwidth.
An important upgrade that the R200 gets is an improved implementation of ATI's bandwidth saving technology: HyperZ. We'll talk about this later.
The second chip being announced today is the RV200. In spite of the name, you should think of the RV200 as a 0.15-micron Radeon because, essentially, that's what it is. The RV200 has the same features as the original Radeon with two changes: the memory controller from the R200 and the display engine from the RV100 (Radeon VE). The memory controller from the R200 gives the RV200 the 256-bit memory accesses and nothing more -- it's still a 128-bit wide DDR memory interface. The RV100's display engine gives the RV200 HydraVision support, which is ATI's dual display solution. This is actually also present on the R200 core.
The 0.15-micron manufacturing process gives the RV200 a bit of a headroom advantage over the original Radeon. Instead of running at a synchronous 183/183MHz DDR (core/memory), the RV200 operates at 270/230MHz DDR (core/memory). If you remember the Radeon SE that you heard about all over the web, basically, this is what the RV200 is except under a different card name.
0 Comments
View All Comments