The Last Bout of ‘03 – NVIDIA’s GeForce FX 5700 Ultra
by Derek Wilson on October 23, 2003 9:30 AM EST- Posted in
- GPUs
Architecture
There was a great deal of talk about why architectural decisions were made, but we will concern ourselves more with what exists rather than why this path was chosen. Every architecture will have its advantages and disadvantages, but understanding what lies beneath is a necessary part of the equation for developers to create efficient code for any architecture.
The first thing of note is NVIDIA's confirmation that 3dcenter.de did a very good job of wading through the patents that cover the NV3x architecture. We will be going into the block diagram of the shader/texture core in this description, but we won't be able to take quite as technical a look at the architecture as 3dcenter. Right now, we are more interested in bringing you the scoop on how the NV36 gets its speed.
For our architecture coverage, we will jump right into the block diagram of the Shader/Texture core on NV35:
As we can see from this diagram, the architecture is very complex. The shader/texture core works by operating on "quads" at a time (in a SIMD manner). These quads enter the pipeline via the gatekeeper which handles managing which ones need to go through the pipe next. This includes quads that have come back for a second pass through the shader.
What happens in the center of this pipeline is dependent upon the shader code running or the texturing operations being done on the current set of quads. There are a certain few restrictions on what can be going on in here that go beyond simply the precision of the data. For instance, NV35 has a max of 32 registers (less if higher precision is used), the core texture unit is able to put (at most) two textures on a quad every clock cycle, the shader and combiners cannot all read the same register at the same time, along with limits on the number of triangles and quads that can be in flight at a time. These things have made it necessary for developers to pay more attention to what they are doing with their code than just writing code that produces the desired mathematic result. Of course, NVIDIA is going to try to make this less of a task through their compiler technology (which we will get to in a second).
Let us examine why the 5700 Ultra is able to pull out the performance increases we will be exploring shortly. Looking in the combiner stage of the block diagram, we can see that we are able to either have two combiners per clock or complete two math operations per clock. This was the same as NV31, with a very important exception: pre-NV35 architectures implement the combiner in fx12 (12 bit integer), NV35, NV36, and NV38 all have combiners that operate in full fp32 precision mode. This allows two more floating point operations to be done per clock cycle and is a very large factor in the increase in performance we have seen when we step up from NV30 to NV35 and from NV31 to NV36. In the end, the 5700 Ultra is a reflection of the performance delta between NV30 and NV38 for the midrange cards.
If you want to take a deeper look at this technology, the previously mentioned 3dcenter article is a good place to start. From here, we will touch on NVIDIA's Unified Compiler technology and explain how NVIDIA plans on making code run as efficiently as possible on their hardware with less hand optimization.
114 Comments
View All Comments
Anonymous User - Friday, October 24, 2003 - link
these anonymous forusm are always a hoot.Anonymous User - Friday, October 24, 2003 - link
Derek takes it in the pooperAnonymous User - Friday, October 24, 2003 - link
#62 making 60k a year is still below the threshhold of being able to spend money on whatever you want and not giving a f&5k....if you made 1mil a year I highly doubt you wouldn't drop the $500 on the best card without thinking twice. So don't call other's dumb for buying video cards...maybe that's how they want to spend their money....If you saved some trips to the "Blue Oyster" I'm sure you'd have a $500 card as well.Anonymous User - Friday, October 24, 2003 - link
The message is damn clear, nvidia is using DDR2 memory to fill in the performance gaps.. Nvidia shuckhs!Anonymous User - Friday, October 24, 2003 - link
doesnt anon mean something in french?Live - Friday, October 24, 2003 - link
Anon postings should be disabled. If people dont have the energy to register the energy awarded to there post is likely to be the same minimal amount.Anonymous User - Friday, October 24, 2003 - link
#64, that makes perfect sense, just don't visit AnandTech. After all, it's not like you've just given them a page impression. lolSeriously, AnandTech will never lose readers or respect as long as they keep doing what they're doing. The critics here that break down every minute detail about what this review did "wrong" aren't gamers. If they were, they would realize that the IQ "differences" are so minuscule it's like trying to argue that nForce2 is incredibly faster than KT600, when the reality is that nForce2's attractiveness comes from its superior sound (APU), overclockability, and stability, most certainly not its “earth shattering” performance. nForce2’s better performance is simply a bonus to any half-intelligent hardware enthusiast, not its main selling point.
Anonymous User - Friday, October 24, 2003 - link
watchu' talkin'bout willis?!Anonymous User - Friday, October 24, 2003 - link
Look, some of us see that these reviews seem to no longer reflect reality. What to do? Quit visiting the site, quit giving AT page impressions. Find reviews elsewhere; god knows there are enough other hardware sites to choose from.Anonymous User - Friday, October 24, 2003 - link
stop crying about the IQ. as #62 said "ESPECIALLY fps games where constant movement makes it almost impossible to notice the IQ differences". i would add - the difference between fx5950u and radeon 9800XT.i spent about 1/3 of the last 10 years playing games. i can call myself a GAMER. i want to play my games at at least 55-60 FPS and nothing else matters. i got radeon 9600pro. that's what i can affort. if fx5600u was faster i would've got it instead. brand doesn't matter if i got 60FPS at 1024x768.