NVIDIA's nForce2 Part II: Diving Deeper
by Anand Lal Shimpi on October 21, 2002 4:05 PM EST- Posted in
- CPUs
DASP Take Two
On paper one of the most attractive features of the original nForce was its Dynamic Adaptive Speculative Pre-Processor (DASP). We explained the idea behind DASP in our original nForce2 piece:
"As you will remember from our nForce Computer 2001 Preview, NVIDIA's DASP acts much like the hardware prefetch logic found on Pentium 4s and Athlon XP processors. The logic makes educated guesses about future memory accesses based on where in main memory data was recently accessed from as well as how frequently it was accessed in the past. After making these guesses the logic pre-fetches the data it thinks will be requested into its buffer; should the data be required by the CPU then access latency is reduced by tens of nanoseconds by not requiring a memory access. If the data is never requested by the CPU then it will eventually get replaced in the DASP buffer by other pre-fetched data without incurring a performance hit or gain."
Unfortunately with the original nForce, there were a handful of problems with its DASP; the two most significant being that:
1) Prefetches got in the way of "real work", meaning they took bandwidth away from the CPU when it needed it, and
2) The latency reduction resulting from a correctly predicted prefetch wasn't as high as it could have been.
With the 2nd generation DASP that made itself a part of nForce2, NVIDIA learned from their mistakes with the original nForce and improved things considerably.
The nForce2 DASP now has better prefetching intelligence allowing it to correctly predict data that will be used next more correctly. The improvement in prefetch intelligence comes partially from an improvement in the prefetching algorithm that detects when to prefetch data streams.
Even more important to the prefetching intelligence however is that the memory arbiter has an improved algorithm to deal with prefetches and memory requests coming from the CPU. The improvements in the way the memory arbiter handles these two types of memory requests results in nForce2 doing a better job of keeping prefetches away from "real work" (memory requests coming from the CPU).
The second improvement NVIDIA made to their DASP comes from sheer optimization of the nForce2 IGP/SPP's internal datapaths. The result of these datapath level optimizations (based on data collected from nForce performance tests) is a significantly larger reduction in latency when the DASP correctly prefetches data into its cache.
The combination of these two improvements to NVIDIA's DASP results in very competitive performance and in some cases, a significant performance boost when using DualDDR. It is the improved DASP that NVIDIA attributes the extremely large performance gains in SPECviewperf to. But if SPECviewperf were the only situation that the second generation DASP improved performance in it wouldn't be all that useful, so where else will the new DASP increase performance?
Unfortunately most of the situations where DASP and DualDDR will really make a performance difference are difficult to quantify; it's a problem we, as well as NVIDIA, have had a tough time solving. SPECviewperf is a good example of a situation where you're bound by the ability of the memory controller to fulfill requests which is where nForce2's dual memory controllers can come in handy.
The chipset's success in SPECviewperf and NVIDIA's architecture behind it leads us to believe that nForce2 (and its successors) would do very well in workstation and server applications. We're currently putting our theories to test in the server arena but the results were not ready in time for publication in this article; who knows, there may even be a Part III in the works if things turn out right.
What's even more interesting is that NVIDIA's improved DASP would seem to be the perfect companion to Intel's Hyper-Threading technology. In situations where there isn't a lot of locality between concurrently executing threads, the number of memory requests will increase. The more memory requests that exist, the more likely DASP will generate positive results and the potential for the crossbar based memory controllers to shine increases as well. Only time will tell how long it will be before nForce meets Intel outside of the Xbox, but in theory it would be a great fit.
0 Comments
View All Comments