AMD's 65nm Preview Part 2 - The Plot Thickens (Updated with Information from AMD)
by Anand Lal Shimpi on December 21, 2006 12:12 AM EST- Posted in
- CPUs
Brisbane Performance Issues Demystified: Higher Latencies to Blame
As you'll remember from Part 1, for some reason, our 65nm Athlon 64 X2 5000+ performed slower than our 90nm part. We had contacted AMD before publication of the article but didn't receive a response until after we were well underway with Part 2. AMD's explanation for the reduced performance? Higher memory latencies.
We wanted to investigate exactly how much higher, thus we turned to CPU-Z's latency benchmark to give us a quick indication of how things had changed.
CPU | CPU-Z Latency (8192KB, 128-byte) |
AMD Athlon 64 X2 5000+ (65nm) | 122 cycles (46.92 ns) |
AMD Athlon 64 X2 5000+ (90nm) | 121 cycles (46.54 ns) |
A single cycle increase in memory access latency, or 0.4ns, is a slight increase but not enough to cause the sort of performance deltas we saw in Quake 4 and Half Life 2, something else was amiss. Luckily it was another metric that CPU-Z's latency test reported that helped us understand the cause of the poor performance: L2 cache access latency.
CPU | CPU-Z L2 Cache Latency | ScienceMark 2.0 L2 Cache Latency |
AMD Athlon 64 X2 5000+ (65nm) | 20 cycles | 20 cycles |
AMD Athlon 64 X2 5000+ (90nm) | 12 cycles | 12 cycles |
Updated - 1/5/07: Although AMD previously did not mention any issues with our findings, we were contacted today and informed that the latency information both ScienceMark and CPU-Z produced is incorrect. The Brisbane core's L2 latency should be 14 cycles, up from 12 cycles and not 20 cycles. This would help explain the relatively low impact on application performance that we've seen across the board. We are still waiting to hear back from AMD on a handful of other issues regarding Brisbane and will update you as soon as we have more information.
The original K8 core, in both 130nm and 90nm flavors, had a 12-cycle L2 cache. With Brisbane, as reported by both CPU-Z and ScienceMark, 65nm K8 now has a 20-cycle L2 cache. Generally speaking you move to a higher latency cache if you're planning on introducing a larger cache size, but a quick glance at AMD's roadmaps doesn't show anything larger than a 1MB L2 per core for the next year. The argument for higher clock speeds isn't valid either as the highest clock speed on AMD's roadmaps thus far is only 3.2GHz.
Luckily the performance impact of the higher latency L2 cache isn't noticeable in all applications, thanks to the K8's on-die memory controller, but make no mistake - the new core is slower. We couldn't figure out why AMD made the change and with most of our key AMD contacts on vacation due to the holidays, we still have no official response on the matter. Rest assured that if/when we learn more we will let you know.
Updated: AMD has given us the official confirmation that L2 cache latencies have increased, and that it purposefully did so in order to allow for the possibility of moving to larger cache sizes in future parts. AMD stressed that this wasn't a pre-announcement of larger cache parts to come, but rather a preparation should the need be there to move to a vastly larger L2. Thankfully the performance delta isn't huge, at least in the benchmarks that we saw, so AMD's decision isn't too painful - especially as it comes with the benefit of a cooler running core that draws less power; ideally we'd like the best of all worlds but we'll take what we can get. Note that none of AMD's current roadmaps show any larger L2 parts (other than the usual 2x1MB offerings), which tells us one of two things: either AMD has some larger L2 parts that it's planning on releasing or AMD is being completely honest with the public in saying that the larger L2 parts will only be released if necessary.
52 Comments
View All Comments
Spoelie - Thursday, December 21, 2006 - link
This is not the first time this has happened, it may be easy to forget, but do you guys remember the thoroughbred?Thoroughbred A was the first 180nm to 130nm shrink and had a hard time reaching the speeds the mature 180nm cores were getting. It wasn't till AMD added another layer to the core (Thoroughbred B) that we saw the expected speedups from a die shrink.
PetNorth - Thursday, December 21, 2006 - link
Anand:Why don't you set manually the voltage, to know really what's the improvement with 0.65 transition?
1.30v to compare it with 5000+ 90nm, and 1.25v to compare it with 4600+ EE 0.90nm.
It would be a good thing IMO.
yyrkoon - Thursday, December 21, 2006 - link
There are already people who believe that odd numbered multipliers offer worse performance compared to even numbered multipliers. I cant help but wonder why AMD chose to start implementing floating point multipliers now. The first thing that comes to mind, is maybe to refine their pricing ? Although, I've never really noticed much performance (if any) difference using odd vs even numbered multipliers, I can not help but wonder if floating point multipliers will play a factor in performance.Regs - Thursday, December 21, 2006 - link
AMD has been stepping in baby steps in their innovation merits. Ever since the IMC and the enhancements from K7 to K8 it seems like they improve little by little. I hope this gives them a rude awakening to how competitive the market can or could be in future. If they did it before they can do it again.As for the transition to 65nm, it was no surprise that these parts could not over clock very well. The K8 is showing its age and I think there are no more ways you can breathe life back into it especially when Core Duo is out in the market.
mino - Thursday, December 21, 2006 - link
Why awekening, and why rude? The fact is AMD kept PARITY with intel on power AND performance inthe lower end with 90nm!!! part with Intel beeing at 65nm for a year allredy!In other words, When AMD's 90nm process is FAR better that Intel's ever was. Same happened with 130nm. Two words: SOI,APM.
No confusion, all thi means no one should avaluate AMD vs. Intel on process_used base. Simply put, as of now(at stock) Intel rules on perf&power while AMD rules on idle_power and price(up to 4200+/E6300 combo).
IntelUser2000 - Thursday, December 21, 2006 - link
Intel bins Core 2 Duo by power consumption.
xsilver - Thursday, December 21, 2006 - link
just to clarify further; all e6600's will have lower stock voltages than e6400's and all e6400's will have lower stock voltages than e6300's?at both idle and load?
how successful are the conroes at undervolting?
Accord99 - Thursday, December 21, 2006 - link
Pretty good, my week 25 E6600 is stable at 2.6GHz/1.1v (My P5B-dlx doesn't go any lower) with dual-P95. The heat output is easily cooled passively by a Scythe Ninja.Here's a thread, one person has a E6600 that does 2.4@/~1v
http://www.xtremesystems.org/forums/showthread.php...">http://www.xtremesystems.org/forums/showthread.php...
blackbrrd - Thursday, December 21, 2006 - link
I have seen a E6600 running at 1,0v at load... It was obviously very cool running :)My E6400 is running at 1,15v at idle (2133MHz) and 1,25v at load (2133MHz)
Power saving features were off in both instances...
haugland - Thursday, December 21, 2006 - link
AMD win in one aspect...I you really consider power consumption to be important, it is much more important to look at idle power consumption than power consumption at full load. Most business PCs idle a lot of the time, and AMDs CPUs are much better at saving power at idle.
EIST was designed for P4, and for a 3+ GHz P4 it makes sense to drop the multiplier to 6. However when the E6300 normally run at a multiplier of 7, you don't get much of a power saving by dropping the multiplier to 6. AMD C'n'Q allows for much lower settings.