AMD Phenom Preview: Barcelona Desktop Benchmarks
by Anand Lal Shimpi on September 10, 2007 12:03 AM EST- Posted in
- CPUs
The Methodology
We've been asking AMD for months now to let us benchmark Barcelona and Phenom, and for months we've gotten the same answer: not yet. When asked why, AMD would always give us some terrible lie about how it was for competitive reasons, but when we actually put our hands on Barcelona at Computex we realized that these chips were simply not ready.
The Barcelona launch is finally upon us and we've got to wait another 45 - 60 days before we'll be able to bring you a review of Phenom, well, not exactly. Back when the Opteron launched, AMD was in a very similar situation to the one it's in today; AMD needed K8 to remain competitive, and it had been delayed so much that we were beginning to wonder if AMD would ever get the chip out on time. When the K8 finally launched, it was server-only but we took one of those server-only motherboards and ran a bunch of desktop tests on it to predict forthcoming performance.
We cracked open the Barcelona server and made some modifications; while the on-board ATI ES1000 graphics is sufficient for use as a server, it'd be too limiting for our desktop benchmarks. Luckily the Supermicro motherboard in the system had a plethora of PCIe slots, we just needed to gain access to them.
The PCIe Riser we removed from the system
We pulled out the PCIe riser card which plugs into the motherboard's sole x16 slot and divides it into a pair of x8s, then we modified a GeForce 8800 GTX by removing the backplate cover so we could just stick it into the open server.
The modded 8800
The 8800 GTX installed, the server is not really intended to be used like this
The end result was, as Johan put it, us using "such a beautiful, noble machine for such plebian activities". We couldn't help it, while AMD has already contacted us about Phenom briefings, we couldn't wait that long to get an idea of what we can expect from AMD on the desktop.
We needed an external PSU to power the graphics card, the server didn't have any PCIe power connectors
Barcelona is currently limited to DDR2-667, we were unsuccessful with attempts to run the memory any faster. Like all other MP Opterons, Barcelona requires the use of registered DDR2 memory, which is inherently slower than the unbuffered stuff we use on desktops. Because of these limitations we refrained from running any comparative benchmarks to desktop Athlon 64 X2s, instead we chose to run a single quad-core Opteron in our server platform against a pair of dual-core Opterons to simulate Phenom vs. K8 on the desktop.
The Opteron server, 2 CPUs, 8 DDR2-667 DIMMs
Keep this in mind as you're looking at these results, at best all we're offering is an idea of, at a minimum, how much faster Phenom will be over an identically clocked Athlon 64 X2. As Phenom is a more data hungry CPU than its predecessor, it will rely more on having a faster memory subsystem so the performance improvement could be even greater when we measure it on the desktop. That being said, at least we can set expectations within some amount of reason by performing this investigation.
70 Comments
View All Comments
MadBoris - Monday, September 10, 2007 - link
hmm, especially if it is only @cas5, as mentioned above.It will be interesting to see if it yields anything more than just a few percent, as to scaling, and if benefits compound per socket.
As to one socket and 4 cores I don't really envision it being that much more than a few percent, but then again, I'm not investing any thought or speculation to try and figure out what will be answered when it actually matters and HW is available.
Major point for me is, being able to OC a q6600($280) to 3.2GHz - 3.4GHz on air is going to be real stiff competition for AMD's Phenom, as to my purchasing decisions, which is all I am concerned about mainly.
Also I believe all peoples talk about "true" quad is going to fall a bit flat for the majority of applications/games in real world comparisons with Kentsfield. Because already anyone that is interested to research it can see that the cache/bus penalties in scaling from 2 to 4 cores is basically nonexistent on applications that actually 'fully leverage' all 4 cores. Some apps will benefit, but I expect this to come to light before long and people will see that the penalty of 2 cores in one (Intel Quad), was more speculation, than actual reality, for 'most' consumer applications and games.
I do like AMD's advances but we seriously need more frequency, CPI cannot be overlooked.
duploxxx - Monday, September 10, 2007 - link
if AMD is already able to show multiple phenom systems on 3.0GHZ without dditional cooling (just boxed heatpipe cooler) then i wouldn't be too worried about oc performance of k10ilkhan - Monday, September 10, 2007 - link
15% over K8 is not going to be enough if it launches at (or at least doesn't overclock easily to) 3.2Ghz. At the 2.5 indicated here, yorkfield@3.2+ is going to eat agena for lunch, while being more profitable for intel than agena can hope to be for AMD.Pity.
JackPack - Monday, September 10, 2007 - link
Based on these numbers, consumers are likely going to stick with Intel quads.Clock for clock, Kentsfield was often >30% faster than Quad FX. Barcelona being 15% faster than K8 is reasonable but it's clearly not going to touch Penryn/Yorkfield.
duploxxx - Monday, September 10, 2007 - link
Alltough it is nice to see what anand tried to put here on electronic paper. I can't be compared to the real phenom in a few months.If you want to know why, check Anand's memory review of a year a go and check how well k8 and also k10 is scaling with better/faster memory.
in a barcelona rig you have reg 667@cas5.
so people who are already making conclusions on these benches, one reply: too early.
JackPack - Monday, September 10, 2007 - link
This isn't K8 though. The L3 in Barcelona is going to make it less sensitive to memory bandwidth and latency.Regs - Monday, September 10, 2007 - link
Memory hits and misses (latency) have nothing to do with the L3. The L3 is there as a buffer for the information being proportion to the 4 cores.JackPack - Monday, September 10, 2007 - link
Look up the term "memory hierarchy."Regs - Tuesday, September 11, 2007 - link
what do you think pulls the data into the L3? God?JackPack - Tuesday, September 11, 2007 - link
It's called prefetching. The data is in the L3 before the CPU needs it, reducing memory traffic and latency.Not only that, but Barcelona has a L3 latency of 20ns. To get data from the main memory, it has to go through all levels of cache. When you look at the cumulative latency of the memory hierarchy, the one or two cycle penalty of RDDR2 is trivial.