Athlon 64 X2: New Memory Dividers and Multitasking Performance
by Anand Lal Shimpi on August 12, 2005 2:00 PM EST- Posted in
- CPUs
Multitasking Office Performance
Our first test is actually the scripted Multitasking Winstone 2004 test. We’ve used this test in the past, and it serves as an excellent example of relatively light general use multitasking performance. The test consists of three parts, all of which are described below:"This test uses the same applications as the Business Winstone test, but runs some of them in the background. The test has three segments: in the first, files copy in the background while the script runs Microsoft Outlook and Internet Explorer in the foreground. The script waits for both foreground and background tasks to complete before starting the second segment. In that segment, Excel and Word operations run in the foreground while WinZip archives in the background. The script waits for both foreground and background tasks to complete before starting the third segment. In that segment, Norton AntiVirus runs a virus check in the background while Microsoft Excel, Microsoft Project, Microsoft Access, Microsoft PowerPoint, Microsoft FrontPage, and WinZip operations run in the foreground."We’ve been playing around with multitasking performance tests for several months now, and have found that even scripted tests like the Multitasking Winstone test require a lot of work to get to produce repeatable results. The problem mostly boils down to making sure that all of the tasks executing simultaneously do so in the exact same manner, every single time, across all platforms, CPUs and other configuration changes. Honestly, doing so is very difficult and it often requires far more benchmarking runs than we are used to performing for most of our other tests. But at the end of the day, it is possible to get results that do make some sort of sense, and spending a great deal of time with Multitasking Winstone and our own home-brew tests, we have done just that.
Multitasking Winstone | DDR400 | DDR480 | % Improvement |
Test 1 | 2.21 | 2.37 | 7.2% |
Test 2 | 2.94 | 3.05 | 3.7% |
Test 3 | 4.82 | 4.88 | 1.2% |
The first test proved to be the most impressive out of the bunch, showing a 7.2% increase in performance over stock DDR400. Note that a 7.2% performance advantage is greater than what we’d see when going from an Athlon 64 X2 4400+ to a 4800+.
The second test still produced reasonably good results, showing a 3.7% increase in performance. The third and final test shows that not all situations will yield a tangible performance increase.
Although it is a canned benchmark, Multitasking Winstone 2004 gives us a very good idea of what is to come. But in order to truly find out if higher bandwidth memory is worth it for Athlon 64 X2 owners, we turned to some of our own home-brew multitasking benchmarks.
23 Comments
View All Comments
Araemo - Friday, August 12, 2005 - link
I'm curious, does windows XP support NUMA?A quick google on the topic gives me conflicting info.
People seem to think it does, if you manually turn on PAE(Which has its own performance overhead, right?), but MS's website says "NUMA is supported only on Windows Server 2003, Enterprise Edition and Windows Server 2003, Datacenter Edition."
What I've read recently suggests that in the A64 X2 cpus, each core has one memory controller enabled, which suggests that NUMA could be usefull for performance reasons. However, what I read originally when the X2's were coming out was that one core simply had both its memory controllers disabled.. Does anyone know which of these two is correct?
In any case, it sounds like memory latencies to different memory addresses will be different between the cores.
Either one core will always have a higher latency, or each one will have low latency to some addresses and high latency to others.
Starglider - Friday, August 12, 2005 - link
The Athlon64 die contains a dual-channel DDR memory controller, three hypertransport transcievers, one or two processor cores and a crossbar switch that links them all together. Adding an extra processor core to the X2 didn't duplicate any of the other parts, so no there aren't any disabled memory controllers on there. Both cores are connected to the memory controller through the switch, so they have equal access to both channels (which are interleaved anyway when both active). NUMA would not be relevant because the banks aren't independently addressable by the OS and deliver exactly the same bandwidth and latency to both cores anyway. NUMA is only useful if your system has more than one processor socket, i.e. is an Opteron system.Araemo - Friday, August 19, 2005 - link
Thanks for clearing that up for me, but the # of sockets really has nothing to do with it. It is the # of independant memory controllers that matters, and AMD could have placed multiple single-channel controllers on the die if they thought the performance would be improved, but if the memory controller is 'external' to the core(Accessable via HT instead of a more direct link.. not that HT isn't good.), then I guess it doesn't matter. I was thinking the memory controller was part of the same HT node as the CPU core, but the method you described makes more sense anyways. If you have the memory controller logically seperated from the core, it can serve DMA requests from the northbridge/southbridge without bothering the CPU at all, as DMA should be.Diasper - Friday, August 12, 2005 - link
It looks to me like future dual-core games will benefit from the extra bandwidth. The logic for that being using a high-efficient dual-core engine both cores should be demanding as much bandwidth as possible and so consequently, we might see something more akin to the multitasking with Doom3 performance numbers.Either way the numbers should be over the numbers we saw first time when testing dual-core with only a single-core game so say that's 5%+ im[provement at DDR500. Either way I think this information is pretty significant for those going with dual-core processors.
Now where did my high sppeed low latency 1GB sticks go...
Oh yeah and first.
Zebo - Friday, August 12, 2005 - link
I don't know about that. Anand did'nt mention timings. I can only assume they are the same since he did'nt mention them at DDR400 and DDR480 respectivly... Which is faster? Who knows really... My feeling is if he let DDR400 at low latency it's capable of while DDR480 had high latency which it runs you would see neligible differences. Again not enough information...Diasper - Friday, August 12, 2005 - link
That's probably larger correct. I suspect they'll be running a similiar setup to before (http://www.anandtech.com/cpuchipsets/showdoc.aspx?...">http://www.anandtech.com/cpuchipsets/showdoc.aspx?... where they were running 2 x 512MB sticks that could do 2-2-2 timings all the way upto DDR500 or so.But yeah, can we get any clarification on that please - it's appalling that you didn't include your test system criteria although we can probably guess and trust it was done correctly.
Zebo - Friday, August 12, 2005 - link
Yeah that VX stuff is most excellente.. The review I *really* want to see is how well DDR2 667 on M2 competes with say DDR 500 with it's new found low latency.. I have my money on "old tech":PDiasper - Friday, August 12, 2005 - link
Interestingly, looking at the results:For a 20% increase in memory speed we saw upto 10% increase in speed (approx) suggesting that X2 is bandwidth confined at least 10% when running full tilt so you'd be looking to be running at least DDR440 speeds or otherwise be risking lessened performance.
Of course, given the uneveness of memory requests from both processors, I guess we could presume they would benefit with more memory speed although benefits would lessen above a certain speed (eg the guesstimate DDR440) as it is unlikely that you'll typically come across a scenario where both processors are demanding maximum memory bandwidth at the exact same moment.
I guess that's speculation at best - but unless your an engineer that's about all you can do...
Spacecomber - Friday, August 12, 2005 - link
I think we can assume that it is the same set up as with the first article, as the previous poster suggested.From the article:
Space
Diasper - Friday, August 12, 2005 - link
[q}http://www.anandtech.com/cpuchipsets/showdoc.aspx?...">http://www.anandtech.com/cpuchipsets/showdoc.aspx?...Ah, well I probably shouldn't skip over stuff so quickly to get to the results - however why when in the previous test was the memory run at DDR500 now run here is only run at DDR480?
That rather nullifies the comparative significance of the test as the same test wasn't run. :/