Intel's Pentium Extreme Edition 955: 65nm, 4 threads and 376M transistors
by Anand Lal Shimpi on December 30, 2005 11:36 AM EST- Posted in
- CPUs
Literally Dual Core
One of the major changes with Presler is that unlike Smithfield, the two cores are not a part of the same piece of silicon. Instead, you actually have a single chip with two separate die on it. By splitting the die in two, Intel can reduce total failure rates and even be far more flexible with their manufacturing (since one Presler chip is nothing more than two Cedar Mill cores on a single package).
In order to find out if there was an appreciable increase in core-to-core communication latency, we used a tool called Cache2Cache, which Johan first used in his series on multi-core processors. Johan's description of the utility follows:
Not only did we not find an increase in latency between the two cores on Presler, communication actually occurs faster than on Smithfield. We made sure that it had nothing to do with the faster FSB by clocking the chip at 2.8GHz with an 800MHz FSB and repeated the tests only to find consistent results.
We're not sure why, but core-to-core communication is faster on Presler than on Smithfield. That being said, a difference of less than 9ns just isn't going to be noticeable in the real world - given that we've already seen that the Athlon 64 X2's 100ns latency doesn't really help it scale better when going from one to two cores.
One of the major changes with Presler is that unlike Smithfield, the two cores are not a part of the same piece of silicon. Instead, you actually have a single chip with two separate die on it. By splitting the die in two, Intel can reduce total failure rates and even be far more flexible with their manufacturing (since one Presler chip is nothing more than two Cedar Mill cores on a single package).
The chip at the bottom of the image is Presler; note the two individual cores.
In order to find out if there was an appreciable increase in core-to-core communication latency, we used a tool called Cache2Cache, which Johan first used in his series on multi-core processors. Johan's description of the utility follows:
"Michael S. started this extremely interesting thread at the Ace's hardware Technical forum. The result was a little program coded by Michael S. himself, which could measure the latency of cache-to-cache data transfer between two cores or CPUs. In his own words: "it is a tool for comparison of the relative merits of different dual-cores."Armed with Cache2Cache, we looked at the added latency seen by Presler over Smithfield:
"Cache2Cache measures the propagation time from a store by one processor to a load by the other processor. The results that we publish are approximately twice the propagation time. For those interested, the source code is available here."
Cache2Cache Latency in ns (Lower is Better) | |
AMD Athlon 64 X2 4800+ | 101 |
Intel Smithfield 2.8GHz | 253.1 |
Intel Presler 2.8GHz | 244.2 |
Not only did we not find an increase in latency between the two cores on Presler, communication actually occurs faster than on Smithfield. We made sure that it had nothing to do with the faster FSB by clocking the chip at 2.8GHz with an 800MHz FSB and repeated the tests only to find consistent results.
We're not sure why, but core-to-core communication is faster on Presler than on Smithfield. That being said, a difference of less than 9ns just isn't going to be noticeable in the real world - given that we've already seen that the Athlon 64 X2's 100ns latency doesn't really help it scale better when going from one to two cores.
84 Comments
View All Comments
Betwon - Saturday, December 31, 2005 - link
NO.Don't You think that Future versions of the patch will be written by intel.
Viditor - Saturday, December 31, 2005 - link
Doubtful (but who knows)...I can't see Intel spending 100s of millions with every developer (or even 1 developer) for the long term, just to keep tweaking their patches. It's just not a very smart long term strategy (and Intel is quite smart).
Betwon - Saturday, December 31, 2005 - link
You just guess it.We find that the good quality codes can provide better performance for both AMD and Intel.
Intel can often benefit more, because the performance potential of Intel is high.
Now, You can not find another SMP-game which can make fps of SMP CPU improve so much great.
If you find it, please tell us.
There is no one who found it.
Viditor - Saturday, December 31, 2005 - link
Now it's you who's guessing...
Betwon - Saturday, December 31, 2005 - link
NO.It is true.
Viditor - Saturday, December 31, 2005 - link
OK...prove it!
Betwon - Saturday, December 31, 2005 - link
For example:we saw a test(from anandtech)
With the good quality codes, AMD become faster than before, but Intel become much faster than before.
They use Intel's compiler.
Betwon - Saturday, December 31, 2005 - link
When not use the intel's compiler, AMD become slow.Viditor - Saturday, December 31, 2005 - link
I know you've often quoted from the spec.org site...
I suggest you revisit there and look at the difference between AMD systems using Intel compilers and the PathScale or Sun compilers. In general, the Spec scores for AMD improve by as much as 30% when not using an Intel compiler...especially in FP.
http://www.swallowtail.org/naughty-intel.html">http://www.swallowtail.org/naughty-intel.html
defter - Saturday, December 31, 2005 - link
This is not true, for example:
FX-57, Intel compiler, SpecInt base 1862:
http://www.spec.org/osg/cpu2000/results/res2005q2/...">http://www.spec.org/osg/cpu2000/results/res2005q2/...
FX-57, Pathscale compiler, 1745: http://www.spec.org/osg/cpu2000/results/res2005q2/...">http://www.spec.org/osg/cpu2000/results/res2005q2/...
Opteron 2.8GHz, Intel compiler, SpecInt base 1837: http://www.spec.org/osg/cpu2000/results/res2005q3/...">http://www.spec.org/osg/cpu2000/results/res2005q3/...
Opteron 2.8GHz, Sun compiler, SpecInt base 1660: http://www.spec.org/osg/cpu2000/results/res2005q4/...">http://www.spec.org/osg/cpu2000/results/res2005q4/...
In SpecFP Intel compiler produces slightly slower results, but the difference isn't 30%:
Opteron 2.8GHz (HP hardware), Intel compiler, SpecFP base 1805: http://www.spec.org/osg/cpu2000/results/res2005q3/...">http://www.spec.org/osg/cpu2000/results/res2005q3/...
Opteron 2.8GHz (HP hardware), Pathscale compiler, SpecFP base 2052: http://www.spec.org/osg/cpu2000/results/res2005q3/...">http://www.spec.org/osg/cpu2000/results/res2005q3/...
Opteron 2.8GHz (Sun hardware), Sun compiler, SpecFP base 2132: http://www.spec.org/osg/cpu2000/results/res2005q4/...">http://www.spec.org/osg/cpu2000/results/res2005q4/...
So let's see:
Intel vs Sun compiler:
- Intel complier is 10.7% faster in SpecINT
- Sun compiler is 18.1% faster in SpecFP
Intel vs Pathscale compiler:
- Intel compiler is 6.7% faster in SpecInt
- Pathscale compiler is 13.7% faster is SpecFP
It is quite suprising that Intel's compiler gives best results for AMD's processors in many situations.