Intel Pentium 4 6xx and 3.73EE: Favoring Features Over Performance
by Anand Lal Shimpi & Derek Wilson on February 21, 2005 6:15 AM EST- Posted in
- CPUs
Twice the Cache - 17% Higher Latency
Both the Pentium 4 6xx and the new Extreme Edition share the same core, meaning they also have the same L2 cache. When Intel first launched Prescott we noticed that in the move to the new architecture that cache latencies went up tremendously. The increase in cache latencies was to be expected, as one tradeoff of a larger cache is that it takings longer to find and access data. So when we heard that Intel was moving to a 2MB L2 cache with the 6xx series, we wondered how much slower the cache would get.
First we wanted to confirm that L1 cache latencies stayed the same, and they did at 4 cycles for the new Prescott 2M based core:
Cachemem L1 Latency | ScienceMark L1 Latency | |
AMD Athlon 64 | 3 cycles | 3 cycles |
Intel Pentium 4 (Northwood) | 1 cycle | 2 cycles |
Intel Pentium 4 (Prescott) | 4 cycles | 4 cycles |
Intel Pentium 4 (Prescott 2M) | 4 cycles | 4 cycles |
Intel Pentium M | 3 cycles | 3 cycles |
Next up, was L2 cache latency. In our review of the Pentium M processor on the desktop we discovered that its 10 cycle L2 cache was responsible for its solid performance in non "media rich" applications (e.g. office applications, OS performance). The original Prescott had a 23 cycle L2 cache, and with a 2MB cache the latency has gone up to 27 cycles:
Cachemem L2 Latency | ScienceMark L2 Latency | |
AMD Athlon 64 | 17 cycles | 18 cycles |
Intel Pentium 4 (Northwood) | 16 cycles | 16 cycles |
Intel Pentium 4 (Prescott) | 23 cycles | 23 cycles |
Intel Pentium 4 (Prescott 2M) | 27 cycles | 27 cycles |
Intel Pentium M | 10 cycles | 10 cycles |
While we're talking about "only" 4 cycles, at 3.6GHz that's 17% longer to access data from L2 cache. Given Prescott's extremely lengthy pipeline, a 17% increase in L2 cache latency is not going to help minimize the downsides of such a long pipeline. Also keep in mind that the only architectural change here is a larger L2 cache, so none of the normal tricks to help hide memory latencies are expanded upon in the new Pentium 4.
What Intel is counting on is that the increase in hit rate provided by a 100% larger cache will outshine the 17% longer access to L2 cache. Did Intel make the right bet? In order to find out we took the new Pentium 4 660 (3.6GHz - 2MB L2) and compared it to the old Pentium 4 560 (3.6GHz - 1MB L2), with all other variables the same, let's see how much of an impact the extra megabyte of cache has in the real world.
In the business category, we see the added cache paying off a little. SYSMark shows good improvement in the document creation portion of its tests, while the Business Winstone makes some very good gains. Worldbench shows web browsing with Mozilla to have improved a good bit while our compression test and the ACDSee test show a loss in performance. These losses generally indicate areas where the test is more dependant on latency than cache hit rate. On the content creation side, adding Windows Media Encoder to the Mozilla test improves performance more than the individual Mozilla test. This is likely due to the fact that the large cache keeps Mozilla's data from being kicked out while Windows Media Encoder is working.
On the gaming front, Doom 3 is the only test we saw with any performance improvement. And the only other application to show a significant performance gain is Maya with more than a 43% gain. The huge gain in performance under Maya is likely a result of 1MB of cache being too small to fit models in while 2MB is enough. This seems to be a case where the test is very bandwidth sensitive rather than latency sensitive. Dropping most (if not all) of the data being worked on into the L2 cache offers a program a very large boost in apparent bandwidth.
As we can see, the unfortunate truth for performance on the 600 series is that most consumer data sets can fit into a 1MB cache just fine. The added cache does seem to help with multitasking from our limited investigation of the subject. The more threads that hit memory aggressively, the better chance we have of seeing a benefit from the 2MB cache. This is because less data from each thread will be kicked out of the cache, resulting in fewer pipeline stalls.
Unfortunately, most usage models that are a good fit for the 600 series are server and workstation workloads. Streaming data (using or encoding media), games, and most other consumer applications don't have the lots of big data requirement that can really separate the performance of the 1MB and 2MB parts.
As we've provided this chart and gone through the general impact of the benchmarks on Intel's new 600 line, we won't include analysis on the pages with our benchmark data. For those who are interested in a deeper look at the numbers and performance of all 5 new parts, graphs of each benchmark are included later in this article.
Impact of L2 Cache Size on Performance (1MB vs. 2MB - 3.60GHz) | |||
1MB L2 | 2MB L2 | 2MB Performance Advantage | |
Business/General Use Performance |
|||
Business Winstone 2004 | 21.4 | 24.2 | 13.0% |
SYSMark 2004 - Communication | 137 | 137 | 0.0% |
SYSMark 2004 - Document Creation | 201 | 218 | 8.4% |
SYSMark 2004 - Data Analysis | 184 | 186 | 1.0% |
Microsoft Office XP with SP-2 | 522 | 520 | 0.3% |
Mozilla 1.4 | 459 | 422 | 8.0% |
ACD Systems ACDSee PowerPack 5.0 | 547 | 558 | -2.0% |
Ahead Software Nero Express 6.0.0.3 | 545 | 550 | -0.9% |
WinZip Computing WinZip 8.1 | 412 | 411 | 0.2% |
WinRAR | 479 | 469 | -2.0% |
Multitasking Content Creation Performance |
|||
Content Creation Winstone 2004 | 32.7 | 33.9 | 3.7% |
SYSMark 2004 - 3D Creation | 231 | 231 | 0.0% |
SYSMark 2004 - 2D Creation | 288 | 279 | -3.1% |
SYSMark 2004 - Web Publication | 206 | 203 | -1.0% |
Mozilla and Windows Media Encoder | 676 | 601 | 11.1% |
Video/Photo Creation & Editing |
|||
Adobe Photoshop 7.0.1 | 342 | 342 | 0.0% |
Adobe Premiere 6.5 | 461 | 468 | -1.5% |
Roxio VideoWave Movie Creator 1.5 | 287 | 276 | 3.8% |
Audio/Video Encoding |
|||
MusicMatch Jukebox 7.10 | 484 | 470 | 2.9% |
DivX Encoding | 55.3 | 55.4 | 0.2% |
XviD Encoding | 33.9 | 33.4 | -1.4% |
Microsoft Windows Media Encoder 9.0 | 2.57 | 2.56 | -0.3% |
Gaming |
|||
Doom 3 | 84.6 | 88.6 | 4.7% |
UT2004 | 59.3 | 60.4 | 1.9% |
Wolfenstein: ET | 97.2 | 95.5 | -1.7% |
3D Rendering |
|||
Discreet 3dsmax 5.1 (DX) | 268 | 266 | 0.7% |
Discreet 3dsmax 5.1 (OGL) | 327 | 329 | -0.6% |
SPECapc 3dsmax 6 | 1.64 | 1.62 | -1.1% |
Professional 3D |
|||
SPECviewperf 8 - 3dsmax-03 | 17.04 | 17.11 | 0.4% |
SPECviewperf 8 - catia-01 | 13.87 | 13.57 | -2.2% |
SPECviewperf 8 - light-07 | 14.3 | 13.83 | -3.3% |
SPECviewperf 8 - maya-01 | 13.12 | 18.85 | 43.7% |
SPECviewperf 8 - proe-03 | 16.7 | 16.5 | -1.2% |
SPECviewperf 8 - sw-01 | 13.09 | 13.33 | 1.8% |
SPECviewperf 8 - ugs-04 | 15.31 | 13.82 | -9.7% |
71 Comments
View All Comments
johnsonx - Monday, February 21, 2005 - link
Is there no merit at all in running a few A64 vs P4 6xx benchmarks with the current RC build of XP x64? While I've found too many things I need don't work with XP x64 to use it, I did see that 3dMark03 ran fine. I know 3dMark itself isn't 64-bit, bit it does making heavy use of 64-bit DirectX and graphic driver calls. There must be a few more apps and games that could be called on...Maybe just limit the benchies to two processors, say an A64 3500+ vs. a Pentium 4 650, running the same benchmarks in 32-bit and 64-bit Windows, using just one GeForce 6xxx and one Radeon X8-something.
It'd just be interesting and useful to see which processor runs 64-bit code better, both absolutely and compared to each processor's 32-bit performance.
When the final release version of XP x64 does come out, it may be interesting to have benchmarks from the RC version to see what's improved (though I agree it wouldn't actually be useful in any practical sense).
Or perhaps Anandtech knows something I don't, like the release XP x64 is so close that running benches on the RC would be moot....
SLIM - Monday, February 21, 2005 - link
#30 and 36Hans is right, the 3000 and 3200 cores in the graphs are not available in retail (downclocked 130nm cores) and are meant to show power consumption scaling with speed increases. It's unfortunate that they left out the more interesting comparison (the 130nm 3500+). The only 90nm AMD chip in the power graphs is the 3500+.
coldpower27 - Monday, February 21, 2005 - link
Very strange your the only guys so far that show an increase in power consumption of the P4 6xx Series over the 5xx Series.Regs - Monday, February 21, 2005 - link
Wow, a lot of good comments here. mlittl3, most of the Anandtech's population know that the EE's are just overpriced Northwood's on steroids (Big heads, small balls). And the crayon wax melting comparison made me laugh out loud.I just find it funny Intel is trying to slap on everything but the kitchen sink on these processors to make them more appealing. What's next? Are they going to come with a microwave toaster oven combo? With all do respect to Intel, to add on such features is not an easy thing to do at a engineering level but once again I feel that their marketing team is still running the show.
But what is AMD doing while Intel performs CPR on their Prescott's? All this news on Intel for the past few months left me nostalgic in what AMD is doing behind the scenes. SSE3 was their latest slap-on feature, but as we saw in your recent AMD article it offered little to no performance gain. AMD's next core has to offer lower L1-l2 Cache latencies. This is the only way I see AMD cornering Intel's Cores performance in every application. But im afraid we won't see any such thing until long-horn comes out in a few years. Until now we have to settle for worthless add-ons features for the desk-top consumers while we see both Intel and AMD battle the server market where Intel is mostly threatened.
HardwareD00d - Monday, February 21, 2005 - link
In Soviet Russia, Prescott melts YOU!miketheidiot - Monday, February 21, 2005 - link
why do the 3000 and 3200 have signifigantly higher power consumption than the 3500? I thought all 3200 and 3000 are also built on 90nm soi.RadeonGuy - Monday, February 21, 2005 - link
Even With All the processors haveing 2mb cache they still suck assHans Maulwurf - Monday, February 21, 2005 - link
#30 I think the 3000 and 3200 are not really Winchester cores. Maybe clocked down 130 nm cores.I´m interested in the memory timing of the A64. Is it 1T or 2T? This is an important information, you should always(!) give it the configuration part of reviews.
DerekWilson - Monday, February 21, 2005 - link
One thing to remember about out power tests --We measure power draw at the wall. Power supplies are inefficient and magnify power draw at the wall. Power input to the PSU does not scale proportionally to power output.
Brian23 - Monday, February 21, 2005 - link
I thought that all winchesters were 90nm SOI.