The Dark Knight: Intel's Core i7
by Anand Lal Shimpi & Gary Key on November 3, 2008 12:00 AM EST- Posted in
- CPUs
Nehalem.
Nuh - hay - lem
At least that's how Intel PR pronounces it.
I've been racking my brain for the past month on how best to review this thing, what angle to take, it's tough. You see, with Conroe the approach was simple: the Pentium 4 was terrible, AMD proudly wore its crown and Intel came in and turned everyone's world upside down. With Nehalem, the world is fine, it doesn't need fixing. AMD's pricing is quite competitive, Intel's performance is solid, power consumption isn't getting out of control...things are nice.
But we've got that pesky tick-tock cadence and things have to change for the sake of change (or more accurately, technological advancement, I swear I'm not getting cynical in my old age):
2008, that's us, that's Nehalem.
Could Nehalem ever be good enough? It's the first tock after Conroe, that's like going on stage after the late Richard Pryor, it's not an enviable position to be in. Inevitably Nehalem won't have the same impact that Conroe did, but what could Intel possibly bring to the table that it hasn't already?
Let's go ahead and get started, this is going to be interesting...
Nehalem's Architecture - A Recap
I spent 15 pages and thousands of words explaining Intel's Nehalem architecture in detail already, but what I'm going to try and do now is summarize that in a page. If you want greater detail please consult the original article, but here are the cliff's notes.
Nehalem
Nehalem, as I've mentioned countless times before, is a "tock" processor in Intel's tick-tock cadence. That means it's a new microarchitecture but based on an existing manufacturing process, in this case 45nm.
A quad-core Nehalem is made up of 731M transistors, down from 820M in Yorkfield, the current quad-core Core 2s based on the Penryn microarchitecture. The die size has gone up however, from 214 mm^2 to 263 mm^2. That's fewer transistors but less densely packed ones, part of this is due to a reduction in cache size and part of it is due to a fundamental rearchitecting of the microprocessor.
Nehalem is Intel's first "native" quad-core design, meaning that all four cores are a part of one large, monolithic die. Each core has its own L1 and L2 caches, and all four sit behind a large 8MB L3 cache. The L1 cache remains unchanged from Penryn (the current 45nm Core 2 architecture), although it is slower at 4 cycles vs. 3. The L2 cache gets a little faster but also gets a lot smaller at 256KB per core, whereas the lowest end Penryns split 3MB of L2 among two cores. The L3 cache is a new addition and serves as a common pool that all four cores can access, which will really help in cache intensive multithreaded applications (such as those you'd encounter in a server). Nehalem also gets a three-channel, on-die DDR3 memory controller, if you haven't heard by now.
At the core level, everything gets deeper in Nehalem. The CPU is just as wide as before and the pipeline stages haven't changed, but the reservation station, load and store buffers and OoO scheduling window all got bigger. Peak execution power hasn't gone up, but Nehalem should be much more efficient at using its resources than any Core microarchitecture before it.
Once again to address the server space Nehalem increases the size of its TLBs and adds a new 2nd level unified TLB. Branch prediction is also improved, but primarily for database applications.
Hyper Threading is back in its typical 2-way fashion, so a single quad-core Nehalem can work on 8 threads at once. Here we have yet another example of Nehalem making more efficient use of the execution resources rather than simply throwing more transistors at the problem. With Penryn Intel hit nearly 1 billion transistors for a desktop quad-core chip, clearly Nehalem was an attempt to both address the server market and make more efficient use of those transistors before the next big jump and crossing the billion transistor mark.
73 Comments
View All Comments
Jingato - Monday, November 3, 2008 - link
If the 920 can easily be overclocked to 3.8Ghz on air, what intensive is there to purchase the 965 for more that triple the price?TantrumusMaximus - Monday, November 3, 2008 - link
I don't understand why the tests were on such low resolutions... most gamers are running higher res than 1280x1024 etc etc....What gives?
daniyarm - Monday, November 3, 2008 - link
Because if they ran gaming benchmarks at higher res, the difference in FPS would be hardly visible and you wouldn't go out and buy a new CPU.If they are going to show differences between Intel and AMD CPUs, show Nehalem at 3.2 GHz vs 9950 OC to 3.2 GHz so we can see clock for clock differences in performance and power.
npp - Monday, November 3, 2008 - link
9950 consumes about 30W more at idle than the 965XE, and 30W less under load. I guess that OC'ing it to 3,2Ghz will need more than 30W... Given that the 965 can process 4 more threads, I think the result should be more or less clear.tim851 - Monday, November 3, 2008 - link
Higher resolutions stress the GPU more and it will become a bottleneck. Since the article was focussing on CPU power and not GPU power they were lowering the resolution enough to effectively take the GPU out of the picture.Caveman - Monday, November 3, 2008 - link
It would be nice to see these CPU reviews use relevant "gaming" benchmarks. It would be good to see the results with something like MS flight simulator FSX or DCS Black Shark, etc... The flight simulators these days are BOTH graphically and calculation intensive, but really stress the CPU.AssBall - Monday, November 3, 2008 - link
No, they don't, actually.philosofool - Monday, November 3, 2008 - link
It would have been nice to see a proper comparison of power consumption. Given all of Intel's boast about being able to shut off cores to save power, I'd like to see some figures about exact savings.nowayout99 - Monday, November 3, 2008 - link
Ditto, I was wondering about power too.Anand Lal Shimpi - Monday, November 3, 2008 - link
Soon, soon my friend :)-A