The Dark Knight: Intel's Core i7
by Anand Lal Shimpi & Gary Key on November 3, 2008 12:00 AM EST- Posted in
- CPUs
Understanding Nehalem's Memory Architecture
Nehalem does spice things up a bit in the memory department, not only does it have an integrated memory controller (a first for an x86 Intel CPU) but the memory controller in question has an unusual three-channel configuration. All other AMD and Intel systems use dual channel DDR2 or DDR3 memory controllers; with each channel being 64-bits wide, you have to install memory in pairs for peak performance.
With a three-channel DDR3 memory controller, Nehalem requires the use of three DDR3 modules to achieve peak bandwidth - which also means that the memory manufacturers are going to be selling special 3-channel DDR3 kits made specifically for Nehalem. Motherboard makers will be doing one of two things to implement Nehalem's three-channel memory interface on boards; you'll either see boards with four DIMM slots or boards with six:
Four DDR3 slots, three DDR3 channels
In the four-slot configuration the first three slots correspond to the first three channels, the fourth slot is simply sharing one of the memory channels. The downside to this approach is that your memory bandwidth drops to single-channel performance as you start filling up your memory. For example, if you have 4 x 1GB sticks, the first 3GB of memory will be interleaved between the three memory channels and you'll get 25.6GB/s of bandwidth to data stored in the first 3GB. The final 1GB however won't be interleaved and you'll only get 8.5GB/s of bandwidth to it. Despite the unbalanced nature of memory bandwidth in this case, your aggregate bandwidth is still greater in this configuration than a dual-channel setup.
Six DDR3 slots, two slots per DDR3 channel
The more common arrangement will be six DIMM slots where each DDR3 channel is connected to a pair of DIMM slots. In this configuration as long as you install DIMMs in triplicate you'll always get the full 25.6GB/s of memory bandwidth.
That discussion is entirely theoretical however, the real question is: does Nehalem's triple-channel memory controller actually matter or would two channels suffice? I suspect that Hyper Threading simply improved Nehalem's efficiency not necessarily its need for more data. The three-channel memory controller is probably far more important for servers and will be especially useful in the upcoming 8-core version of Nehalem due out sometime next year. To find out we simply benchmarked Nehalem in a handful of applications with a 4GB/dual channel configuration and a 6GB/triple-channel configuration. Note that none of these tests actually used more than 4GB of memory so the size difference doesn't matter, we kept memory timings the same between all tests.
Dual Channel DDR3-1066 (9-9-9-20) | Triple Channel DDR3-1066 (9-9-9-20) | |
Memory Tests - Everest v1547 | ||
Read Bandwidth | 12859 MB/s | 13423 MB/s |
Write Bandwidth | 12410 MB/s | 12401 MB/s |
Copy Bandwidth | 16474 MB/s | 18074 MB/s |
Latency | 37.2 ns | 44.2 ns |
Cinebench R10 (Multi-threaded test) | 18499 | 18458 |
x264 HD Encoding Test (First Pass / Second Pass) | 83.8 fps / 30.3 fps | 85.3 fps / 30.3 fps |
WinRAR 3.80 - 602MB Folder | 118 seconds | 117 seconds |
PCMark Vantage | 7438 | 7490 |
Vantage - Memories | 6753 | 6712 |
Vantage - TV and Movies | 5601 | 5637 |
Vantage - Gaming | 10202 | 9849 |
Vantage - Music | 5378 | 4593 |
Vantage - Communications | 6671 | 6422 |
Vantage - Productivity | 7589 | 7676 |
WinRAR (Built in Benchmark) | 3283 | 3306 |
Nero Recode - Office Space - 7.55GB | 131 seconds | 130 seconds |
SuperPI - 32M (mins:seconds) | 11:55 | 11:52 |
Far Cry 2 - Ranch Medium (1680 x 1050) | 62.1 fps | 62.4 fps |
Age of Conan - 1680 x 1050 | 51.5 fps | 51.1 fps |
Company of Heroes - 1680 x 1050 | 136.6 fps | 133.6 fps |
At DDR3-1066 speeds we found no real performance difference between the Core i7-965 running in two channel vs. three channel mode, the added bandwidth is simply not useful for most desktop applications. For some reason we were able to get better latency scores on the dual-channel configuration, but there's a good chance that may be due to the early nature of BIOSes on these boards. In benchmarks were the latency difference was noticeable we saw the dual-channel configuration pull ahead slightly, then in other tests where the added bandwidth helped we saw the triple-channel configuration do better. Honestly, it's mostly a wash between the two.
Our recommendation would be to stick with three channels, but if you have existing memory and can't populate the third channel yet it's not a huge deal, really, two is fine here for the time being.
73 Comments
View All Comments
Jingato - Monday, November 3, 2008 - link
If the 920 can easily be overclocked to 3.8Ghz on air, what intensive is there to purchase the 965 for more that triple the price?TantrumusMaximus - Monday, November 3, 2008 - link
I don't understand why the tests were on such low resolutions... most gamers are running higher res than 1280x1024 etc etc....What gives?
daniyarm - Monday, November 3, 2008 - link
Because if they ran gaming benchmarks at higher res, the difference in FPS would be hardly visible and you wouldn't go out and buy a new CPU.If they are going to show differences between Intel and AMD CPUs, show Nehalem at 3.2 GHz vs 9950 OC to 3.2 GHz so we can see clock for clock differences in performance and power.
npp - Monday, November 3, 2008 - link
9950 consumes about 30W more at idle than the 965XE, and 30W less under load. I guess that OC'ing it to 3,2Ghz will need more than 30W... Given that the 965 can process 4 more threads, I think the result should be more or less clear.tim851 - Monday, November 3, 2008 - link
Higher resolutions stress the GPU more and it will become a bottleneck. Since the article was focussing on CPU power and not GPU power they were lowering the resolution enough to effectively take the GPU out of the picture.Caveman - Monday, November 3, 2008 - link
It would be nice to see these CPU reviews use relevant "gaming" benchmarks. It would be good to see the results with something like MS flight simulator FSX or DCS Black Shark, etc... The flight simulators these days are BOTH graphically and calculation intensive, but really stress the CPU.AssBall - Monday, November 3, 2008 - link
No, they don't, actually.philosofool - Monday, November 3, 2008 - link
It would have been nice to see a proper comparison of power consumption. Given all of Intel's boast about being able to shut off cores to save power, I'd like to see some figures about exact savings.nowayout99 - Monday, November 3, 2008 - link
Ditto, I was wondering about power too.Anand Lal Shimpi - Monday, November 3, 2008 - link
Soon, soon my friend :)-A