Intel's 90nm Pentium M 755: Dothan Investigated
by Anand Lal Shimpi on July 21, 2004 12:05 AM EST- Posted in
- CPUs
A quick look back at Banias
The core technologies of the Pentium M remain unchanged in Dothan. We've already explained them in great detail but here's a quick recap for those of you who haven't read or don't remember the original article.The Pentium M is characterized by the following 7 design features and principles:
Mid-Length pipeline
The Pentium M has a pipeline that's shorter than that of the Pentium 4 (much shorter than that of Prescott), but longer than that of the Pentium III. Intel needed a longer pipeline to ensure that higher clock speeds would be possible, but shunned the Pentium 4's extremely long pipeline as it is quite a power hog. Although extremely high clock speeds can be wonderful for performance and marketing, they are a nightmare when it comes to power consumption. The longer your pipeline, the harder you have to work to keep that pipeline filled at all times and the bigger the penalty that you pay if the pipeline is ever left idle or has to be flushed (thanks to a mispredicted branch, for example).To this day, Intel has still not disclosed the number of stages in the Pentium M pipeline out of an extreme desire to protect the processor's underlying architecture. The only thing we know is that Dothan's pipeline remains unchanged from Banias; a very good thing considering the surprise we all got with Prescott .
Much of Banias (and also Dothan) remains unpatented and protected using trade secret law in order to prevent the underlying ideas behind the CPUs' design from being picked up by competitors.
Micro Ops Fusion
The Pentium M, like all of Intel's modern day microprocessors, decodes regular x86 instructions into smaller micro-ops that are the actual operations sent down the pipeline for execution. Micro Ops Fusion takes certain micro-ops and "fuses" them together so that they are sent down the pipeline together and are either executed in parallel or serially without being reordered (or separated from one another). Micro Ops Fusion can only apply to certain types of instructions, which Intel has not officially disclosed.The benefits of Micro Ops Fusion are multi-faceted; first, you have the obvious performance improvements, but alongside them, you also have reduced power consumption, thanks to not wasting any cycles waiting for dependent micro ops to retire before working on others.
Dedicated Stack Manager
Banias' dedicated stack manager is another power saving tool integrated into the Banias architecture that is designed to manage stack pointers and other stack-related data. Remember that stacks are used to store information about the current state of the CPU, including data that cannot be kept in registers due to limits in the number of available registers; thus, a dedicated manager can help performance considerably. As usual, whenever efficiency is improved, power consumption is optimized, which is the case with Banias here as well.High Performance Branch Predictor
Banias' branch predictor reduced mispredicted branches by around 20% when compared to the Pentium III (when running SPEC CPU 2000 tests, but the improvements are very real world). The improvements are thanks to a larger branch history table (for storing data used to predict branches) and better handling of branching in loops, the latter of which is improved in Dothan.Pentium 4 FSB, Pentium III Execution Units
The execution back end of Banias is identical to that of the Pentium III, making the Pentium M a relatively narrow microprocessor when compared to AMD's Athlon 64 and Intel's Pentium 4. Given the low power target for Banias, this decision makes a lot of sense as it reduces power consumption and die size; but keep in mind that the lack of extreme width in the pipeline means that technologies like Hyper Threading will be kept away from the Pentium M. Instead, we can look forward to having multi-core Pentium M designs, which is made somewhat easier to implement, thanks to a relatively small die.In order to keep the processor fed, however, Intel implemented the Pentium 4's 64-bit quad-pumped front side bus. Currently, the FSB clock on all Banias (and Dothan) parts is 100MHz quad-pumped (effectively, 400MHz for 3.2GB/s of bandwidth), but by the end of this year, it will move to 133MHz (effectively 533MHz).
Power Saving Cache
Banias (and Dothan) implement an 8-way set associative L2 cache, which is not uncommon amongst modern day microprocessors. A set associative cache increases hit rate (likelihood that something you want will be found in cache) at the expense of increased cache latency. Cache latency is increased because once the location of data is found in cache, in which "way" it exists must be determined and selected - an incorrect determination will further increase cache latency.In order to optimize the 8-way set associative cache for low power consumption, each "way" is further divided into quadrants. Once a "way" is selected, the L2 controller will determine in which quadrant the needed data resides and only activates that part of the cache. With such a large cache, it is important to save power here as much as possible.
Artificially Limited Clock Speed Design
Generally speaking, when you design a microprocessor, you want it to run as fast as possible. Normally, there's an initial idea of target clock speed and once the chip is actually back from the plant, it's not uncommon to find parts of the chip that run slower than your clock target, while others run faster (sometimes much faster). In desktop microprocessor design, the goal is to speed up the slowest parts of the chip (or critical paths as they are known among chip designers) and tweak the chip and the manufacturing process to run as fast as the fastest parts.With Banias, Intel took a different approach. The design team set a clock speed target, and if any part of the chip exceeded that clock speed target, then that part of the chip had to be slowed down. The idea was that if a chip can run faster than its target, then you're wasting power - a luxury that isn't present in mobile chip design. The upside to this design methodology is that power consumption is further reduced, and when coupled with the other power-saving advancements that we've talked about, we're dealing with a fairly low power chip. The downside is that each generation of the Pentium M has a very well defined clock speed wall, and the only way over that wall is to use a smaller, cooler and faster manufacturing process. This is why you will see Pentium M ramp much slower in clock speed than any other Intel chip and why you will see clock speed bumps coincide with new manufacturing processes. It also means that if Intel ever has yield problems with a new manufacturing process (which isn't uncommon), the Pentium M will suffer. It's a risky move, but it's the type of move that is necessary to truly build a good mobile CPU.
28 Comments
View All Comments
phtbddh - Wednesday, July 21, 2004 - link
What is the battery life of a Dothan compared to a Banias? I know the Dothan is suppose to be better, but can we see some numbers?tfranzese - Wednesday, July 21, 2004 - link
Not quite SKiller, a large part of the P4's dominance in media encoding is the high core frequency attributed to such a long pipeline.SKiller - Wednesday, July 21, 2004 - link
I think the assertion that.."With Intel's vision for the future being centered on media encoding and content creation, the Pentium M is the last thing that Intel would want to build their future desktop CPUs around."
..may not be correct as by your own admission:
"Partially constrained by its 400MHz FSB and single channel memory interface, the Pentium M is not the successor to the Pentium 4 that many will make it out to be."
So all Intel would have to do is up the FSB on a desktop version to improve media encoding and content creation performance and be competitive with P4.
mkruer - Wednesday, July 21, 2004 - link
you know i wonder just how much of the preformance is gained from the 2MB of L2 cache. If I recall from Aceshardware the 2MB is the sweetspot For mico op code, any more, and there is a preformance hit in either direction, Also on a side note. The 90nm Athlon 64 show a ~5% improvement across the board.dvinnen - Wednesday, July 21, 2004 - link
Yea, I was wondering the same thing. Why not just use a mobile A64 system with a mobile 9600. Acer and emachines make systems with them.alexruiz - Wednesday, July 21, 2004 - link
Another one: Was that difficult to get an eMachines M68xx for the review? Mobile against mobile.alexruiz - Wednesday, July 21, 2004 - link
Anand made a huge mistake in the Athlon 64 CPU selection. The mobile [b]A64 3000+ is clocked at 1.8 GHz with a 1MB L2 cache[/b]. He used a desktop 2.0 GHz with 512 K. This will affect the outcome, specially because clock speed matters more cache.I knew Dotham was going to give a very good fight, but I didn't expect it to win any gaming application ot Business Winstone. As reference, my M6805 A64 3000+ scores 22.2 and 27.8 in the BW and CCMW tests (7K60 hard drive, so not the same setup)
A very good review, but we can do better. I still want to see video encoding tests run with a commercial application, preferably 3 (Ulead Video Studio 8, Roxio Videowave 7, Pinnacle 9) and 2 alternative programs for DivX encoding (DVD2AVI and virtualdubmod are suggested. We have seen enough XMPEG from other sites)
Run some photoedition benchmarks not only with Adobe, but also with Corel Photopaint 11 or Roxio Photosuite.
AutoCAD is also expected to give an idea of what be attained. SolidWorks or UG would be fantastic, but those 2 are more of a wish.
How about more scientific or technical programs? Electrical simulators (PSpice for example), FEA (Nastran), MathCAd, Maple, etc.
More games were expected to be run. Howe about chess programs? How about OSmark, the succesor of COSBI by Van Smith?
I stressed the use of 2 or more applications that do the same to highlight the fact that software optimization matters a lot and that some myth about a CPU being "the best for that activity" are only myths.
All in all, Dotham is a potent rival that uncovers some weaknesses in the K7/K8 architecture that were noticeable against the P6 (Pentium II/III) but forgotten against the P7 (Pentium 4): [b]L2 cache performance[/b] and integer performance.
Regarding battery life keep in mind that the CPU is not the biggest spender in a laptop, the screen is. The K8T800, the most popular chipset for AMF64 laptops is a desktop part, and is quite voracious. Keep those factor when battery life is evaluated.
I foresee that SOI will give AMD the edge in battery life once they implement power saving caches, the biggest energy conservation feature in the P-M.
Comments are welcome
Alex
dacaw - Wednesday, July 21, 2004 - link
Well Dothan looks very much like a copy of a 32-bit AthlonXP to me.Comparing it to an Athlon64 makes no sense. Dothan is not 64-bit.
I bought an AthlonXP Barton mobile 2600 for $99 and it runs barely warm under PowerNow. What could you buy for the price of a Dothan? Maybe 5 top-of-the-line Athlon XPs?
Let's compare apples to apples and have a review of top-of-the line Dothan to top-of-the-line AthlonXP.
Oh, and drop those fake synthetic benchmarks. What point are they if they simply "favor" Intel processors (your comment in the review).
Come on Anand, lets have a review that really means something. Please!
Jeff7181 - Wednesday, July 21, 2004 - link
Can't wait to see battery life tests.mino - Wednesday, July 21, 2004 - link
Nice review, however it is a shame you didn't include Celeron 2.4 (which could be find in many SLOW notebooks) and also AXP-M 2600+ would be nice. -> this way it would be a complete notebook market review. - The best one.I'll love to see bench results of Cely and XP added (by using same desktop platform as you did in case of P4)
mino