Conclusion

Prescott was built to adapt to the typical problems that made it hard to run x86 programs quickly: branches, dependencies, lots of memory and ADD operations. However, in order to do so, complex logic was used, which increased leakage power quickly. The wire delay problem and dependency problem were only solved by sacrificing a lot of energy. The combination of LVS double-pumped ALUs, tons of new features and 64 bit together created an avalanche of leaking logic. The result is an innovative architecture crushed into a thermal wall.

But the Prescott failure, the exploding leakage power and wire delay don't mean automatically that the single core CPUs have no future. Power leakage can be contained by introducing high-K materials and SOI. Wire delay has been solved by using repeaters - at the cost of some extra power - and Cu interconnects. Dual core is not a magical solution that is going to solve all the problems that Prescott and other modern CPU face.

The Prescott failure only tells us that right now, the ultra deep pipelined CPU is not the best solution. Intel went too quickly, too deep, and although many ingenious tricks were implemented to make the Prescott a real powerhouse, all those tricks together backfired with high leakage and dynamic power loss.

In the next article, we investigate what dual core technology can really bring us, besides a lot of hype, "paradigm shift" slogans everywhere and "much smoother system" claims.


References

[1] An In-Depth Look at Computer Performance Growth
CHALMERS UNIVERSITY OF TECHNOLOGY, Department of Computer Engineering, Göteborg 2004
http://www.ce.chalmers.se/~warg/papers/performancegrowth_tr-2004-9.pdf

[2] Intel Whitefield uncovered, The Register
http://www.theregister.co.uk/2004/05/01/intel_whitefield_uncovered/

[3] Implementing Power Management IP forDynamic and Static Power Reduction in Configurable Microprocessors using the Galaxy Design Platform at 130nm
Dan Hillman, Virtual Silicon
John Wei, Tensilica
http://www.tensilica.com/hillman_slides.pdf

[4] Leakage Power Modelling and Leakage Power Modelling and Minimization
Massoud Pedram
University of Southern California , Dept. of EE-Systems
http://atrak.usc.edu/~massoud/Papers/pedram-tutorial-iccad04.pdf

[5] Gigascale Integration-Challenges and Opportunities
By Shekhar Borkar
Intel Fellow, Director, Circuit Research
http://www.intel.com/research/mrl/research/circuit.htm
http://www.intel.com/cd/ids/developer/asmo-na/eng/strategy/182440.htm?page=1

[6] SUN Niagra Demo
http://www.sun.com/aboutsun/media/presskits/networkcomputing05q1/

[7] LVS Technology for the Intel® Pentium® 4 Processor on 90nm Technology
http://www.intel.com/technology/itj/2004/volume08issue01/art04_lvs_technology/p01_abstract.htm


Other Sources:

  1. Intel Silicon Innovation To Shape Direction Of The Digital World
    Multi-Core Processors, FALL IDF 2004
    http://www.intel.com/pressroom/archive/releases/20040907corp.htm
  2. Pentium 4 processor at 4.7 GHz, FALL IDF 2002
    http://www.intel.com/pressroom/archive/releases/20020909corp.htm
  3. Intel Developer Forum, Spring 2002
    Louis Burns Keynote, Netburst architecture scales up to 10 GHz.
    http://www.intel.com/pressroom/archive/speeches/burns20020227.htm
  4. The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software
    By Herb Sutter
    http://www.gotw.ca/publications/concurrency-ddj.htm
  5. Illinois researchers create world's fastest transistor ... again
    http://www.news.uiuc.edu/scitips/03/1106feng.html

CHAPTER 4 (con't)
Comments Locked

65 Comments

View All Comments

  • Zak - Wednesday, August 22, 2007 - link

    I seem to remember reading somewhere, probably couple of years ago, about research being done on hyperconductivity in "normal" temperatures. Right now hyperconductivity occurs only in extremely low temperatures, right? If materials were developed that achieve the same in normal temperatures it'd solve lots of these issues, like wire delay and power loss, wouldn't it?

    Z.
  • Tellme - Monday, February 21, 2005 - link

    Carl what i meant was that soon we might not see much improved performance with multicores as well because the data comes too late to the processor for quick execution. (That is true for single cores as well).

    Did you checked the link?
    Their idea is simple.
    "If you can't bring the memory bandwidth to the processor, then bring the processors to the memory."
    Intresting no?
    Currently processor waits most of its time for data to be processed.

  • carl0ski - Saturday, February 19, 2005 - link

    #61 i thought p4 already had memory bandwidth problems,
    AMD has a temporary work around (on die memory controller) which aids in multiple CPU's/Dies using the same fsb to access the Ram.

    Intel has proposed multiple fsb's , one each CPU/die.

    Does anyone know if that means they will need sperate RAM dimms for each FSB? because that would prove an expensive system.
  • carl0ski - Saturday, February 19, 2005 - link

    [quote]59 - Posted on Feb 12, 2005 at 11:28 AM by fitten Reply
    #57 What was the performance comparison of the 1GHz Athlon vs. the 1GHz P3? IIRC, the Athlon was faster by some margin. If this was the case, then there was a little more than tweaking that went on in the Pentium-M line. Because they started out looking at the P3 doesn't mean that what they ended up with was the P3 with a tweak here or there. :)[/quote]

    #59 didnt P3 1ghz run 133mhz sdram? on a 133fsb?
    Athlon 1ghz had a nice DDR 266 fsb to support it.

  • Tellme - Monday, February 14, 2005 - link

    Nice article.

    I think dual cores will soon reach hit the wall ie Memory Bandwidth.

    Hopefully memory and processors are integrates in near future.

    See
    http://www.ee.ualberta.ca/~elliott/cram/

  • ceefka - Monday, February 14, 2005 - link

    Though still a little too technical for me, it makes a good read.

    It's good to know that Intel has eaten their words and realized they had to go back to the drawing board.

    I believe rather sooner than later multicore will mean 4 - 8 cores providing the power to emulate everything that is not necessarily native, like running MAC OSX on an AMD or Intel box. Iow the CELL will meet its match.
  • fitten - Saturday, February 12, 2005 - link

    #57 What was the performance comparison of the 1GHz Athlon vs. the 1GHz P3? IIRC, the Athlon was faster by some margin. If this was the case, then there was a little more than tweaking that went on in the Pentium-M line. Because they started out looking at the P3 doesn't mean that what they ended up with was the P3 with a tweak here or there. :)
  • avijay - Friday, February 11, 2005 - link

    EXCELLENT Article! One of the very best I've ever read. Nice to see all the references at the end as well. Could someone please point me to Johan's first article at AT please. Thanks.
    Great Work!
  • fishbreath - Friday, February 11, 2005 - link

    For those of you who don't actually know this:

    1) The Dotham IS a Pentium 3. It was tweaked by Intel in Israel, but it's heart and soul is just a PIII.

    1b) All P4's have hyperthreading in them, and always have had. It was a fuse feature that was not announced until there were applications to support them. But anyone who has HT and Windows XP knows that Windows simply has a smoother 'feel' when running on an HT processor!

    2) Complex array processors are already in the pipeline (no pun intended). However the lack of an operating system or language to support them demands they make their first appearance in dedicated applications such as h264 encoders.
  • blckgrffn - Friday, February 11, 2005 - link

    Yay for Very Large Scale Integration (more than 10,000 transistors per chip)! :) I wonder when the historians will put down in the history books that we have hit the fifth generation of computing org....

Log in

Don't have an account? Sign up now