Intel's Larrabee Architecture Disclosure: A Calculated First Move
by Anand Lal Shimpi & Derek Wilson on August 4, 2008 12:00 AM EST- Posted in
- GPUs
Putting it all Together - Return of the Ring Bus
Intel is keeping two important details of Larrabee very quiet: the details of the instruction set and the configuration of the finished product. Remember that Larrabee won't ship until sometime in 2009 or 2010, the first chips aren't even back from the fab yet, so not wanting to discuss how many cores Intel will be able to fit on a single Larrabee GPU makes sense.
The final product will be some assembly of a multiple of 8 Larrabee cores, we originally expected to see something in the 24-to-32 core range but that largely depends on targeted die size as we'll soon explain:
Intel's own block diagrams indicated two memory controller partitions, but it's unclear whether or not we should read into this. AMD and NVIDIA both use 64-bit memory controllers and simply group multiples of them on a single chip. Given that Intel's Larrabee will be more memory bandwidth efficient than what AMD and NVIDIA have put out, it's quite possible that Larrabee could have a 128-bit memory interface, although we do believe that'd be a very conservative move (we'd expect a 256-bit interface). Coupled with GDDR5 (which should be both cheaper and highy available by the Larrabee timeframe) however, anything is possible.
All of the cores are connected via a bi-directional ring bus (512-bits in each direction), presumably running at core speed. Given that Larrabee is expected to run at 2GHz+, this is going to be one very high-bandwidth bus. This is half the bit-width of AMD's R600/RV670 ring bus, but the higher speed should more than make up the difference.
AMD recently abandoned their ring bus memory architecture citing a savings in die area and a lack of need for such a robust solution as the reason. A ring bus, as memory busses go, is fairly straight forward and less complex than other options. The disadvantage is that it is a lot of wires and it delivers high bandwidth to all the clients on the bus whether they need it or not. Of course, if all your memory clients need or can easily use high bandwidth then that's a win for the ring bus.
Intel may have a better use for going with the ring bus than AMD: cache coherency and inter-core communication. Partitioning the L2 and using the ring bus to maintain coherency and facilitate communication could make good use of this massive amount of data moving power. While Cell also allows for internal communication, Intel's solution of providing direct access to low latency, coherent L1 and L2 partitions while enabling massive bandwidth behind the L2 cache could result in a much faster and easier to program architecture when data sharing is required.
101 Comments
View All Comments
Griswold - Monday, August 4, 2008 - link
You seem to be confused. Time for a nap.MDme - Monday, August 4, 2008 - link
but AMD will have Cinema 2.0. did you see that demo? by 2010, AMD will have the RV990 or whatever...and Nvidia will have GT400?phaxmohdem - Monday, August 4, 2008 - link
Considering how long it took nVidia to release a single GPU significantly faster than G80, I'd be shocked if we wee GT300 by 2009/2010. however a GTX 295GT X2 ULTRA OC is not out of the question ;)shuffle2 - Monday, August 4, 2008 - link
mm², how hard is that to write? >.>1prophet - Monday, August 4, 2008 - link
They need to hit one out of the park with the drivers (software)as well.jltate - Tuesday, August 5, 2008 - link
I've got a bunch of comments, so I'll just list them all here.SSE doesn't have fused multiply-add operations. Larrabee does -- thus that 10 core processor could perform a peak of 320 floating point operations per cycle (it's mentioned in the SIGGRAPH paper).
Larrabee's programming model is variable width -- the hardware can and likely will be augmented in the future to perform more than just 16 operations in parallel.
The ring bus between cores was stated to be for each group of 16. Intel stated that for more than 16 cores they'd use "multiple short-linked rings".
Also, the diagram only shows one memory controller on one side with fixed function logic on the other, not two memory controllers as you showed on page 5 of your article. However, Intel stated in the paper that the configuration and number of processors, fixed function blocks and I/O controllers would be implementation dependent. So in effect it could very well have a half-dozen 64-bit interfaces like G80.
My forecast? This thing will rock. I for one simply cannot wait.
Laura Wilson - Monday, August 4, 2008 - link
that's the truththey say they know this. it sounds like they know this ... we'll see what happens :-)
gigahertz20 - Monday, August 4, 2008 - link
I'm going to predict Larrabee will provide a huge boost of performance over Intel's current crappy integrated graphic solutions, but will not be able to compete with AMD/ATI's and Nvidia's high end GPU's when it (Larrabee) finally launches. If Intel can deliver a monster that can push 100+ FPS in Crysis and doesn't cost so much that it breaks the bank like the current Nvidia GTX 280's, then they will have a real winner! When it finally launches though, who knows what AMD/ATI and Nvidia will have out to compete against it, wonder if Intel is just trying to push out a mainstream chip or go high end as well...guess I need to read the rest of the article :)JEDIYoda - Tuesday, August 5, 2008 - link
dreaming again huh??? you people who want top notch performance without having to pay for it....rofl..hahahaFITCamaro - Monday, August 4, 2008 - link
This isn't mean to compete with their IGPs. At least not initially.