AMD CPU Roadmap: Q4'04 Two Heads are Better than One
by Jarred Walton on December 18, 2004 12:00 AM EST- Posted in
- CPUs
AMD CPU Roadmap Update
2004 is coming to a close, and it has been a pretty exciting year for AMD. While it launched in 2003, 2004 was the year that the Athlon 64 really came into its own. Prices dropped to more affordable levels, and performance was improved to the point where even the best that Intel has to offer in the way of the Pentium 4 Extreme Edition is not able to score a victory over the top AMD processors. In fact, AMD has several processors that are currently beating Intel's top chips in the majority of applications. With such a successful year now completed, do we have more of the same to look forward to in 2005? That's difficult to say: hindsight is always 20-20, but predicting the future is far more difficult.The buzz for the coming year is all centered around multi-cored processors - mostly dual core, but we may see quad core chips in the enterprise segment, and the future will almost certainly bring chips with more than two cores in a package. There are other areas besides the high-end multi-core arena, and we haven't seen the end of increasing clock speeds yet. Besides the high-end enthusiast/workstation/server market, we have the mainstream and value markets. The high-end may be good for bragging rights, but the vast majority of chips sold are in the value and mainstream markets. Here, then, is the latest look at what AMD is planning in each of these areas, starting with the performance processors.
AMD Desktop Athlon 64 Roadmap | ||||
Processor | Clock Speed | Socket | Launch Date | End of Line |
Athlon >= FX-57 | ??? | Socket 939 | Q1'06 | |
Athlon FX-57 | ??? | Socket 939 | Q3'05 | |
Athlon FX-55 | 2.6 GHz 1MB L2 | Socket 939 | Now | |
Athlon FX-53 | 2.4 GHz 1MB L2 | Socket 939 | Now | |
Athlon FX-53 | 2.4 GHz 1MB L2 | Socket 940 | Now | |
Athlon FX-51 | 2.2 GHz 1MB L2 | Socket 940 | Now | |
Athlon 64 >=4200+ | ??? | Socket 939 | Q3'05 | |
Athlon 64 4000+ | 2.4 GHz | Socket 939 | Now | |
Athlon 64 3800+ | 2.4 GHz | Socket 939 | Now | |
Athlon 64 3700+ | ??? | Socket 939 | Q2'05 | |
Athlon 64 3700+ | 2.4 GHz 1MB L2 | Socket 754 | Now | Q4'05 |
Athlon 64 3500+ | 2.2 GHz 90 nm | Socket 939 | Now | |
Athlon 64 3500+ | 2.2 GHz | Socket 939 | Now | |
Athlon 64 3400+ | 2.4 GHz 512K L2 | Socket 754 | Now | Q4'05 |
Athlon 64 3400+ | 2.2 GHz 1MB L2 | Socket 754 | Now | Q3'05 |
Athlon 64 3200+ | 2.2 GHz 512K L2 | Socket 754 | Now | Q4'05 |
Athlon 64 3200+ | 2.0 GHz 90 nm | Socket 939 | Now | |
Athlon 64 3200+ | 2.0 GHz 1MB L2 | Socket 754 | Now | Q3'05 |
Athlon 64 3000+ | 2.0 GHz 512K L2 | Socket 754 | Now | Q4'05 |
Athlon 64 3000+ | 1.8 GHz 90 nm | Socket 939 | Now | Q3'05 |
Athlon 64 2800+ | 1.8 Ghz | Socket 754 | Now | Q1'05 |
If you compare that with our last AMD roadmap, you'll notice that there are very few changes. In fact, the only really new information is the appearance of a 3700+ socket 939 part. We would assume that this uses the new Venice core, which should include 512K L2 cache and SSE3 support. Unfortunately, precise features and clock speeds remain an unknown at present for many of the upcoming chips. The FX-57 could be a dual core chip, or else it could be a 2.8 GHz single core chip - we really can't say which yet.
In addition to adding support for SSE3, there are rumors that AMD will begin supporting DDR2 with their next processor revisions. At present that remains wild speculation. A new memory type would require new motherboards at the very least, and probably a new CPU socket as well. (Having two CPUs that share the same socket and yet support different memory types - due to the integrated memory controller - would create a lot of confusion among customers. We find it hard to imagine that AMD would take such an approach.) DDR2 also has increased latencies relative to DDR RAM, and the additional bandwidth offered does not seem to benefit Athlon 64 very much. All things considered, then, we assume that AMD will continue using only DDR memory and socket 939 for at least the first half of 2005.
You might note that we have added an "EOL" column for the processors. This is only for the desktop versions of the chips, and it indicates AMD's plans for when to phase out each model. In the past, we might see processors drop well below the $100 mark before they were discontinued, but now that AMD has closed the performance gap with Intel, they are halting the shipment of their "performance" processors once they drop below about $120. This raises the average sale price of AMD's processors, and that's a good thing as a more profitable AMD is a more competitive AMD. As we mentioned in the last update, all of the socket 754 chips are scheduled for EOL by Q3'05; beyond that, Athlon 64 will only be available on socket 939. Take these EOL dates with a grain of salt, however, as mobile variants will continue to be sold and will work in most - if not all - desktop boards. Below $120, AMD has their "value" offerings, and they pick up in performance basically where the more expensive processors leave off.
AMD Desktop Sempron Roadmap | ||||
Processor | Clock Speed | Socket | Launch Date | End of Line |
Sempron >= 3500+ | ??? | Socket 754 | Q1'06 | |
Sempron 3400+ | ??? | Socket 939 | Q3'05 | |
Sempron 3400+ | ??? | Socket 754 | Q4'05 | |
Sempron 3300+ | ??? | Socket 754 | Q2'05 | |
Sempron 3200+ | ??? | Socket 939 | Q1'05 | |
Sempron 3200+ | ??? | Socket 754 | Q1'05 | |
Sempron 3100+ | 1.8 GHz | Socket 754 | Now | |
Sempron 3000+ | ??? | Socket 939 | Q1'05 | |
Sempron 3000+ | ??? | Socket 754 | Q1'05 | |
Sempron 2800+ | ??? | Socket 754 | Q1'05 | Q1'06 |
Sempron 2600+ | ??? | Socket 754 | Q1'05 | Q4'05 |
Sempron 3000+ | 2.00 GHz 512K | Socket A | Now | Q3'05 |
Sempron 2800+ | 2.0 GHz | Socket A | Now | Q3'05 |
Sempron 2600+ | 1.83 GHz | Socket A | Now | Q3'05 |
Sempron 2500+ | 1.75 GHz | Socket A | Now | Q3'05 |
Sempron 2400+ | 1.67 GHz | Socket A | Now | Q1'05 |
Sempron 2300+ | 1.58 GHz | Socket A | Now | Q1'05 |
Sempron 2200+ | 1.5 GHz | Socket A | Now | Q1'05 |
There are a few new additions to the value lineup since our last look, including the arrival of Sempron chips for socket 939. Clock speeds, again, remain somewhat unknown. However, given the apparent lack of working .5X multipliers for Athlon 64 motherboards, we would guess that they will come in 200 MHz increments from the current chips. Feel free to fill in the blanks. Sempron variants featuring SSE3 support should be arriving via the Palermo chips in early 2005, but where the current Paris ends and the new Palermo begins is anyone's guess. The newer parts will use a 90 nm process, so that should make spotting them somewhat easier. Overlapping model numbers continue to create some confusion, so pay close attention to the details when ordering any of these chips. With the value lineup transitioning to socket 754 - and AMD's roadmap makes this the clear intention - the days of socket A systems are numbered. The platform will also continue to see support in the way of mobile variants, but the desktop Athlon XP and Sempron chips are fading away fast. If you already own a system that uses the platform, performance is still more than acceptable in all but the most demanding of applications, but we wouldn't advise anyone looking for a new system to use socket A.
If our guess on the clock speeds is correct, we'll actually see the new chips launching with clock speeds as low as 1.2 GHz. That seems awfully low, and the parts will have a very short lifespan given the EOL looming just a few quarters after the launch. Power requirements would be very low at those speeds, however, making them an interesting prospect for mobile and embedded devices. Whether the low initial clock speeds are just AMD being cautious with a new design or if they're protecting the higher end markets is difficult to say - it's probably a little of both. The socket 939 Sempron parts will most likely start at 1.8 GHz and scale up from there in 200 MHz increments, so it looks to be more of a customer (i.e. OEM) demand than anything else. The lower performing socket 754 parts will take over the vacated positions of socket A Semprons, and that will let AMD continue to cater to the value conscious consumer without cannibalizing sales of the performance parts.
AMD Server/Workstation Roadmap | ||||
Processor | Clock Speed | Socket | Launch Date | End of Line |
Opteron x50 | 2.4 GHz 1MB L2 | Socket 940 | Now | |
Opteron x48 | 2.2 GHz 1MB L2 | Socket 940 | Now | |
Opteron x46 | 2.0 GHz 1MB L2 | Socket 940 | Now | |
Opteron x44 | 1.8 GHz 1MB L2 | Socket 940 | Now | |
Opteron x42 | 1.6 GHz 1MB L2 | Socket 940 | Now | |
Opteron x40 | 1.4 GHz 1MB L2 | Socket 940 | Now |
We don't have any new additions for the Opteron lineup yet, although the Venus, Troy, and Athens parts are supposed to be in the works. These are the server variants of the San Diego Athlon FX part. They should come with 1 MB L2 cache and SSE3 support and will be fabbed on the new 90 nm SOI process - strained silicon may also be used, although that isn't entirely clear. Rumors say that the x52 parts may arrive as early as Q1 2005, with clock speeds matching the 2.6 GHz of the FX-55. However, AMD may simply choose to skip these parts and head straight to their dual core models.
Speaking of dual core, Denmark, Italy, Egypt, and Toledo chips are also on the horizon for 2005, and the latest roadmap outlines AMD's plans for the introduction of these processors. Scheduled to arrive in the performance segment in the middle of 2005, dual core will initially benefit server and workstation applications the most. AMD even acknowledges this with the statement that gaming will be "best served with a maximum frequency, single core solution until 2006." The first half of 2005 will see the model numbering finalized and the sample and demonstration of dual core processors and platforms will begin in earnest. Planned clock speeds remain unknown for the time being.
The EOL for the Opteron processors remains blank, and this is what you would expect for any processor competing for the enterprise server market. Typically, such chips will be manufactured as long as there is a demand for them from the server manufacturers. There are also lower voltage versions of the earlier x40 chips available, which can be useful in blade servers. These chips might be around for years to come, although likely in smaller quantities as demand decreases.
Returning to an earlier subject, while current Athlon 64 processors do not seem to benefit much from increased memory bandwidth, that could change once we shift to dual core processors. The industry will eventually move to fully embrace DDR2 memory, just like the eventual shift from SDRAM to DDR (and RDRAM) several years back. If we are going to see DDR2 support from AMD any time soon, a separate revision of their dual core offerings would seem to make a lot of sense. That's just speculation on our part, but by 2006 we should see DDR2 prices drop to DDR levels and possibly even lower, and we should also see a shift towards the use of 1 GB and 2 GB DIMMs. That would be a good time for AMD to transition to a new CPU and RAM platform, we think.
Final Thoughts
If we take a look at the bigger picture, what we see for 2005 is that things will remain largely static with a few notable exceptions. Socket A is going to disappear, which comes as little surprise. Socket 754 becomes the new value platform and socket 939 fills in as the mainstream and performance platform - and even adds a couple of value options later in the year. Meanwhile, socket 940 will continue as the platform for workstations and servers. The notable exceptions are that we'll see the introduction of dual core processors, and we'll also see a shift to the 90 nm SOI process for AMD. The maximum clock speed of any single chip planned for 2005 appears to be 2.8 GHz at present, although there is a slight chance that we could see a 3.0 GHz chip at the end of the year. What this means is that those of you that went out and splurged on an FX-55 are going to be very close to the maximum performance available for games for the next year.Of course, things may always change as the months roll by. Intel is certainly not sitting idle, watching AMD increase market share and performance. We'll take a look at the Intel side of things in the near future, and if Intel can execute properly on several key items, we might actually see AMD forced to accelerate the launch of faster processors. With initial overclocking of the FX-55 on 130 nm SOI with strained silicon reaching roughly 3.0 GHz, AMD may actually have more of a performance cushion than ever before. Fierce competition between AMD and Intel is almost certain to muck with these "best laid plans."
31 Comments
View All Comments
timw - Saturday, January 15, 2005 - link
And also, making those copies of the data will take up some time and memory bandwidth as well. If you want your AI thread for example to run as a separate process and it needs access to certain structures in order to do it's work, instead of sending a pointer or reference to those functions, you'll need to send a complete copy instead. Besides using extra memory as I mentioned, this is also going to necessarily be slower. If you have multiple processors it will be worth it no doubt, but it's easy to see why they wouldn't code their games this way right now given how few multiprocessor setups are out there.timw - Saturday, January 15, 2005 - link
I am not a game programmer yet either - actually going to school to become one - but as other people have mentioned, the problem with multiple threads, and the reason why 2 processors is not twice as fast as 1 is because much of the data used for one part of the program will need to be accessed by another part.This is easy enough to implement in C++ with critical sections that permit only one thread to access the data at a time, however that also means that if multiple threads are trying to access it at the same time, the rest will have to wait while one modifies the data.
So in order to allow multiple threads to work at the same time, each will need to store it's own local copy of the data to minimize the amount of time it controls the critical section. In other words, multithreaded games will need to store the same set of data multiple times, increasing the memory footprint. And if you code it that way, all systems will pay the price whether the threads are running on one CPU or several.
WhoBeDaPlaya - Friday, January 14, 2005 - link
Nice comments #19. Programming for multiple threads is kinda different, and IMHO, a lot of effort is spent just synchronizing the threads. Then there are the overheads of variable/memory protection and the really fun stuff like starvation and deadlocks. Of course, my favorite is probably still the good 'ol fork bomb :PJarredWalton - Friday, January 7, 2005 - link
#27 - 939 will eventually support dual-core processors, but it will definitely appear first for socket 940, so you're in luck there. Of course, the big question that remains is clock speed. Supposing that x52 is the dual-core variant (rumors also have that being a 2.6 GHz Opteron single-core, essentially the Opteron equivalent of the FX-55), if it "only" runs at 1.8 or 2.0 GHz initially, it won't outperform an FX-51 in most applications - at least not initially. (See conversation about SMP programming that took place in this thread.)However, let me make it clear that I have not seen any material stating the initial clock speeds of dual-core chips - i.e. I haven't read something that's under NDA - so the clock speed is just a stab in the dark. If AMD can launch dual-core at 2.4 GHz, on the other hand (and not charge an arm and a leg for it), I think a lot of people would snatch it up.
phaxmohdem - Friday, January 7, 2005 - link
#16 I share the same sentiment, I am running a socket 940 FX-51 and cant see shelling out the dough for a marginal upgrade to the FX53. However... Us crazy Socket 940'ers should be in the highest levels of heaven once dual core opterons launch. You see our boards will be compatible with the first dual core opteron CPUS to hit the streets. (With perhaps some BIOS updating) (I"m not certain if 939 will be initally supported or not) I'm hoping to Drop a Dual core Opteron x52 into my boxx once it comes out.*Drooooling already.
Jii - Tuesday, January 4, 2005 - link
Fascinating roadmap. Luckily I looked at it. I wasn't aware of Socket A demise in this year until now. I must have been off the loop - badly. Time to look at the MoBo features with a magnifying glass again...Googer - Saturday, January 1, 2005 - link
First Comment of They NEW YEAR!Pannenkoek - Friday, December 31, 2004 - link
Perhaps it is vague what exactly a module is, but if different modules share quite a lot of data then obviously the design can't be called modular anymore.Any reasonably big program ought be designed with a lot of thought. Good programming practices are well known for decades, but unfortunately not adhered to too often in practice. And that is perhaps the key difference between us: I know the theory, while you have been more confronted with the raw reality in the field, which isn't as pretty as it could have been. Alas, there is a lot of bad code around, and it appears that the spaghetti paradigm still reigns. ;-)
A good design is never a trade off, as most other aspects of the program benefit from it. The art of programming is to keep things simple, even if that may require a lot of thought. If one makes a mess, than any change to the code will be hard.
You aks me whether I played a bug free game recently? At a friends place, I exploited the only serious bug I found in HL2 within a minute after I was finished harassing the police by throwing garbage to their heads. :-)
JarredWalton - Friday, December 31, 2004 - link
Modular != ThreadedI think a lot of non-programmers don't realize how much data is shared between different modules. Not necessarily directly, of course, but as passed parameters. You typically end up with the output of one module (procedure/function) being the input to another module, so you have heavy dependencies between the two. Global variables, of course, really help to break modularity. Unfortunately, global variables are often still used as a performance optimization.
It *IS* possible to get things running independently, but it is also a lot more tricky than you seem to think. I'll give a simple example from one of my former jobs.
We were working on a word processor, more or less. It was done in Java (a long time ago - like six or seven years back), and so a lot of routines had to be written by us. We ended up writing the whole document layout engine from scratch because there wasn't much available in Java's primitives. Basically, we had a big, blank window and we wrote all the routines to handle mouse input, text selection, the blinking of the cursor, etc. Can you guess which item used a thread?
It might not be immediately obvious, but at the time we used a separate thread for the blinking of the cursor. That way it could toggle the cursor state between on and off once every .5 seconds. Simple enough, right? It worked well in theory too... except that there were conflicting calls to certain functions. What ended up happening is that the paint function could be triggered in numerous ways - sometimes for no apparent reason, just the OS doing its thing - and you had this thread blinking the cursor all the time. Getting the cursor to write to the screen properly (i.e. no think it was "on" when it was really "off") took weeks of work. Yes, WEEKS! And that was a very simple thread, for all intents and purposes.
The real problem is that when you have multiple threads accessing data at the same time, synchronization isn't something you just worry about once per frame. Well, I suppose it *could* be provided you designed very carefully with this in mind, but it is that "paradigm shift" I was talking about earlier.
Now, let's all just assume for the sake of argument that making games threaded isn't especially complex. Fair enough. Let me ask one question: how many bug free games have you played in recent years? Oh, some are close enough to bug free, but the vast majority ship with quite a few major bugs that need to be addressed with a patch. If the game developers can't manage to rid themselves of most critical bugs with single-threaded models, I shudder to think how difficult it will be for them to eliminate bugs in a multi-threaded world. Threads make debugging *much* more difficult - just trust me on this one!
What you failed to clarify is that when you have to "lock" some data to prevent concurrent access, what happens to the second thread that tries to access the data? It sits and waits and does nothing, usually. If it could find something else to do, that would be great, but it would also add another level of complexity and bug hunting.
We have to become more threaded in our programming approach, but it's a lot more than being modular in design. Everything in the design process is a series of trade offs: more optimized code at the cost of more development time, better graphics at the cost of more time, new features at the cost of more bugs and time, more threads at the cost of more bugs and time... It's a delicate balancing act, and to be honest I would just as soon have slower bug-free code than highly optimized but buggy code (provided the slower code isn't more than 20% slower).
Pannenkoek - Thursday, December 30, 2004 - link
Synchonisation is done by saying: "Don't touch this data until I'm finished!" by one process to another, which will have to wait if it wanted to use that data too. That is simple, and it stays simple if the data is almost never used at the same time by both processes and if not much data is shared. Once per frame is not often. It can easily be made much more complex, and you made a good start. ;)I didn't even mention multithreading CPU heavy tasks within one module. That depends much more on the actual implementation for how hard that would be and might be more work. The reason why I didn't mention it.
What you seem to overlook it the fact that game engines are frame based. Every freaking frame they calculate and render everything again. At the start of the frame they handle some user input and internet data, then they do the heavy stuff: physics, AI and rendering. Lets say those are three different modules. Even if one depends on the output data of another, they can be easily threaded if they don't process the same frame at the same time. I believe that is called pipelining in hardware. ;)
In fact, games run already on SMP: rendering happens on the GPU, while the rest happens on the CPU. Graphics can't be multithreaded, and OpenGL at least, being a statefull API, is NOT thread safe! However, advantage is almost automatically taken of this parallelism. I think Carmack tried to run the graphics part on a separate thread, not the whole game on an actual dual processor system. Can be wrong though, but it would explain why it didn't make any difference. ;)
The reason why no one bothers with SMP for desktop programs is because desktop pc's have been all single processor or core. Also, the reason why SMP doesn't improve performance that dramatically for most of the programs is because most programs are IO-bound. Games can be video-bound, and dual core processors wouldn't help much either in such case.
"The algorithms that divide up the work between threads need to also be efficient enough that they don't end up wiping out any gains."
Between CPU's you mean? That is what CPU schedulers are for, which reside in the OS. A game doesn't need any algorythm, it simply runs a module as a separate thread, a no-brainer. At the end of the frame the main loop of the game simply waits for all modules to have finished before continuing. Still, I can't understand why you guys think that the overhead might kill the advantage of an additional 2-3 GHz processing power. :p
My main point is that _if_ a game is designed modular, it should be easy to multithread it. If the game design is messy, then obviously it could be a nightmare to implement.