Athlon64 3000+: 64-bit at Half the Price
by Wesley Fink on December 22, 2003 8:15 AM EST- Posted in
- CPUs
Newcastle, the 512k cache version of Athlon64, is in the AMD Roadmap for the first half of 2004.
Imagine the surprise when we stumbled across the 3000+ for sale at several sites this week. The specifications were wrong at most sites, and got changed several times without getting them completely right, but there was no mistake that the Athlon64 3000+ is for sale at just over $200 for the OEM (bare chip) version. This is about half the price of the 3200+, so we couldn’t resist getting one in to see what was really being sold and how it performed.
The chip arrived a couple of days ago, and it certainly appears to be Newcastle. Clock speed is exactly the same at 2.0GHz as the 3200+. The only difference that we can see is the L2 cache is 512kb instead of the 1Mb found on the 3200+. Regardless of the botched specs you are seeing, the Athlon64 3000+ being advertised for mainstream prices is a Socket 754 running at 2.0GHz with 512kb cache.
AMD even added the new 3000+ to their 1,000 lot Processor price list on December 15th. You can see the full 12/15/03 price list at AMD. Anand is preparing an in-depth look at Newcastle, but we knew that our readers would enjoy a preview of the performance of the chip as it compares to other processors. With the early Christmas present from AMD, we couldn’t help but rush it into an Athlon64 board that we were testing and put it through its paces. How much difference does that 512k cache make in performance?
75 Comments
View All Comments
Pumpkinierre - Thursday, December 25, 2003 - link
#65 what youre discussing is the branch predictor in the cpu. This is at the micro level and further down the track than the cache prediction algorithm. Because of the fast nature of cpus the branch predictor offers up a number of solutions to a decision yet to be made by the operator or program. Once the decision is made, the cpu throws away (flushes) the non relevant solutions and then repeats the process with the next step of the program. This way tne cpu uses its spare time more efficiently and involves black magic (no. of pipelines, execution lenghts, branch prediction methodology, buffers etc.). This is also where the K8 has been tuned for gaming (strong fpu) mentioned in my last post. What I've been talking about is caches. These have prediction algorithms- a small program if you will run by a processor as part of its housekeeping. Whether this is done by the cpu itself or by separate dedicated circuitry I dont know-but it is in the processor somewhere. these algorithms are black magic also. Caches on hard drives and DVD/CD players/burners have gone up and down (512k to 2Mb to8MB and now settled back to 2Mb) because predictability of required data at this level is nigh on impossible. Better burn software has made the need for large caches redundant. In the case of HDD many say 512K is all you need so the decision is more to do with cost/marketing (the bigger-the better and as I look between my legs I understand this philosophy).So similar to the cpu predictor the cache predictor loads up the data /commands that are possible requirements in the future and waits for the cpu to make the final decision. Unlike hdds etc., at this level its got something to go on- the program code and in the case of a batch process- data/command input file. This file is set in stone and controls the exe program which may have many decision statements and subroutines. Even the stupidest cache algorithm would load information from this file first as soon as it encountered the relevant Open file statement in the main program. For the rest its a question at looking at the branch statements and what memory addresses each require and their associations- again to do with algorithm, compilation cache/memory architecture-black magic.
This all fine for batch jobs and that is what a demo is(go away and have a cup of coffee). But a game is not a batch job- you dont have a cup of coffee in the middle of an FS or FPS without hitting the pause button. So in this instance the cache has nothing to go on- loads up as much of the main program as it can and waits there for the operator to give it an instruction. Predictability-zero. So as with caches on HDD and Cd burners, for this low predictability application the cache size can come down. I suspect algorithms can or will look ahead in the code possibly in conjunction with the code compilation to better assess what the cpu will require but this will be of only small benefit to 3D gaming and a hindrance if the game hasnt followed the expected methodology in its conception. Caches benefit servers/workstations they are only present on desktops because these systems are expected to be jack of all trades.
In the case of the K8, it is a production/politics problem- so AMD have gone for a niche market but they've picked the wrong one because they think servers are high profitability. But this is erroneous as the server market requires extensive backup and upgrade paths which is based on reputation which in turn requires lots of initial capital outlay to build up goodwill. On top of that, the K8 wasnt designed for that(you dont need powerful fpus for servers which require low latency and memory bandwidth-K8 has the low latency thats it), it was designed for gaming pure and simple. So the solution is to get it out there targeted at gamers and by chopping of the cache they could double capacity. This 512k a64 is going to sell like hot cakes but its going to be hard to get hold of as the server apparatchiks in AMD cling to their model and refuse to divert resources. AMD are in deep turmoil evident in the lack of clarity on socket types, upgrade paths and roadmaps (what's this 32bit paris sckt754 A-XP processor- either stick with K7/SktA or leave the 64bit set in). With XBOX2 rumour has it that the K8 is going to be used with IBM producing it. The G5 is a dog cf to the K8 in gaming and Bill knows it.I'm all for it as it would establish the K8s true credentials but the problem might be that Bill becomes too interested.
So its up to us, the interested populace, to back up whoever it is in AMD that is taking on the machine men and state unequivocally that what we want is a budget gaming K8 cpu NOW! (use the Arnie gospel- the broom and the jingle:"No,no we're not going to take this anymore").
Merry Xmas
PrinceGaz - Thursday, December 25, 2003 - link
@Pumpkin... - Your argument against game benchmarks is fundamentally flawed; while it may sound plausible to people who know nothing about how a CPU works (which includes yourself it seems), you only need to read some of the CPU articles here on AT to spot the problem.Basically what you're saying is game benchmarks are invalid because the processor has access to the benchmark/demo recording data and can use it to ensure all the data and instructions the processor will need is cached ready to be used, and that the only way to test real game-performance is for a human-player to interact with it as then theres no way for the processor to predict exactly what or when the player will do things. Right?
Wrong. The processor can only make predictions based on what it has done at that point in the code the last few times its reached it, more specifically the Branch Prediction Unit makes its decision about whether to assume a branch is followed or not by checking a 2-bit counter (0 to 3) which is incremented each time it is actually taken, and decremented if it isn't. By looking at that counter it can see whether the branch has been taken more often than not recently, and if thats the case it assumes the branch will be taken again. Thats the limit to its prediction.
Theres no magical examining of demo-recording files to see what it says is coming up next, all decisions are made on the basis of very recent past events (so theres no long-term memory to it either, if you were about to use that argument), therefore it makes no difference if the game-input is from a human-player or a file. If you don't believe me read this-
http://www.anandtech.com/cpu/showdoc.html?i=1815&a...
Your whole argument against using demos/game-recordings is therefore proven totally incorrect, and with it everything else you have said about how large cache processors performed differently in game demos to when a human interacts directly with the game. Basically, everything you have said on that subject is total utter rubbish. Head-Shot! :)
Reflex - Thursday, December 25, 2003 - link
Um, while most of that is junk science at best, let me point out something just cause I found it a bit funny: IBM's Power5 is going in the Xbox2, not the Athlon64. They announced that last month...Pumpkinierre - Thursday, December 25, 2003 - link
Sorry forgot Merry XmasPumpkinierre - Thursday, December 25, 2003 - link
Yeah well #60 no one does what I'm suggesting except for consumers who find against all 'expert' advice that a particular game runs quite well on their lowly processor maybe with the help of a video card upgrade. You can see it in this review with a half size cache A64 being within miniscule difference of a 3200+. This cache crippling may have extended to the 16way associative cutting it down to 8way, supposedly further damaging performance If all the blarney about cache improving system performance were true you'd expect 15-20% loss of performance- after all 512k is a lot of transistors and a fair bit of die space. I mean the guys that bought full blown A64-3200 at over us$400 must be spitting chips. But the fact is you cant make any statement or deduction from this review as there are too many variables (different mobos, processors demos benchmarks etc.) all requiring intensive analysis to draw any truth.The fact is the K8 was built for gaming- why? powerful number cruncher (fpu-better than K7) and LOW LATENCY- that is what you need for gaming. The K7 was only experimental for the K8 not the other way round as many suggest (K8 is just a K7 with two extra branch instructions per pipeline nanana- BullSht). This processor is tuned for gaming and someone at the heart of AMD knows this. Unfortunately the apparatchiks have taken over, due to over 2 years of losses and so we've got server business models, inverted pyramids and a mute heart. The server market is conservative and full of apparatchiks who wont take a risk. So even though its profitable its a long haul with reputation build up etc. and really not the province of a company laden with debt as is AMD. So its up to internet hardware sites to chorus and point out this bad turn in direction in order to harmonize with those inside AMD who know K8's true destiny. Some of the politics can be seen with all these different sockets when the cpu barely been released (i've still yet to see one even in a shop). Unfortunately it seems that the hardware sites perhaps helped by intel who strive to achieve what the K8 has (ie low latency), seem determined to follow this trend with bloated FX51 and occasionally P4EE cpu dropped into the heavy cache biased tests to make us go ooh ha and go out the back to do what we never admit to, because we know we cant afford it.
The problem AMD have is capacity in production. They basically have one Fab and they are scared of over reaching and not supplying demand. Hence the single expensive A64 release,the tentative OEM release of the 3000+ (like the top range opteron in april) and the limited edition hyper expensive FX51. The present production is geared towards opteron production even the A64 is a rebadged opteron.
The solution: 2dies to the 200mm2 wafer not on 90 nm- too many problems, no money - but NOW! so on .13um. This means less than 100mm2 per die. A quick look at the present die shows that the computational and L1 exist on less than half the die. This is good enough in my view but if they could squeeze in and extra 128K as L1 or even L2 it may keep the cache apparatchiks/zombies at bay. To compensate dual bank memory controller with fastest memory DDR500, dual or quad phase memory- whatever but try to minimise latency (careful balance between bandwidth and latency-another story). This is the Newcastle that is required and should be being demanded by the internet sites preferably released before Prescott to show up this cpu obvious problems an shortcomings. With a sale price under US$150 and AMD would meet demand, have over 30% of the market and be in the red by end 2004 with debt on its way down like before Bush and Iraq. The advent of Win64 and Xbox2(I'm not the only one to have noticed the K8's true calling) will only further boost sales and credibility. As it is their model is one of production contraction(witness A64 3200+ sales) for supposedly high profit most probably resulting in slow death or takeover.
So AT and other sites revamp totally your testing procedures for the new year- no synthetics, no demos just real usage with operator anectdotes- too subjective?! - isnt that what quantum mechanics is telling us! and no cpu or gpu above us$500. You'd double your susbscription in a year and it wouldnt just be AMD with a capacity problem. That might turn AMD around as long as you kept barking at its heels for what the K8 was always meant to be: a cheap fast responsive overclockable gaming cpu.
Reflex - Thursday, December 25, 2003 - link
All I can say is that I used to get paid to do tests like this. Pumpkin is wrong, plain and simple. Show me one modern game that runs better on a Duron than an Athlon. Show me one modern game that runs better on a Celeron than a P4. Do this with equivilent clock speeds. I don't care how you do the demos. Bear in mind that there are *plenty* of user created demos you can run aside from what the game manufacturer gives you to start, so there is no conspiracy here.All I can say is: Prove it. I know your wrong on a technical level, so the ball is in your court.
Hammerfan: No, it wasn't, but it was a somewhat fun excercise for a little while, till it got repetitive...
HammerFan - Wednesday, December 24, 2003 - link
was this arguement really worth all those lines of text?Pumpkinierre - Wednesday, December 24, 2003 - link
When you run a program with the same input file you get a predictable follow through of code to the CPU a la von Neumann. Even SMP and HT tuned games will be the same with a predictable follow through of code. That is why you get repetitive results. Cache prediction algorithms love nothing better than step by step follow through. They can load up the next steps in the program in conjunction with the input file data or command and have it on hand for the cpu. The process I have described is a game demo and this process is almost the antithesis of what happens in actual operator driven gaming. Its true I'm a failed scientist (and gardener!) but if I produced a model of a process ie a demo and claimed it represented the process without correlating the results with the actual process ie what is felt by gamers which no site has ever done i'd truly take up tiddly winks as my primary occupation. The only use of demos is to compare the same computer system family eg A-XP/nf2/ATI9800p and then change one variable BUT WITHIN THE FAMILY eg XP2500+ barton to XP3000+ barton (both 333MHz). Even changes of cache size and FSB within the same series of processors can be deemed out of the family. Only a single variable can be changed at a time and then the response of the whole system observed. The result from this would define comparatively the power of the system where the demo is integral to that system BUT NOT THE ACTUAL PLAYING OF THE GAME from which the demo is derived. Most reviews do even better with a kaleidoscope of Intel & AMD cpus, mobos, DRAM and other factors all compared in the same chart with max. fps as being the winner when in fact the relevance to gameplay is nothing. No wonder the populace turn to George Bush and Arnie for inspiration. For 2d and multimedia applications this sort of testing (Winstone, photoshop, high end wkstation, 3dmax5) is fine as it represents ordered command sequence that operators use when running these apps eg rotation followed by render etc. in cad-again the antithesis of gaming where you might bank left too hard find yourself in a spin and kick the rudder back off the throttle while unloading the wings IMMEDIATELY to correct.Secondly, outside of any technical argument, demos are produced by the companies to sell their games- see how it runs on my system. Its only natural they are likely be sequence selected and "optimised" for smoothness, good fps and visual attraction.
The above has caused terrible confusion with a meaningless neurotic MHz,cache size, equivalent rating, IQ vs fps war amongst the internet elite and worse berating celerons and Durons as useless (when many know in operation they play games very well) while poor selling expensive overbloated cache high end cpus more relevant to servers than gaming, are discussed by the internet sites.
The solution: As in science (except for the 20th century)- the awesomely simple - DO THE TEST ON THE GAME ITSELF by a competent gamer. Yes you wont get a repetitive result but no game is played the same way exactly even if following a set routine (just like surfing - no wave is the same man- add failed surfer to the list!). By running several passes of the same judiciously chosen game sequences - meaningful results could be derived and systems compared for that game. Groupings of similar responding games would then help the consumer better match a system to his preferred games and importantly budget. If AT did that they would have to add a few more servers (Xeons of course 2MB L3) to cope with subscription.
PS Sorry to those that want me to bury it! Merry Xmas
PrinceGaz - Wednesday, December 24, 2003 - link
Yeah, I think we've wone this argument against Pumpkin... At least until the next time theres a CPU article (Prescott?) where he'll no doubt say its large cache cripples gaming performance. He should be on a comedy-show :)Reflex - Wednesday, December 24, 2003 - link
*laff* *hands #55 some pom pom's*Now if you don't mind, I must return to my secret identity.... ;)