.09 Athlon 64: Value, Speed and Overclocking
by Wesley Fink on October 14, 2004 12:05 AM EST- Posted in
- CPUs
AMD Q&A
To understand better what was accomplished, we asked AMD some questions regarding the move to .09. We hoped that this would provide a clearer picture of what we should expect with the new die-shrink processors.Q1: Historically, any move to a smaller manufacturing process has been met with cooler processor operation, higher clock speeds and sometimes increases in feature sets. With the move to 90nm, we have seen that most companies have been having troubles, mostly stemming from heat issues. Is this simply a case of thermal densities growing too quickly or are there other factors that are impacting the situation? For example, despite the move to a smaller process, initial reports are showing AMD's 90nm processors run hotter than AMD's 130nm processors. Would you attribute this to maturity issues with the process (if so, will it get better over time) or...?
A1: For AMD, the power dissipation for an equally performing part has gone down from 130nm to 90nm. AC capacitance and the leakage for the same devices are also lower. Thermal density naturally increases as die size shrinks for high-performance CPUs, but this is expected and is not problematic for AMD 90nm CPUs. Overall, our transition to 90nm is meeting our expectations due primarily to three factors: 1) the power-efficient micro-architecture and ISA extensions of AMD64, 2) our adoption of power-efficient SOI in the previous generation, and 3) the industry-leading level of automation in our fabs, Automated Precision Manufacturing, which allows for incredible levels of accuracy and control of submicron critical dimensions.
Q2: As far as layout goes, could you let us know what sort of changes had to be made to the AMD64 architecture to move down to 90nm?
A2: The original 130nm AMD Athlon 64 processor core was designed with the intention to migrate to 90nm layout rules. Analog circuits (like PLLs) and IO drivers required design modifications for 90nm process migration. Power grids required modifications for electromigration prevention and proper internal voltage distribution.
Q3: With the move to 90nm, Intel introduced a new method of chip layout, by using a mostly computer-optimized layout where functional portions of the chip could be spread over the chip to reduce power/heat density. I don't believe AMD has done anything like this with the move to 90nm (correct me if I'm wrong), but are there any plans to do so in the future? If not, why and what techniques were employed to combat the issue of power density? If so, when?
A3: For our 90nm transition, AMD employed state-of-the-art procedures and tools with success, as our results have shown. While AMD is constantly evaluating new techniques in many areas of CPU design to continue to refine our process, many of the same techniques were employed in our move to 130nm. AMD's CPU implementation flow focuses on the optimization of layout for many purposes, including power/heat dissipation.
Q4: This next question is about lithography. What improvements are there in the lithography tools that AMD uses for 90nm vs. 130nm? I understand that little can be talked about here other than the usual wavelength specs, etc..., but anything you can provide that will help our readers understand exactly what goes into what is normally referred to as a "simple die shrink" would be very helpful.
A4: The entire industry is moving more layers from 248nm Lithography to 193nm Lithography in the transition to 90nm process technology. Patterning margin is always better with the smaller wavelength (note that the lines and spaces in 90nm technologies are sometimes smaller than the wavelength of light used), but several things need to be considered when deciding which Lithography technique is appropriate for each layer in the flow. The higher manufacturing cost of the smaller wavelength process as well as the design rules of each layer and how each layer is integrated into the overall process flow must be balanced against each other. In addition, a variety of RET (resolution enhancement techniques) can be used to boost the imaging capability of a given Lithography process. For instance, "phase shift reticles" are created by etching small transparent grooves into the glass of a reticle. These grooves introduce differences in the "optical path length" for light rays traveling to the wafer. If the grooves are correctly placed near the actual chrome on top of the reticle, which defines the actual geometries of the circuit, the interaction of light from the grooves, light not from the grooves, and the dark areas of the reticle improves the overall resolution of the optical system. Gate patterning is a particularly important and difficult patterning step. The AMD Opteron and Athlon 64 processors have gate dimensions of about 50nm. Printing such fine structures with 193nm light is sort of like trying to write in an 8-point font with a big fat Marker pen. To "print down" from 193nm light to 50nm gates requires a delicate balancing act of pushing all aspects of the Litho process, the reticle dimensions, the resist dimensions, and the final silicon dimensions. If any one of these aspects of the gate patterning is pushed "too far", the image will collapse and yield or speed will be affected. All together, solid Lithography improves our ability to manufacture with high yields and high processor speeds. In the 90nm technology at AMD, we are using a mixture of 248nm and 193nm Lithography (more 193nm than at the 130nm technology generation), with RET techniques employed where appropriate and cost-effective.
Q5: How quickly will the AMD Athlon 64 processor transition from 130nm to 90nm be (a general timeframe would be good here, e.g. when will 50% of all AMD Athlon 64 processor shipments be 90nm, etc...)?
A5: AMD expects that approximately 50% of total eighth-generation wafer starts will be 90nm by the end of 2004.
Q6: Other than the die size (do you have any 90nm die shots by any chance?), have there been any physical changes such as transistor count with the new 90nm parts? What is the new die size of the 90nm parts?
A6: The die size of the 90nm Mobile AMD AthlonTM 64 processor is 84 square millimeters (mm2), a 42 percent reduction from the previous generation, which was 145mm2. The size reduction means 72 percent more chips can be produced per wafer than in the previous generation. AMD will use this capacity increase to better meet the growing demand for its AMD64 products.
The die size for new 90nm AMD AthlonTM 64 processor for desktops is also 84mm2. The die size for the new 90nm AMD OpteronTM processor is expected to be 115mm2.
Q7: The head of memory testing and reviews at AnandTech, Senior Editor Wesley Fink, has recently encountered some interesting data with regards to headroom of memory on AMD Athlon 64 platforms vs. Intel platforms. In particular, one type of memory is able to reach noticeably higher clock speeds on the AMD Athlon 64 platform than on the Intel platforms. I have hypothesized that this is due to the fact that the AMD Athlon 64 processor's on-die memory controller is much faster than an external memory controller, potentially allowing for higher headroom in memory overclocking. Would you care to comment about the validity of that argument? Taking that assumption one step further, how would things like memory headroom and the performance of the memory controller change with the move to 90nm? Am I correct in assuming that any performance improvements on the memory controller side would only be seen with higher clock speeds enabled by the smaller, faster switching transistors or have there been other optimizations with the move to 90nm?
A7: There are no features within the AMD Athlon 64 processor that would explain this, and the observation is probably just due to the timing margin characteristics of the given device or devices that have been tested. AMD does not recommend overclocking the memory interface. However, AMD does believe that AMD64 architecture, in which the memory controller is integrated into the CPU, does improve the overall system performance due to lower latency for memory access.
Q8: Speaking of memory controllers, is there anything that must be done differently now that a memory controller is a part of the CPU when shrinking the transistor size? Or is it treated just like any part of the CPU?
A8: Just like the CPU.
Q9: How does SOI change things at 90nm, or is the impact similar as it was at 130nm? Are there any other technologies AMD has implemented to reduce leakage current at 90nm as it becomes more and more of a problem?
A9: AMD faced many new issues and challenges with the world's first high-volume introduction of SOI and Low-k at the 130nm technology generation. Much of this learning has transferred well to 90nm, making the transition from 130nm technology to 90nm technology relatively straightforward for AMD and the AMD64 products. Furthermore, our SOI technology gives a better performance:power ratio and thereby addresses one of the industry-wide challenges we face as we scale to 90nm. One of the main improvements in power due to SOI is the reduced capacitance enabled by the presence of the Buried Oxide (BOX) layer. This reduces the parasitic junction capacitance relative to Bulk CMOS - and hence reduces total power. The improvement in 130 vs 90nm due to SOI is comparable.
Q10: With the much smaller die of the 90nm core vs. the 130nm core, routing all of the vias on the package must be even more difficult than it already was in the previous 939 pin chip. What changes had to be made or what had to be done to deal with the added difficulty of packaging?
A10: Going from 130nm product to 90nm product does require the use of more advanced packaging technology and finer design rules, which are being used in the industry for leading-edge products.
Q11: Is there anything else you would like to express to the readers in order to have them better understand how difficult it is to shrink the process size of a CPU?
A11: From the technology and manufacturing perspective, the key to a technology transition for AMD Opteron and AMD Athlon 64 processors is achieving a high-performance, high-yielding process flow in manufacturing. The high-performance need is generally dependent on the transistor speed, while the high-yield need is generally dependent on the yield of the metal interconnect. Solid yield in high-volume manufacturing requires a detailed understanding of the interactions between all steps in the interconnect flow, and even what happens to the wafer while it's waiting for the next manufacturing step. The yield challenges are greater with each technology generation, and a key to AMD's ability to transition smoothly to 90nm technology is our ability to quickly identify and improve manufacturing yield using our unique Automated Precision Manufacturing (APM) capabilities.
89 Comments
View All Comments
Bugler - Thursday, October 14, 2004 - link
Newegg Model#: OCZ4001024ELDCPER2-KItem#: N82E16820146890
OCZ EL Platinum Revision 2 Dual Channel Kit 184-Pin 1GB(512MBx2) DDR PC-3200 - Retail $281
Araemo - Thursday, October 14, 2004 - link
nevermind...It is only for sale in 1 gig packs of 2x512 right now, different part #:
http://www.newegg.com/app/viewproductdesc.asp?DEPA...
Araemo - Thursday, October 14, 2004 - link
Also.. is that ram available in the retail channel? I wanted to look up the price, and found the part number(I believe) OCZ400512ELPER2However, this isn't on newegg, or pricewatch.
Araemo - Thursday, October 14, 2004 - link
#23 - WesleyIf you already have overclock results of a p4 from another article, how difficult would it be to include in the graphs? Or were those results using a different enough configuration that it is not an applicable comparison?(In which case, as a reader that loves Anandtech for your thoroughness, I would like to see an applicable comparison.)
All in all, good review. Not as overly wordy as some have been recently(Though I won't name names. ;P).
Bugler - Thursday, October 14, 2004 - link
With the 3500+ showing a 20% overclock and the 3000+ hitting a 45% overclock, it would be great to know how the 3200+ would overclock in this comparison.Wesley, thank you so much. Once again, another fine job.
PrinceGaz - Thursday, October 14, 2004 - link
Further to my earlier comment, the default core voltage of all the Winchester-core 90nm A64 parts currently available is 1.4V, not 1.5V as indicated in the review. Its important this is corrected on the Overclocking page of the review as it is very relevant to the obtained results.I now see that you didn't actually measure the temperature under full-load conditions. Other reports suggest that the 90nm parts do run cooler when idle than the equivalent 130nm parts, but are hotter under full-load conditions due to the higher thermal density. They have been measured as using less power under full-load than the 130nm parts, but run hotter because that power is concentrated in a smaller core.
I'd be very interested to know just how hot that 3000+ got under full-load conditions (eg. running Prime95) when you were feeding it 1.6V instead of 1.4V, and had it clocked at the maximum of 2610MHz. If you were using the standard retail HSF, it may have been rather hot :)
----
As for why the 90nm parts run a little faster than the 130nm parts, I found this post on the AMD forum. I don't know if the info is accurate, but it sounds reasonable:
Whether the 90nm process for the 3000+ to 3500+ runs cooler is still up for speculation to a degree. What will eventually be shown is that the TDP for these processors is lower than the current 130nm. (currently it is 89W TDP, the TDP for these three - when the information is released - is 67W).
In addition the 90nm A64 (DH8-D0) has these improvements over the 130nm (DH7-CG):
- improved DRAM page closing policy
- improved memory addressing with graphics cards using main memory (eg. integrated cards) as frame buffer
- memory controller power reductions (DDR receivers go off in default)
- memory power consumption reductions (CKE pins disconnect)
- second write combining buffer
- SAHF and LAHF instructions are now supported in 64bit mode
Wesley Fink - Thursday, October 14, 2004 - link
#22 - I appreciate your suggestion, and we did overclock the Pentium 4 775 in our "Intel 925X Roundup: Creative Engineering 101" at http://www.anandtech.com/mb/showdoc.aspx?i=2162.The highest stable overclock we could achieve with the P4 on air cooling was 3.92GHz (280x14) on the best overclocking 925X board. Others have achieved higher overclocks with water and phase-change cooling, and higher overclocks will also likely be achieved with those methods on the new 90nm Athlon 64 processors.
We will be looking at Pentium 4 overclocking again in the upcoming launch of some new and improved P4 processors.
thermalpaste - Thursday, October 14, 2004 - link
I am an AMD freak, and Im happy they launched the winchester. You should have, however overclocked the Pentium-4 also, just to compare the scalability of both the CPUs.I had read an article on somebody overclocking the pentium-4 to 6 Ghz. Though this was an unstable overclock, what this indirectly implies is that despite of have a 30-odd stage pipeline, intel may find it difficult to reach speeds in excess of 5Ghz using the 0.09u process...I expect a more thorough comparo soon.....cheers!
deathwalker - Thursday, October 14, 2004 - link
All the buzz in this article is about the O/C'ing capabilities of the new .90 die...personally im just as impressed or maybe even more so with the performance of the memory used in this testing. Having made that statement it is clear that the O/C'ing capability of the 3000+ version of this Proc. takes us back to the good old days of the Celery 300.Wesley Fink - Thursday, October 14, 2004 - link
#11 & #16 - The memory brand is identified in the "Performace Test Configuration" on p.4 and the timings are in Overclocking table on p.5.The OCZ PC3200 Platinum Rev. 2 and other top performing memory is tested on the Athlon 64 in "Athlon 64 Memory: Rewriting the Rules" at http://www.anandtech.com/memory/showdoc.aspx?i=222... Some memory in that review made it to DDR618 on A64, but DDR580 at 1T was the fastest 1T performance.