RAM - Memory Technology Overview
by Jarred Walton on September 28, 2004 12:05 AM EST- Posted in
- Memory
Memory Latencies Explained
One big question that remains is latency. All the bandwidth in the world will not help if you have to wait forever to get the needed data. It is important to note, however, that higher latencies can be compensated for. The Pentium 4, for example, has improved buffering, sophisticated prefetch logic, and the ability to have many outstanding memory requests. It loves bandwidth, and performance has been helped substantially by increasing the bus speeds, even with higher memory latencies. Graphics chips also tend to be more forgiving of higher latencies. Any design can be modified to work with higher or lower latencies, of course; it is but one facet of the overall goal which needs to be addressed. Still, the question remains, how does memory latency relate to timings and bandwidth?The simple answer is that it is directly related to the memory timings, but you cannot compare timings directly. The reason for this is that the memory timings are relative to the base clock speed of the RAM - they are the number of memory clock cycles that each operation requires. For DDR memory, this means that the cycle time is calculated using one half of the data transfer speed. PC3200 DDR memory has a 64-bit bus that transfers up to 3200 MB/s. Converting that to a clock speed means converting bytes to bits (multiply by eight), then divide by that bus width, and we get the effective clock speed; the base clock speed is half the effective clock speed.
PC3200:
3200 MB/s * 8 bits = 25600 Mb/s
25600 Mb/s / 64-bits = 400 MHz
400 MHz / 2 = 200 MHz base clock speed
Other memory types may use quad or even octal data rates, but if we convert those into the base clock speed, we can compare latencies. Where timings are listed in clock cycles, latency is listed in nanoseconds (ns). A CL of 2.0 sounds better than a CL of 5.0, but depending on the memory clock, it may actually be closer than we would at first expect. By converting all of the timings into nanoseconds, we can compare performance. We will save detailed comparisons for the next installment, but as an example, suppose we have two memory types - one with a CL of 4.0 and a base clock speed of 333 MHz, and the second with a CL of 2.5 and a base clock speed of 200 MHz.
CL | Clock Speed | Cycle Time | Real Latency |
2.5 | 200 MHz | 5.0 ns | 12.5 ns |
4.0 | 333 MHz | 3.0 ns | 12.0 ns |
In this specific example, we see that even with a CL that's 60% higher, the effective latency can actually end up being slightly slower. This is something that we will examine further in the next article of this series.
22 Comments
View All Comments
666an666 - Thursday, May 14, 2009 - link
Thanks for the details. Unfortunatelt, most sellers of RAM (and most brand packagings) fail to mention these measurement details. They only show obscure model numbers and "PC-3200" or whatever. They usually only offer the choice of various brands, not various CL values.letter rip - Saturday, December 25, 2004 - link
This is great reading. When's the next installment?Herm0 - Wednesday, November 10, 2004 - link
There are two things that sould improve greatly a DIMM performance, in addition to the well known timings things "2-2-2-6"... , but looking at DIMMs specs, are hard to know :- The number of internal Banks. When a DIMM use multiple banks, the DIMM is divided in pieces, each holding its own grid of data and the logic to access it. Going from one bank to another one have no penalty : the memory controller have to send the bank address on two physical DIMM pins (so that it can't be more than 4 banks in a DIMM) at each access. Having a 2/4 bank DIMM is really like having 2/4 DIMMs : while one bank is waiting for a delay to exhaust (a CAS latency, a RAS latency, a RAS precharge...), the memory controller can send an order or do r/w things on another one... Most manufacturer build 2 banks DIMMs (when they publish that information !), few of them do 4 banks DIMMs.
- The wideness of their row. It's slow to access to the 1st data of a row (1: wait for tRP, Row Precharge, from the last operation, 2: send the new row address and wait tRCD, 3: Ras to Cas Delay, send the column address and wait tCL, Cas delay, read the 1st 64bit bloc of data), but it's fast to read from the activated row (Send the starting column and wait tCL, then read/write data, 1 or 2 per clock (SDRAM or DDRAM), of the pre-programmed length & order). In a ideal DIMM having only 1 row, the only penalty would be from the tCL one ! The more large is a row, the more data can be accessed before dealing with Row delays (Precharge, and Ras to Cas). The row size is nearly never published, and I don't know how to get the number from the detailed DIMM/DRAM specs...
Looking at 1Gb DDR400 DIMM modules too as #19, a good one, theorically, seems to be a Kingston's DIMMs :
- Timings = 2.5-3-3-7 (shouldn't last digit be 2.5+3+2 = 7.5 or 8 ?), most 1 Gb DIMMs are 3-3-3-8 or slowers.
- Banks = 4, most of DIMMs, even high-end ones, are only 2 Banks.
- Row size = ??? Unknown...
Am I right, or do I have to re-do Ars Technica lessons ? :-)
Gioron - Thursday, September 30, 2004 - link
In terms of buying 512M of fast memory of 1G of slow memory... here's what a quick look at prices for memory looked like (all corsair sticks and only from one vendor because I'm lazy and didn't want to complicate things):512M "Value" (CL2.5): $77
512M "XMS" (CL2): $114
512M "Xtra low" (2-2-2-5): $135
1G "Value" kit (CL3, 2x512M):$158
To me, it looks like the "Xtra low" is indeed not a good bang for the buck, with the 1G upgrade only $20 more. However, the "XMS" 512M might be a good price point if you don't want to go all the way to $158 but have more than $77. Going for insanely low latencies seems to be only worth it if you have plenty of cash to spare and are already at 1G or more. (Or else are optimizing for a single, small application that relies heavily on RAM timings, but I don't think you'll run into that too much in a desktop environment.)
One thing that might be useful in later articles is a brief discussion on the tradeoffs between size and performenace in relation to swapping pages to disk. Not sure if that will fit in with the planned article content, however.
JarredWalton - Wednesday, September 29, 2004 - link
??? I didn't think I actually started with a *specific* type of RAM - although I suppose it does apply to SDRAM/DDR, it also applies to most other types of RAM at an abstract level. There are lots of abstractions, like the fact that a memory request actually puts the row address and column address on different pins - it doesn't just "arrive". I didn't want to get into really low-level details, but look more at the overall picture. The article was more about the timings and what each one means, but you have to have a somewhat broader understanding of how RAM is accessed before such detail as CAS and RAS can really be explained in a reasonable manner.Lynx516 - Wednesday, September 29, 2004 - link
Not much has changed fundementaly with SDRAM since the early days of ddR.I never actually said a burst was a column but infact a continous set of columns (unless interleaved).
Ok I admit there arnt many books on processor design and latency however there are data sheets and articles that describe the basics. Once tyou have grasped the basics you can work it out using the data sheets e.t.c
Probably a better place to start with this series would have been the memory heirarchy instead of starting with a specifc
type of RAM
JarredWalton - Wednesday, September 29, 2004 - link
The idea here is to have an article on Anandtech.com. :) I like Ars Technica as much as the next guy, but there are lots of different ways of describing technology. Sometimes you just have to write a new article covering information available elsewhere, you know? How many text books are there on processor design and latency? Well, here's another article discussing memory. Also worth noting is that Ars hasn't updated their memory information since the days of SDRAM and DDR (late 2000), and things certainly have changed since then.I should clarify my last comment I made: the column width of DDR is not really 32 bytes or 64 bytes, but that seems to be how many memory companies now refer to it in *layman's* terms. This article is much more of a layman's approach. The deep EE stuff on how everything works is more than most people really want to know or understand (for better or for worse). A column can also be regarded as each piece of a burst, which is probably the correct terminology. We'll be looking at various implementations in the next article - hopefully stuff that you haven't read a lot about yet. :)
greendonuts3 - Tuesday, September 28, 2004 - link
Meh. You kind of started in the middle of the topic and worked your way outward/backward/forward. As a general user, I found the wealth of info more confusing than helpful in understanding ram. Maybe you could focus just on timing issues, which seems to be your intent, and refer the reader to other articles (eg the Ars one mentioned above) for the basics?Thanks.
JarredWalton - Tuesday, September 28, 2004 - link
The comparison with set associativity is not that bad, in my opinion. What you have to remember is that we would then be talking about a direct-mapped cache with a whopping four entries (one per sense amp/active row). I guess I didn't explain it too well, and it's not a perfect match, true.Regarding burst lengths, each burst is not a column of information, although perhaps it was on older RAM types. For instance, the burst length of DDR can be 4 or 8. Each burst transmits (in the case of single-channel configurations) 64 bits of data, or 8 bytes. The column size is not 8 bytes these days, however - it is either 32 bytes or 64 bytes on DDR. (Dual-channel would effectively double those values.)
ss284 - Tuesday, September 28, 2004 - link
I wouldnt say that the article is that confusing, but there is much truth in the post above^^^.-Steve