The SSD Anthology: Understanding SSDs and New Drives from OCZ
by Anand Lal Shimpi on March 18, 2009 12:00 AM EST- Posted in
- Storage
The Anatomy of an SSD
Let’s meet Mr. N-channel MOSFET again:
Say Hello
This is the building block of NAND-flash; one transistor is required per cell. A single NAND-flash cell can either store one or two bits of data. If it stores one, then it’s called a Single Level Cell (SLC) flash and if it stores two then it’s a Multi Level Cell (MLC) flash. Both are physically made the same way; in fact there’s nothing that separates MLC from SLC flash, it’s just a matter of how the data is stored in and read from the cell.
SLC flash (left) vs. MLC flash (right)
Flash is read from and written to in a guess-and-test fashion. You apply a voltage to the cell and check to see how it responds. You keep increasing the voltage until you get a result.
SLC NAND flash | MLC NAND flash | |
Random Read | 25 µs | 50 µs |
Erase | 2ms per block | 2ms per block |
Programming | 250 µs | 900 µs |
With four voltage levels to check, MLC flash takes around 3x longer to write to as SLC. On the flip side you get twice the capacity at the same cost. Because of this distinction, and the fact that even MLC flash is more than fast enough for a SSD, you’ll only see MLC used for desktop SSDs while SLC is used for enterprise level server SSDs.
Cells are strung together in arrays as depicted in the image to the right
So a single cell stores either one or two bits of data, but where do we go from there? Groups of cells are organized into pages, the smallest structure that’s readable/writable in a SSD. Today 4KB pages are standard on SSDs.
Pages are grouped together into blocks; today it’s common to have 128 pages in a block (512KB in a block). A block is the smallest structure that can be erased in a NAND-flash device. So while you can read from and write to a page, you can only erase a block (128 pages at a time). This is where many of the SSD’s problems stem from, I’ll repeat this again later because it’s one of the most important parts of understanding SSDs.
Arrays of cells are grouped into a page, arrays of pages are grouped into blocks
Blocks are then grouped into planes, and you’ll find multiple planes on a single NAND-flash die.
The combining doesn’t stop there; you can usually find either one, two or four die per package. While you’ll see a single NAND-flash IC, there may actually be two or four die in that package. You can also stack multiple ICs on top of each other to minimize board real estate usage.
250 Comments
View All Comments
punjabiplaya - Wednesday, March 18, 2009 - link
Great info. I'm looking to get an SSD but was put off by all these setbacks. Why should I put away my HDDS and get something a million times more expensive that stutters?This article is why I visit AT first.
Hellfire26 - Wednesday, March 18, 2009 - link
Anand, when you filled up the drives to simulate a full drive, did you also write to the extended area that is reserved? If you didn't, wouldn't the Intel SLC drive (as an example) not show as much of a performance drop, versus the MLC drive? As you stated, Intel has reserved more flash memory on the SLC drive, above the stated SSD capacity.I also agree with GourdFreeMan, that the physical block size needs to be reduced. Due to the constant erasing of blocks, the Trim command is going to reduce the life of the drive. Of course, drive makers could increase the size of the cache and delay using the Trim command until the number of blocks to be erased equals the cache available. This would more efficiently rearrange the valid data still present in the blocks that are being erased (less writes). Microsoft would have to design the Trim command so it would know how much cache was available on the drive, and drive makers would have to specifically reserve a portion of their cache for use by the Trim command.
I also like Basilisk's comment about increasing the cluster size, although if you increase it too big, you are likely to be wasting space and increasing overhead. Surely, even if MS only doubles the cluster size for NTFS partitions to 8KB's, write cycles to SSD's would be reduced. Also, There is the difference between 32bit and 64bit operating systems to consider. However, I don't have the knowledge to say whether Microsoft can make these changes without running into serious problems with other aspects of the operating system.
Anand Lal Shimpi - Wednesday, March 18, 2009 - link
I only wrote to the LBAs reported to the OS. So on the 80GB Intel drive that's from 0 - 74.5GB.I didn't test the X25-E as extensively as the rest of the drives so I didn't look at performance degradation as closely just because I was running out of time and the X25-E is sooo much more expensive. I may do a standalone look at it in the near future.
Take care,
Anand
gss4w - Wednesday, March 18, 2009 - link
Has anyone at Anandtech talked to Microsoft about when the "Trim" command will be supported in Windows 7. Also it would be great if you could include some numbers from Windows 7 beta when you do a follow-up.One reason I ask is that I searched for "Windows 7 ssd trim" and I saw a presentation from WinHEC that made it sound like support for the trim command would be a requirement for SSD drives to meet the Windows 7 logo requirements. I would think if this were the case then Windows 7 would have support for trim. However, this article made it sound like support for Trim might not be included when Windows 7 is initially released, but would be added later.
ryedizzel - Thursday, March 19, 2009 - link
I think it is obvious that Windows 7 will support TRIM. The bigger question this article points out is whether or not the current SSDs will be upgradeable via firmware- which is more important for consumers wanting to buy one now.Martimus - Wednesday, March 18, 2009 - link
It took me an hour to read the whole thing, but I really enjoyed it. It reminded me of the time I spent testing circuitry and doing root cause analysis.alpha754293 - Wednesday, March 18, 2009 - link
I think that it would be interesting if you were to be able to test the drives for the "desktop/laptop/consumer" front by writing a 8 GiB file using 4 kiB block sizes, etc. for the desktop pattern and also to test the drive then with a larger sizes and larger block size for the server/workstation pattern as well.You present some very very good arguments and points, and I found your article to be thoroughly researched and well put.
So I do have to commend you on that. You did an excellent job. It is thoroughly enjoyable to read.
I'm currently looking at a 4x 256 GB Samsung MLC on Solaris 10/ZFS for apps/OS (for PXE boot), and this does a lot of the testing; but I would be interested to see how it would handle more server-type workloads.
korbendallas - Wednesday, March 18, 2009 - link
If The implementation of the Trim command is as you described here, it would actually kind of suck."The third step was deleting the original 4KB text file. Since our drive now supports TRIM, when this deletion request comes down the drive will actually read the entire block, remove the first LBA and write the new block back to the flash:"
First of all, it would create a new phenomenon called Erase Amplification. This would negatively impact the lifetime of a drive.
Secondly, you now have worse delete performance.
Basically, an SSD 4kB block can be in 3 different states: erased, data, garbage. A block enters the garbage state when a block is "overwritten" or the Trim command marks the contents as invalid.
The way i would imagine it working, marking block content as invalid is all the Trim command does.
Instead the drive will spend idle time finding the 512kB pages with the most garbage blocks. Once such a page is found, all the data blocks from that page would be copied to another page, and the page would be erased. Doing it in this way maximizes the number of garbage blocks being converted to erased.
alpha754293 - Wednesday, March 18, 2009 - link
BTW...you might be able to simulate the drive as well using Cygwin where you go to the drive and run the following:$ dd if=/dev/random of=testfile bs=1024k count=76288
I'm sure that you can come up with fancier shell scripts and stuff that uses the random number generator for the offsets (and if you really want it to work well, partition it so that when it does it, it takes up the entire initial 74.5 GB partition, and when you're done "dirtying" the data using dd and offset in a random pattern, grow the partition to take up the entire disk again.)
Just as a suggestion for future reference.
I use parts of that to some (varying) degree for when I do my file/disk I/O subsystem tests.
nubie - Wednesday, March 18, 2009 - link
I should think that most "performance" laptops will come with a Vertex drive in the near future.Finally a performance SSD that comes near mainstream pricing.
Things are looking up, if more manufacturers get their heads out of the sand we should see prices drop as competition finally starts breeding excellence.