The SSD Anthology: Understanding SSDs and New Drives from OCZ
by Anand Lal Shimpi on March 18, 2009 12:00 AM EST- Posted in
- Storage
Strength in Numbers, What makes SSDs Fast
Given the way a single NAND-flash IC is organized one thing should come to mind: parallelism.
Fundamentally the flash that’s used in SSDs cut from the same cloth as the flash that’s used in USB drives. And if you’ve ever used a USB flash drive you know that those things aren’t all that fast. Peak performance to a single NAND-flash IC is going to be somewhere in the 5 - 40MB/s range. You get the faster transfer rates by reading/writing in parallel to multiple die in the same package.
The real performance comes from accessing multiple NAND ICs concurrently. If each device can give you 20MB/s of bandwidth and you’ve got 10 devices you can access at the same time, that’s 200MB/s of bandwidth. While hard drives like reads/writes to be at the same place on the drive, SSDs don’t mind; some are even architected to prefer that data be spread out all over the drive so it can hit as many flash devices as possible in tandem. Most drives these days have 4 - 10 channel controllers.
The Recap
I told you I’d mention this again because it’s hugely important, so here it is:
A single NAND flash die is subdivided into blocks. The typical case these days is that each block is 512KB in size. Each block is further subdivided into pages, with the typical page size these days being 4KB.
Now you can read and write to individual pages, so long as they are empty. However once a page has been written, it can’t be overwritten, it must be erased first before you can write to it again. And therein lies the problem, the smallest structure you can erase in a NAND flash device today is a block. Once more, you can read/write 4KB at a time, but you can only erase 512KB at a time.
It gets worse. Every time you erase a block, you reduce the lifespan of the flash. Standard MLC NAND flash can only be erased 10,000 times before it goes bad and stops storing data.
Based on what I’ve just told you there are two things you don’t want to do when writing to flash: 1) you don’t want to overwrite data, and 2) you don’t want to erase data. If flash were used as a replacement for DVD-Rs then we wouldn’t have a problem, but it’s being used as a replacement for conventional HDDs. Who thought that would be a good idea?
It turns out that the benefits are more than worth the inconvenience of dealing with these pesky rules; so we work around them.
Most people don’t fill up their drives, so SSD controller makers get around the problem by writing to every page on the drive before ever erasing a single block.
If you go about using all available pages to write to and never erasing anything from the drive, you’ll eventually run out of available pages. I’m sure there’s a fossil fuel analogy somewhere in there. While your drive won’t technically be full (you may have been diligently deleting files along the way and only using a fraction of your drive’s capacity), eventually every single block on your drive will be full of both valid and invalid pages.
In other words, even if you’re using only 60% of your drive, chances are that 100% of your drive will get written to simply by day to day creation/deletion of files.
250 Comments
View All Comments
punjabiplaya - Wednesday, March 18, 2009 - link
Great info. I'm looking to get an SSD but was put off by all these setbacks. Why should I put away my HDDS and get something a million times more expensive that stutters?This article is why I visit AT first.
Hellfire26 - Wednesday, March 18, 2009 - link
Anand, when you filled up the drives to simulate a full drive, did you also write to the extended area that is reserved? If you didn't, wouldn't the Intel SLC drive (as an example) not show as much of a performance drop, versus the MLC drive? As you stated, Intel has reserved more flash memory on the SLC drive, above the stated SSD capacity.I also agree with GourdFreeMan, that the physical block size needs to be reduced. Due to the constant erasing of blocks, the Trim command is going to reduce the life of the drive. Of course, drive makers could increase the size of the cache and delay using the Trim command until the number of blocks to be erased equals the cache available. This would more efficiently rearrange the valid data still present in the blocks that are being erased (less writes). Microsoft would have to design the Trim command so it would know how much cache was available on the drive, and drive makers would have to specifically reserve a portion of their cache for use by the Trim command.
I also like Basilisk's comment about increasing the cluster size, although if you increase it too big, you are likely to be wasting space and increasing overhead. Surely, even if MS only doubles the cluster size for NTFS partitions to 8KB's, write cycles to SSD's would be reduced. Also, There is the difference between 32bit and 64bit operating systems to consider. However, I don't have the knowledge to say whether Microsoft can make these changes without running into serious problems with other aspects of the operating system.
Anand Lal Shimpi - Wednesday, March 18, 2009 - link
I only wrote to the LBAs reported to the OS. So on the 80GB Intel drive that's from 0 - 74.5GB.I didn't test the X25-E as extensively as the rest of the drives so I didn't look at performance degradation as closely just because I was running out of time and the X25-E is sooo much more expensive. I may do a standalone look at it in the near future.
Take care,
Anand
gss4w - Wednesday, March 18, 2009 - link
Has anyone at Anandtech talked to Microsoft about when the "Trim" command will be supported in Windows 7. Also it would be great if you could include some numbers from Windows 7 beta when you do a follow-up.One reason I ask is that I searched for "Windows 7 ssd trim" and I saw a presentation from WinHEC that made it sound like support for the trim command would be a requirement for SSD drives to meet the Windows 7 logo requirements. I would think if this were the case then Windows 7 would have support for trim. However, this article made it sound like support for Trim might not be included when Windows 7 is initially released, but would be added later.
ryedizzel - Thursday, March 19, 2009 - link
I think it is obvious that Windows 7 will support TRIM. The bigger question this article points out is whether or not the current SSDs will be upgradeable via firmware- which is more important for consumers wanting to buy one now.Martimus - Wednesday, March 18, 2009 - link
It took me an hour to read the whole thing, but I really enjoyed it. It reminded me of the time I spent testing circuitry and doing root cause analysis.alpha754293 - Wednesday, March 18, 2009 - link
I think that it would be interesting if you were to be able to test the drives for the "desktop/laptop/consumer" front by writing a 8 GiB file using 4 kiB block sizes, etc. for the desktop pattern and also to test the drive then with a larger sizes and larger block size for the server/workstation pattern as well.You present some very very good arguments and points, and I found your article to be thoroughly researched and well put.
So I do have to commend you on that. You did an excellent job. It is thoroughly enjoyable to read.
I'm currently looking at a 4x 256 GB Samsung MLC on Solaris 10/ZFS for apps/OS (for PXE boot), and this does a lot of the testing; but I would be interested to see how it would handle more server-type workloads.
korbendallas - Wednesday, March 18, 2009 - link
If The implementation of the Trim command is as you described here, it would actually kind of suck."The third step was deleting the original 4KB text file. Since our drive now supports TRIM, when this deletion request comes down the drive will actually read the entire block, remove the first LBA and write the new block back to the flash:"
First of all, it would create a new phenomenon called Erase Amplification. This would negatively impact the lifetime of a drive.
Secondly, you now have worse delete performance.
Basically, an SSD 4kB block can be in 3 different states: erased, data, garbage. A block enters the garbage state when a block is "overwritten" or the Trim command marks the contents as invalid.
The way i would imagine it working, marking block content as invalid is all the Trim command does.
Instead the drive will spend idle time finding the 512kB pages with the most garbage blocks. Once such a page is found, all the data blocks from that page would be copied to another page, and the page would be erased. Doing it in this way maximizes the number of garbage blocks being converted to erased.
alpha754293 - Wednesday, March 18, 2009 - link
BTW...you might be able to simulate the drive as well using Cygwin where you go to the drive and run the following:$ dd if=/dev/random of=testfile bs=1024k count=76288
I'm sure that you can come up with fancier shell scripts and stuff that uses the random number generator for the offsets (and if you really want it to work well, partition it so that when it does it, it takes up the entire initial 74.5 GB partition, and when you're done "dirtying" the data using dd and offset in a random pattern, grow the partition to take up the entire disk again.)
Just as a suggestion for future reference.
I use parts of that to some (varying) degree for when I do my file/disk I/O subsystem tests.
nubie - Wednesday, March 18, 2009 - link
I should think that most "performance" laptops will come with a Vertex drive in the near future.Finally a performance SSD that comes near mainstream pricing.
Things are looking up, if more manufacturers get their heads out of the sand we should see prices drop as competition finally starts breeding excellence.