The SSD Anthology: Understanding SSDs and New Drives from OCZ
by Anand Lal Shimpi on March 18, 2009 12:00 AM EST- Posted in
- Storage
Strength in Numbers, What makes SSDs Fast
Given the way a single NAND-flash IC is organized one thing should come to mind: parallelism.
Fundamentally the flash that’s used in SSDs cut from the same cloth as the flash that’s used in USB drives. And if you’ve ever used a USB flash drive you know that those things aren’t all that fast. Peak performance to a single NAND-flash IC is going to be somewhere in the 5 - 40MB/s range. You get the faster transfer rates by reading/writing in parallel to multiple die in the same package.
The real performance comes from accessing multiple NAND ICs concurrently. If each device can give you 20MB/s of bandwidth and you’ve got 10 devices you can access at the same time, that’s 200MB/s of bandwidth. While hard drives like reads/writes to be at the same place on the drive, SSDs don’t mind; some are even architected to prefer that data be spread out all over the drive so it can hit as many flash devices as possible in tandem. Most drives these days have 4 - 10 channel controllers.
The Recap
I told you I’d mention this again because it’s hugely important, so here it is:
A single NAND flash die is subdivided into blocks. The typical case these days is that each block is 512KB in size. Each block is further subdivided into pages, with the typical page size these days being 4KB.
Now you can read and write to individual pages, so long as they are empty. However once a page has been written, it can’t be overwritten, it must be erased first before you can write to it again. And therein lies the problem, the smallest structure you can erase in a NAND flash device today is a block. Once more, you can read/write 4KB at a time, but you can only erase 512KB at a time.
It gets worse. Every time you erase a block, you reduce the lifespan of the flash. Standard MLC NAND flash can only be erased 10,000 times before it goes bad and stops storing data.
Based on what I’ve just told you there are two things you don’t want to do when writing to flash: 1) you don’t want to overwrite data, and 2) you don’t want to erase data. If flash were used as a replacement for DVD-Rs then we wouldn’t have a problem, but it’s being used as a replacement for conventional HDDs. Who thought that would be a good idea?
It turns out that the benefits are more than worth the inconvenience of dealing with these pesky rules; so we work around them.
Most people don’t fill up their drives, so SSD controller makers get around the problem by writing to every page on the drive before ever erasing a single block.
If you go about using all available pages to write to and never erasing anything from the drive, you’ll eventually run out of available pages. I’m sure there’s a fossil fuel analogy somewhere in there. While your drive won’t technically be full (you may have been diligently deleting files along the way and only using a fraction of your drive’s capacity), eventually every single block on your drive will be full of both valid and invalid pages.
In other words, even if you’re using only 60% of your drive, chances are that 100% of your drive will get written to simply by day to day creation/deletion of files.
250 Comments
View All Comments
zdzichu - Sunday, March 22, 2009 - link
Very nice and thorough article. I only lack more current status of TRIM command support in current operating systems. For example, Linux supports it since last year:http://kernelnewbies.org/Linux_2_6_28#head-a1a9591...">http://kernelnewbies.org/Linux_2_6_28#h...a9591f48...
Sinned - Sunday, March 22, 2009 - link
Outstanding article that really helped me understand SSD drives. I wonder how much of an impact the new SATA III standard will have on SSD drives? I believe we are still at the beginning stage for SSD drives and your article shows that much more work needs to be done. My respect for OCZ and how they responded in a positive and productive way should be a model for the rest of the SSD makers. Thank you again for such a concise article.Respectfully,
Sinned
529th - Sunday, March 22, 2009 - link
The first thing I thought of was Democracy. Don't know why. Maybe it was because a company listened to our common goal of performance. Thank you OCZ for listening, I'm sure it will pay off!!!araczynski - Saturday, March 21, 2009 - link
very nice read. the 4/512 issue seems a rather stupid design decision, or perhaps more likely a stupid problem to find this 4/512 solution as 'acceptable'.although a great marketing choice, built in automatic life expectancy reduction.
sounds like the manufacturers want the hard drives to become a disposable medium like styrofoam cups.
perhaps when they narrow the disparity down to 4/16, i might consider buying an ssd. that, or when they beat the 'old school' physical platters in price.
until then, get back to the drawing board and stop crapping out these half arsed 'should be good enough' solutions.
IntelUser2000 - Sunday, March 22, 2009 - link
araczynski: The 4/512 isn't done by accident. It's done to lower prices. The flash technology used in SSDs are meant to replace platter HDDs in the future. There's no way of doing that without cost reductions like these. Even with that the SSDs still cost several times more per storage space.araczynski - Tuesday, March 24, 2009 - link
i understand that, but i don't remember original hard drives being released and being slower than the floppy drives they were replacing.this is part of the 'release beta' products mentality and make the consumer pay for further development.
the 5.25" floppy was better than the huge floppy in all respects when it was released. the 3.5" floppy was better than the 5.25" floppy when it was released. the usb flash drives were better than the 3.5" floppies when they were released.
i just hate the way this is being played out at the consumer's expense.
hellcats - Saturday, March 21, 2009 - link
Anand,What a great article. I usually have to skip forwards when things bog down, but they never did with this long, but very informative article. Your focus on what matters to users is why I always check anandtech first thing every morning.
juraj - Saturday, March 21, 2009 - link
I'm curious what capacity is the OCZ Vertex drive reviewed. Is it an 120 / 250g drive or supposedly slower 30 / 60g one?Symbolics - Friday, March 20, 2009 - link
The method for generating "used" drives is flawed. For creating a true used drive, the spare blocks must be filled as well. Since this was not done, the results are biased towards the Intel drives with their generous amount of spare blocks that were *not* exhausted when producing the used state. An additional bias is introduced by the reduction of the IOmeter write test to 8 GB only. Perhaps there are enough spare blocks on the Intel drives so that these 8 GB can be written to "fresh" blocks without the need for (time-consuming) erase operations.Apart from these concerns, I enjoyed reading the article.
unknownError - Saturday, March 21, 2009 - link
I also just created an account to post, very nice article!Lots of good well thought out information, I'm so tired of synthetic benchmarks glad someone goes through the trouble to bench these things right (and appears to have the education to really understand them). Whats with the grammar police though? geez...