The SSD Anthology: Understanding SSDs and New Drives from OCZ
by Anand Lal Shimpi on March 18, 2009 12:00 AM EST- Posted in
- Storage
Putting Theory to Practice: Understanding the SSD Performance Degradation Problem
Let’s look at the problem in the real world. You, me and our best friend have decided to start making SSDs. We buy up some NAND-flash and build a controller. The table below summarizes our drive’s characteristics:
Our Hypothetical SSD | |
Page Size | 4KB |
Block Size | 5 Pages (20KB) |
Drive Size | 1 Block (20KB |
Read Speed | 2 KB/s |
Write Speed | 1 KB/s |
Through impressive marketing and your incredibly good looks we sell a drive. Our customer first goes to save a 4KB text file to his brand new SSD. The request comes down to our controller, which finds that all pages are empty, and allocates the first page to this text file.
Our SSD. The yellow boxes are empty pages
The user then goes and saves an 8KB JPEG. The request, once again, comes down to our controller, and fills the next two pages with the image.
The picture is 8KB and thus occupies two pages, which are thankfully empty
The OS reports that 60% of our drive is now full, which it is. Three of the five open pages are occupied with data and the remaining two pages are empty.
Now let’s say that the user goes back and deletes that original text file. This request doesn’t ever reach our controller, as far as our controller is concerned we’ve got three valid and two empty pages.
For our final write, the user wants to save a 12KB JPEG, that requires three 4KB pages to store. The OS knows that the first LBA, the one allocated to the 4KB text file, can be overwritten; so it tells our controller to overwrite that LBA as well as store the last 8KB of the image in our last available LBAs.
Now we have a problem once these requests get to our SSD controller. We’ve got three pages worth of write requests incoming, but only two pages free. Remember that the OS knows we have 12KB free, but on the drive only 8KB is actually free, 4KB is in use by an invalid page. We need to erase that page in order to complete the write request.
Uhoh, problem. We don't have enough empty pages.
Remember back to Flash 101, even though we have to erase just one page we can’t; you can’t erase pages, only blocks. We have to erase all of our data just to get rid of the invalid page, then write it all back again.
To do so we first read the entire block back into memory somewhere; if we’ve got a good controller we’ll just read it into an on-die cache (steps 1 and 2 below), if not hopefully there’s some off-die memory we can use as a scratch pad. With the block read, we can modify it, remove the invalid page and replace it with good data (steps 3 and 4). But we’ve only done that in memory somewhere, now we need to write it to flash. Since we’ve got all of our data in memory, we can erase the entire block in flash and write the new block (step 5).
Now let’s think about what’s just happened. As far as the OS is concerned we needed to write 12KB of data and it got written. Our SSD controller knows what really transpired however. In order to write that 12KB of data we had to first read 12KB then write an entire block, or 20KB.
Our SSD is quite slow, it can only write at 1KB/s and read at 2KB/s. Writing 12KB should have taken 12 seconds but since we had to read 12KB and then write 20KB the whole operation now took 26 seconds.
To the end user it would look like our write speed dropped from 1KB/s to 0.46KB/s, since it took us 26 seconds to write 12KB.
Are things starting to make sense now? This is why the Intel X25-M and other SSDs get slower the more you use them, and it’s also why the write speeds drop the most while the read speeds stay about the same. When writing to an empty page the SSD can write very quickly, but when writing to a page that already has data in it there’s additional overhead that must be dealt with thus reducing the write speeds.
250 Comments
View All Comments
zdzichu - Sunday, March 22, 2009 - link
Very nice and thorough article. I only lack more current status of TRIM command support in current operating systems. For example, Linux supports it since last year:http://kernelnewbies.org/Linux_2_6_28#head-a1a9591...">http://kernelnewbies.org/Linux_2_6_28#h...a9591f48...
Sinned - Sunday, March 22, 2009 - link
Outstanding article that really helped me understand SSD drives. I wonder how much of an impact the new SATA III standard will have on SSD drives? I believe we are still at the beginning stage for SSD drives and your article shows that much more work needs to be done. My respect for OCZ and how they responded in a positive and productive way should be a model for the rest of the SSD makers. Thank you again for such a concise article.Respectfully,
Sinned
529th - Sunday, March 22, 2009 - link
The first thing I thought of was Democracy. Don't know why. Maybe it was because a company listened to our common goal of performance. Thank you OCZ for listening, I'm sure it will pay off!!!araczynski - Saturday, March 21, 2009 - link
very nice read. the 4/512 issue seems a rather stupid design decision, or perhaps more likely a stupid problem to find this 4/512 solution as 'acceptable'.although a great marketing choice, built in automatic life expectancy reduction.
sounds like the manufacturers want the hard drives to become a disposable medium like styrofoam cups.
perhaps when they narrow the disparity down to 4/16, i might consider buying an ssd. that, or when they beat the 'old school' physical platters in price.
until then, get back to the drawing board and stop crapping out these half arsed 'should be good enough' solutions.
IntelUser2000 - Sunday, March 22, 2009 - link
araczynski: The 4/512 isn't done by accident. It's done to lower prices. The flash technology used in SSDs are meant to replace platter HDDs in the future. There's no way of doing that without cost reductions like these. Even with that the SSDs still cost several times more per storage space.araczynski - Tuesday, March 24, 2009 - link
i understand that, but i don't remember original hard drives being released and being slower than the floppy drives they were replacing.this is part of the 'release beta' products mentality and make the consumer pay for further development.
the 5.25" floppy was better than the huge floppy in all respects when it was released. the 3.5" floppy was better than the 5.25" floppy when it was released. the usb flash drives were better than the 3.5" floppies when they were released.
i just hate the way this is being played out at the consumer's expense.
hellcats - Saturday, March 21, 2009 - link
Anand,What a great article. I usually have to skip forwards when things bog down, but they never did with this long, but very informative article. Your focus on what matters to users is why I always check anandtech first thing every morning.
juraj - Saturday, March 21, 2009 - link
I'm curious what capacity is the OCZ Vertex drive reviewed. Is it an 120 / 250g drive or supposedly slower 30 / 60g one?Symbolics - Friday, March 20, 2009 - link
The method for generating "used" drives is flawed. For creating a true used drive, the spare blocks must be filled as well. Since this was not done, the results are biased towards the Intel drives with their generous amount of spare blocks that were *not* exhausted when producing the used state. An additional bias is introduced by the reduction of the IOmeter write test to 8 GB only. Perhaps there are enough spare blocks on the Intel drives so that these 8 GB can be written to "fresh" blocks without the need for (time-consuming) erase operations.Apart from these concerns, I enjoyed reading the article.
unknownError - Saturday, March 21, 2009 - link
I also just created an account to post, very nice article!Lots of good well thought out information, I'm so tired of synthetic benchmarks glad someone goes through the trouble to bench these things right (and appears to have the education to really understand them). Whats with the grammar police though? geez...