The SSD Anthology: Understanding SSDs and New Drives from OCZ
by Anand Lal Shimpi on March 18, 2009 12:00 AM EST- Posted in
- Storage
Putting Theory to Practice: Understanding the SSD Performance Degradation Problem
Let’s look at the problem in the real world. You, me and our best friend have decided to start making SSDs. We buy up some NAND-flash and build a controller. The table below summarizes our drive’s characteristics:
Our Hypothetical SSD | |
Page Size | 4KB |
Block Size | 5 Pages (20KB) |
Drive Size | 1 Block (20KB |
Read Speed | 2 KB/s |
Write Speed | 1 KB/s |
Through impressive marketing and your incredibly good looks we sell a drive. Our customer first goes to save a 4KB text file to his brand new SSD. The request comes down to our controller, which finds that all pages are empty, and allocates the first page to this text file.
Our SSD. The yellow boxes are empty pages
The user then goes and saves an 8KB JPEG. The request, once again, comes down to our controller, and fills the next two pages with the image.
The picture is 8KB and thus occupies two pages, which are thankfully empty
The OS reports that 60% of our drive is now full, which it is. Three of the five open pages are occupied with data and the remaining two pages are empty.
Now let’s say that the user goes back and deletes that original text file. This request doesn’t ever reach our controller, as far as our controller is concerned we’ve got three valid and two empty pages.
For our final write, the user wants to save a 12KB JPEG, that requires three 4KB pages to store. The OS knows that the first LBA, the one allocated to the 4KB text file, can be overwritten; so it tells our controller to overwrite that LBA as well as store the last 8KB of the image in our last available LBAs.
Now we have a problem once these requests get to our SSD controller. We’ve got three pages worth of write requests incoming, but only two pages free. Remember that the OS knows we have 12KB free, but on the drive only 8KB is actually free, 4KB is in use by an invalid page. We need to erase that page in order to complete the write request.
Uhoh, problem. We don't have enough empty pages.
Remember back to Flash 101, even though we have to erase just one page we can’t; you can’t erase pages, only blocks. We have to erase all of our data just to get rid of the invalid page, then write it all back again.
To do so we first read the entire block back into memory somewhere; if we’ve got a good controller we’ll just read it into an on-die cache (steps 1 and 2 below), if not hopefully there’s some off-die memory we can use as a scratch pad. With the block read, we can modify it, remove the invalid page and replace it with good data (steps 3 and 4). But we’ve only done that in memory somewhere, now we need to write it to flash. Since we’ve got all of our data in memory, we can erase the entire block in flash and write the new block (step 5).
Now let’s think about what’s just happened. As far as the OS is concerned we needed to write 12KB of data and it got written. Our SSD controller knows what really transpired however. In order to write that 12KB of data we had to first read 12KB then write an entire block, or 20KB.
Our SSD is quite slow, it can only write at 1KB/s and read at 2KB/s. Writing 12KB should have taken 12 seconds but since we had to read 12KB and then write 20KB the whole operation now took 26 seconds.
To the end user it would look like our write speed dropped from 1KB/s to 0.46KB/s, since it took us 26 seconds to write 12KB.
Are things starting to make sense now? This is why the Intel X25-M and other SSDs get slower the more you use them, and it’s also why the write speeds drop the most while the read speeds stay about the same. When writing to an empty page the SSD can write very quickly, but when writing to a page that already has data in it there’s additional overhead that must be dealt with thus reducing the write speeds.
250 Comments
View All Comments
Jamor - Wednesday, March 18, 2009 - link
The best tech article I've ever read, and I've read a few.haze4peace - Wednesday, March 18, 2009 - link
Wow, excellent article and so much useful information in an easy to understand way. I have just recently been paying attention to SSDs and thanks to this article I am armed with the information to make the correct choice for my needs. Thanks AnandTech, its the deep and honest articles like these that keep me coming back for more.Alseki - Wednesday, March 18, 2009 - link
I just registered then simply to say, great article. Really informative and enjoyable to read.alexsch8 - Wednesday, March 18, 2009 - link
Anand,Thank you for this article, very informative.
Looking at the example you are giving with your self-manufactured SSD drive: If I save the DOC I use up a page. Based on what you are saying, if I make a change to that DOC, it would then be saved in the next page instead of overwriting the existing page? If that is true, then the File Allocation system (FAT or MFT) itself would contribute quite a bit to the 'filling up of pages' phenomena. Could you elaborate if the proposed file system for SSD addresses this?
Ytterbium - Wednesday, March 18, 2009 - link
Fantastic article, shame that the vendors blacklisted you for telling the truth and OCZ rock for working so hard to address issues.I'll be ordering my Intel SSD soon, I'll defintly consider the Summit when it comes out for my encoding rig as there sequental writes matter to me.
mindless1 - Wednesday, March 18, 2009 - link
Great even, but I've have to disagree with the significance of the passage that suggested the Indilinx controller makes data loss as bad on those SSD as on a conventional hard drive.The primary cause of data loss is mechanical or component failure, not power loss. If we want to consider power loss, it's not just the drive which is prone to lose data, the entire system memory suffers far more data loss than that.
Further, a sufficiently sized supercapacitor should keep the drive operating for a period of time beyond when the rest of the system would be operational, it could be sufficient for the controller to finish writing to flash all received data (or just use an UPS, that's what they're for?).
Second, I can't believe that OCZ only tests designs with HDTach and Atto, I think it more likely they knew of the problem but didn't expect anyone to find it so quickly, and felt the higher sequential speeds made it more marketable. This makes me feel that manufacturers, then online sellers should differentiate their drives with a standardized random read/write score.
What would be really nice is if the Indilinx based SSDs had an application available, similar to a HDD acoustic management bit changing app, that lets the owner set their own preference for IO versus sequential read performance.
gomakeit - Wednesday, March 18, 2009 - link
This is by far the BEST article on SSD I've ever read! Great job anand and yes I read every single word of it!MagicPants - Wednesday, March 18, 2009 - link
Don't they ever try using their own devices? One second of latency should slap any user in the face. It should be very easy for a manufacturer to build a system with their new technology put it in front of people and see what happens, but apparently they're not doing this.They wait for reviewers to do the work for them and then get upset when they find a problem.
What the manufacturers should be taking away from this article is:
1) Try your competitor's products
2) Try your own products
3) Try them in real life as opposed to synthetic tests
4) Compare everything you've tried and market the performance that matters
7Enigma - Thursday, March 19, 2009 - link
But that would make sense....and we know marketing rarely does.paulinus - Wednesday, March 18, 2009 - link
That art is great. Finally someone done ssd test's right, and said loud what we, customers, can get for that hefty pricetags.I've supposed that only choices are intel and new ocz's. Now I know, and big kudos for that.
Just need a bit more $$ for x25-m, it'll be ideal for heavy workstation use, and biggest vertex'll replace wd black in my aging 6910p :)