The Flash Hierarchy & Data Loss

We've already established that a flash cell can either store one or two bits depending on whether it's a SLC or MLC device. Group a bunch of cells together and you've got a page. A page is the smallest structure you can program (write to) in a NAND flash device. In the case of most MLC NAND flash each page is 4KB. A block consists of a number of pages, in the Intel MLC SSD a block is 128 pages (128 pages x 4KB per page = 512KB per block = 0.5MB). A block is the smallest structure you can erase. So when you write to a SSD you can write 4KB at a time, but when you erase from a SSD you have to erase 512KB at a time. I'll explore that a bit further in a moment, but let's look at what happens when you erase data from a SSD.

Whenever you write data to flash we go through the same iterative programming process again. Create an electric field, electrons tunnel through the oxide and the charge is stored. Erasing the data causes the same thing to happen but in the reverse direction. The problem is that the more times you tunnel through that oxide, the weaker it becomes, eventually reaching a point where it will no longer prevent the electrons from doing whatever they want to do.

On MLC flash that point is reached after about 10,000 erase/program cycles. With SLC it's 100,000 thanks to the simplicity of the SLC design. With a finite lifespan, SSDs have to be very careful in how and when they choose to erase/program each cell. Note that you can read from a cell as many times as you want to, that doesn't reduce the cell's ability to store data. It's only the erase/program cycle that reduces life. I refer to it as a cycle because an SSD has no concept of just erasing a block, the only time it erases a block is to write new data. If you delete a file in Windows but don't create a new one, the SSD doesn't actually remove the data from flash until you're ready to write new data.

Now going back to the disparity between how you program and how you erase data on a SSD, you program in pages and you erase in blocks. Say you save an 8KB file and later decide that you want to delete it, it could just be a simple note you wrote for yourself that you no longer need. When you saved the file, it'd be saved as two pages in the flash memory. When you go to delete it however, the SSD mark the pages as invalid but it won't actually erase the block. The SSD will wait until a certain percentage of pages within a block are marked as invalid before copying any valid data to new pages and erasing the block. The SSD does this to limit the number of times an individual block is erased, and thus prolong the life of your drive.

 

Not all SSDs handle deletion requests the same way, how and when you decide to erase a block with invalid pages determines the write amplification of your device. In the case of a poorly made SSD, if you simply wanted to change a 16KB file the controller could conceivably read the entire block into main memory, change the four pages, erase the block from the SSD and then write the new block with the four changed pages. Using the page/block sizes from the Intel SSD, this would mean that a 16KB write would actually result in 512KB of writes to the SSD - a write amplification factor of 32x.

At this point we don't have any data from any of the other SSD controller makers on how they handle situations like this, but Intel states that traditional SSD controllers suffer from write amplification in the 20 - 40x range, which reduces the longevity of their drives. Intel states that on typical client workloads its write amplification factor is less than 1.1x, in other words you're writing less than 10% more data than you need to. The write amplification factor itself doesn't mean much, what matters is the longevity of the drive and there's one more factor that contributes there.

We've already established that with flash there are a finite number of times you can write to a block before it loses its ability to store data. SSDs are pretty intelligent and will use wear leveling algorithms to spread out block usage across the entirety of the drive. Remember that unlike mechanical disks, it doesn't matter where on a SSD you write to, the performance will always be the same. SSDs will thus attempt to write data to all blocks of the drive equally. For example, let's say you download a 2MB file to your band new, never been used SSD, which gets saved to blocks 10, 11, 12 and 13. You realize you downloaded the wrong file and delete it, then go off to download the right file. Rather than write the new file to blocks 10, 11, 12 and 13, the flash controller will write to blocks 14, 15, 16 and 17. In fact, those four blocks won't get used again until every other block on the drive has been written to once. So while your MLC SSD may only have a lifespan of 10,000 cycles, it's going to last quite a while thanks to intelligent wear leveling algorithms.


Intel's wear leveling efficiency, all blocks get used nearly the same amount


Bad wear leveling, presumably on existing SSDs, some blocks get used more than others

Intel's SSDs carry about a 4% wear leveling inefficiency, meaning that 4% of the blocks on an Intel SSD will be worn at a rate higher than the rest.

How SSDs Work How Long Will Intel's SSDs Last?
Comments Locked

96 Comments

View All Comments

  • Mocib - Thursday, October 9, 2008 - link

    Good stuff, but why isn't anyone talking about ioXtreme, the PCI-E SSD drive from Fusion-IO? It baffles me just how little talk there is about ioXtreme, and the ioDrive solution in general.
  • Shadowmaster625 - Thursday, October 9, 2008 - link

    I think the Fusion-IO is great as a concept. But what we really need is for Intel and/or AMD to start thinking intelligently about SSDs.

    AMD and Intel need to agree on a standard for an integrated SSD controller. And then create a new open standard for a Flash SSD DIMM socket.

    Then I could buy a 32 or 64 GB SSD DIMM and plug it into a socket next to my RAM, and have a SUPER-FAST hard drive. Imagine a SSD DIMM that costs $50 and puts out even better numbers than the Fusion-IO! With economy of scale, it would only cost a few dollers per CPU and a few dollars more for the motherboard. But the performance would shatter the current paradigm.

    The cost of the DIMMs would be low because there would be no expensive controller on the module, like there is now with flash SSDs. And that is how it should be! There is NO need for a controller on a memory module! How we ended up taking this convoluted route baffles me. It is a fatally flawed design that is always going to be bottlenecked by the SATA interface, no matter how fast it is. The SSD MUST have a direct link to the CPU in order to unleash its true performance potential.

    This would increase performance so much that if VIA did this with their Nano CPU, they would have an end product that outperforms even Nehalem in real-world everyday PC usage. If you dont believe me, you need to check out the Fusion-IO. With SSD controller integration, you can have Fusion-IO level performance for dirt cheap.

    If you understand what I am talking about here, and can see that this is truly the way to go with SSDs, then you need to help get the word to AMD and Intel. Whoever does it first is going to make a killing. I'd prefer it to be AMD at this point but it just needs to get done.
  • ProDigit - Tuesday, October 7, 2008 - link

    Concerning the Vista boottime,I think it'd make more sense to express that in seconds rather than MB/s.
    I rather have a Windows boot in 38seconds article,than a windows boots with 51MB/s speeds.. That'd be totally useless to me.

    Also, I had hoped for entry level SSD cards, replacements for mininotebooks rather in the category of sub 150$ drives.
    On an XP machine, 32GB is more then enough for a mininotebook (8GB has been done before). Mininotebooks cost about $500,and cheap ones below $300. I,as many out there, am not willing to spend $500 on a SSD drive, when the machine costs the same or less.

    I had hoped maybe a slightly lower performance 40GB SSD drive could be sold for 149$,which is the max price for SSD cards for mini notebooks.
    for laptops and normal notebooks drives upto 200-250$ range would be enough for 64-80GB. I don't agree on the '300-400' region being good for SSD drives. Prices are still waaay too high!
    Ofcourse we're paying a lot of R&D right now,prices should drop 1 year from now. Notebooks with XP should do with drives starting from 64GB,mini notebooks with drives from 32-40GB,and for desktops 160GB is more than enough. In fact, desktops usually have multiple harddrives, and an SSD is only good for netbooks for it's faster speeds, and lower powerconsumption.
    If you want to benefit from speeds on a desktop,a 60-80GB will more then do, since only the Windows, office applications, anti-virus and personal programs like coreldraw, photoshop, or games need to be on the SSD drive.
    Downloaded things, movies, mp3 files, all those things that take up space might as well be saved on an external/internal second HD.

    Besides if you can handle the slightly higher game loadtimes on conventional HD's, many older games already run fine (over 30fps) on full detail, 1900x??? resolution.
    Installing older games on an SSD doesn't really benefit anyone, apart from the slightly lower loadtimes.

    Seeing that I'd say for the server market highest speed and largest diskspace-size matter, and occasionally also lowest power consumption matter.
    => highest priced SSD's. X >$1000, X >164GB /SSD

    For the desktop high to highest speed matters, less focus on diskspace size and power consumption.
    => normal priced SSD's $250 > X > $599 X > 80GB/SSD

    For the notebooks high speed and lowest power consumption matter, smaller size as compensation for price.
    => Normal priced SSD's $175 > X > $399 X > 60GB/SSD

    For the mininotebook normal speed, and more focus on lower power consumption and lowest pricing matter!
    => Low powered small SSD's $75 > X > $199 X > 32GB/SSD
  • gemsurf - Sunday, October 5, 2008 - link

    Just in case anyone hasn't noticed, these are showing up for sale all over the net in the $625 to $750 range. Using live search, I bought one from teckwave on ebay yesterday for $481.80 after the live search cashback from microsoft.

    BTW, Does Jmicron do anything right? Seems I had AHCI/Raid issues on the 965 series Intel chipsets a few years back with jmicron controllers too!
  • Shadowmaster625 - Wednesday, September 24, 2008 - link

    Obviously Intel has greater resources than you guys. No doubt they threw a large number of bodies into write optimizations.

    But it isnt too hard to figure out what they did. I'm assuming that when the controller is free from reads or writes, that is when it takes the time to actually go and erase a block. The controller probably adds up all the pages that are flagged for erasure, and when it has enough to fill an entire block, then it goes and erases and writes that block.

    Assuming 4KB pages and 512KB blocks (~150,000 blocks per 80GB device) what Intel must be doing is just writing each page wherever they could shove it. And erasing one block while writing to all those other blocks. (With that many blocks you could do a lot of writing without ever having to wait for one to erase.) And I would go ahead and have the controller acknowledge the data was written once it is all in the buffer. That would free up the drive as far as Windows is concerned.

    If I was designing one of these devices, I would definately demand as much SRAM as possible. I dont buy that line of bull about Intel not using the SRAM for temporary data storage. That makes no sense. You can take steps to ensure the data is protected, but making use of SRAM is key to greater performance in the future. That is what allows you to put off erasing and writing blocks until the drive is idle. Even a SRAM storage time limit of just one second would add a lot of performance, and the risk of data loss would be negligable.

  • Shadowmaster625 - Wednesday, September 24, 2008 - link

    OCZ OCZSSD2-1S32G

    32GB SLC, currently $395

    The 64GB version is more expensive than the Intel right now, but with the money they've already raked in who really thinks they wont be able to match intel performance or pricewise? Of course they will. So how can this possibly be that great of a thing? So its a few extra GB. Gimme a break, I would rather take the 32GB and simply juggle around stuff onto my media drive every now and then. Did you know you can simply copy your entire folder from the Program Files directory over to your other drive and then put it back when you want to use it? I do that with games all the time. It takes all of 2 minutes... Why pay hundreds of dollars extra to avoid having to do that? It's just a background task anyway. That's how 32GB has been enough space for my system drive for a long time now. (Well, that and not using Vista.) At any rate this is hardly a game changer. The other MLC vendors will address the latency issue.
  • cfp - Tuesday, September 16, 2008 - link

    Have you seen any UK/Euro shops with these available (for preorder even?) yet? There are many results on the US Froogle (though none of them seem to have stock or availability dates) but still none on the UK one.
  • Per Hansson - Friday, September 12, 2008 - link

    What about the Mtron SSD's
    You said they used a different controller vs Samsung in the beginning of the article but you never benchmarked them?
  • 7Enigma - Friday, September 19, 2008 - link

    I would like to know the question to this as well...
  • NeoZGeo - Thursday, September 11, 2008 - link

    The whole review is based on Intel vs OCZ Core. We all know OCZ core had issues that you have mentioned. However, what I would like to see is other drives test bench against OCZ core drive, or even the core II drive. Suppose the controller has a different firmware according to some guys from OCZ on the core 2, and I find that a bit bias if you are using a different spec item to represent all the other drives in the market.

Log in

Don't have an account? Sign up now