What's The Deal With Hybrid Storage?



When discussing data storage solutions, it's a topic that is not so hard to understand at first but gets really complicated when you start looking at the to details.

Until the last ten years or so you’d have very limited options for storing your data on the server, and how to achieve the performance needed by your applications. 

You could use as many hard disks drives as your server could fit and create a RAID array of selected level (usually 1, 5, 6 or 1+0). The final performance could be calculated through some formulas and often got to be a few hundred IOPs (input-output operations in a second) of random 4k reads from the array. RAID controllers also used to provide some cache to improve performance a little, especially with lot write requests. This was the most usual way to store data for most customers in those days.
If your working data set was not so big and your budget could afford it, you could also use DRAM-based storage solutions, where all data was stored in regular RAM modules backed up with additional power to prevent data loss if a power failure occurred. These solutions were extremely fast but limited by the size and cost as compared to HDD-based solutions.

If you needed even more disk performance, then all you could do at this time was to purchase a Storage Area Network (SAN) device along with the required infrastructure for your servers, which could consist of hundreds of hard disks drives and offer extreme IO load capabilities.


Let’s crunch some numbers:

Consumer SATA HDD can get you about 70 IOPS, and that’s where you are starting. An entry level RAID controller can use this kind of drives to achieve a few hundreds IOPS anyway.

Enterprise quality SAS HDD can get you about 120 IOPS, and when they are working in a RAID controller or SAN we can talk about almost linear dependence from HDD count. So if you have a permanent load of 2,000 IOPS and want to have some room for growing, then you’ll need 24 SAS HDD - this is not about space, but about performance.

Remember this number because that’s where SSD came and changed all the rules
A few years later, NAND memory prices continued to fall, and you could find that you were able to buy an SSD drive offering you up to 100,000 IOPs for a lower price than a DRAM solution. On top of that, it was a much more reliable solution because SSDs weren’t sensitive to sudden data loss at all!

There is another thing we need to talk about, and it is “latency”
Performance isn’t always about the number of input-output operations you can to perform in a second. When you write something to the array, the controller can cache your request almost instantly, and write it back to the drive(s) later, but when you need to read some data, it is another story.

If your request is not in the cache, then the RAID controller would have no choice but to read the data from the drive, which could take from 1-2 ms to over 500 ms (a half of second!). On the other hand, most of enterprise SAN solutions can serve 95% read requests within 10-15 ms range. Finally, you have SSD drives that could read data almost instantly!

Single enterprise level SSD drive can read up to 95% of random read requests with less than 0.5 ms request time, and we’re talking about tens of thousands of requests in a second!

The caveat? SSDs aren’t cheap and have capacity limitations. And that’s where hybrid storage solutions come into the picture.

The principle is simple: you have a 2-tier storage solution, where all ‘hot’ data is stored at fast SSD drives, and all other data can be stored at regular hard disk drives.
This combination of media types helps you obtain both performance and capacity at an affordable price.

Let me help you to understand: let’s assume that you have a hybrid solution with 2 x 240 Gb SSD devices in the first tier and 20 x 160 Gb SAS HDD drives as a second tier. When you make an IO request, you can almost read the data almost instantly if it's already stored in the SSD tier, if not, the data will be retrieved from the second tier and placed to SSD for some time for future use.
Most hybrid storage solutions have their own algorithms to move data across the different types of media to achieve best performance and price ratio. In a fact, some solutions allow you to obtain the same performance from your hybrid storage solution as you would with an all-flash solution at a fraction of the price!


Author bio:
David Kovacs is passionate about entrepreneurship and triathlon. Presently, he is spending most of his time with his friends trying to kick off an awesome project.
https://www.linkedin.com/in/kovacsd

Featured images:
License: Royalty Free or iStock source: http://www.insiinc.com/iStock_000006910539XSmall.jpg 


Powered by Blogger.