RAID - Storage Made Smart |
Written by Harry Fairhead | |||
Wednesday, 23 November 2011 | |||
Page 1 of 2 Storing data is fundamental to programming. We often think of the task as something that just involves hardware, but we can take basic storage devices and use them in conjunction with clever algorithms to make the whole thing work better. Storage can be about software. The idea of RAIDRAID “Redundant Arrays of Inexpensive Disks” is an idea whose time has more than come. When it was invented back in 1987 the “Inexpensive” part of the name was something of a joke. For a desktop machine, the cost of a single hard disk was just affordable, the idea of using a whole set of them was out of the question. Today, of course, you can get a high-capacity, high-performance drive for less than $50 and putting multiple units together to make something much better than a single drive is a very attractive proposition. So what is RAID? How does it work and how can you get to make use of it? The RAID idea is very simple - take multiple drive units and connect them together to make them work as a single “virtual” storage device. You treat a RAID system as if it was a single disk drive or storage volume, even though it might consist of multiple physical drives. This sounds obvious enough but what does it get you in addition to more cost, more heat and more noise? RAID enthusiasts often forget that the purpose of the whole idea isn’t obvious and hence don’t bother to explain. There are three main reasons for wanting to implement RAID.
It's important to remember that whatever RAID scheme you choose, it is not a substitute for backup. A RAID system may be more reliable but it can still fail. An earthquake could take out the building and your data. Much smaller disasters, such as a cascading power supply failure, can easily destroy all of the disks in an array - so backup is not optional, even with a RAID system. Knowing how to keep data safe is an essential skill for big companies that rely on their computers, taking an information assurance training will help you acquire these skills from the comfort of your home Six basic types of RAIDThe reasons for building RAID systems seem desirable enough so how do we achieve them? There are a number of types of RAID differing in how much they satisfy each of the objectives. RAID 0 - StripingRAID 0 uses a technique called “striping” which is employed in other RAID versions. The basic idea is that if you want to write a file to the disks it is split up into fixed size blocks. The first block is written to the first drive in the array, the second to the second and so on until we get back to the first drive again. You can see that RAID 0 makes the array of drives look like a single virtual drive by spreading the data over all of them. As long as the drives can function independently of one another there can be a performance gain but this also depends on choosing the correct block or ‘stripe’ size to optimise the performance. In practice RAID 0 should be avoided at all costs for the simple reason that the failure of any drive in the array results in the complete loss of data and data recovery is made much more difficult by the way that the data is spread across the drives.
RAID 1 - MirroringThis is often referred to as mirroring. Additional drives simply backup the original, i.e. they mirror the first drive. In most cases mirroring is used with two drives as this provides the maximum benefit for smallest cost. It doesn’t increase the storage capacity, i.e. two drives provide the same capacity as one, but it does improve data reliability. If one of the drives fails there is still a copy of all of the data on the working drive. You can replace the failed drive with a fresh drive and the mirror copy will be regenerated. Mirroring can also increase performance if it is arranged so the data is read from alternate drives, but this isn’t really the main reason for using Mirroring. RAID 2RAID 2 sounds a bit like RAID 0 in that the data in a file is split and stored on all the drives in the array, but when you look at it in detail it is really a very different conception of how things should work. In this case the data is striped into single bit sized blocks and an error correction code is added. The idea is the data is divided up into data words, error correction bits are added, and these are stored one bit per drive. Typically RAID 2 needs lots of drives to make it work. For example, if the data in the file is naturally divided up into bytes and you add two error correction bits then you need 8 plus 2 i.e. 10 drives to store the data. As you can imagine this creates a highly reliable storage system which can withstand the simultaneous failure of multiple drives (exactly how many depends on the number of error correction bits added). It can also improve performance if suitable hardware is used to run the drives in parallel. In fact you need special hardware to make this scheme work at all and typically the drives need to spin in sync with one another so that the data can be written in parallel.
In many senses RAID 2 is the technological pinnacle of RAID systems, but it is hardly ever used today despite the falling cost of drives. The reason is most probably that it is overkill in the sense that drives are mostly reliable in the sense that they either work or they don’t work. A complete drive failure is well handled by simpler RAID systems - mirroring for example - and the on-the-fly error correction provided by RAID 2 generally isn’t needed. |
|||
Last Updated ( Sunday, 13 April 2014 ) |