RAID (redundant array of independent disks; originally redundant array of inexpensive disks) provides a way of storing the same data in different places (thus, redundantly) on multiple hard disks (though not all RAID levels provide redundancy). By placing data on multiple disks, input/output (I/O) operations can overlap in a balanced way, improving performance. Since multiple disks increase the mean time between failures (MTBF), storing data redundantly also increases fault tolerance.

RAID arrays appear to the operating system (OS) as a single logical hard disk. RAID employs the technique of disk mirroring or disk striping, which involves partitioning each drive’s storage space into units ranging from a sector (512 bytes) up to several megabytes. The stripes of all the disks are interleaved and addressed in order.

In a single-user system where large records, such as medical or other scientific images, are stored, the stripes are typically set up to be small (perhaps 512 bytes) so that a single record spans all disks and can be accessed quickly by reading all disks at the same time.

In a multi-user system, better performance requires establishing a stripe wide enough to hold the typical or maximum size record. This allows overlapped disk I/O across drives.

Standard RAID levels

RAID 0: This configuration has striping but no redundancy of data. It offers the best performance but no fault-tolerance.

RAID 0 diagram

RAID 1: Also known as disk mirroring, this configuration consists of at least two drives that duplicate the storage of data. There is no striping. Read performance is improved since either disk can be read at the same time. Write performance is the same as for single disk storage.

RAID 1 diagram

RAID 2: This configuration uses striping across disks with some disks storing error checking and correcting (ECC) information. It has no advantage over RAID 3 and is no longer used.

RAID 2 diagram

RAID 3: This technique uses striping and dedicates one drive to storing parity information. The embedded ECC information is used to detect errors. Data recovery is accomplished by calculating the exclusive OR (XOR) of the information recorded on the other drives. Since an I/O operation addresses all drives at the same time, RAID 3 cannot overlap I/O. For this reason, RAID 3 is best for single-user systems with long record applications.

RAID 3 diagram

RAID 4: This level uses large stripes, which means you can read records from any single drive. This allows you to use overlapped I/O for read operations. Since all write operations have to update the parity drive, no I/O overlapping is possible. RAID 4 offers no advantage over RAID 5.

RAID 4 diagram

RAID 5: This level is based on block-level striping with parity. The parity information is striped across each drive, allowing the array to function even if one drive were to fail. The array’s architecture allows read and write operations to span multiple drives. This results in performance that is usually better than that of a single drive, but not as high as that of a RAID 0 array. RAID 5 requires at least three disks, but it is often recommended to use at least five disks for performance reasons.

RAID 5 arrays are generally considered to be a poor choice for use on write-intensive systems because of the performance impact associated with writing parity information. When a disk does fail, it can take a long time to rebuild a RAID 5 array. Performance is usually degraded during the rebuild time and the array is vulnerable to an additional disk failure until the rebuild is complete.

RAID 5 diagram

RAID 6: This technique is similar to RAID 5 but includes a second parity scheme that is distributed across the drives in the array. The use of additional parity allows the array to continue to function even if two disks fail simultaneously. However, this extra protection comes at a cost. RAID 6 arrays have a higher cost per gigabyte (GB) and often have slower write performance than RAID 5 arrays.

RAID 6 diagram


Nested RAID levels

Some RAID levels are referred to as nested RAID because they are based on a combination of RAID levels. Here are some examples of nested RAID levels.

RAID 10 (RAID 1+0): Combining RAID 1 and RAID 0, this level is often referred to as RAID 10, which offers higher performance than RAID 1 but at a much higher cost. In RAID 1+0, the data is mirrored and the mirrors are striped.

RAID 10 diagram

RAID 01 (RAID 0+1): RAID 0+1 is very similar to RAID 1+0, except the data organization method is slightly different. Rather than creating a mirror and then stripping the mirror, RAID 0+1 creates a stripe set and then mirrors the stripe set.

RAID 03 (RAID 0+3 also known as RAID 53 or RAID 5+3): This level uses striping (in RAID 0 style) for RAID 3’s virtual disk blocks. This offers higher performance than RAID 3, but at a much higher cost.

RAID 50 (RAID 5+0): This configuration combines RAID 5 distributed parity with RAID 0 striping to improve RAID 5 performance without reducing data protection.

Non-standard RAID levels

RAID 7: This RAID level is based on RAID 3 and RAID 4, but adds caching to the mix. It includes a real-time embedded OS as a controller, caching via a high-speed bus and other characteristics of a standalone computer. It is a non-standard, trademarked RAID level owned by the now defunct Storage Computer Corp.

Adaptive RAID: Adaptive RAID lets the RAID controller decide how to store the parity on the disks. It will choose between RAID 3 and RAID 5, depending on which RAID set type will perform better with the type of data being written to the disks.

RAID S (also known as Parity RAID): This is an alternate, proprietary method for striped parity RAID from EMC Symmetrix that is no longer in use on current equipment. It appears to be similar to RAID 5 with some performance enhancements, as well as the enhancements that come from having a high-speed disk cache on the disk array.

Downsides of using RAID

Nested RAID levels are more expensive to implement than traditional RAID levels because they require a greater number of disks. The cost per GB of storage is also higher for nested RAID because so many of the drives are used for redundancy. Nested RAID has become popular in spite of its cost because it helps to overcome some of the reliability problems associated with standard RAID levels.



Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.