“What is called good is perfect and what is called bad is just as perfect.”
Walt Whitman

Homemade RAID (Part1)

n

As a photographer I have accumulated vasts quantities of digital information. If you live in the USA or Europe then you can find numerous attractively priced storage solutions. I live in Israel and most of these devices don’t make it out here. One that did is outrageously expensive and for me out of the question. Though my application was primarily for digital images – the solution can be used for many other application where large volumes of information need to be stored and shared amongst a group of users over a private network.

Over the past months I’ve been thinking about, studying and preparing to build a custom Linux based solution. Over the last few days I have finally built the system – though it is still in its last setup and configuration steps – It looks like I have been able to achieve my goals. It has been quite a challenging process and I hope that in sharing my experience I’ll be able to save some efforts for others who may try to do the same. I highly recommend it. These posts will require some technical computer knowledge and experience – but I hope to make them accessible to people without previous experience in Linux, Servers or RAID (like me!).

Objective – Reliable & Affordable Storage

I set out to construct a reliable and affordable storage solution with the following capabilities:

  1. Storage capacity of at least 1.5 Terabyte
  2. Reliability – failure of a single hard drive should not result in loss of data
  3. Network connectivity – to be able to access this storage from numerous computers over a private network.
  4. Scalability – to be able to expand the storage capacity of the solution in the future.

Selected Configuration

I chose to build a server computer that runs the Ubuntu Linux distribution (Linux is a core operating system that has branched out to diverse packages – referred to as distributions). I chose Linux because it is (1) developed by talented engineers with good intents – it is open source; (2) affordable, it’s free; (3) it is becoming widely adopted as a robust server operating system; (4) it has a software RAID solution.

About RAID

RAID is an acronym for Redundant Array of Inexpensive Drives. It represents a collection of solutions/configurations for combining numerous hard drives into larger/more reliable storage devices. Popular RAID configurations are:

  • RAID0 – this is not a reliable solution but it is fast . In this configuration two or more hard drives are used to create one array (for example: 2 500GB drives become a 1TB array). A technique called ‘striping’ is used to split information over the physical hard drives. When a file is saved to the array, the information in the file is actually split between the hard drives in the array. The fact that numerous disk drives are storing the information simultaneously means that the file write is faster then it would have been on one hard drive (theoretically – twice as fast, in practice not quite).
  • RAID1 – often referred to as ‘mirroring’. this is mostly about reliability, not much improvement in speed (it may even cause the system to slow down) and pretty wasteful in space. In this configuration two identical hard drives are used to create mirror copies of each other (for example: 2 500GB drives become a 500GB array). When a file is written to one hard drive it is also duplicated on to the second drive. If one drive fails – the other can be used in its place.
  • RAID5 – often referred to as ‘striping with parity’. In this configuration three or more hard drives are used to create on array – but one of the hard drives is used as a parity drive. The parity drive contains information that can be used to rebuild when one of the other drives fails. A RAID5 array requires only one parity drive regardless of the number of total drives. My array was built using 4 750GB hard drives. One of the drives is a parity drive and the other three are used for actual storage. This means that I will have a storage array of 3×750=2250GB~2.2TB.

RAID can be implemented either through dedicated hardware or software. A Hardware RAID is managed by dedicated hardware devices. A software RAID is managed by the operating system itself. One key difference is the load on the CPU. With dedicated hardware the CPU is free from controlling the RAID. With software RAID the CPU is responsible for controlling the RAID – which can theoretically cause performance problems. Dedicated RAID hardware is expensive. Software RAID has become more widespread because todays processors can easily cope with the load.

My intention was to create two RAID arrays:

  • A RAID1 (mirroring) array for the operating system – to have two duplicate hard drives for the operating system. This was to be implemented as a hardware RAID using features that are available with some mother boards (more on that later).
  • A RAID5 (striping with parity) array for the actual storage space – using 4 750GB drives. This was to be implemented as a software RAID.

Coming next – Selecting the Hardware

This entry was posted in Coming Through, Open Source, outside, Tech Stuff. You are welcome to read 3 comments and to add yours