Monday, October 4, 2010

Downside of shooting RAW: the thirst for diskspace -- Linux RAID adventures

With my photo directory easily breaking the 100GB limit after about a month of shooting, I needed some more space. Given that disk space is pretty cheap these days, I went for two 1TB drives in software RAID1 (I don't have the discipline to make decent back-ups, hence the RAID). It turns out to be pretty easy to set up a software RAID on Linux -- if you read the documentation properly (I didn't).

Long story short (I had to rebuild the array(s) about 4 times): if you want to boot from the array and/or want the kernel to auto-assemble it for you at boot-time, give the --metadata=0.90 flag to mdadm when creating the array.

Note: I know that automatically assembling the array at boot time isn't the smartest thing to do. The kernel may get confused when adding a drive that was part of another array and try to add that to the existing array, or vice versa. Data will get lost either way. I hope I'll remember this when in such a situation as I am too lazy to use a proper initrd/initramfs to assemble the array at boot.

The final layout is a 1GB boot RAID1 array on sda1 and sdb1 (first and fastest blocks of the disks) and the rest of the hard disk (900-sometingGB) RAID1 on sda2 and sdb2 as root.

When I run out of space again, the plan is to get another 1TB disk and (losslessly) convert to RAID5 to double the storage capacity to 2TB.

The fun part about the current RAID1 is not only that it's redundant against (a single) hard drive failure, but also that the kernel will do some read balancing. When issuing multiple reads, it can distribute them over the two drives.

I should do some proper benchmarking, but I don't have access to my desktop atm. However, it seems as though the reading speed of a single reading job is somewhat slower than reading speed of a single drive. Issuing multiple jobs seems to help.
I'd also guess that, in case of multiple reads, the access delay should decrease, as the work can be distributed over two disks. Still need to test next week-end.

1 comment:

  1. Rather than diving directly into RAID did you think about LVM?

    For more Linux Photography thoughts try my blog discusses the problems