Archive for October, 2005

Vic’s Gallery back on the air

As the title suggests, Vic’s photo Gallery is back up after the disk blow-up the other week.  Follow the link at the left of Crossed Wires to view.  There’s still been no new photos for a while, but at least you can go back to looking at the old ones! ;)

Recovering a striped LVM volume group… or not

I had a rather brutal lesson in LVM recovery last week.  I had four SATA disks in an LVM VG, one of which failed.  Despite the failed disk *not* being the first drive in the VG, all the LVs in the VG were toast.  Why?  Because when I created the LVs, I striped them over the two disks that were in the VG at the time.

I figured it would be a performance improvement; it proabably was.  I’d forgotten that I did it.  When I added two more disks to the VG, and was able to extend the LVs without any problems, I thought that I must have *not* striped them — turned out that because I’d added a *pair* of new disks, LVM was able to extend the two-striped LVs onto the two new disks just fine.
So, back to the failure: the disk that I lost was the second PV in the VG (the second one in the original pair).  Recovery procedures for LVM involve either substituting a new PV of the same size in place of any PVs from the failed disk, or creating a special device node called “/dev/ioerror” that LVM can refer to instead of a missing PV (usually you link /dev/ioerror to /dev/zero).  Having done either of those, you can add the “–partial” option to your LVM commands and LVM will do its best to make your LVs available (even though they’d have crashing-great gaps in them).
The one rule that is given in this procedure is that you cannot recover any LVs that *started* in the failed PV.  ”No worries,” thought I, “I lost the second PV, all my LVs start on the first PV, I’ll be fine”.  WRONG.  Because I was striping, the LVs all started in the first *and* the second PV.  So, a failure of either disk was totally destructive to the entire VG (of course if I had created a new LV when I added the two new disks, that LV would have been fine since it would have started and striped over the third and fourth PVs).
So what’s my config now?  LVM over a RAID5 array built from my four SATA disks.  Since I’m coming to rely on this stuff more, I figure it’s time to give up a little performance to gain some stability and recoverability (besides, the performance has been just fine so far).

Western Digital off the purchasing list

A hard disk failure last week caused a big issue for the Crossed Wires admins.  The drive was a WD, and they’re not necessarily off the purchasing list because the drive failed: disk drives are mechanical devices, and especially the consumer-level ones are not indestructible.  The reason they’re off the purchasing list is that I had to RMA the drive back to WD, and their nearest agent is in Singapore.

Firstly I verified that the drive was under warranty.  WD’s website confirmed this (they have a tool that lets you look up warranty status by serial number).  I phoned the retailer, who said that although the drive has a three-year warranty, they only handle the first 12 months.
Grrr.
So, back to WD’s website to lodge the RMA — I had to laugh at this point; after completing the RMA process I was asked to print the proforma invoice and packing slip, which said “send the drive to this address:”, followed by a blank field!  Nice one!  Luckily the address was found elsewhere on the website.  There are only four or five places in the world that handle RMAs for WD — at least for retail (surely there must be more for their commercial gear!), and Singapore handles the whole of Asia Pacific.
Grrr again.
To their credit, only 9 days after sending the faulty one, I had a courier knocking on the door with a replacement.  It’s clearly labelled “Recertified” though, which changes its warranty eligibility, so I’m going to be very careful about where I use it and what I store on it (having read some unfavourable comments about WD’s recertified drives).
The icing on the cake: I had an intermittent problem with my second drive in the Mac today.  Guess what: it’s another WD – same model as the failed one in the server.  Oh well, at least I’ve got another one ready to swap in…  :)