This is the second installment of a series of posts to document “How I recovered my Linux systems…” See the first post for foundation and background.
So, we had two hard drive failures in two different Linux boxes, nearly simultaneously. As is usual with these things, this couldn’t have happened at a worse time: my “main box” (let’s call it PC-A in the following narrative) crapped out just a couple of weeks before I was scheduled to give a first class “Ruby for Newbies”. Fortunately, I had my trusty 10-year-old Sony laptop (also now a Linux system) to take up that particular gap… Class development and presentation covered.
What happened, and how did I diagnose it? This one was simple: AsPC-A is up and running nearly all the time (it collects backup sets from other household systems, and generally is the strong hub of our family operations), I monitor and experience its general health as a matter of course. When it starts acting “sick,” I usually know it pretty quickly. This time, it began running sluggishly, and was pretty obviously “laboring” as it did memory management (process swaps — I told you, I’m a former VMS hacker, so I’m kinda sensitive to this operational behavior).
And although I don’t reboot our Linux boxes very often — unlike Windows, Linux doesn’t need to be routinely rebooted — now seemed like a good time to do so. Upon shutdown, and after system (re)initialization, it became audibly clear that the sda (first and boot) disk drive was having physical problems… I could hear it. What I heard was that the disk was having trouble and making bad noises as it tried to spin up — and it failed on (re)boot. Quick and final conclusion: Dead drive.
PC-A’s configuration included two SATA drives, the now defunct sda (partitioned as sda0, containing /, and sda1, a linux swap partition), and a still-healthy sdb (partitioned sdb0, containing /var, and sdb1 with /usr). Unfortunately, as originally installed, my /home directory tree was on the sda0 partition (as part of /), not in its own separate partition. This meant that several gigabytes of my data (see part 1 of this series) was now very much at risk!
What’s more, I maintain “non-home” libraries of files in separate /…Library directory trees (archive of “Walking A Walk” radio shows, music files, piano scores, photographs, etc.) on /usr, so there were several dozens’ of gigabytes more safely stashed on sdb, the second drive. Years of experience — caution — have convinced me of the wisdom of backups… near-line and remote. I’ve been experimenting a lot with local (near-line) backups, looking for the optimum configuration and deployment of free space for backups sets.
The best approach I’ve discovered so far is to use rsync-based utilities (things like luckyBackup and grsync) to simply duplicate directory trees between systems and/or devices (at very least, separate disks). And fortunately, as sda failed, I had copies of my /home and my /…Library directory trees stashed on my /var partition… not a perfect configuration, but at least everything important was stashed safely on the second drive, sdb.
What’s better, I’ve been a happy-camper customer of CrashPlan.com since early 2012 — it makes remote, off-line backups a cinch (as long as I pay my monthly or annual bill). At sign-up, after a couple of days watching my gigabytes leak across 20-Mb DSL to CrashPlan’s cloud, things quickly settled down to routine and regular incremental backups updating our little chunk of the cloud. My CrashPlan login account lets me visually verify that “all my stuff” is stashed safely — however, I’d yet not been able to validate the Second Rule of Backup, namely: “2. Always practice and verify restoration of your backups on a regular basis.” No opportunity…
Until now… So, knowing where all my data is located, especially on the now-defunct drive, means that I can plan to fully restore it. But first! I’ve got some hardware to replace.
Next post: Is hardware replacement really cheap… enough? Fixing and reconfiguring the failed hard disk.