This is the sixth installment of a series of posts to document “How I recovered my Linux systems…” See the first post and second post for the preliminaries, the third post for diagnostics and disassembly, the fourth post for reassembly and hardware checkout, and the fifth post which completes the hardware and software rebuild of my PC-A system… Now, on to my wife’s system, PC-B…
I’ve taken some time in getting back to this, the penultimate post in this series of “recovering Linux,” mostly to be sure that I’d not been fooling myself into a false sense of it’s-fixed-now as I recovered the second of our two Linux systems, PC-B, which suffered a disk failure at roughly the same time as the first, PC-A, system did. Perhaps not surprisingly, although PC-B seemed to have a hard disk failure similar to PC-A’s, the hardware diagnosis, and thus the fix, took an entirely different turn.
Whereas PC-A’s problem was indeed a hard disk failure (a “disk crash” in the popular vernacular, although nothing actually “crashed”) — readily diagnosed by the audible screeching sound that the disk emitted as it failed — PC-B’s failure mode was, upon subsequent evaluation, quite different. And although I was prepared to take a similar course of action — hard disk replacement, repartioning and reformatting, operating system reinstallation, and data recovery from backup resources — to repair it, things turned out quite differently.
PC-A’s disk failure was complete: Upon an attempt to power-up and boot, BIOS would indicate “no disk device available” (or error message words to that effect). Attempting to power-up and boot PC-B, on the other hand, reported something like “cannot find boot partition,” which indicated that the BIOS could still “see” (detect and attempt to read) data from the boot disk drive (again, partitioned as sda0 on this system)… it just couldn’t seem to proceed with the bootstrap. So, has the disk drive gone bad, failed? Or is something a bit more subtle going on?
Certainly, one option is to throw up my hands, declare the disk drive failed (whether it really was or not), and proceed as I had with PC-A. That would cost me the price of another new disk drive… not enough to break the bank, but necessary? What else to try first? How can I prove that this particular drive was actually a(nother) dead-duck, or that it was recoverable?
Several years ago, I had invested in a licensed copy of Steve Gibson‘s (Gibson Research Corporation — grc.com) SpinRite Version 6.0 SATA-capable hard disk diagnostic, repair and recovery utility program. Could I use SpinRite to diagnose, and possibly fix, the ailing disk drive on PC-B?
Gibson is an Intel x86 assembly language programming guru who takes pride and pleasure in creating “tight,” highly-functional, stand-alone utility programs which can run independently of an operating system environment such as Windows. In fact, SpinRite was designed to be booted and run stand-alone from a 3.5-inch floppy disk, so the entire program (together with the minimal open-source “FreeDOS” support environment) requires less — much less — than 1.44MB of storage.
Here’s where things get a bit interesting: Gibson’s ideas about distributing and installing his software products is a bit unorthodox. SpinRite ultimately boots and runs off a 3.5-inch floppy (a largely obsolete drive type on modern PC systems), or (fortunately) from “other bootable media” — like a CDROM perhaps? But when you buy your licensed copy of SpinRite from GRC, you download a program, spinrite.exe, which carries your own licence-code internally, and which must be run — under Windows! — in order to copy and configure it onto a 3.5-inch floppy disk or “other media” (as a bootable “ISO or IMG file“).
But wait! My systems are now 100% Linux, and I don’t want to set up a dual-boot Windows environment (regressive!) just to run spinrite.exe once to create a bootable CDROM disk. Fortunately, this is an ideal situation where a virtual machine (VM) can come to the rescue. In summary, here’s what I did:
- Install VirtualBox on my Linux system PC-A
- Install a licensed copy of Windows XP in that virtual machine environment
- Copy the distribution spinrite.exe program (a 169KB file) from a Linux directory into that Win-XP environment (a scratch-folder) and run it…
- When SpinRite’s configuration window pops-up, click the “Create ISO or IMG file” button to create an ISO file spinrite.iso (a 1.5MB file, inflated from that 169KB file)
- Copy that ISO file back out into the Linux directory…
- And then burn that to a CDROM (image-file mode) using a CD burner utility like K3b or Xfburn. (Note that the ISO image could also be copied to a USB flash drive to create another form of bootable media.)
Voilà! A bootable SpinRite CDROM!… Certainly doesn’t take up but a tiny fraction of the CD’s 700MB, but hey — it’s a self-contained hard-disk-recovery-CD ready to ride to the rescue… Now, does it boot?
…Drop the newly burned CD into the drive and power-up. Yup… it boots. SpinRite’s running standalone, and it sees PC-B’s disk drives, both of them, and all of the partitions as separate devices, sda0, sda1, etc. The fact that SpinRite is detecting the drive partitions is promising… now, can it diagnose any problems, or even fix anything?
Steve Gibson is a really smart guy, and he’s imbued SpinRite with deeply technical hardware diagnostic, evaluation and data recovery methods, some of which are actually quite controversial, particularly Gibson’s claim that SpinRite can “refresh” older drive media surfaces and how it recovers data from sectors which the drive’s own embedded hardware intelligence has marked as “defective.” Well, let’s see what smart and controversial can do…
SpinRite’s user interface is a bit peculiar, but simple enough to use. Mostly, it’s about “selecting one or more devices” to exercise, and then selecting a “level” of drive diagnostics and/or data recovery for it to perform. Once things are set up, you turn it loose and let it run for as long as it takes to scan, evaluate and/or recover data from a drive… usually, this can take at least an hour or more, depending on the size of the drive (partition) and the diagnostic level you request. You’ll typically spend several hours running tests and/or refreshing data surfaces.
Which is exactly what I did: I ran SpinRite at “level 4 – Defect analysis” for each partition on the drive, and then again at “level 3 – Refresh the surfaces” (which also includes “level 2 – Recover unreadable data”) — basically requiring an overnight run to complete. The results?
- SpinRite’s diagnostic counters indicated a non-trivial number of read and head-positioning errors; however, it was able to read, recover and refresh all sectors in each partition. As far as I can tell, read and positioning errors are not uncommon for an “ageing” (older) drive, but it certainly advances the merits of replacing this drive sooner rather than later — that is, I don’t plan to rely on this drive for “more years of service,” and will likely replace it within a few months. However…
- After running SpinRite’s diagnostic and refresh levels, I shut it down and attempted a Linux system reboot. Surprisingly, it booted right up! Apparently, obviously, SpinRite’s data refresh operation did some good!
And PC-B’s been working fine ever since, several weeks now. No (apparent) data loss, no glitches since the SpinRite run. Again, I’ve got no false expectations… It’s likely that this older drive is running on it’s hairy edge, and is due for replacement sooner rather than later.
Because the SpinRite treatment seems to have restored the drive to reasonable operational health — at least for now — I was able to avoid data recovery steps from CrashPlan.com and/or local backup resources. However, had things needed to go farther, those backup resources were indeed in-place and ready to go…
PC-B is also back in service — happy wife — as is PC-A. Linux recovery and rebuild mission(s) accomplished. This whole rather lengthy exercise has taught me a lot about my SOHO systems’ backup practices.
I’ll share these lessons learned, and my experiences with CrashPlan.com so far, in the next and final post to this Recovering Linux series.