URI: 
       WHATEVER YOU DO, DON'T REBOOT
       
       I experienced my first hard-drive failure last week. A four month old
       WD SN770 BLACK NVMe SSD in my Dell laptop failed without any
       discernibly notice or reason. Everything about my computer was fine
       about 7 hours earlier when I hibernated to disk for the workday. When
       I lifted the lid upon my return and proceeded to edit some Emacs
       buffers, I found that I was unable to save my work! Emacs complained
       about the files being read only.
       
       I investigated this behavior a little but not quite enough. Instead of looking more deeply into the current state of my computer and figuring out why it might have become read only I decided to do a power cycle[1]. I thought a quick reboot would fix everything...
       
       Big mistake. The computer no longer recognized the drive. BIOS wouldn't get past a warning: the PCIe slot was "empty". As soon as I realized what had happened my heart sank and my pulse quickened. I had just experienced my first hard-drive failure[2].
       
       Of course I had a backup.
       
       Yes. Luckily, I'd gotten into the practice of using `rsync' to copy my
       home folder to an external drive (which is then backed up to a second
       external drive). Unfortunately, since installing Void over the summer
       I'd gotten lazy about backups. I hadn't set them to occur on a
       schedule, so my most recent backup was from 5 days earlier, and the
       one before that went a month back.
       
       I'm not a prolific coder, so 5 days lost was not a terrible amount of
       work gone. In fact, I'd been pretty sedentary on my computer those
       last few days. This is all that I'd lost:
       - Work on a Gopher front-end for libgen (~8 hours)
       - Various notes in my personal knowledge repository (unsure of scale)
       - Org markup for my previous phlog post
       
       Once I accepted what I'd never get back I set my mind to getting my
       computer operational again. The experience turned from a shock into a
       thrill. If you can believe it, I was actually excited that my
       hard-drive failed. I'd been given a reason to battle test my ability
       to restore from a backup. Again: this was my first hard-drive
       failure. Up until this point the purpose of backups had been purely
       theoretical.
       
       So I went out and bought a new SSD. I chose the cheapest compatible option I could find at the computer store, a 1TB Kingston SKC3000[3]. I installed the SSD, flashed the Void installer onto a drive, and setup my machine by following the Void full-disk encryption guide. I did all the usual system configuration for users, groups, packages, and services. I used `rsync' to bring the home folder over (using all the switches that seemed relevant, `pEogtUr'). I also ran `stow', which has been a newcomer program in my computer life for which I am very grateful, to setup links in `/etc' and my home folder. After three and one half hours I was back up and coding! Honestly, it was like I hadn't missed a beat.
       
       Most surprising about this experience is how I reacted. I thought I'd
       be angry, frustrated, or under the covers in tears. In actuality I was
       excited (finally my hard-drive fails! A right of passage! A new
       experience!) I guess it shows that I've matured and gathered some
       perspective in my life. Things could have been worse; they
       weren't. Things could have been better; but not by much. I'm now
       happily back in my computer sanctuary publishing new phlog posts,
       chatting with people on IRC, and learning more about how these
       incredible machines can work to make my life better.
       
       
       Reflections
       ----------------------------------------------------------------------
       I've already made some changes to my situation to better protect
       against hard-drive failure. I've setup a backup schedule that runs
       every night. I've written some small scripts to make the system
       configuration of users, groups, packages, and services automatic. And
       I've figured out what I need to do next if I really want to take safe
       data storage seriously:
       - Find an off-site storage solution.
       - Use Blu-Ray disks as more reliable, long-term backup media.
       - Replace `rsync' with a program that can recover deleted files.
       
       It's possible to weather unplanned events, whatever they may be. But
       resilience in the face of these calamities takes careful planning and
       consistent procedures. In other words: predictability.
       
       
       Footnotes
       ----------------------------------------------------------------------
       
       
       
       Footnotes
       _________
       
       [1] I would later learn that NVMe drives go into read only mode when they fail. This "feature" is to give time so the data can be moved to another drive. If I had known this, I would have attempted to mount my external drive at `/tmp', which would have still been writable since it's on RAM. Then I would have proceeded with my regular backup routine and tried my luck at a `dd' of the full drive.
       
       [2] I'd always heard that hard-drives fail. I must have been lucky most of my life. This experience has given me a new outlook: don't trust a drive to last more than the current day.
       
       [3] Ironically, the three clerks at the computer store I sought for help didn't know much if anything about SSDs. I had to give myself a quick refresher on the technical jargon and what it all meant. I learned that: NVMe is the protocol, M.2 specifies the connector, 2280 specifies the dimensions, PCIe 4.0 is the interface and its backwards compatible with 3.0.