Lessons from a corrupted SD Card

Hi all,

I just want to tell you my recent painful experience with a corrupted SD Card. I do have a monthly reminder to back up the SD Card, but overtime, I under-estimated the changes that I’ve made on the image and ignored the reminders. Hence, my most recent backup was half a year’s old. And here are what I found out not working after that restore.

  1. PiHole wasn’t configured correctly. It took me hours of scratching my head to figure out why my various computers cannot resolve web addresses. I do have a secondary PiHole, so things were fine when the OH rasp-pi was down, but the moment that it got restored and went online again, computers start being affected.
    It turned out that over the summer, I changed my router gateway from .1 to .2. A small detail that I forgot which had a huge impact on all the DNS clients.

  2. HABApp, an external framework that all my Python automation code depends on, couldn’t start. Turned out that it needs to be upgraded (my source code assumed the latest version).

  3. The MyQ and Sony extensions that I rely on but aren’t yet part of the OH distribution cannot be loaded. The root cause is that the OH version in the backup was at 3.0 and at least one of these addons requires 3.1. Apparently I did upgrade OH to 3.1 before, but forgot. I then had to upgrade OH, and scrambled to figure why the garage door was still not recognized. Had to scan the OH Inbox to find the garage door serial #; again, another detail that I forgot.

  4. Zigbee2mqtt could not detect the CC2531 sticks. I spent so much time on this but couldn’t figure out why. I tried to update zigbee2mqtt to the latest version, but then npm has trouble retrieving packages from the central repository. After hours of trying to figure out why, I decided to try it on the Rasp-Pi4 with newer OS (buster). It turned out that the old stretch OS on the rasp-pi3 was the culprit. npm installation worked just fine on buster.
    But zigbee2mqtt still had trouble communicating with the CC2531 stick. I then had to make the assumption that the stick is itself corrupted. Off I went to reflash it with the latest firmware, and all the various Xiaomi motion sensors are back online.

  5. But still, my motion sensors were not triggering the switches correctly. More head scratching before I concluded that it must be caused by the latency introduced by OH 3’s REST API authentication. Despite Spaceman_Spiff’s effort to inject the cache in the OH 3.1 code bases, it still might not function that well. I went on to disable authentication, and so far all seems well.

The gist is do not ignore the small changes that you might think are not important enough to deem another backup. They might be small, but accumulated over time, they become significant. And don’t forget that being human, we will forget things. While ultimately we can reconstruct what was done before, it took time and effort.

Not all is bad however. As part of the restoration effort, I managed to upgrade several components to the latest versions. I also moved zigbee2mqtt to off the PI that hosts OH. It spreads out the components and de-risks in case of failure, but now the stack is spread over two PIs so there are more SD Cards to corrupt.

Cheers,

2 Likes

A lesson almost all of us have had to learn the hard way. A couple things I’ll add:

  • Because humans are forgetful it’s best to automate the backups. Amanda comes with openHABian ;for just that reason, though there are plenty of other options. You can’t forget the backup if it just happens automatically. However, don’t forget to check the backups are working periodically. :wink:

  • This is exactly the same reason I advocate upgrading often instead of waiting months and months or even years between upgrades. All those little changes made to the software by the developers also build up over time which can make the amount of work you need to do to restore more involved too.

I’m glad you got back up and running with nothing lost to time.

Thanks for posting!

1 Like

Indeed.

You can do multiple things:
I have a daily rsync Job that syncs my smarthome folder (that contains all docker-compose files and the data volumes) to a „yesterday“ folder, for those „damn, it worked yesterday!“ moments.

A nightly job that TGZs the same folder and keeps 10 days of copies on an external drive that I could plug and restore on a spare NUC.

And a weekly duplicati backup that backups the smarthome folder to a MEGA.nz cloud Storage.

Call me paranoid :grinning:

You’re paranoid :grinning:.
Serious, it makes sense.
However, Amanda is part of openHABian and already does all of this for you. You can even have a secondary backup to AWS S3. Menu-driven and tested by many users. It does not make sense to setup all of this on your own. You will have a way higher risk of your solution to fail because of some error you’ve overlooked or something you may have forgotten.

1 Like

Thanks for the heads up. I do morbide openhabian, but Debian on a NUC + everything in docker containers. So I simply used the tools I knew.

I will definitely check out Amanda, especially to replace duplicati, which feels a bit clunky, although it works.

Thanks for sharing, I found lots of helpful information here, really appreciate.

1 Like