Track SD Card health / RPi status

hey folks,
I have been running OH3 on a Pi3 since a while now, and luckily do regular backups, as I had quite some crashes recently. first my OH wasnt responding anymore while away on holidays (murphys law, no?) so I had to restore it to re-open my blinds :smile:
The SD card had an FS error according to the fsck on restart, so I got 2 new ones, but both also crashed with Filesystem errors within a week each, both not even readable any more.! Now I purchased a new A1 Extreme SD card, so far it looks better (ZRam is activated);
Now, as I dont trust this fully yet, I would love to have some kind of health status report via email every friday + a logging of the SD card health - is this possible with the board tools? I am not completely new to Linux, and am comfortable stting up scripts and cron tasks, but I have no knowhow about filesystems, SD read/write errors and suchā€¦ my dream would be the weekly status via email, and an alert if sth. goes wrong via email.
thx for help & inspiration how to do this!
Mark

1 Like

Itā€™s been a few years since I last looked into this but at that time there was no way to assess the health of the SD card, at least not while that card is being actively used. Samsung and others have SD card checking utilities but those require you to manually move the card to another machine and run a scan. And even then Iā€™m not sure it finds all potential problems.

The fact that you had two SD cards crash within a week points to something else might be going on anyway. Unless they were cheap knock-off cards they would not wear out that fast. Did your machine lose power perhaps without shutting down properly? That can corrupt the file system on any SD card because of the way that they work (tl;dr is files are spread across sectors, to write to a file the entire contents of one sector gets copied to a new sector with the changes applied. If you lose power during this process, you not only lose the file being written but any other file/part of file that was in that sector too).

Things may have changed since I last looked into it but a quick Google search doesnā€™t show anything new on this front. I found the following article to be pretty comprehensive in explaining the problem and how to limit it.

1 Like

Thx for the clarification, I almost thought there isnt an easy route.
I did have several crashes due to filesystem reasons back 1-2 years ago, mainly I guess because I didnt have ZRam running, but since OH 3 ZRam is default and it was running stable for about 1 yr now, crashing during holidays 3 weeks ago.
the 2 cards I used were average Intenso 32GB Cards (Level 10) - they did cost 7 bucks each;
Power could of course be an issue, although I never experienced any outages here so far. maybe the walplug is not sufficient? (its the original that came with the RPi3).

I would love to have a more stable setup, as now I dont trust it anymore (and even worse, my wife & kids laugh about our unstable system, not using the blinds applet on their phones anymore :slight_smile:
What is the easiest way to ensue sufficient power and no outages? a Powerbank between wall and Pi? an APC? Or would moving to a Diskstation be more reliable than the Pi?

Thx,
Mark

Havenā€™t tried but you could check out https://www.smartmontools.org/

Use SD mirroring to at least greatly reduce MTTR. Even your kids can exchange SD cards.

Yes, or a UPS HAT like e.g. Waveshare.
Read the full story at
Corrupt FileSystems every 2-3 month? - Setup, Configuration and Use / Beginners - openHAB Community

1 Like

I Hand the intenso cards, too. They both didnā€˜t do well in my rpis and both broke within a year.
So donā€˜t buy them!

1 Like

would have thought that that is not possible at all but ā€¦

Some power banks have momentary voltage interruptions when they lose power, which would make them unreliable for UPS purposes. You can test for it, but I find it hard to trust them.

You can get a UPS with a USB port and then use Network UPS Tools to monitor it. This enables you to also power your modem/router so that openHAB can send you a notification, but if you buy a small one then youā€™ll only get a few minutes of operation before NUT has to shut it down. Meanwhile, a large UPS takes up more space and can get expensive in a hurry.

I recently came across a few mini UPS devices like this one and this one on Amazon, and Iā€™m intrigued. It makes a ton of sense to just have 5V/9V/12V outputs and not waste energy converting back and forth between AC and DC. However, the 5V USB ports on the ones I linked to only output 2A, and the recommendation for an RPi 3/4 is a 3A power supply. Iā€™m not ready to buy a mini UPS, but Iā€™m keeping an eye out for improvements.

1 Like

I had the issue with broken sd-cards some time ago also, but found out that since a while (do not know exactly the release) itā€™s pretty easy to run a raspi with USB-ssd-drives. Since I do this, not problem any more. Just followed this advices (unfortunately in German, but IĀ“m sure you find something in other languages as well, as this is common since a while): Raspberry Pi: Externe USB-Platte/USB-Stick
Regarding USV: I run an old Eaton-Server-USV, which I bought used pretty cheap from ebay and exchanged the battery. Even itā€™s an old one with reduced capacity, it can run the raspi and the switches and routers for daysā€¦ NUTĀ“s handles the management and there is even a binding for OH.

1 Like

Openhabian gives you several well tested tools to significantly improve stability of the system, namely zram, SD sync and autobackup.
In addition to that, in my experience, a standard ups is also important.

I used a usb powerbank with my raspberry pi 3 v2 and it worked well, even if very rarely some problem occurred. I noticed that the raspberry was reporting some undervolt event, and therefore I ended up connecting the original power supply to a standard ups. (The same does not occur for a pi0w, which requires much less power.)

Zram improved reliability a lot, even though you have to pay attention to situations in which log files size grow and fill all available memory.

I am using a high endurance SD card, specifically designed for continuous writes.

I am using the SD sync feature made available by Openhabian and periodically swap the main SD and the backup one. In this way it is possible to test if the backup SD works while still having a valid SD.

I found useful to test all of the backup and restore features on a separate raspberry in order to let the production system work.

Finally, let your family notice when your system works :wink:: usually they tend to emphasize when it fails but never pay attention when it works smoothly.

thx for the link - but this seems more difficult / hand made than the option in the openhab-config tool ā€œMove the system root from the SD card to a USB deviceā€ ā€“ this is the same, no? I had this running under OH 2.5 with an external SSD already, but somewhere I read that I cant run ZRam then anymore, is this correct?

to be honest, iĀ“m actually a bit surprised, that this is possible within the config-tool at all ;-))
I run the usb-ssd since a while and when i managed to get this running, it was not possible to use openhabian-config for this feature.
Ive used ZRAM before and had some issues at this time, so I decided to go this way as the best way for me. maybe itā€™s worth to rethink today, but you know: Never change a running system.

Donā€™t discount the value in this though. Even if it only lasts a couple of minutes, the ability to safely shutdown the RPi before power is gone is huge when it comes to protecting your file system.

The purpose of ZRAM is to minimize the number of writes to the SD card. There is a limited number of writes that flash memory can support before it stops working. If youā€™ve moved to operate off of an external HDD or an SDD (which has itā€™s own built in mitigations for wear out) there is no longer much benefit to running ZRAM. So itā€™s not that you canā€™t run ZRAM, itā€™s that there really is no benefit in doing so.

Thing is rather that SSDs (as root/boot drive) are not supported with openHABian.

1 Like

Absolutely. Iā€™ve actually got an APC 600VA UPS on the way from Amazon, because my CyberPower ST625Uā€™s battery is no longer reliable after 3+ years and I canā€™t find a replacement cell for it.

Is this still correct, when openhabian-config itself offer this possibility? Iā€™ve seen this statement already in the past, but with the actual NVRAM of the raspiĀ“s itā€™s a boot-drive like all others.

100% agree. AND: Running a low-power-device like a raspi give you usually much more than a ā€œfew minutesā€.

also agree, that was earlier one of the reasons I do not even tried it (as far I can remember).

Of course. The docs say so, and theyā€™re pretty explicit. Didnā€™t you read them ?
Availability of a function whatsoever doesnā€™t mean anything.
It does not mean thatā€™s supported let alone that itā€™s a good idea to do.
Thatā€™s what it has been ever. (note I assume you refer to the ā€œmove root to USBā€ function, have you ever (recently) tried that ? Then you will have gone through a series of warnings).

(sigh) No. The boot drive is never just a drive ā€œlike all othersā€. openHABian expects the boot drive to be the SD. Various reliability functions such as mirroring, zram and backups are adversely affected when you start messing with that.

For reference, pretty much all to be said about it is said here:

Any takers ?

2 Likes

Thanks for clarification!

I can look into it. Iā€™ve currently have it scripted through Ansible. But there are so many variables involved Iā€™m not sure itā€™s possible to script through openHABian in a generic manner.

Is the UPS plugged in to the RPiā€™s USB or is it subscribed remotely? What model of UPS is being used (the configuration can be different based on the model and thereā€™s this long table of UPS model mapping to the correct NUT driver to use). How long to wait before shutting down? Is this the NUT master primary (i.e. last to shut itself down) or a slave secondary (NUT hasnā€™t changed to more appropriate terminology yet NUT has changed the terminology after all).

If we make extreme restrictions it might just be possible (e.g. only support locally controlled UPS, force the users to go research to find the right driver themselves, etc.).

NUT really is a disappointing in its complexity.

Of course I donā€™t really have a test platform at the moment. Are RPiā€™s still really hard to get these days? Yes, for what they are going for on Amazon I could get a NUC. :frowning:

Thx all for all the clarification so far!

With my limited unix skills I had the idea to just get a used UPS that can run my Pi and some other servers for some time, install a second Pi not connected to the UPS, and set up a cron job to ping this Pi every 3 min or so, and if its unavailable due to outage shutdown my openhab Pi.

Not really a great solution, but it would do the job;