I recently had a case where my RPi was hung showing some kind of browser problem.
Is there a way to get informed when that happens? Like a Text message or something?
Then I would power my Pi through one of those WIFI controlled outlets. When it hung, I could power it down and power it up from anywhere to restart it.
Basically, yes. But it was a german site I’ve followed to do it, so it might not be helpful for most readers here. Here is the link for those who either can read German or can make use of it with a translation service: watchdog guidance in german
I believe there are english sites as well to be found with similar information. If you cannot find something suitable, let me know and I will try to write it down as a step-by-step guide (as time permits, the daily job is very demanding these weeks).
The PS has plenty of power and the KBD and mouse are battery powered.
I set up what I hope is a watchdog timed to restart if events.log isn’t updated in an hour.
Google Translate does a great job of translating the German site.
I ran into a problem in that the watchdog files listed there (and nearly everywhere else) are out of date.
But using New Watchdog Info
sudo modprobe bcm2835_wdt NOT sudo modprobe bcm2708_wdog
echo "bcm2835_wdt " | sudo tee -a /etc/modules NOT echo „bcm2708_wdog“ | sudo tee -a /etc/modules
I am not sure I got the syntax correct for the file. This is what I added to watchdog.conf:
file = /var/log/openhab2/events.log
change = 3600
I am searching for some examples.
Not sure if you’ve already thought about it: you can create a test case by creating a file under your control. That will not be updated just to check if the watchdog is triggered then you need not to wait for a real hang up.
In case you haven’t seen it the manual page for the watchdog configuration: https://linux.die.net/man/5/watchdog.conf
Doing it with a file under your control is a very good advise. I ran into a situation where I struggled to be faster in logging in and manually deactivating the watchdog than it rebooted my RasPi
You can update the timestamp of your test file with the touch command.
My setup for the change parameter is 1000 as that results in a wait period that should be long enough if something goes wrong (see above) but still short enough to ensure a “fast” reaction in case of a hangup. But that is a personal decision, everyone needs to experiment a bit to find a personal “correct” setting.
One suggestion from my experience: if you shut down your OH installation for whatever reason, you should keep in mind that the watchdog still looks at the file you’ve configured. And eventually it WILL reboot your machine. That might happen unexpectedly to you and will be done at the most inconvinient point in time according to Murphys Law I did write a command script that shuts down the watchdog before shutting down OH. And a start script with starting OH first and then starting the watchdog:
Well I have uncovered another problem. My mqtt messages stop coming through for hours. In my system I have a separate program (written in C) that sends a message via mqtt whenever it senses an event from any one of 12 sensors. In addition, each sensor reports that it is still alive every 70 minutes or so. My program sends an mqtt message when any of these happen.
Well I set the watchdog to restart if event.log isn’t accessed in 75 minutes. I kept getting restarts. I kept extending the time all the way to 10000 seconds (about 2.8 hours). I was still getting restarts. I looked at events.log and sure enough the log stops being updated and after 2.8 hours daemon.log starts to report the missing updates and does so for 61 seconds then restarts my RPi.
I have a lot of investigating to do but my first test indicates that my program sent a specific mqtt message but it didn’t appear in the event log. However that very same message appeared many times previously in the event log going back all the way to 2019-12-13 14:39:08.617 so the mqtt message sequence works.
I concluded that events are logged only when they change (door from open to close) but most of mine are reports that repeat the last state every 70 minutes. No change there.
I started to look for documentation about the events.log but then realized it would be very simple for me to add a little code to my C program to send a mqtt message to a new item (WatchDogBurp) and send “BurpUp” then 60 minutes later “BurpDown” . That would give me a change which will cause a new line to events.log.
I guess I should have tried to “do it right” by working with the events.log but I have this mode working faster then I could find the documentation for the events.log. And who knows how effort it would take to understand that doc.
If you don’t hear back, conclude that this worked.
Actually I don’t have any rules for these sensors except to send me a text message if security is enabled and any indicate “opening”.
Also, I couldn’t find a way to go from OpenHAB to the RPi watchdog.
And after thinking it over, this is a better solution because it includes my C program in the loop.
Now the watchdog will restart if OpenHAB or my program is hung (or the RPi is hung) I think.