How To Detect When OpenHAB Is Hung

I recently had a case where my RPi was hung showing some kind of browser problem.
Is there a way to get informed when that happens? Like a Text message or something?
Then I would power my Pi through one of those WIFI controlled outlets. When it hung, I could power it down and power it up from anywhere to restart it.

Anybody have a way?

I use the built-in watchdog of my RPi. It looks after the file events.log, if that is not changed for a certain time, the RPi gets rebooted

do I understand it right that you would like to switch off the power the hard way ?
You shouldln’t do that. This could end up in a corrupted file system.

Well, nothing else works.
The keyboard is dead,
The mouse is dead,
Putty doesn’t work.

What should I do?

stefon.oh,
Thanks, I will look into that.

Check your power supply, test with SysRq key combinations. Is swap turned off? Maybe the system is thrashing

1 Like

Can you share how you did that ?

I second that. A power supply that is not powerful enough, especially with additional hardware like keyboard and mouse attached might cause such hungups.

Basically, yes. But it was a german site I’ve followed to do it, so it might not be helpful for most readers here. Here is the link for those who either can read German or can make use of it with a translation service:
watchdog guidance in german

I believe there are english sites as well to be found with similar information. If you cannot find something suitable, let me know and I will try to write it down as a step-by-step guide (as time permits, the daily job is very demanding these weeks).

1 Like

two examples of english sites:
https://www.bayerschmidt.com/raspberry-pi/89-auto-reboot-a-hung-raspberry-pi-using-the-on-board-watchdog-timer.html

EDIT: corrected spelling

3 Likes

The PS has plenty of power and the KBD and mouse are battery powered.

I set up what I hope is a watchdog timed to restart if events.log isn’t updated in an hour.
Google Translate does a great job of translating the German site.
I ran into a problem in that the watchdog files listed there (and nearly everywhere else) are out of date.
But using
New Watchdog Info
I did:
sudo modprobe bcm2835_wdt NOT sudo modprobe bcm2708_wdog
and
echo "bcm2835_wdt " | sudo tee -a /etc/modules NOT echo „bcm2708_wdog“ | sudo tee -a /etc/modules

I am not sure I got the syntax correct for the file. This is what I added to watchdog.conf:
file = /var/log/openhab2/events.log
change = 3600
I am searching for some examples.

1 Like

Not sure if you’ve already thought about it: you can create a test case by creating a file under your control. That will not be updated just to check if the watchdog is triggered then you need not to wait for a real hang up.
In case you haven’t seen it the manual page for the watchdog configuration: https://linux.die.net/man/5/watchdog.conf

2 Likes

Doing it with a file under your control is a very good advise. I ran into a situation where I struggled to be faster in logging in and manually deactivating the watchdog than it rebooted my RasPi :frowning:
You can update the timestamp of your test file with the touch command.

My setup for the change parameter is 1000 as that results in a wait period that should be long enough if something goes wrong (see above) but still short enough to ensure a “fast” reaction in case of a hangup. But that is a personal decision, everyone needs to experiment a bit to find a personal “correct” setting.

One suggestion from my experience: if you shut down your OH installation for whatever reason, you should keep in mind that the watchdog still looks at the file you’ve configured. And eventually it WILL reboot your machine. That might happen unexpectedly to you and will be done at the most inconvinient point in time according to Murphys Law :wink: I did write a command script that shuts down the watchdog before shutting down OH. And a start script with starting OH first and then starting the watchdog:

sudo systemctl start openhab2
sudo systemctl start watchdog
1 Like

stefan.oh. Thanks for the idea and method.
I marked that the solution.

Wolfgang_5. A test file is a good idea, I just tested it using 100 and a text file that I kept saving until I wanted it to reboot. Worked like a charm.

1 Like

Well I have uncovered another problem. My mqtt messages stop coming through for hours. In my system I have a separate program (written in C) that sends a message via mqtt whenever it senses an event from any one of 12 sensors. In addition, each sensor reports that it is still alive every 70 minutes or so. My program sends an mqtt message when any of these happen.
Well I set the watchdog to restart if event.log isn’t accessed in 75 minutes. I kept getting restarts. I kept extending the time all the way to 10000 seconds (about 2.8 hours). I was still getting restarts. I looked at events.log and sure enough the log stops being updated and after 2.8 hours daemon.log starts to report the missing updates and does so for 61 seconds then restarts my RPi.
I have a lot of investigating to do but my first test indicates that my program sent a specific mqtt message but it didn’t appear in the event log. However that very same message appeared many times previously in the event log going back all the way to 2019-12-13 14:39:08.617 so the mqtt message sequence works.

Time for more testing.

Does only your specific MQTT message not appear or does MQTT messsages in general not appear in the logfile ? If the later is the case could it be that this is related to logging options ?

Right now it looks like certain mqtt events get logged but not all. This “feels like” something I am doing somewhere in OpenHAB but I haven’t found it yet. But your idea looks promising

I concluded that events are logged only when they change (door from open to close) but most of mine are reports that repeat the last state every 70 minutes. No change there.
I started to look for documentation about the events.log but then realized it would be very simple for me to add a little code to my C program to send a mqtt message to a new item (WatchDogBurp) and send “BurpUp” then 60 minutes later “BurpDown” . That would give me a change which will cause a new line to events.log.
I guess I should have tried to “do it right” by working with the events.log but I have this mode working faster then I could find the documentation for the events.log. And who knows how effort it would take to understand that doc.
If you don’t hear back, conclude that this worked.

That’s right.

But who cares … are you not using a rule to detect whatever conditions you want (update, change) and do whatever action it is you want to stave off the watchdog?

Actually I don’t have any rules for these sensors except to send me a text message if security is enabled and any indicate “opening”.
Also, I couldn’t find a way to go from OpenHAB to the RPi watchdog.
And after thinking it over, this is a better solution because it includes my C program in the loop.
Now the watchdog will restart if OpenHAB or my program is hung (or the RPi is hung) I think.