How To Detect When OpenHAB Is Hung

PeteC · August 24, 2020, 7:25pm

I recently had a case where my RPi was hung showing some kind of browser problem.
Is there a way to get informed when that happens? Like a Text message or something?
Then I would power my Pi through one of those WIFI controlled outlets. When it hung, I could power it down and power it up from anywhere to restart it.

Anybody have a way?

stefan.oh · August 24, 2020, 7:38pm

I use the built-in watchdog of my RPi. It looks after the file events.log, if that is not changed for a certain time, the RPi gets rebooted

Wolfgang_S · August 24, 2020, 7:48pm

do I understand it right that you would like to switch off the power the hard way ?
You shouldln’t do that. This could end up in a corrupted file system.

PeteC · August 24, 2020, 8:30pm

Well, nothing else works.
The keyboard is dead,
The mouse is dead,
Putty doesn’t work.

What should I do?

PeteC · August 24, 2020, 8:31pm

stefon.oh,
Thanks, I will look into that.

Sascha_ · August 25, 2020, 7:48am

Check your power supply, test with SysRq key combinations. Is swap turned off? Maybe the system is thrashing

allen · August 25, 2020, 8:13am

Can you share how you did that ?

stefan.oh · August 25, 2020, 6:32pm

I second that. A power supply that is not powerful enough, especially with additional hardware like keyboard and mouse attached might cause such hungups.

stefan.oh · August 25, 2020, 6:43pm

Basically, yes. But it was a german site I’ve followed to do it, so it might not be helpful for most readers here. Here is the link for those who either can read German or can make use of it with a translation service:
watchdog guidance in german

I believe there are english sites as well to be found with similar information. If you cannot find something suitable, let me know and I will try to write it down as a step-by-step guide (as time permits, the daily job is very demanding these weeks).

Wolfgang_S · August 25, 2020, 8:15pm

two examples of english sites:
https://www.bayerschmidt.com/raspberry-pi/89-auto-reboot-a-hung-raspberry-pi-using-the-on-board-watchdog-timer.html

EDIT: corrected spelling

PeteC · August 27, 2020, 1:03am

The PS has plenty of power and the KBD and mouse are battery powered.

I set up what I hope is a watchdog timed to restart if events.log isn’t updated in an hour.
Google Translate does a great job of translating the German site.
I ran into a problem in that the watchdog files listed there (and nearly everywhere else) are out of date.
But using
New Watchdog Info
I did:
sudo modprobe bcm2835_wdt NOT sudo modprobe bcm2708_wdog
and
echo "bcm2835_wdt " | sudo tee -a /etc/modules NOT echo „bcm2708_wdog“ | sudo tee -a /etc/modules

I am not sure I got the syntax correct for the file. This is what I added to watchdog.conf:
file = /var/log/openhab2/events.log
change = 3600
I am searching for some examples.

Wolfgang_S · August 27, 2020, 5:25am

Not sure if you’ve already thought about it: you can create a test case by creating a file under your control. That will not be updated just to check if the watchdog is triggered then you need not to wait for a real hang up.
In case you haven’t seen it the manual page for the watchdog configuration: https://linux.die.net/man/5/watchdog.conf

stefan.oh · August 27, 2020, 5:47pm

Doing it with a file under your control is a very good advise. I ran into a situation where I struggled to be faster in logging in and manually deactivating the watchdog than it rebooted my RasPi
You can update the timestamp of your test file with the touch command.

My setup for the change parameter is 1000 as that results in a wait period that should be long enough if something goes wrong (see above) but still short enough to ensure a “fast” reaction in case of a hangup. But that is a personal decision, everyone needs to experiment a bit to find a personal “correct” setting.

One suggestion from my experience: if you shut down your OH installation for whatever reason, you should keep in mind that the watchdog still looks at the file you’ve configured. And eventually it WILL reboot your machine. That might happen unexpectedly to you and will be done at the most inconvinient point in time according to Murphys Law I did write a command script that shuts down the watchdog before shutting down OH. And a start script with starting OH first and then starting the watchdog:

sudo systemctl start openhab2
sudo systemctl start watchdog

PeteC · August 28, 2020, 1:08am

stefan.oh. Thanks for the idea and method.
I marked that the solution.

Wolfgang_5. A test file is a good idea, I just tested it using 100 and a text file that I kept saving until I wanted it to reboot. Worked like a charm.

PeteC · September 10, 2020, 4:04pm

Well I have uncovered another problem. My mqtt messages stop coming through for hours. In my system I have a separate program (written in C) that sends a message via mqtt whenever it senses an event from any one of 12 sensors. In addition, each sensor reports that it is still alive every 70 minutes or so. My program sends an mqtt message when any of these happen.
Well I set the watchdog to restart if event.log isn’t accessed in 75 minutes. I kept getting restarts. I kept extending the time all the way to 10000 seconds (about 2.8 hours). I was still getting restarts. I looked at events.log and sure enough the log stops being updated and after 2.8 hours daemon.log starts to report the missing updates and does so for 61 seconds then restarts my RPi.
I have a lot of investigating to do but my first test indicates that my program sent a specific mqtt message but it didn’t appear in the event log. However that very same message appeared many times previously in the event log going back all the way to 2019-12-13 14:39:08.617 so the mqtt message sequence works.

Time for more testing.

Wolfgang_S · September 10, 2020, 4:56pm

Does only your specific MQTT message not appear or does MQTT messsages in general not appear in the logfile ? If the later is the case could it be that this is related to logging options ?

PeteC · September 10, 2020, 5:12pm

Right now it looks like certain mqtt events get logged but not all. This “feels like” something I am doing somewhere in OpenHAB but I haven’t found it yet. But your idea looks promising

PeteC · September 12, 2020, 12:44am

I concluded that events are logged only when they change (door from open to close) but most of mine are reports that repeat the last state every 70 minutes. No change there.
I started to look for documentation about the events.log but then realized it would be very simple for me to add a little code to my C program to send a mqtt message to a new item (WatchDogBurp) and send “BurpUp” then 60 minutes later “BurpDown” . That would give me a change which will cause a new line to events.log.
I guess I should have tried to “do it right” by working with the events.log but I have this mode working faster then I could find the documentation for the events.log. And who knows how effort it would take to understand that doc.
If you don’t hear back, conclude that this worked.

rossko57 · September 12, 2020, 1:24am

That’s right.

But who cares … are you not using a rule to detect whatever conditions you want (update, change) and do whatever action it is you want to stave off the watchdog?

PeteC · September 12, 2020, 4:15pm

Actually I don’t have any rules for these sensors except to send me a text message if security is enabled and any indicate “opening”.
Also, I couldn’t find a way to go from OpenHAB to the RPi watchdog.
And after thinking it over, this is a better solution because it includes my C program in the loop.
Now the watchdog will restart if OpenHAB or my program is hung (or the RPi is hung) I think.