Hardware Watchdog for OH on Raspberry PI

watchdog
Tags: #<Tag:0x00007fe05706d4b8>

(Michael) #1

Hello,

i have found infos that in Raspi is already a hardware watchdog module installed.
https://thomas.flying-bordercollies.de/2015/05/raspberry-pi-watchdog-nutzen/

But i can not find any infos on our forum about it and not a lot in internet.
I would like to know, is it works good or Raspi need another watchdog?

Thanks a lot!


(Rossko57) #2

“works good” for what purpose? It’s important to be clear about what you are trying to deal with. openHAB getting hung up, losing communication?


(Michael) #3

i thought that watchdog have only one purpose - to restart system in case of hang or?
In my case it hangs up.


(Markus Storm) #4

There’s 'hangup’s on various levels. The hardware watchdog will not apply if the lock is on say OS, Java or OH level.
Neither Pis nor OH locks up if correctly setup, so if you don’t find the reason for that, you shouldn’t install a watchdog but consider reinstalling your system from scratch instead. Pay attention to SD card corruption this time as that’s the most likely reason for what you’re seeing.


(Michael) #5

First of all - if PI have already a hardware watchdog why not use it?
We can think and talk a lot on this theme, we can look for hangup reasons (am not linux profi). But one thing that i know - in my case hangsup every week or little more (not sd card). SSH do not react and so on.
I restart the system and it works again wihout visible problems.
So i think that watchdog can help. Therefor i am looking for a way to use it.


(Rich Koshak) #6

Probably a bad or insufficient power supply to the RPi. I don’t think a watchdog would be able to help with a problem like that.

To use a watchdog you need to figure out what to watch. And to know what to watch you need to diagnose why your RPi hands up every week. And of course, depending on exactly what the cause of the hangup is the watchdog may just be hung up too (e.g. a kernel panic). The watchdog isn’t as simple as “watch for any problems and reboot if you see some.” You have to tell it what to watch for, e.g. like a log file is still being written to, you can ping a certain IP, etc. And what makes sense to watch for depends on what is causing the problem in the first place.


(Michael) #7

As i said - ssh do not answer, ping works. After restart all works fine.
Therefore i would only like to know how to use whatchdog but not looking and Linux and OH problems or moreovrr debugging it))) Reboot helps and it is me enought)))
Oft i am not at home and at this time i cannot examite lines of lof files, i only need to restart Oh and thats all)))


(Aaron) #8

Sort of defeats the point when you need a watchdog for the watchdog :wink:


(Rich Koshak) #9

Right, so what do you need to watch on the RPi to detect when ssh decides not to respond to requests from another host? What log file? What process? What service? It can’t be the case that sshd is crashing because systemd would have restarted it for you. You network isn’t down because you can still ping.

Until you can answer this question you can’t set up the watchdog in the first place.

Well, the watchdog is a kernel module. If the kernel panics, everything on the RPi stops, including the kernel. A kernel panic is like the Windows blue screen of death.


(Aaron) #10

This is something I have not encountered yet being semi new too Linux

That’s something I have experienced many times still worries me too this day


(Rich Koshak) #11

This pretty much only happens when you have:

  • failing/bad hardware
  • corrupt SD card

So you are unlikely to ever see it. Linux is VERY stable in this respect.


(Aaron) #12

I do have strange slowdowns and other problems that require a restart sometimes myself (usually after major file edits) but since moving openhab from Windows too Linux I have found my setup alot more stable and no more random restarts now windows updates :frowning:


(Rich Koshak) #13

This is a known problem. What happens is, particularly if you are on an RPi, it can take several minutes to load and parse all the files. If you edit another file while it is in that process, it will queue up another load of the files. These loads and parses can take huge amounts of CPU causing OH to become unresponsive.


(Michael) #14

So here is working solution for PI 3+ (maybe others) Solution!

https://www.raspberrypi.org/forums/viewtopic.php?t=210974

Just checked it, works!
Maybe it CAN help and maybe not, but it it better then nothing!
If it helps me, we will se as i got next hangup)


(Rossko57) #15

That means your watchdog will not help you at all (kernel is running).


(Rich Koshak) #16

But there is a watchdog built into systemd which can help. But you have to know what to watch before you can set up that watchdog, you have to know what to watch.


(Michael) #17

After this commend kernel is also running

:(){ :|:& };:

But it works)))


(Markus Storm) #18

Re-read my post. I didn’t say that.
But a HW watchdog does not help if your problem is in any of the other layers. It won’t trigger if e.g. your kernel is still running (you say host reponds to pings so the HW isn’t locked up).
Sure there are already or can be setup more watchdogs on the other levels such as the one in systemd, but you asked for the HW watchdog in the processor. That one won’t help with your problems.

Either way, to have one or even more working watchdogs is no proper solution to your problem.
Finding and fixing the root cause is the only right way.


(Nh905) #19

@thisisIO, I have had a few issues where SSH stopped working, I could PING the server, and OpenHAB was still responding, although at some point they failed as well. I did some SSH diagnostics which suggested the Raspberry Pi did receive the SSH request but hung before it prompted for a username/password. On a hunch that I might be running out of memory, I added a swap partition to the Raspberry Pi and have not seen any issues since then. No guarantee that you have the same problem or that memory is the problem - I have not had a chance to back out the swap partition so that I could try to recreate the problem while monitoring memory usage.

Regards, Norbert


(Michael) #20

Answer is [quote=“mstormi, post:18, topic:66867”]

Re-read my post. I didn’t say that.
But a HW watchdog does not help if your problem is in any of the other layers. It won’t trigger if e.g. your kernel is still running (you say host reponds to pings so the HW isn’t locked up).
Sure there are already or can be setup more watchdogs on the other levels such as the one in systemd, but you asked for the HW watchdog in the processor. That one won’t help with your problems.

Either way, to have one or even more working watchdogs is no proper solution to your problem.
Finding and fixing the root cause is the only right way.
[/quote]

We can talk a lot and make the suggestions about problem, about layers and kernels and so on! But what for? :thinking:
I have just installed it in 5 mins and not after hangup - it works! The only thing what i need from it - reboot if im not at home! Made! :sunglasses:

And finding a solution about the hangup - is the question of another thread :wink: