High cpu usage - restart openhab

smhgit · December 10, 2018, 8:05pm

Hi,

I saw today that the cpu usage (%) of my pi3 (openhabian) was > 100. I restarted openhab (not pi 3 reboot) and it dropped to < 15%.

I currently don’t know what is the cause. I assuming one of the bindings or maybe the fact that I am developing and loading rules constantly.

I hope I will find the cause, but until then I want to have functional responsive system. I prefer to have the ability to restart openhab when cpu usage exceed some percentage. I know how to get the usage (systeminfo binding) but I am not sure how to restart from a rule. saw some solutions but I am not sure the best way to handle it.

Has anyone done similar thing?

rlkoshak · December 10, 2018, 8:44pm

Probably this. Loading and parsing .rules files is apparently really hard on the RPi. It can take several minutes (I’ve seen 15 and more reported) to load and parse all the files. And every time you change a file it kicks off a reload and parse of the files. So if you make lots of changes that work can back up and pile up.

The recommendation on RPis is to make changes to the Rules offline (i.e. in some other folder) and move them over so you end up with only one reloading of the Rules for all those changes. I think that is how @mstormi describes how he manages this issue.

This can be a challenge as often this is how it happens:

OH issues the command to restart the service using the Exec binding
Before the command returns OH gets the kill command causing it to shut down
Because OH shut down, the Exec binding command got killed before it was done doing it’s job and consequently OH never comes back up

If you want to go down this path, you probably need to set up a watchdog outside of OH to monitor it’s CPU usage and restart OH if it gets too high.

Though I’m pretty sure the cause of your problem is the developing and loading of rules constantly.

smhgit · December 10, 2018, 9:26pm

@rlkoshak

Probably the rules (which I don’t care actually as I can reboot the system manually) , but even though, my system control devices in my home, and I think that it should have the ability to recover from the unexpected, which is usually == restart. I hope that it will be added as a built in feature to openhab / eclipse smarthome (the ability to trigger restart).

Until then I will try to implement some way of rebooting in case of extreme cpu usage.

rlkoshak · December 10, 2018, 10:01pm

Given the challenges of having OH issue the command and having the command live beyond OH shutting down, I wouldn’t hold my breath. It’s not just as simple as writing a few lines of code.

Also, if OH is having problems and unresponsive, what makes you think it would be in a position to restart itself? That is why there are external capabilities like watchdogs in the OS. That is where that sort of thing would most appropriately be done.

smhgit · December 10, 2018, 10:10pm

Absolutely agree, but I think they are for the “hardcore” cases (at least WD), but in my case it was only CPU usage and OH already have the ability to get this information (SYSTEMINFO) so for me it make sense to have some first line cases handled by the rules itself.

Confectrician · December 10, 2018, 10:15pm

Just a quick shot:

How are you developing/editing your rule files?

VSCode? Maybe with activated Completions over RestAPI?

rlkoshak · December 10, 2018, 10:26pm

So let’s make an inordinate amount of effort to solve a problem only some of the time or spend less effort to solve the problem in accepted, proven, and which works in all cases? The latter seems much more attractive to me and sounds like a great addition to openHABian.

CDriver · December 10, 2018, 11:11pm

This, without fail, dragged my pi down.

Confectrician · December 10, 2018, 11:54pm

So you could solve it by deactivating the ‘restCompletions’ option?
Am I reading this correct?

illnesse · December 11, 2018, 1:40am

Exec "systemctl restart openhab2.service"

Should work, maybe sudo. I don’t think it’s a good idea though.

rlkoshak · December 11, 2018, 3:16am

You do need sudo. And in past experiments running that command causes OH to shutdown, the systemctl command to likewise get killed since it is a subprocess of OH, and then OH never coming back up because systemctl restart was killed before OH could be restarted. Maybe it doesn’t happen on all systems or maybe there was a change to exec binding that keeps the command from being killed when the parent process gets killed.

As of about a year go though, on an RPi and an Ubuntu VM the command would not succeed fully.

smhgit · December 11, 2018, 7:40am

@rlkoshak

Agree that solving the cause is the way to go, but we are working with open source, which includes huge amount of software and technologies and what ever we will do, bugs (and unexpected failures) will always exists. I don’t disagree with the point that we need to solve the issues, just trying to say that we should have an easy way to recover from unexpected failures. Yes, I can use system tools for that, but if will be nice if we can have some built in capability / support for that in openhab system.

A good example for such approach is Watch dog …

mstormi · December 11, 2018, 9:47am

Disagree!
While there’s still bugs deep inside the OH core to be excavated, there’s also many users happily running OH on Raspis for weeks and more without a need to restart ever.
That’s including people like me to have complex configurations and rules and frequent changes to them.

So while we might not be able to fix the cause right away it’s well worth determining it right because this will allow for using a better workaround than to simply restart.
I can’t tell for sure but would think that your problem is not caused by a bug but probably by your usage, i.e. frequent changes to items and rules. So a workaround would be to minimize the number of changes to your files. At least copy, edit and copy them back rather than do editing in-place.
Yes there also is a bug that this is taking way longer than it should particularly but not exclusively on Raspis, but still you can avoid or minimize this to happen so you can run your OH without that someone fixes that bug.

rlkoshak · December 11, 2018, 3:35pm

Which is a system service. A watchdog would be something that lives outside of OH, watches it for problems, and restarts it when there is a problem. A program cannot be it’s own watch dog. When one tries to make a program it’s own watch dog, it can only detect and recover from a subset of problems for which a watch dog is a good solution.

So again, it comes down to spending a pretty significant amount of effort trying to implement a way for OH to restart itself from a Rule which will only help in a minority of situations where you would need such a remedy, or use the many many watch dog options built into the operating system. Systemd has a particularly easy to use one. And based on my own and the experience of many users, writing a Rule that can restart OH is a huge challenge because after OH shuts down there is nothing left of the subprocess to start it back up again.

Anyone is welcome to submit a PR for something like this. Personally I think it is wasted effort and that effort would be better spend submitting a PR to openHABian to configure an OS level watchdog instead.

And as Markus said, the number of times where OH needs to be restarted like this is relatively low and the restart of OH is pretty much always just treating the symptom instead of treating the illness.

Rangarid · December 11, 2018, 3:49pm

There have been reports that the LSP in openhab consumes a lot of CPU:

maybe its related somehow.