openHAB on Raspberry Pi freezes for a couple of minutes

Hi!

So I’m facing an issue a long time ago and I didn’t really know where to ask about this or where to start with it, so I’m asking here, maybe someone faced similar issue before.
Sometimes my openHAB server (openHABian, RPi 3B+) freezes. Sometimes totally, sometimes I can use SSH.
When it freezes, the CPU usage goes to the sky, but the temperature remains the same. Usually if I do some heavy tasks (installing new packages, etc…), when the CPU usage is quite high, the temperature also rises to almost 70 degree (on idle it stays around at 55-60). It happens randomly, sometimes it seems like if I restart it and leave it there without touching anything it can work for days without this, but sometimes it happens almost immediately after restart.
I don’t have to restart anything, I just wait a few minutes and everything normalizes. It’s a brand new SD card and I also had this problem on the previous SD card on a different RPi.

It’s odd because I thought about maybe it can’t handle the I/O and interrupts caused by the lots of events happening in openHAB, but sometimes this error comes up in the middle of the night, when there is no such activity at home…
Before openHABian I have used OSMC on these Raspberries and these didn’t had this issue, so it is more or less related to software I think.

Here are some logs which verifies this:

What process is using the CPU the most when it freezes?

I think openHAB. But as I remember it is always random. Another thing I don’t know is that using atop/htop it doesnt show this high CPU usage. But from this tool (rpimonitor) and also Systeminfo on openHAB shows these high numbers, and it is really happening something because it becomes really really slow… thats why it is hard to tell which uses the cpu most at these times, but Ill try to see.

If the process causing the CPU spike is random then this isn’t an OH problem. Something more fundamental is going on. I can’t speak to the different tools showing different information but given that the temp is not going up I’m inclined to believe htop over rpimonitor and systeminfo (which is what I believe rpimonitor uses).

Does your RPi have a good power supply? Weird behaviors like this are often caused by poor power quality.

This is what I also thought of. I’m using the official power supply (2.5A) but I have tried another official one. Same behaviour. And I also didn’t think of any hardware problems, because the other RPi where I ran OH before for testing, now runs Rapsbian with services running on it constantly without a problem…
I also think that htop is more reliable, but if I just rely on the numbers that htop shows, it would seem that nothing is wrong with it, the CPU usage and everything is the same there as before. However for even htop needs minutes to load if I can even bring it up…

Well, CPU is only one thing that can cause symptoms like this. Are you connecting over the network or physically with keyboard, monitor, and mouse?

Have you looked in syslog to see if something weird is going on?

Yes I think that - as stated in the 1st post - I/O handling and interrupt handling causes this, when lots of concurrent things happening (also inside openhab and “outside” of it), but I can’t make myself sure about this.
No there is no device connected to it, I can only access it through ssh.

It would be useful to see if the device has the same slowdown problems when connected to physically rather than through the network, given that the problem could be network related as well.

And how should I do that?
I mean I can easily remove the network from it, but then I couldn’t make sure that network causes the issue. Because that way most of the things which happens on this Raspberry is driven by network (polling device states, sending commands, running python scripts which polls other devices status, etc…), so I would also assume that the average usage will drop because there will be no traffic which needs to managed/controlled.

Don’t remove the network, but physically connect a monitor and keyboard and when trouble shooting the problems rather than going through the network to do it. At this point you can’t really tell whether there is a real slowdown on the machine or the network is throttled and everything just looks like it’s running slow.

Ah I get it!
However this seems to be done before. I also ran before Kodi on this device as well. First I thougth that I couldn’t handle Kodi and openHAB together (and Kodi also seemed to cause this) so I removed it now. But before I have used it with Kodi on it (which was a TV connected to it through the HDMI port) and Kodi also stopped when this slowdown happens.

Might be the java garbage collector. You should restrict cpu and memory of all java processes to 80% (rlimit), that way your ssh connection stays stable at least if java starts doing strange things.

Thanks I have never heard of this! I will try that (at least for testing it with this).
Do you have an example how I could do this without modifying the source code?

My guess would be it’s paging (swapping).
Check out my hints on optimizing swap here.
A patch to add zram compression is in the pipe for openHABian.

How a zram compression can help on this?
However you might be right. Initially I had to increase the swap size in order to function properly. Before this the problem was that there were no enough memory (not even in swap) and Raspberry got stuck (like a deadlock…).

It’s compressing RAM, effectively doubling it at a (minor) cost of CPU.

openHAB with java params from openHABian alone does well fit into 1GB w/o swap so you must have done something else on your system.
I’m using EXTRA_JAVA_OPTS="-Xms250m -Xmx350m" in /etc/defaults/openhab2.

And if you’re running additional programs consuming a lot of memory on the box (Grafana ?) it might be too much. Better move them to a different box.

Yes for me it won’t fit… It uses almost every available RAM (swappiness 60) and also uses around 500MB swap. I should move my system to a reliable server…

Maybe we can find the memory abusing binding / addon instead. The problem with a garbage collected language like Java is, that code that rapidly creates objects (but also frees them, so not a memory leak per se) will exhaust vmem. Most of the time there are coding patterns to avoid this.

Yes I know these, I’m also a junior Java developer :slight_smile: But I forgot these in this scenario. I wouldn’t say that there is an addon which uses this much RAM or more as it should. The RAM usage is constant. The problem might be that I’m running lots of little services on a single RPi (openHAB, Grafana, Node-RED, and not a few other “connectors”, mainly python scripts…).

Which is not recommended. Running some script services in addition to OH is ok, but at least Grafana (+ InfluxDB I assume) is not “little” but a memory hog.
You’ve created that situation by yourself by overburdening your HW. Move services off to another box or accept you need to live with the freezes.