Openhab freezes for some minutes

martin111 · December 7, 2019, 6:10pm

Hello everyone,

since this morning I observe a very strange issue. No major changes and/or upgrade before.
my Openhab Installation (Details see below) operates as normal, but out of a sudden it stops working. CPU actually freezes and according to my monitored metrics it remains on the level it was (have seen it with almost 0% util, but also with around 20%).
after around 6-8 Minutes everything is working again normal AND every event which hasn’t been processed during this period of time will be processed after it.

symptoms:
it seems that my eventbus stops working, the event.log doesn’t show any record in this time. The openhab.log shows me that cron-triggered rules are working, but no status changes are written to my items.

i do not have any high cpu loads, i/o loads and memory and diskspace are fine too.

Until now i cannot really narrow it down to any specific binding, or thing or whatever… when setting logs to debug i also cannot really find something very suspicious.

Platform information:
- Hardware: x86, Celeron 8GB Memory; SSD
- OS: ubuntu 18.04 LTS
- Java Runtime Environment:
  openjdk version “1.8.0_222”
  OpenJDK Runtime Environment (build 1.8.0_222-8u222-b10-1ubuntu1~18.04.1-b10)
  OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode)
- openHAB version:2.4 stable

Openhab: i have around 400 rules, 2000 items, around 100 things.
around 20 dashboards in Habpanel and constantly 4 devices displaying it.
but above things were running perfectly fine for months…

Has anyone any idea what it could be or at least where I can start troubleshooting?

thanks
martin

H102 · December 7, 2019, 6:15pm

First make sure you have a backup.

Then clean the cache and restart.

sudo systemctl stop openhab2

sudo openhab-cli clean cache

sudo systemctl start openhab2 or sudo reboot

Have a browser open with frontail opened so you can watch as OH starts back up. Post any relevant logs if the issue continues.

Gad_Ofir · December 7, 2019, 6:39pm

i had some issues when several clients(mostly PCs ) where open on the site map
OH started to act wired , not sure if it was the casue of the issue i had

but when i closed some of them it was fixed , again just a shoot in the dark but worth a try

martin111 · December 7, 2019, 6:42pm

hi,
thanks for you reply.
I already cleaned the cache after it happend the first time.
I stopped openhab, cleaned the cache and did a reboot. After the reboot everything worked fine, also startup logs are normal, but after around 1 hour it started again.

The issue just happend again, CPU was frozen at around 20 %, and after around 9 minutes it worked again (in the event log the gap is visible very good)

some ms before everythings worked as normal (and the eventlog show event-processing again) i found this in my debug logs

2019-12-07 19:02:02.232 [DEBUG] [ommons.httpclient.HttpMethodDirector] - Closing the connection.
2019-12-07 19:02:02.232 [DEBUG] [ommons.httpclient.HttpMethodDirector] - Closing the connection.
2019-12-07 19:02:02.233 [DEBUG] [ommons.httpclient.HttpMethodDirector] - Method retry handler returned false. Automatic recovery will not be attempted
2019-12-07 19:02:02.233 [DEBUG] [ommons.httpclient.HttpMethodDirector] - Method retry handler returned false. Automatic recovery will not be attempted
2019-12-07 19:02:02.233 [DEBUG] [he.commons.httpclient.HttpConnection] - Releasing connection back to connection manager.
2019-12-07 19:02:02.233 [DEBUG] [he.commons.httpclient.HttpConnection] - Releasing connection back to connection manager.
2019-12-07 19:02:02.234 [ERROR] [org.openhab.io.net.http.HttpUtil ] - Fatal transport error: java.net.ConnectException: Connection timed out (Connection timed out)
2019-12-07 19:02:02.234 [ERROR] [org.openhab.io.net.http.HttpUtil ] - Fatal transport error: java.net.ConnectException: Connection timed out (Connection timed out)

Thats interessting because since today morning i regulary see this line in my openhab.log. I checked my logs of the last 6 months, and this line isn’t in.

2019-12-07 07:51:12.785 [INFO ] [ommons.httpclient.HttpMethodDirector] - I/O exception (java.net.ConnectException) caught when processing request: Connection refused (Connection refused)

whats also interesting, when I cleaned my cache i lost by spotify binding. I was easily able to reinstall, but it’s strange since I do not have so many http connected things, and spotify is one of it.

martin111 · December 8, 2019, 7:16am

hi,
a brief update for everyone who is interested.
the issue didn’t happen again since yesterday evening.

I narrowed down the above mentioned http client messages to one of my tablets (running fully kiosk browser) which http interface is connected and “monitored” by openhab.
i cannot exclude that this actually caused the issue, since I have observed really strange behavior if one of these http targets is not reachable, but I honestly do not believe this caused the issue.

I will no observe whats happening today, when there is more load on the system as during the night.

cheers

martin111 · December 8, 2019, 3:57pm

another quick update. everything still stable.
still no idea about root cause…

H102 · December 8, 2019, 4:02pm

I see weird OH things happening from time to time and cleaning the cache with reboot always seem to work but I never find why it started acting up. I just chalk it up to something I gotta do once in a while and move on.