since this morning I observe a very strange issue. No major changes and/or upgrade before.
my Openhab Installation (Details see below) operates as normal, but out of a sudden it stops working. CPU actually freezes and according to my monitored metrics it remains on the level it was (have seen it with almost 0% util, but also with around 20%).
after around 6-8 Minutes everything is working again normal AND every event which hasn’t been processed during this period of time will be processed after it.
symptoms:
it seems that my eventbus stops working, the event.log doesn’t show any record in this time. The openhab.log shows me that cron-triggered rules are working, but no status changes are written to my items.
i do not have any high cpu loads, i/o loads and memory and diskspace are fine too.
Until now i cannot really narrow it down to any specific binding, or thing or whatever… when setting logs to debug i also cannot really find something very suspicious.
Platform information:
Hardware: x86, Celeron 8GB Memory; SSD
OS: ubuntu 18.04 LTS
Java Runtime Environment:
openjdk version “1.8.0_222”
OpenJDK Runtime Environment (build 1.8.0_222-8u222-b10-1ubuntu1~18.04.1-b10)
OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode)
openHAB version:2.4 stable
Openhab: i have around 400 rules, 2000 items, around 100 things.
around 20 dashboards in Habpanel and constantly 4 devices displaying it.
but above things were running perfectly fine for months…
Has anyone any idea what it could be or at least where I can start troubleshooting?
i had some issues when several clients(mostly PCs ) where open on the site map
OH started to act wired , not sure if it was the casue of the issue i had
but when i closed some of them it was fixed , again just a shoot in the dark but worth a try
hi,
thanks for you reply.
I already cleaned the cache after it happend the first time.
I stopped openhab, cleaned the cache and did a reboot. After the reboot everything worked fine, also startup logs are normal, but after around 1 hour it started again.
The issue just happend again, CPU was frozen at around 20 %, and after around 9 minutes it worked again (in the event log the gap is visible very good)
some ms before everythings worked as normal (and the eventlog show event-processing again) i found this in my debug logs
2019-12-07 19:02:02.232 [DEBUG] [ommons.httpclient.HttpMethodDirector] - Closing the connection.
2019-12-07 19:02:02.232 [DEBUG] [ommons.httpclient.HttpMethodDirector] - Closing the connection.
2019-12-07 19:02:02.233 [DEBUG] [ommons.httpclient.HttpMethodDirector] - Method retry handler returned false. Automatic recovery will not be attempted
2019-12-07 19:02:02.233 [DEBUG] [ommons.httpclient.HttpMethodDirector] - Method retry handler returned false. Automatic recovery will not be attempted
2019-12-07 19:02:02.233 [DEBUG] [he.commons.httpclient.HttpConnection] - Releasing connection back to connection manager.
2019-12-07 19:02:02.233 [DEBUG] [he.commons.httpclient.HttpConnection] - Releasing connection back to connection manager.
2019-12-07 19:02:02.234 [ERROR] [org.openhab.io.net.http.HttpUtil ] - Fatal transport error: java.net.ConnectException: Connection timed out (Connection timed out)
2019-12-07 19:02:02.234 [ERROR] [org.openhab.io.net.http.HttpUtil ] - Fatal transport error: java.net.ConnectException: Connection timed out (Connection timed out)
Thats interessting because since today morning i regulary see this line in my openhab.log. I checked my logs of the last 6 months, and this line isn’t in.
whats also interesting, when I cleaned my cache i lost by spotify binding. I was easily able to reinstall, but it’s strange since I do not have so many http connected things, and spotify is one of it.
hi,
a brief update for everyone who is interested.
the issue didn’t happen again since yesterday evening.
I narrowed down the above mentioned http client messages to one of my tablets (running fully kiosk browser) which http interface is connected and “monitored” by openhab.
i cannot exclude that this actually caused the issue, since I have observed really strange behavior if one of these http targets is not reachable, but I honestly do not believe this caused the issue.
I will no observe whats happening today, when there is more load on the system as during the night.
I see weird OH things happening from time to time and cleaning the cache with reboot always seem to work but I never find why it started acting up. I just chalk it up to something I gotta do once in a while and move on.