The setup used to work stable for over one year with no interruptions. Now, since a few weeks, the persistence service stops working every 1-4 days, and values are not written to influx any longer.OpenHAB itself seems to work normally.
A reboot of the Raspi solves the issue, but I don’t want to create a cronjob to reboot every night. This is not really a clean solution.
The issue is not located in InfluxDB, since Influx still persists values from other sources.
Any idea how to debug and fix the issue? The Raspi itself seems to be ok (enough memory, enough disk space,…)
Enable logging on influxdb to check if access from openhab stops at some time or if the access is denied with a specific error message.
Enable debug log for influxdb on openhab to see if that gives a clue for the reason.
I digged a bit into the logfiles and found only this hint in the openhab.log:
2023-09-08 14:49:01.704 [WARN ] [rite.events.WriteRetriableErrorEvent] - The retriable error occurred during writing of data. Reason: ‘timeout’. Retry in: 5s.
2023-09-08 14:49:16.720 [WARN ] [rite.events.WriteRetriableErrorEvent] - The retriable error occurred during writing of data. Reason: ‘xxx-Influx-Server-xxx.net: Temporary failure in name resolution’. Retry in: 25s.
2023-09-08 14:49:51.736 [WARN ] [rite.events.WriteRetriableErrorEvent] - The retriable error occurred during writing of data. Reason: ‘xxx-Influx-Server-xxx.net: Temporary failure in name resolution’. Retry in: 125s.
I will keep my eyes open when the next interruption occurrs. Is there a possibility to increase number of retries and timeout to avoid a complete stop of the persistence layer?
Based on the error about temporary failure in name resolution this looks like it would be a network problem.
Where is this influxdb server located ? In your local network or via internet remote connection ?
As long as it is a name resolution problem and not really related to network you can try to add the server name and ip address into the local server’s /etc/hosts file.
the InfluxDB-Server is connected via an internet remote connection. Normally, writing to this server is working very well. The issue is that the persistence service stops completely. It seems that it is not too robust when the InfluxDB Server is not available. Or there is a timeout that thepersistence decides not to persist to this server any longer.
Anyway, I’d like to control this timeout, and the persistence should continue to persist once the server is back again.
I can confirm I had something similar after having upgraded to openHAB 4. In my system, the influxDB service is even on the same machine, but for reasons of CPU or memory overload, openHAB could not reach the service for a certain period of time, and it also could only be healed by restarting openHAB, not by restarting the influxDB service. I remember some timeout messages in the openhab log that said it was not able to reach the influxDB service.
I just experienced the same after doing a Container Update for the InfluxDB on my NAS box.
2023-09-21 08:48:36.687 [WARN ] [rite.events.WriteRetriableErrorEvent] - The retriable error occurred during writing of data. Reason: 'Failed to connect to localhost/127.0.0.1:8086'. Retry in: 5s.
2023-09-21 08:48:41.692 [WARN ] [rite.events.WriteRetriableErrorEvent] - The retriable error occurred during writing of data. Reason: 'Failed to connect to localhost/127.0.0.1:8086'. Retry in: 25s.
2023-09-21 08:49:06.698 [WARN ] [rite.events.WriteRetriableErrorEvent] - The retriable error occurred during writing of data. Reason: 'Failed to connect to localhost/127.0.0.1:8086'. Retry in: 125s.
Nothing more regarding Influx in the logs after that, but no data written until I restarted Openhab4.
I use only InfluxDB as persistence