OpenHAB 3.2.0: InfluxDB Error message "Batch could not be sent. Data will be lost" repeated every few minutes

peter_tau · March 30, 2022, 8:32am

Dear Community,

My setup is based on OpenHAB 3.2.0 and InfluxDB 1.0. While I had no particular issues, I now see error messages invoked by org.influxdb.impl.BatchProcessor in regular time intervals, every few minutes:

2022-03-30 09:50:08.950 [ERROR] [org.influxdb.impl.BatchProcessor    ] - Batch could not be sent. Data will be lost
org.influxdb.InfluxDBIOException: java.net.SocketTimeoutException: connect timed out
	at org.influxdb.impl.InfluxDBImpl.execute(InfluxDBImpl.java:831) ~[bundleFile:?]
	at org.influxdb.impl.InfluxDBImpl.write(InfluxDBImpl.java:460) ~[bundleFile:?]
	at org.influxdb.impl.OneShotBatchWriter.write(OneShotBatchWriter.java:22) ~[bundleFile:?]
	at org.influxdb.impl.BatchProcessor.write(BatchProcessor.java:340) [bundleFile:?]
	at org.influxdb.impl.BatchProcessor$1.run(BatchProcessor.java:287) [bundleFile:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: java.net.SocketTimeoutException: connect timed out
	at java.net.PlainSocketImpl.socketConnect(Native Method) ~[?:?]
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:412) ~[?:?]
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:255) ~[?:?]
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:237) ~[?:?]
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:?]
	at java.net.Socket.connect(Socket.java:615) ~[?:?]
	at okhttp3.internal.platform.Platform.connectSocket(Platform.java:130) ~[bundleFile:?]
	at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:263) ~[bundleFile:?]
	at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:183) ~[bundleFile:?]
	at okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.java:224) ~[bundleFile:?]
	at okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.java:108) ~[bundleFile:?]
	at okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.java:88) ~[bundleFile:?]
	at okhttp3.internal.connection.Transmitter.newExchange(Transmitter.java:169) ~[bundleFile:?]
	at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:41) ~[bundleFile:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[bundleFile:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[bundleFile:?]
	at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:94) ~[bundleFile:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[bundleFile:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[bundleFile:?]
	at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) ~[bundleFile:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[bundleFile:?]
	at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:88) ~[bundleFile:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[bundleFile:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[bundleFile:?]
	at org.influxdb.impl.BasicAuthInterceptor.intercept(BasicAuthInterceptor.java:22) ~[bundleFile:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[bundleFile:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[bundleFile:?]
	at org.influxdb.impl.GzipRequestInterceptor.intercept(GzipRequestInterceptor.java:42) ~[bundleFile:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[bundleFile:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[bundleFile:?]

What I did lately was adding a few bindings without having already configured them. In the meantime, I removed the unconfigured bindings again, and only pulseaudio and marytts remain as new bindings while the rest of the configuration remained unchanged.

So I wonder what the cause is for this sudden appearance of this error message. I checked the resources on the OpenHAB server. While CPU usgae was quite low, there was a shortage of memory. However, even after increasing the memory to 8 GB with a current usage of below 4 GB, the error messages remained and appear every few minutes.

Adding memory seemed to an improvement, as the following warning disappeared from then:

2022-03-30 08:49:40.394 [WARN ] [ence.internal.PersistenceManagerImpl] - Querying persistence service 'influxdb' takes more than 5000ms.

Unfortunately, the error as initially reported is still there.

Best regards,
Peter

rlkoshak · March 30, 2022, 1:24pm

You have the InfluxDB add-on installed.
Either InfluxDB is not running or the configuration of the add-on is incorrect. In either case openHAB cannot connect to InfluxDB. It tries, fails, waits a bit and tries again.

peter_tau · March 30, 2022, 1:35pm

Dear Rich,

Thank you very much for your hint. When I have a look onto the InfluxDB data via Grafana, I still see incoming data and identifed no gaps so far. Could this issue also apply to a partial loss of measurements?

I wonder what would be the next step as how to debug this scenario…

Best regards,
Peter

rossko57 · March 30, 2022, 1:42pm

If you are not being selective about what is persisted and when, you might just be overloading it.
Also if you have not tailored the default rrd4j service, that may be competing for resource.

peter_tau · March 30, 2022, 1:58pm

I have only the InfluxDB persistency service. The rrd4j service is not installed. Is there any particular drawback only having InfluxDB and not rrd4j?

Here is any excerpt from my system load check:

openhab@openhab:~$ iostat -x 1
Linux 4.19.0-20-amd64 (openhab) 	30.03.2022 	_x86_64_	(4 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3,28    0,00    0,79    0,03    0,00   95,90

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda              3,27   14,27     66,21    160,68     0,02    22,75   0,56  61,45    0,46    0,26   0,00    20,22    11,26   0,12   0,21

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0,76    0,00    0,51    0,00    0,00   98,73

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda              0,00    0,00      0,00      0,00     0,00     0,00   0,00   0,00    0,00    0,00   0,00     0,00     0,00   0,00   0,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1,75    0,00    1,75    0,00    0,00   96,50

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda              0,00    2,97      0,00     87,13     0,00    18,81   0,00  86,36    0,00    0,33   0,00     0,00    29,33   0,00   0,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1,26    0,00    0,25    0,00    0,00   98,49

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda              0,00    3,00      0,00     36,00     0,00     6,00   0,00  66,67    0,00    0,33   0,00     0,00    12,00   0,00   0,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1,01    0,00    0,25    0,00    0,00   98,74

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda              0,00   33,00      0,00    180,00     0,00     2,00   0,00   5,71    0,00    0,36   0,00     0,00     5,45   0,00   0,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0,76    0,00    0,25    0,00    0,00   98,99

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda              0,00    0,00      0,00      0,00     0,00     0,00   0,00   0,00    0,00    0,00   0,00     0,00     0,00   0,00   0,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0,75    0,00    0,75    0,00    0,00   98,50

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda              0,00    0,00      0,00      0,00     0,00     0,00   0,00   0,00    0,00    0,00   0,00     0,00     0,00   0,00   0,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0,76    0,00    0,51    0,00    0,00   98,73

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda              0,00    0,00      0,00      0,00     0,00     0,00   0,00   0,00    0,00    0,00   0,00     0,00     0,00   0,00   0,00

I am not familiar with %wrqm which goes up quite high. Everything else seems to be fine so I wonder whether the current issue is indeed related to lack of system performance.

Best regards,
Peter

rossko57 · March 30, 2022, 2:09pm

Not at all.
rrd4j is installed by default, so you must have actively removed that.
By default, influxdb will be persisting everything it can, so the impact depends on how many Iems that might be and where it is writing to - SD card speed may be a factor, if you are not using ZRAM filestore.

Beware also other services (e.g. Grafana) hogging influxdb attention.