Since a few days I am getting the following error:
[DEBUG] [rnal.influx1.InfluxDB1RepositoryImpl] - Writing to database failed
org.influxdb.InfluxDBException$CacheMaxMemorySizeExceededException: engine: cache-max-memory-size exceeded: (1074461556/1073741824)
at org.influxdb.InfluxDBException.buildExceptionFromErrorMessage(InfluxDBException.java:153) ~[?:?]
....
[WARN ] [.influxdb.InfluxDBPersistenceService] - Re-queuing 2118 elements, failed to write batch.
The first time I encountered this. I stopped influx (stopped, started zram) and restarted openHAB and didn’t have an error … for some time.
No it has reappeared. I read in a German forum a post of August that there was a type issue but that wasn’t available in a snapshot by then but I cannot find anything.
Now I stopped influxdb and then tried to restart and it doesn’t even start again:
Sep 15 21:43:53 openhabian systemd[1]: Stopped InfluxDB is an open-source, distributed, time series database.
â–‘â–‘ Subject: A stop job for unit influxdb.service has finished
â–‘â–‘ Defined-By: systemd
â–‘â–‘ Support: https://www.debian.org/support
â–‘â–‘
â–‘â–‘ A stop job for unit influxdb.service has finished.
â–‘â–‘
â–‘â–‘ The job identifier is 25587 and the job result is done.
Sep 15 21:43:53 openhabian systemd[1]: influxdb.service: Start request repeated too quickly.
Sep 15 21:43:53 openhabian systemd[1]: influxdb.service: Failed with result 'exit-code'.
â–‘â–‘ Subject: Unit failed
â–‘â–‘ Defined-By: systemd
â–‘â–‘ Support: https://www.debian.org/support
â–‘â–‘
â–‘â–‘ The unit influxdb.service has entered the 'failed' state with result 'exit-code'.
Sep 15 21:43:53 openhabian systemd[1]: Failed to start InfluxDB is an open-source, distributed, time series database.
â–‘â–‘ Subject: A start job for unit influxdb.service has failed
â–‘â–‘ Defined-By: systemd
â–‘â–‘ Support: https://www.debian.org/support
â–‘â–‘
â–‘â–‘ A start job for unit influxdb.service has finished with a failure.
â–‘â–‘
â–‘â–‘ The job identifier is 25587 and the job result is failed.
I have stopped OH and rebooted and it seems it has recovered but I doubt it will last for long.
Does anyone know what the root cause might be for ?
It might be that there is a limit to the size of a write (i.e. some sort of “maximum batch size”). The question is why there are 2118 elements that need to be written to the database, that sound awfully much. How many items do you persist?
No it should work. But even with 600 items, 2100 datapoints to persist is 3 points per item, which sounds much, because the add-on tries to commit with an interval of 3s.
So the question is: why are there so many datapoints? Is there any error in the log before? Or in the influxdb log at that time?
org.influxdb.InfluxDBException: engine: error rolling WAL segment: error opening new segment file for wal (2): close /var/lib/influxdb/wal/openhab/autogen/2/_00025.wal: file already closed
which points to what was discussed here:
which though looks good to me:
[19:32:22] root@openhabian:/var/lib/influxdb/wal/openhab/autogen/2# ls -l
total 10492
-rw-r--r-- 1 influxdb influxdb 10741791 Sep 16 19:22 _00025.wal