(OH 4.1.2) MariaDB persistance broke down out of a sudden

mars-online · October 23, 2024, 4:05pm

I am experiencing a strange issue for a few days now.

Error: 1129-HY000: Host 'openhabian' is blocked because of many connection errors;

JdbcSQLException: Error in SQL query!!!; yank-default - Connection is not available, request timed out after 30000ms.

My OpenHAB instance uses MariaDB for persistence through the jdbc Addon. A few days ago, the host system of the MariaDB was down for about a day and since it came back on I regularly get weird errors and OpenHAB even stops persisting data due to them.

All out of a sudden, OpenHAB cannot persist data any longer - jdbc says there have been connection errors and therfore MariaDB does not accept new connections from OpenHAB any longer. Some queries run unreasonably long while I do not see too much of a load on the database.

In a nutshell it looks like this

Logs

2024-10-23 09:32:42.407 [DEBUG] [qtt.generic.AbstractMQTTThingHandler] - Successfully published value OFF to topic esphome/display/camera
2024-10-23 09:32:42.417 [DEBUG] [qtt.generic.AbstractMQTTThingHandler] - Successfully published value ON to topic esphome/display/doorbell
2024-10-23 09:32:42.425 [DEBUG] [qtt.generic.AbstractMQTTThingHandler] - Successfully published value OFF to topic esphome/display/washingmachine
2024-10-23 09:32:42.432 [DEBUG] [qtt.generic.AbstractMQTTThingHandler] - Successfully published value OFF to topic esphome/display/dishwasher
2024-10-23 09:32:52.325 [WARN ] [e.internal.SseItemStatesEventBuilder] - Attempting to send a state update of an item which doesn't exist: undefined
2024-10-23 09:34:16.463 [WARN ] [e.internal.SseItemStatesEventBuilder] - Attempting to send a state update of an item which doesn't exist: undefined
2024-10-23 09:34:22.754 [WARN ] [iadb.jdbc.message.server.ErrorPacket] - Error: 1129-HY000: Host 'openhabian' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'
2024-10-23 09:34:22.868 [WARN ] [e.internal.SseItemStatesEventBuilder] - Attempting to send a state update of an item which doesn't exist: undefined
2024-10-23 09:34:23.012 [WARN ] [iadb.jdbc.message.server.ErrorPacket] - Error: 1129-HY000: Host 'openhabian' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'
2024-10-23 09:34:23.391 [WARN ] [iadb.jdbc.message.server.ErrorPacket] - Error: 1129-HY000: Host 'openhabian' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'
2024-10-23 09:34:23.958 [WARN ] [iadb.jdbc.message.server.ErrorPacket] - Error: 1129-HY000: Host 'openhabian' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'
2024-10-23 09:34:24.805 [WARN ] [iadb.jdbc.message.server.ErrorPacket] - Error: 1129-HY000: Host 'openhabian' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'

There is also hints about the query running to long such as

Logs

2024-10-23 10:51:00.255 [WARN ] [jdbc.internal.JdbcPersistenceService] - JDBC::store: Unable to store item
org.openhab.persistence.jdbc.internal.exceptions.JdbcSQLException: Error in SQL query!!!; yank-default - Connection is not available, request timed out after 30000ms.; Pool Name= yank-default; SQL= INSERT INTO item0023 (time, value) VALUES( NOW(3), ? ) ON DUPLICATE KEY UPDATE VALUE= ?
        at org.openhab.persistence.jdbc.internal.db.JdbcBaseDAO.doStoreItemValue(JdbcBaseDAO.java:441) ~[?:?]
        at org.openhab.persistence.jdbc.internal.JdbcMapper.storeItemValue(JdbcMapper.java:220) ~[?:?]
        at org.openhab.persistence.jdbc.internal.JdbcPersistenceService.internalStore(JdbcPersistenceService.java:174) ~[?:?]
        at org.openhab.persistence.jdbc.internal.JdbcPersistenceService.lambda$1(JdbcPersistenceService.java:146) ~[?:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
        at java.lang.Thread.run(Thread.java:840) [?:?]

A temporary fix is to just do as I am told in the error message and tell the database to “flush hosts” which allows openhab to connect again as far as I understand.
This only helps for a while though.

I already tried

restart OpenHAB
reboot MariaDB host system
restarted MariaDB service
rebooted OpenHABian which I run on a Raspberry Pi
checked network connection between OpenHABian and MariaDB - don’t see relevant delays
The max_connections at the MariaDB is set to the default value of 151 and I do not think it has changed since the issue came up. Increased to 400 anyway
http://:8080/settings/addons/persistence-jdbc-mariadb increased the maximum pool size to 500

Does anyone have an idea how to solve this or maybe guide me through debugging the issue?

Wolfgang_S · October 23, 2024, 4:35pm

Are you sure that the example SQL you posted takes longer than normal ?
I think it is timing out because the request was blocked.

Have there been any changes right before this happened the first time ?
If - what kind of changes ?
Is this an instance that is open to the internet / accessible from the internet ?
How long does it take from “restarting” the connection until it is blocked again ?

mars-online · October 23, 2024, 5:41pm

Hi Wolfgang and thanks a lot for responding

Until a few days ago I never noticed the error at all and since it causes gaps in diagrams I am pretty sure the errors are new.

I had the DB taken offline for maintenance for some hours. Fixed the filesystem on the NAS and so on. No configuration changes. Started performing daily backups on the DB as well a few days before the incidents started and added a user account for that but this account is only used locally on the host system of the DB.

On OpenHAB maybe a few Items have been added during the last weeks - maybe between 10 and 20? Not really much.

The NAS and the MariaDB service is only available locally over wired ethernet. No changes there lately.

I cannot determine this exactly. Between moments and an hour maybe.

Good idea. So I should go after why the blocking happens. Any ideas about that?

Thanks again for answering!!

Wolfgang_S · October 25, 2024, 5:50pm

I would start with having a look into the MariaDB log file and check for any type of errors like authentication errors, timeout. It might be necessary to increase the log level verbosity.