MQTT connection 'stalled' after a reconnect

Running OH 2.4.0 Release with MQTT 2.4 version. Works fine for several days to get sensor readings via MQTT messages every 10 seconds and persist them in InfluxDB. Then, yesterday evening I refreshed mqtt.things and some other OH config files to cleanup on text formatting and the MQTT broker (Mosquitto) connection reestablished. After that no new entries where stored in InfluxDB, only the same value over and over again because of a persistence rule to write on every change and every minute, until I restarted OH this morning.

The openhab.log file at the time of the MQTT broker connection restart:

2019-01-10 21:32:42.326 [INFO ] [el.core.internal.ModelRepositoryImpl] - Refreshing model 'mqtt.things'
2019-01-10 21:32:42.409 [INFO ] [.transport.mqtt.MqttBrokerConnection] - Starting MQTT broker connection to '192.168.1.2' with clientid paho639293970054699 and file store '/opt/openhab2/userdata/mqtt/192.168.1.2'

The events.log file:

2019-01-10 21:32:42.400 [hingStatusInfoChangedEvent] - 'mqtt:broker:mosquitto' changed from ONLINE to OFFLINE
2019-01-10 21:32:42.402 [hingStatusInfoChangedEvent] - 'mqtt:topic:mosquitto:dsmr' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2019-01-10 21:32:42.406 [hingStatusInfoChangedEvent] - 'mqtt:broker:mosquitto' changed from ONLINE to OFFLINE
2019-01-10 21:32:42.410 [me.event.ThingUpdatedEvent] - Thing 'mqtt:broker:mosquitto' has been updated.
2019-01-10 21:32:42.423 [hingStatusInfoChangedEvent] - 'mqtt:broker:mosquitto' changed from OFFLINE to ONLINE
2019-01-10 21:32:42.426 [hingStatusInfoChangedEvent] - 'mqtt:topic:mosquitto:dsmr' changed from OFFLINE (BRIDGE_OFFLINE) to ONLINE
2019-01-10 21:32:42.427 [me.event.ThingUpdatedEvent] - Thing 'mqtt:broker:mosquitto' has been updated.

And the mosquitto.log file:

1547152362: Received UNSUBSCRIBE from paho536918870074112
1547152362:     homeassistant/#
1547152362: paho536918870074112 homeassistant/#
1547152362: Sending UNSUBACK to paho536918870074112
1547152362: Received UNSUBSCRIBE from paho536918870074112
1547152362:     homie/+/$homie
1547152362: paho536918870074112 homie/+/$homie
1547152362: Sending UNSUBACK to paho536918870074112
1547152362: Received UNSUBSCRIBE from paho536918870074112
1547152362:     sensor/dsmr
1547152362: paho536918870074112 sensor/dsmr
1547152362: Sending UNSUBACK to paho536918870074112
1547152362: Received DISCONNECT from paho536918870074112
1547152362: Client paho536918870074112 disconnected.
1547152362: New connection from 192.168.1.2 on port 1883.
1547152362: New client connected from 192.168.1.2 as paho639293970054699 (c1, k60, u'USER').
1547152362: No will message specified.
1547152362: Sending CONNACK to paho639293970054699 (0, 0)

After that moment all entries in InfluxDB remain the same (0.737 kW) as seen with this query:

> SELECT * FROM Pwr_Use WHERE time > now() - 851m AND time < now() - 825m
name: Pwr_Use
time                     value
----                     -----
2019-01-10T20:31:08.71Z  0.743
2019-01-10T20:31:18.731Z 0.741
2019-01-10T20:31:28.754Z 0.748
2019-01-10T20:31:38.773Z 0.732
2019-01-10T20:31:48.792Z 0.741
2019-01-10T20:31:58.809Z 0.744
2019-01-10T20:32:00.004Z 0.744
2019-01-10T20:32:08.831Z 0.74
2019-01-10T20:32:18.857Z 0.756
2019-01-10T20:32:28.932Z 0.745
2019-01-10T20:32:38.892Z 0.737
2019-01-10T20:33:00.004Z 0.737
2019-01-10T20:34:00.004Z 0.737
2019-01-10T20:35:00.003Z 0.737
2019-01-10T20:36:00.004Z 0.737
2019-01-10T20:37:00.004Z 0.737
2019-01-10T20:38:00.004Z 0.737
2019-01-10T20:39:00.004Z 0.737
2019-01-10T20:40:00.004Z 0.737
2019-01-10T20:41:00.004Z 0.737
  ...

This goes on until I detected this issue and restarted OH as a first attempt at mitigating this. Then the entries are looking good again:

> SELECT * FROM Pwr_Use WHERE time > now() - 10m
name: Pwr_Use
time                     value
----                     -----
2019-01-11T10:49:52.692Z 0.29
2019-01-11T10:50:00.005Z 0.29
2019-01-11T10:50:02.717Z 0.254
2019-01-11T10:50:12.73Z  0.224
2019-01-11T10:50:22.753Z 0.193
2019-01-11T10:50:32.771Z 0.21
2019-01-11T10:50:42.791Z 0.227
2019-01-11T10:50:52.813Z 0.251
2019-01-11T10:51:00.005Z 0.251
2019-01-11T10:51:02.835Z 0.244
2019-01-11T10:51:12.859Z 0.226
2019-01-11T10:51:22.873Z 0.224
2019-01-11T10:51:32.897Z 0.214
2019-01-11T10:51:42.913Z 0.206
2019-01-11T10:52:00.004Z 0.206
2019-01-11T10:52:02.955Z 0.135
2019-01-11T10:52:12.98Z  0.075
2019-01-11T10:52:22.996Z 0.034
2019-01-11T10:52:33.018Z 0
2019-01-11T10:53:00.004Z 0
2019-01-11T10:53:43.158Z 2.162
2019-01-11T10:53:53.173Z 0
  ...

I have no actual proof that the data coming in from the DSMR sensor provides correct values after the reconnect, but since MQTT keeps receiving data and all things start working again after an OH restart, that is at least somewhat unlikely. :grinning: The mosquitto.log file after the reconnect shows:

1547152368: Received PUBLISH from dsmrv4 (d0, q0, r1, m0, 'sensor/dsmr', ... (282 bytes))
1547152373: Received PINGREQ from dsmrv4
1547152373: Sending PINGRESP to dsmrv4
1547152378: Received PUBLISH from dsmrv4 (d0, q0, r1, m0, 'sensor/dsmr', ... (282 bytes))
1547152388: Received PINGREQ from dsmrv4
1547152388: Sending PINGRESP to dsmrv4
1547152388: Received PUBLISH from dsmrv4 (d0, q0, r1, m0, 'sensor/dsmr', ... (282 bytes))
1547152398: Received PUBLISH from dsmrv4 (d0, q0, r1, m0, 'sensor/dsmr', ... (282 bytes))
1547152403: Received PINGREQ from dsmrv4
1547152403: Sending PINGRESP to dsmrv4
1547152408: Received PUBLISH from dsmrv4 (d0, q0, r1, m0, 'sensor/dsmr', ... (282 bytes))
1547152418: Received PINGREQ from dsmrv4
1547152418: Sending PINGRESP to dsmrv4
1547152419: Received PUBLISH from dsmrv4 (d0, q0, r1, m0, 'sensor/dsmr', ... (282 bytes))
1547152422: Received PINGREQ from paho639293970054699
1547152422: Sending PINGRESP to paho639293970054699
1547152429: Received PUBLISH from dsmrv4 (d0, q0, r1, m0, 'sensor/dsmr', ... (282 bytes))
1547152433: Received PINGREQ from dsmrv4
1547152433: Sending PINGRESP to dsmrv4
1547152436: Saving in-memory database to /var/lib/mosquitto/mosquitto.db.
1547152439: Received PUBLISH from dsmrv4 (d0, q0, r1, m0, 'sensor/dsmr', ... (282 bytes))
1547152448: Received PINGREQ from dsmrv4
1547152448: Sending PINGRESP to dsmrv4
  ...

Of course, I will add logging of the actual received MQTT message to hopefully (or not) catch a next time, to ensure Mosquitto is receiving the correct data, but in the mean time, I wonder if anyone else has experienced a kind of ‘stalled’ MQTT connection with the new binding?

Yes, this is a bug caused by:

The bug is fixed in the snapshots

1 Like

Thanks. I will update to current 2.5.0-SNAPSHOT release.