MQTT minimum delay time between messages that openHAB can process them

Hi,
I have a thing consisting of an ESP32 programmed by me that sends messages to openHAB via MQTT based on the Homie standard.
I experienced moments where apparently the ESP32 toggles values on a homie channel quicker than openHAB detects them.

Example:

2021-01-20 20:28:13.055 homie/door-thing/bell/button --> true
2021-01-20 20:28:13.056 homie/door-thing/bell/button --> false
Switch doorbell "Door Bell" <door> {channel="mqtt:homie300:mosquitto:door-thing:bell#button" }
rule "DoorBell"
when
   Item doorbell changed to ON
then
   logInfo("doorbell", "Door bell button pressed.")

Sometimes it happens that the rule is not triggered, not even the change of the item is logged in the events.log. I assume that the rising edge from “false” to “true” and the falling edge from “true” to “false” comes too close after another such that openHAB or the mqtt binding does not detect the glitch and therefore does not forward the item change.

So my question is, what is the minimum gap delay time two MQTT messages must have to each other sucht that they can be safely recognized by the binding? Is there a poll sleep period of some monitoring thread that must be overcome to make the signal recognized?

Thanks,
stedon81

Most likely you are using QOS 0 for the messages which means “at most once”. When the messages happen too fast either the broker or the end client might drop messages. Try QOS 1 or QOS 2 for both the devices and the MQTT Broker Thing in openHAB.

Assuming this is the problem, which I’m not convinced is the case, there is no way to answer that. It’s going to depend on your machine, what else is going on on that machine, and the network conditions at the time that the messages are sent. Sometimes it might be 10 msec, other times it might 5 seconds.

I think this binding manages a queue of incoming notifications, so it should not matter how quickly they come.
However, it is not impossible that somewhere along the way there is only one “slot” for a payload for any given topic. It might just be possible to replace the payload before it’s been fully processed, with the result that no change appears on the openHAB events bus.
I think that’s unlikely, and you’d have to be blindingly fast.

I’d be more inclined to follow Rich’s idea of dropped messages. Playing against that - what is the little log that you have shown us, is that the broker’s view? Is that consistent and never loses messages?

A little tell-tale rule might offer a clue -even if a message is getting dropped or overwritten somewhere, we would expect one to get through and ultimately produce an update-to-same-state on your Item.
That will not be seen in logs, but you can capture with a received event rule trigger.

Or use DEBUG logging on the binding.

Hi Rich,

thanks for your quick answer.

On ESP32 side, I use the homieiot/homie-esp8266 library. From the source code, I can see that it uses hard-coded QoS level 1 for every message:

SendingPromise& HomieNode::setProperty(const String& property) const {
  Property* iProperty = this->getProperty(property);
  if (iProperty &&  iProperty->isRetained()) {
      return Interface::get().getSendingPromise().setNode(*this).setProperty(property).setQos(1).setRetained(true);
  } else {
      return Interface::get().getSendingPromise().setNode(*this).setProperty(property).setQos(1);
  }
}

My MQTT binding is configured with QoS level 1. From the source code, I can see that this is used for sending messages:

connection.publish(topic, value.getBytes(), connection.getQos(), retain != null && retain.booleanValue())
                .thenRun(() -> {
                    logger.debug("MQTT publish to {} performed", topic);
                }).exceptionally(e -> {
                    logger.warn("MQTT publish to {} failed!", topic);
                    return null;
                });

… and further it seems that the homie-mqtt uses QoS of 1 for subscriptions on every connection also hard-coded:

    protected CompletableFuture<@Nullable Void> start(MqttBrokerConnection connection) {
        logger.debug("About to start Homie device {}", device.attributes.name);
        // We have mostly retained messages for Homie. QoS 1 is required.
        connection.setRetain(true);
        connection.setQos(1);
        return device.subscribe(connection, scheduler, attributeReceiveTimeout).thenCompose((Void v) -> {
            return device.startChannels(connection, scheduler, attributeReceiveTimeout, this);
        }).thenRun(() -> {
            logger.debug("Homie device {} fully attached", device.attributes.name);
        });
    }

The MQTT seems to connect to the broker using listeners, which should be triggered by the broker upon a message arrival. So opposed to my initial thinking, there is no polling thread that samples the connection and could miss any state change…

What else could I try to look at to investigate on this problem?

Thanks,
stedon81

Hi rossko57,

thanks for your reply.
Actually, the little log isn’t really existing, I just made it up for illustration, as I can’t see a real-time log from my ESP32 device.

The issue happens only rarely, so it’s hard to debug. I probably have to switch on debug logging of moqsuitto broker and mqtt binding and quickly analyze as soon as the issue happens. You know, of course the issue always happens at moments when you don’t have time to chew logs over…

Regards,
stedon81

You need to configure the MQTT Thing on the openHAB side to also subscribe with QOS 1. The QOS is setting the quality of service between the client and the broker, not end to end from client to client.

Typically the libraries work by polling the broker for new messages when they are able to process them. If the MQTT config is using QOS 0 on the subscriptions then if a message arrives and then is replaced with another one between the polling periods it will miss the message. If it has QOS 1 it might receive the message more than once. If it has QOS 2 it will only receive the message once.

Think I’d start there, use broker log to find out message existence and timing.
I’m guessing as a doorbell that you can manually test.

ohhh . I thought the broker publishes to (all) subscribed clients i.e. pushes the message asynchronously.

One of the driving designs around MQTT is it’s the client that is calling the shots. Therefore it’s the client who has to tell the broker it’s ready to receive a message.

Now this usually happens in the background and all the MQTT libraries I’ve used have a way to tell it to “just set up your own polling loop” so from my program’s perspective it looks like the messages are pushed. But what’s really going on is there is a polling loop. It’s even more apparent that is going on when you code MQTT into a microcontroller like and ESP or Arduino.

Paho Python MQTT Client-Understanding The Loop has a good description of what is going on from the Paho Python library perspective. The “loop_start” function is the one that sets up the polling loop in the background.

Ah, I read that as the client software maintaining an inbound buffer/queue, waiting for user software (in this case the binding) to pick messages off.
i.e. the binding polls paho running internally on OH host.
I was focused on the message being pushed over TCP from the broker, which I still think happens.

Polled buffer looks like a potential place to drop or overwrite a message.

If we get down to this level though, it’s not likely to get fixed promptly.

How well debounced is your doorbell signal? What are the chances that you’re making a burst of messages, let alone two in quick succession? Broker log may show this up.

Hi Rich and rossko,

thanks for all your replies and discussion, obviously you live in a different time zone than me.

Not sure if this discussion goes into the wrong direction because I can’t follow you or you can’t follow me.

The problem is not on the thing’s subscription side, but on the openHAB server’s subscription.

Let me first sync my knowledge on MQTT with yours:

  • There are two clients to my mosquitto broker: The ESP32-based thing and the openHAB MQTT binding. Both clients can choose the QoS level for both publishing and subscribing separately
  • As proven by code snippets above, the homieiot-mqtt publishes all values with QoS 1
  • As proven by code snippets above, the MQTT binding uses a QoS level of 1 for homie subscriptions
  • The way how IoT mqtt libraries are implementing subscriptions doesn’t matter here because we are talking solely about the direction from the thing to openHAB.
  • From a TCP point of view, the subscribers are clients that connect to the listening moquitto server, but that does not necessarily mean that there is a dead time during polling (depending on the transport protocol that is used for mqtt, the server can send unrequested messages to the client, or the client could use long-polling)
  • From the source code of the MQTT binding and of the core mqtt implementation, I can see that under the hood, hivemq’s asynchronous API is used. This returns FutureCallables for each received message, but here my Java skills unfortunately end to know if there is any limit in reaction time involved.

So in the end, for me it comes down to the question how the hivemq lib receives the data from the broker. As the moqsuitto server should take care that both messages are acknowledged (QoS 1), only possibility would be that the data is lost within the lib, maybe because some buffers are too small or there is some concurrency bug.

@David_Graeff : Do you have experience with fast message sequences and can contradict to my suspicions out of experience?

After one day of monitoring with enabled debug logs the issue did not re-occur yet, which makes me think that the additional load for logging on mosquitto might hide it now.

Regards,
stedon81

@rossko57 : Regarding your question on debouncing: Even if I had a problem with debouncing - without the described issue, openHAB should recognize a transition. If the button would not be debounced, then I actually would expect even several transitions instead of none. Nevertheless, my code recognizing the button press and the code publishing the values are decoupled, and I always only send a true/false sequence quickly after another.

So, can you share how quickly your messages actually arrive at the broker, now you have logging in place there? The logs even for a “working” bellpush are going to be informative.
From our point of view, this is all supposition based on few facts yet.

Yes indeed. But you have a real life problem - if it comes down to something obscure in an underlying library, it is not going to get fixed quickly. If it is what you think, the practical circumvention can be made at your ESP32 by just throttling messages.

The two are synonymous. openHAB represents the device as a Thing. The Thing has Channels. the Channels subscribe to the MQTT topics.

Thing has a special meaning in openHAB. When I say “set the QOS on the Thing” I literally mean the Thing in openHAB. I will always call your ESP the “device” to distinguish between the “stuff” that exists outside of openHAB and the Thing which represents the device in openHAB’s configuration.

Anyway, this doc is for Websphere but the debugging approach is valid for MQTT in general for the most part. https://www.ibm.com/support/knowledgecenter/SSFKSJ_7.5.0/com.ibm.mq.tro.doc/q039420_.htm

There is a bit of hand waving going on here that is hiding some other potential places where things can go wrong.

Remember that the acks are between the client and the broker. Just because the broker acked that it got the message from your ESP is no guarantee that openHAB received the message. That send/ack is a separate independent transaction between openHAB and the broker. If something goes wrong on the broker (e.g. your max_queued_messages setting is too low so the message can’t be buffered) the message may get dropped if a new one comes in before the old one was sent. So there could be something going on with the broker here. I’m not saying it is, but it could be. Therefore the message being lost within the lib is not the only possible place that the message could be lost.

Hi Rich,
thanks for your reply and your patience with me.
I shouldn’t do my smartHome stuff only in the late evenings, cause the brain is not awake enough anymore at those times.
You’re absolutely right - thinking about it with a fresher mind got me realize that there are many potential sources of errors in that chain, and I have to systematically track them down instead of shooting at a random library. In fact, after thinking more about it, it is very unlikely that the asynchronous approach of hiveMQ drops this message, and it is way more likely that I do something stupid in my ESP code.
Moreover, after a look at my ESP code I realized that I even have a minimum gap between those signals of 10ms, whereas I use a minimum gap of 50ms between signals for another use case and there are no problems. So somewhere it has to do with the message delay, but the root cause is still unclear. As with the homieiot library, there are also threads involved, I have to take a deeper look into that corner, too.
Further, in the meantime I found some more stability problems regarding deep sleep with the homieiot library, so I can’t exclude the possibility that the messages are not even sent because of problems there - as said, I don’t even know where the messages get lost because as soon as I switch on debug logging, the issue is gone.

Two cents to your reply:

  • Thanks for making clear that one should care better about Nomenclature, regarding “device” and “thing”, that’s a very good point
  • Your advise to set QoS to 1 in the Thing settings of openHAB doesn’t work here, because this is not a generic MQTT thing, but a homie MQTT thing - there is no such setting because the homie MQTT binding hard-codes the QoS to 1 (at least I believe so from the code)

Regards,
stedon81

There is diagnostic value in this.
If I understand you correctly, when you switch debugging on in the broker then the issue is gone, all works well?
Most likely not your undisturbed ESP32 device then.
About the only way it could be is if the broker normally ACKs so quickly the ESP misses that. Presumably you’d see repeat/retry messaging if that were the case.

Hi,

Today, the bell button worked fine, but openHAB missed updates on a different channel from the same device. After I restarted openHAB, that problematic channel was fine again. From the debug log of the mqtt binding I could see that it subscribed correctly to the channel in both scenarios.

I had a close look to the logs of moqsuitto now and I think I’ve found an issue that might be related to that.
The log shows a button press on the bell button, as well as an update of the device mode of my door device.
Subscriber ce05f2fc-7936-4193-ab5a-f5e87258aa60 is openHAB, subscriber b12211e3-9cc0-4f92-80d4-edf2808cf7ef1611310210630 is “MQTTBox” on my laptop. In MQTTBox, I have subscribed only to the device mode.

1611313412: Received PUBLISH from DP-door-thing (d0, q1, r1, m82, 'homie/door-thing/bell/button', ... (4 bytes))
1611313412: Sending PUBACK to DP-door-thing (Mid: 82)
1611313412: Sending PUBLISH to homie-ota-671 (d0, q0, r0, m0, 'homie/door-thing/bell/button', ... (4 bytes))
1611313412: Sending PUBLISH to mqtt-explorer-36b27257 (d0, q0, r0, m0, 'homie/door-thing/bell/button', ... (4 bytes))
1611313412: Sending PUBLISH to ce05f2fc-7936-4193-ab5a-f5e87258aa60 (d0, q0, r0, m0, 'homie/door-thing/bell/button', ... (4 bytes))
1611313412: Received PUBLISH from DP-door-thing (d0, q1, r1, m83, 'homie/door-thing/bell/button', ... (5 bytes))
1611313412: Sending PUBACK to DP-door-thing (Mid: 83)
1611313412: Sending PUBLISH to homie-ota-671 (d0, q0, r0, m0, 'homie/door-thing/bell/button', ... (5 bytes))
1611313412: Sending PUBLISH to mqtt-explorer-36b27257 (d0, q0, r0, m0, 'homie/door-thing/bell/button', ... (5 bytes))
1611313412: Sending PUBLISH to ce05f2fc-7936-4193-ab5a-f5e87258aa60 (d0, q0, r0, m0, 'homie/door-thing/bell/button', ... (5 bytes))
1611313412: Received PUBLISH from DP-door-thing (d0, q1, r1, m84, 'homie/door-thing/device/mode', ... (4 bytes))
1611313412: Sending PUBACK to DP-door-thing (Mid: 84)
1611313412: Sending PUBLISH to homie-ota-671 (d0, q0, r0, m0, 'homie/door-thing/device/mode', ... (4 bytes))
1611313412: Sending PUBLISH to mqtt-explorer-36b27257 (d0, q0, r0, m0, 'homie/door-thing/device/mode', ... (4 bytes))
1611313412: Sending PUBLISH to b12211e3-9cc0-4f92-80d4-edf2808cf7ef1611310210630 (d0, q1, r0, m47, 'homie/door-thing/device/mode', ... (4 bytes))
1611313412: Sending PUBLISH to ce05f2fc-7936-4193-ab5a-f5e87258aa60 (d0, q0, r0, m0, 'homie/door-thing/device/mode', ... (4 bytes))
1611313412: Received PUBACK from b12211e3-9cc0-4f92-80d4-edf2808cf7ef1611310210630 (Mid: 47)

The subscriptions from openHAB look like this:

1611310431: Received SUBSCRIBE from ce05f2fc-7936-4193-ab5a-f5e87258aa60
1611310431:     homie/door-thing/bell/button (QoS 0)
1611310431: ce05f2fc-7936-4193-ab5a-f5e87258aa60 0 homie/door-thing/bell/button
1611310431: Sending SUBACK to ce05f2fc-7936-4193-ab5a-f5e87258aa60
1611310431: Sending PUBLISH to ce05f2fc-7936-4193-ab5a-f5e87258aa60 (d0, q0, r1, m0, 'homie/door-thing/bell/button', ... (5 bytes))
1611310431: Received SUBSCRIBE from ce05f2fc-7936-4193-ab5a-f5e87258aa60
1611310431:     homie/door-thing/device/mode (QoS 0)
1611310431: ce05f2fc-7936-4193-ab5a-f5e87258aa60 0 homie/door-thing/device/mode
1611310431: Sending SUBACK to ce05f2fc-7936-4193-ab5a-f5e87258aa60
1611310431: Sending PUBLISH to ce05f2fc-7936-4193-ab5a-f5e87258aa60 (d0, q0, r1, m0, 'homie/door-thing/device/mode', ... (5 bytes))

What you can see is, that openHAB does subscribe with a QoS level of 0. So either I interpreted the code wrong or looked at the wrong code location, or there is a bug in the MQTT homie implementation.
@David_Graeff : Can you tell me how I can configure the homie-MQTT binding to subscribe with a QoS level of 1? – I’m using openHABian 3.0.0 - Release Build

Best regards,
stedon81

P.S.: @rlkoshak : Seems that you have been right already in your first post - might be related to wrong QoS level :wink:

Reminder: that does not affect QoS of incoming messages for openHAB, which are determined by the publisher (doorbell)

EDIT - I stand corrected by @stedon81 , this is rubbish!

As rich already pointed out, each client can define the QoS for both directions, so my device defines QoS 1 for publishing, and QoS 1 for subscription. However, openHAB does define QoS 0 for subscription.

As you can see from the log above, my device is sending with QoS 1, and therefore gets an ACK:

1611313412: Received PUBLISH from DP-door-thing (d0, q1, r1, m84, 'homie/door-thing/device/mode', ... (4 bytes))
1611313412: Sending PUBACK to DP-door-thing (Mid: 84)

However, the MQTT openHAB binding does subscribe with a QoS 0, and therefore does not send an ACK on reception of the event:

1611313412: Sending PUBLISH to ce05f2fc-7936-4193-ab5a-f5e87258aa60 (d0, q0, r0, m0, 'homie/door-thing/device/mode', ... (4 bytes))

whereas the MQTTBox tool subscribes with QoS 1 and therefore sends an ACK:

1611313412: Sending PUBLISH to b12211e3-9cc0-4f92-80d4-edf2808cf7ef1611310210630 (d0, q1, r0, m47, 'homie/door-thing/device/mode', ... (4 bytes))
1611313412: Received PUBACK from b12211e3-9cc0-4f92-80d4-edf2808cf7ef1611310210630 (Mid: 47)

That was a misconception I had for a long time too. The key is to realize that the QOS is an agreement between the client and the broker only, not an end-to-end agreement. So it is perfectly possible and reasonable to have a client publish a message with QOS 0 but a subscriber subscribe to the topic with QOS 2 and another client subscribe to the same topic with QOS 1. So the same message on the same topic could be delivered to different clients using different QOS levels from each other and different from the original QOS used to get the message to the broker.

@stedon81, if you’ve confirmed that the subscription is in fact using QOS 0, your best bet is to file an issue on the binding at this point. While David pokes his head in here periodically he has largely moved on from openHAB development and others have taken over maintenance of the binding.

Hi Rich,

thanks, didn’t know that David Graeff isn’t active on the mqtt binding anymore.
I filed a bug for this issue: [mqtt.homie] Binding subscribes/publishes with QoS 0 although openHAB setting is different · Issue #9919 · openhab/openhab-addons · GitHub

I crawled through the code and I really can’t find where that information got lost. I would be very surprised if that bug is in the underlying hiveMQ library. However, it indeed seems that I’m the first one who wants to use QoS levels, because while I crawled through the code I found another obvious bug which should have been found by anyone before if QoS would have been used: io transport mqtt sets wrong QoS levels to hiveMQ · Issue #2145 · openhab/openhab-core · GitHub

Maybe it’s again my special configuration with a mosquitto broker? Wonder if there is anyone out there who can reproduce the bug. Let’s see, I’m curious…

regards,
stedon81

I think it’s relatively unusual for openHAB users to encounter a case where MQTT messages occur so close together. So even at a QOS 0 there never would be a loss of a message. As rossko57 points out, where possible when events are that close together and need to be processed, we try to move the processing to the device itself. No pub/sub protocol like MQTT is really designed to reliably process messages within 10 msec of each other in a timely manner. Either messages get dropped or the latency grows significantly.

I’m not surprised no one has reported a similar problem to date.