MQTT items unreliable

openHAB 2.5.0m1 and Mosquitto on Raspberry Pi 3

Another day, another problem.

So, my kitchen fan remotes didn’t work just now. This is the second time it happened – a perfectly working setup that failed on its own. I let it slide the first time but I wasn’t going to this time, so I immediately started troubleshooting.
433mhz RF – check.
MCU (esp8266) – check.
MQTT Explorer: Shows the correct messages on the correct topic when the remote is pressed.

From that point, I use MQTT Explorer to publish to the same topic so I don’t have to go to the kitchen.

I have generic MQTT thing subscribed to home/mcu/LivingRoomMCU/433mhz/KitchenFan which is the correct topic. I saw nothing in the logs when I publish to the topic.
I have a rule that triggers on update, but even if not, there should still be an entry in the event log, and there wasn’t.

I tried reloading the script (by making a change). No effect.
I tried reloading the MQTT binding by making a change to its settings. No effect.
Not until I reloaded the generic MQTT thing by changing the topic and then changing the topic back did it spring back to life!

So, it was the generic MQTT thing all along. It appears to have failed silently. It was sitting there showing “online” in green yet it was completely deaf to any messages on its topic!

Now, I have spent the day writing my own lightweight Homie 3.x library for ESP8266, and I added a couple of other MQTT items. I never touched the binding. I don’t believe that it should be possible for one item to kill another in this way. That’s not reasonably expectable behavior so if that’s normal it should be documented so that it can be known.

So… any ideas?

Here’s my logs:

The logs are from the last successful KitchenCeilingFanRemotes event at 09:06:24.250, until after I reloaded the generic mqtt thing. I started troubleshooting around 16:20.

Don’t be confused by the “KitchenCeilingFans” item – that’s a Z-Wave switch module that controls power to all the fans, and because it doesn’t have a “remember last state” or “on after power failure” configuration option, I have a rule tell it to turn on every five minutes.

Any ideas?

Two players in that game; the broker has to remember to push messages to OH.
Just saying the subscription needs to be active at both ends.

1 Like

That is a good point. I actually didn’t think of that!
If it happens again, how would I know whether that was the problem? Normally i’d wireshark but in this case it’s 127.0.0.1 on a raspberry pi (mosquitto running on the same machine).

I don’t know what diagnostic features are available for mosquitto

Does anyone else think Mosquitto may be the problem? Has anyone else experienced something like this? I kinda doubt that subscription is the problem considering how widely used mosquitto is?

I’d be very surprised if it was a Mosquitto issue but it could be either an OH issue, a current stalled connection or perhaps a reconnect that has somehow happened but the subscription was not reinstated. Are other expected subscribed topics still live ?

I don’t think Mosquitto diags allow you to examine connected clients and their subscriptions.

K

1 Like

It’s happening again right now. I will find out how to spy on the connection in linux (a la wireshark) and report back.

It WAS the link between mosquitto and openHAB!! I’ll be damned. @rossko57 was right.

sudo tcpdump -i lo -w dumpfile
…receives nothing when I publish to the topic, even though the generic MQTT thing shows as online.
After I change the topic and back (to reinitialize the generic MQTT thing), and things work again, tcpdump does get data.

Dang. How do I debug that?

Edit: found the mosquitto log in /var/log/mosquitto but it logs timestamps in unix epoch format. I was able to translate to human-readable timestamps with

cat moquitto.log | perl -pe 's/(\d+)/localtime($1)/e' > mosquitto.log.txt

I can then see that the openhab client (paho322960888581482) does not disconnect. It stays connected. By the way the MQTT binding client name gui option is broken, it is never saved.

Mosquitto does not log subscriptions by default, so that’s a dead end for now. I will turn more detailed logging and report back.

This is the first time this behavior had been reported so it’s going to take some investigating.

Here is what I would do.

  • put the mqtt binding into trace logging and shunt the logs to their own file
  • use MQTT Explorer to publish test messages
  • see how or whether it’s possible to get more detailed logs out of Mosquito
  • watch for it to happen again

See if there is anything that stands out.

As long as the binding thinks the connection to the broker is ok, out will show the generic things as online.

Good luck! It really is odd.

@David_Graeff do you have any suggestions?

1 Like

I just configured mosquitto for log_type all, holy crap that thing is chatty. No worries, I’ll set up a frontail port for it now that I know how.

Looking up how to set the MQTT binding into trace mode.

The way things are going I’m gonna need more monitors. :slight_smile:

1 Like

Okay, I think I’ve got logging set up now. I had some trouble at first, naturally.

Here are the mqtt related bundles in my system:

openhab> list -s | grep -i mqtt
222 │ Active   │  80 │ 1.2.0                  │ org.eclipse.paho.client.mqttv3
236 │ Active   │  80 │ 0.11.0.oh250M1         │ org.eclipse.smarthome.binding.mqtt
237 │ Active   │  80 │ 0.11.0.oh250M1         │ org.eclipse.smarthome.binding.mqtt.generic
242 │ Active   │  80 │ 0.11.0.oh250M1         │ org.eclipse.smarthome.io.transport.mqtt

org.eclipse.paho.client.mqttv3 is unbelievably chatty. Like 15 kilobytes per event. I think that may be too extreme to be useful, becase it’s going to end up being an insane amount of data in several hours? The things logged are all at DEBUG level, so changing it from TRACE to DEBUG wouldn’t help either.

org.eclipse.smarthome.binding.mqtt is much more reasonable but it only show publishings, not subscriptions.

For example:

2019-07-01 10:17:34.747 [DEBUG] [home.binding.mqtt.action.MQTTActions] - MQTT publish to home/mcu/kitchen-fan-1/inbound/speed performed
2019-07-01 10:17:34.760 [DEBUG] [home.binding.mqtt.action.MQTTActions] - MQTT publish to home/mcu/kitchen-fan-2/inbound/speed performed
2019-07-01 10:17:34.773 [DEBUG] [home.binding.mqtt.action.MQTTActions] - MQTT publish to home/mcu/kitchen-fan-3/inbound/speed performed 

That’s what my script publishes to the actual fan MCUs. That part has never failed. Even when the openhab generic mqtt item’s subscription has failed, I can still push these buttons:


…and the script picks up on it and publishes the MQTT messages for the fan MCUs. The fans go “bleep” and then start spinning.

Note the absense of any incoming MQTT messages, though.

org.eclipse.smarthome.binding.mqtt.generic doesn’t yield any logs at all.

The LAST one, org.eclipse.smarthome.io.transport.mqtt, seems to be the winner.
Check this out.

2019-07-01 10:22:39.599 [TRACE] [ansport.mqtt.internal.ClientCallback] - Received message on topic 'home/mcu/LivingRoomMCU/433mhz/KitchenFan' : KitchenFan*long
2019-07-01 10:22:39.603 [TRACE] [ansport.mqtt.internal.ClientCallback] - No topic match for 'home/mcu/LivingRoomMCU/433mhz/KitchenFan' using regex homie/TestMQTTdev/\$name
2019-07-01 10:22:39.604 [TRACE] [ansport.mqtt.internal.ClientCallback] - No topic match for 'home/mcu/LivingRoomMCU/433mhz/KitchenFan' using regex homie/TestMQTTdev/\$homie
2019-07-01 10:22:39.606 [TRACE] [ansport.mqtt.internal.ClientCallback] - No topic match for 'home/mcu/LivingRoomMCU/433mhz/KitchenFan' using regex homie/TestMQTTdev/\$state
2019-07-01 10:22:39.607 [TRACE] [ansport.mqtt.internal.ClientCallback] - Topic match for 'home/mcu/LivingRoomMCU/433mhz/KitchenFan' using regex home/mcu/LivingRoomMCU/433mhz/KitchenFan
2019-07-01 10:22:39.609 [TRACE] [ansport.mqtt.internal.ClientCallback] - No topic match for 'home/mcu/LivingRoomMCU/433mhz/KitchenFan' using regex homie/TestMQTTdev/\$nodes
2019-07-01 10:22:39.703 [TRACE] [ansport.mqtt.internal.ClientCallback] - Message with id 133 delivered.
2019-07-01 10:22:39.726 [TRACE] [ansport.mqtt.internal.ClientCallback] - Message with id 134 delivered.
2019-07-01 10:22:39.752 [TRACE] [ansport.mqtt.internal.ClientCallback] - Message with id 135 delivered. 

That’s the one you meant isn’t it?

Posting log items from intentionally changing the topic subscription in the detail block below for posterity.

mosquitto.log:

1561954634: Received UNSUBSCRIBE from openhab
1561954634: 	home/mcu/LivingRoomMCU/433mhz/KitchenFan2
1561954634: openhab home/mcu/LivingRoomMCU/433mhz/KitchenFan2
1561954634: Received SUBSCRIBE from openhab
1561954634: 	home/mcu/LivingRoomMCU/433mhz/KitchenFan (QoS 1)
1561954634: openhab 1 home/mcu/LivingRoomMCU/433mhz/KitchenFan
1561954634: Sending SUBACK to openhab 

mqtt.log (org.eclipse.smarthome.io.transport.mqtt)

2019-07-01 11:16:57.826 [TRACE] [.transport.mqtt.MqttBrokerConnection] - Unsubscribing message consumer for topic 'home/mcu/LivingRoomMCU/433mhz/KitchenFan' from broker '127.0.0.1'
2019-07-01 11:17:14.321 [TRACE] [.transport.mqtt.MqttBrokerConnection] - Unsubscribing message consumer for topic 'home/mcu/LivingRoomMCU/433mhz/KitchenFan2' from broker '127.0.0.1'
2019-07-01 11:20:41.335 [TRACE] [ansport.mqtt.internal.ClientCallback] - Received message on topic 'home/mcu/LivingRoomMCU/433mhz/KitchenFan' : KitchenFan*long
2019-07-01 11:20:41.336 [TRACE] [ansport.mqtt.internal.ClientCallback] - No topic match for 'home/mcu/LivingRoomMCU/433mhz/KitchenFan' using regex homie/TestMQTTdev/\$name
2019-07-01 11:20:41.337 [TRACE] [ansport.mqtt.internal.ClientCallback] - No topic match for 'home/mcu/LivingRoomMCU/433mhz/KitchenFan' using regex homie/TestMQTTdev/\$homie
2019-07-01 11:20:41.339 [TRACE] [ansport.mqtt.internal.ClientCallback] - No topic match for 'home/mcu/LivingRoomMCU/433mhz/KitchenFan' using regex homie/TestMQTTdev/\$state
2019-07-01 11:20:41.340 [TRACE] [ansport.mqtt.internal.ClientCallback] - Topic match for 'home/mcu/LivingRoomMCU/433mhz/KitchenFan' using regex home/mcu/LivingRoomMCU/433mhz/KitchenFan
2019-07-01 11:20:41.341 [TRACE] [ansport.mqtt.internal.ClientCallback] - No topic match for 'home/mcu/LivingRoomMCU/433mhz/KitchenFan' using regex homie/TestMQTTdev/\$nodes
2019-07-01 11:20:41.473 [TRACE] [ansport.mqtt.internal.ClientCallback] - Message with id 158 delivered.
2019-07-01 11:20:41.487 [TRACE] [ansport.mqtt.internal.ClientCallback] - Message with id 159 delivered.
2019-07-01 11:20:41.498 [TRACE] [ansport.mqtt.internal.ClientCallback] - Message with id 160 delivered.

It seems that the org.eclipse.smarthome.io.transport.mqtt shows unsubscribes but not subscribes. Still, mosquittos own logging shows it, so let’s see what happens. Will leave this for now and keep working on my homie v3 esp8266 code. :slight_smile:

GOTCHA!!!

Reproducing at will now, 100%.

openHAB 2.5.0M1 and its MQTT binding unsubscribes my perfectly working Generic MQTT item when I change the Device ID of a completely different working Homie MQTT Device to a non-existent device, bringing the Homie MQTT Device offline (communication error).

The Generic MQTT item (which I did not touch) remains labeled ONLINE but it is in fact UNSUBSCRIBED.

Holy crap that’s fragile. That’s a real bug.
How am I the first person to come across it? Am I really the first person to try to use Generic MQTT and Homie together before?

Edit: I thought I would use the Broker Thing publish trigger as a temporary workaround.
It has the same problem. As soon as a Homie item is offline, it unsubscribes also the publish trigger… so there is NO workaround that I can see.

There appears to be no public ESP8266 Homie 3.0 library (this one has been under development for months and crashes instantly and repeatedly when I tried it), which is why I wrote my own over the weekend, but surely I can’t be the first person to try this? What am I missing? :thinking:

Edit: The only workaround I’ve found is to have two separate MQTT broker things connected to the same mosquitto! One for Homie devices and another for others. At least this way I can develop and still use my kitchen fans.

Edit: My Homie library is taking shape and working well now. For the MQTT communication itself it uses AsyncMqttClient. I noticed that once I had more than a handful of properties, they would get lost on publishing (probably the output buffer gets full), so I made my code throttle the initial added throttling code to send them out more slowly. So, now they all get published properly.

But, now that I know all the properties are in (and I do see them all in MQTT Explorer), I’m noticing similar issues with the openHAB MQTT binding when there’s more than a handful of parameters, namely: **Status: OFFLINE - COMMUNICATION_ERROR Did not receive all required topics **

I’m starting to feel like I’m breaking new ground here – Homie is a great idea but this is definitely not ready for prime time. I will get my library ready for public release as soon as possible so that the bugs in the binding can get fixed too. As a programmer I realize it’s impossible to fix bugs that one cannot see. This new (?) use case should make a whole slew of new bugs obvious. :grin:

2 Likes

Looks like a good find.
I wonder if home in the topic is coincindental, surely something can’t be mungling home and homie
It’s not clear if there are other generic MQTT things that continue to function - is this some selected victims, or all?

1 Like

I named it “home” right as I was starting out with MQTT… a few days before I learned about “homie”. That one isn’t following the homie convention in any way… surely they can’t be getting mixed up? i mean, MQTT is even case sensitive :slight_smile: Also, even if I wanted to subscribe to “homie” in a rule, shouldn’t I be able to? They shouldn’t trample each other.

homielib is up!

First time I ever make an actual redistributable arduino library, as well as release code publically, go easy on it please :slight_smile:

I don’t use the arduino environment myself (i outgrew that in about two seconds), I run sloeber. I tried adding the library to my Arduino environment and wasn’t successful (it accepted it but nothing was added to the menu). If someone with more experience could take a look, that’d be great.

But, the important part:
This is a functioning Homie 3.0.1 library for ESP8266, and with it, it’s easy to demonstrate the problems with openHAB.

Or, so I thought. Wouldn’t you know it, now that I’m testing the final version, I’m not seeing the same error as before (OFFLINE - COMMUNICATION_ERROR Did not receive all required topics). Maybe that’s because I turned off the trace logging and it’s keeping up better now.

I AM however seeing the following every time the ESP publishes an mqtt update:

2019-07-01 18:47:07.287 [WARN ] [.core.thing.binding.BaseThingHandler] - Handler HomieThingHandler of thing mqtt:homie300:506c59a4 tried updating channel nodeid1#hum although the handler was already disposed.

2019-07-01 18:47:09.289 [WARN ] [.core.thing.binding.BaseThingHandler] - Handler HomieThingHandler of thing mqtt:homie300:506c59a4 tried updating channel nodeid1#precip although the handler was already disposed.

2019-07-01 18:47:11.291 [WARN ] [.core.thing.binding.BaseThingHandler] - Handler HomieThingHandler of thing mqtt:homie300:506c59a4 tried updating channel nodeid1#wind although the handler was already disposed.

2019-07-01 18:47:15.292 [WARN ] [.core.thing.binding.BaseThingHandler] - Handler HomieThingHandler of thing mqtt:homie300:506c59a4 tried updating channel nodeid1#hum although the handler was already disposed.

2019-07-01 18:47:17.290 [WARN ] [.core.thing.binding.BaseThingHandler] - Handler HomieThingHandler of thing mqtt:homie300:506c59a4 tried updating channel nodeid1#precip although the handler was already disposed.

Also, obviously the interference with generic MQTT items is still an issue. :slight_smile:

So, if people here with ESP and MQTT experience could help by running my homielib example and hopefully reproduce the problems I’m seeing, I would most appreciate it. Also, if you find the homielib lib useful, have at it, of course.

1 Like

Are all of your client IDs unique? If not, when a client with the same ID as a client already connected to the broker subscribes, the broker kicks off the old client in favor of the new one. All your clients must have a unique client ID. Do not confuse the client ID with the user ID. You can reuse the same user ID everywhere.

I can’t tell if this is what you are experiencing of not but it sounds like a possibility.

What QOS are you using? If it’s zero I could see that if they come in too fast the broker might just drop them if it can’t process them fast enough. Or it could be the case that the AsyncMqttClient doesn’t have the ability to queue up the messages so if you send a new message before the last one gets published to the broker the old one gets overwritten.

It appears like it, at least in terms of openHAB and Homie.

Don’t confuse the standard Homie with implementations of the standard. The standard is actually pretty mature. The implementations of it are not as mature.

Please file an issue on the openhab2-addons as well so it can get fixed in openHAB too.

I know a lot of people are using Homie with openHAB though and this is the first report of a problem like this so it isn’t clear to me what’s unique about your use case yet. Clearly there is something going on.

It can’t but the person might. I think rossko57 is just suggesting double checking you haven’t accidentally used “home” where you should have used “homie” and vise versa.

1 Like

That’s a very good point! I actually don’t know where to look up the client ID for MQTT things after they were created. I looked around for it for a while now, in both paperui and habmin, i don’t think the option is there other than when you first create the item… it doesn’t even show it anywhere afterwards, let alone edit.
But, I stuck with the default 32-bit hex value (never edited it) and i deleted and recreated the homie mqtt things several times. They would have gotten a new random client ID every time.
Also, since the Homie MQTT items also kill the subscription on the Publish Trigger, the one that’s directly on the broker thing… I don’t think that’s it.

I tried both 1 and 2. Indeed 0 could have dropped messages enroute. I think it’s probably AsyncMqttClient has a limited buffer. It is an embedded system after all. I spent a few minutes looking for a way to control the buffer size (and came up empty handed) before I rewrote my initial publish function to work in a piecemeal manner. No delay() anywhere, obviously. :slight_smile:

That’s what I meant, I think. :slight_smile: I’ll go with that.

Done! #5786 and #5787.

Ah! Now I get it.

I don’t use Homie yet so I only have Generic Things right now. But that shouldn’t matter because the client ID I believe is set on the Broker Thing. It’s there under “Show More” for the Broker Thing config in PaperUI. The broker represents the connection to the MQTT Broker and the client ID is defined on a connection by connection basis so I wouldn’t expect it be possible to supply a client ID on a Generic Thing or Homie Thing.

The question is does the Homie Things use a different Broker Thing?

The client ID that gets generated automatically when you don’t supply one is generated by the Paho library.

Don’t confuse the client ID on the Broker Thing with the Channel ID.

When I started making my first Homie device just as a test, by manually publishing the right values, and saw it automatically pop up in the inbox, I knew it was the right way to go for me.

Ah!

They do now because that was the only workaround I found for the bug. Normally they would be using the same broker thing, of course.

I ran across the bug when they were using the same broker thing (it’s the same mosquitto broker so of course i did not create a new broker thing willy nilly). So, there was only one client ID. Obviously one should be able to have more than one subscription (to different things, mind you) on the same broker thing? :slight_smile:

Technically I was confusing the “Thing ID” with the client ID on the broker thing :wink:

The MQTT Homie part is broken in openHAB 2.5m1. It has several subscribe/resubscribe issues. Fixes are merged since a few months already, but we didn’t had a release since then.

1 Like

Yes,

It’s been too long. I remember there were Homie problems but I thought they got fixed before M1.

@leif, definitely upgrade. I’ve been running the snapshots for a week or so and I find them to be stable again. It should be safe to use. Though I can’t speak for all bindings, just the ones I use.

1 Like

HA! That explains it. That’s kind of what I was afraid of when I upgraded from 2.4. Two steps forward, one step back. Fair enough.

I’m going to stop by clonezilla on the way, though. :slight_smile:

But hey, I’ve wasted more than a day troubleshooting this… If this was already known, that’s not great. Perhaps there should be a page of known issues in a prominent place?