MQTT 2.5 Binding Strange Behavior: Item goes ON then immediately to UNDEF

Issue is created:

I have a work around (Rule) so this isn’t leaving me without options, but it does appear to be a bug. I’m sure I won’t be the last to see it.

I’m pretty sure it’s the binding. The MQTT2 binding does set Items to UNDEF under a number of circumstances. There is actually an issue right now where the binding ends up overwriting values restoredOnStartup with UNDEF.

I see the same behavior with both.

Yes, most of the Item types are supported. You define the Channel type when you create it.

That could be but if that were the case I would have expected it to reject the message in the first place rather than successfully updating/commanding the Item to ON, showing that it received the message and the transform successfully applied, only to then immediately change it to UNDEF.

Yes, I actually do have separate Channel that is of type Text to show the uptime on my sitemap. The Item linked to this Channel is not seeing any problems.

Though that does make me think it is worth iterating the differences of this Channel/Item from a more typical case:

  • Message is a String converted to ON/OFF using a MAP transform
  • Item is also bound to the Expire binding
  • There are two Channels subscribed to the same topic but only the one with the transform and ON/OFF type Channel is exhibiting this behavior.

Hello,

I would like to ask you how to write a rule that will restore my switches states after they are set to UDEF after restart.
I am running openHAB 2.5M2 on openHABian. I have few ON/OFF switches that states are stored in mapdb (which is default persistence, I also have influxDB)
Situation is exactly the same as in that thread and https://github.com/openhab/openhab2-addons/issues/4616
After restart states of swithes are restored from mapdb and then changed to UNDEF.

I would like to restore them back as it is in mapdb.

example:

rule "Switch 01"
when
        Item Switch_01 changed to UNDEF
then
        Switch_01 = restore previous state from mapdb
end

If anyone can help I would be appreciated.

Thanks,
Kris

See also
https://github.com/openhab/openhab2-addons/issues/5879

The unresolved debate boils down to : should a message based binding force a linked Item to UNDEF state when no active message event exists.

In the restore-on-startup context, the binding makes the UNDEF when it first subscribes to broker and (of course) there is no message at that precise instant.
Unfortunately, it subscribes after any restore-on-startup has happened, so any restored value gets trampled.

You could trigger a rule from changed to UNDEF, but that would also act on and “hide” any real failures later. Maybe you could ensure that it runs one time only.

An alternative is a simple delay after startup, perform a “manual restore” after you think the binding has finished setting up subscriptions.
You could reduce the risk of the restore trampling over genuine freshly arrived (or broker retained) data by testing for UNDEF before acting.

FWIW I think this a kludge and that the binding is misbehaving, but you need to deal with how it works today.

I agree with @rossko57. It a real PITA.
I have dealt with it the very first day the binding went live.
I did it this way.

Any item using the MQTTv2 binding is added to a group called MQTTv2

Persistence used is mapdb (best for restore on start up)

// mapdb persistence

Strategies {
	default = everyChange
}
Items {
	MQTTv2* : strategy = everyChange, restoreOnStartup
}

Then I have this rule:

rule "Update MQTT2 Items"
when
    System started //or
    //Item TestSwitch received command ON
then
    createTimer(now.plusSeconds(120), [ | // you may need to experiment to find the optimum wait time
        MQTTv2.members.filter[ i | i.state == UNDEF ].forEach[ ii |
            //logInfo("ITEM: ", ii.name.toString)
            ii.postUpdate(ii.previousState(false,"mapdb").state.toString)
            Thread::sleep(150)
        ]
    ])
end

The rule waits for 2 minutes and then restores any item of the group that is UNDEF
You may need to adjust the timer duration. 2 minutes work for me but I could make it shorter.

Thanks to @rlkoshak for the code hints.
The Thread::sleep(150) is to give the system enough time to retrieve the value from presistence and then update the item before doing the next one.

Thanks man. Your rule seems to work for me.
But after all MQTT switches are set back from UNDEF there is stack trace:

2019-08-23 14:59:07.107 [ERROR] [org.quartz.core.JobRunShell         ] - Job DEFAULT.2019-08-23T14:59:04.995+02:00: Proxy for org.eclipse.xtext.xbase.lib.Procedures$Procedure0: [ | {
  <XMemberFeatureCallImplCustom>.forEach(<XClosureImplCustom>)
  HistoricState
} ] threw an unhandled Exception: 
java.lang.reflect.UndeclaredThrowableException: null
	at com.sun.proxy.$Proxy215.apply(Unknown Source) ~[?:?]
	at org.eclipse.smarthome.model.script.internal.actions.TimerExecutionJob.execute(TimerExecutionJob.java:49) ~[?:?]
	at org.quartz.core.JobRunShell.run(JobRunShell.java:202) [181:org.openhab.core.scheduler:2.5.0.M2]
	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573) [181:org.openhab.core.scheduler:2.5.0.M2]
Caused by: org.eclipse.smarthome.model.script.engine.ScriptExecutionException: The name 'HistoricState' cannot be resolved to an item or type; line 61, column 10, length 13
	at org.eclipse.smarthome.model.script.interpreter.ScriptInterpreter.invokeFeature(ScriptInterpreter.java:141) ~[?:?]
	at org.eclipse.xtext.xbase.interpreter.impl.XbaseInterpreter._doEvaluate(XbaseInterpreter.java:902) ~[?:?]
	at org.eclipse.xtext.xbase.interpreter.impl.XbaseInterpreter._doEvaluate(XbaseInterpreter.java:865) ~[?:?]
	at org.eclipse.xtext.xbase.interpreter.impl.XbaseInterpreter.doEvaluate(XbaseInterpreter.java:224) ~[?:?]
	at org.eclipse.smarthome.model.script.interpreter.ScriptInterpreter.doEvaluate(ScriptInterpreter.java:226) ~[?:?]
	at org.eclipse.xtext.xbase.interpreter.impl.XbaseInterpreter.internalEvaluate(XbaseInterpreter.java:204) ~[?:?]
	at org.eclipse.xtext.xbase.interpreter.impl.XbaseInterpreter._doEvaluate(XbaseInterpreter.java:447) ~[?:?]
	at org.eclipse.xtext.xbase.interpreter.impl.XbaseInterpreter.doEvaluate(XbaseInterpreter.java:228) ~[?:?]
	at org.eclipse.smarthome.model.script.interpreter.ScriptInterpreter.doEvaluate(ScriptInterpreter.java:226) ~[?:?]
	at org.eclipse.xtext.xbase.interpreter.impl.XbaseInterpreter.internalEvaluate(XbaseInterpreter.java:204) ~[?:?]
	at org.eclipse.xtext.xbase.interpreter.impl.XbaseInterpreter.evaluate(XbaseInterpreter.java:190) ~[?:?]
	at org.eclipse.xtext.xbase.interpreter.impl.ClosureInvocationHandler.doInvoke(ClosureInvocationHandler.java:46) ~[?:?]
	at org.eclipse.xtext.xbase.interpreter.impl.AbstractClosureInvocationHandler.invoke(AbstractClosureInvocationHandler.java:29) ~[?:?]
	... 4 more
2019-08-23 14:59:07.195 [ERROR] [org.quartz.core.ErrorLogger         ] - Job (DEFAULT.2019-08-23T14:59:04.995+02:00: Proxy for org.eclipse.xtext.xbase.lib.Procedures$Procedure0: [ | {
  <XMemberFeatureCallImplCustom>.forEach(<XClosureImplCustom>)
  HistoricState
} ] threw an exception.
org.quartz.SchedulerException: Job threw an unhandled exception.
	at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [181:org.openhab.core.scheduler:2.5.0.M2]
	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573) [181:org.openhab.core.scheduler:2.5.0.M2]
Caused by: java.lang.reflect.UndeclaredThrowableException
	at com.sun.proxy.$Proxy215.apply(Unknown Source) ~[?:?]
	at org.eclipse.smarthome.model.script.internal.actions.TimerExecutionJob.execute(TimerExecutionJob.java:49) ~[?:?]
	at org.quartz.core.JobRunShell.run(JobRunShell.java:202) ~[?:?]
	... 1 more
Caused by: org.eclipse.smarthome.model.script.engine.ScriptExecutionException: The name 'HistoricState' cannot be resolved to an item or type; line 61, column 10, length 13
	at org.eclipse.smarthome.model.script.interpreter.ScriptInterpreter.invokeFeature(ScriptInterpreter.java:141) ~[?:?]
	at org.eclipse.xtext.xbase.interpreter.impl.XbaseInterpreter._doEvaluate(XbaseInterpreter.java:902) ~[?:?]
	at org.eclipse.xtext.xbase.interpreter.impl.XbaseInterpreter._doEvaluate(XbaseInterpreter.java:865) ~[?:?]
	at org.eclipse.xtext.xbase.interpreter.impl.XbaseInterpreter.doEvaluate(XbaseInterpreter.java:224) ~[?:?]
	at org.eclipse.smarthome.model.script.interpreter.ScriptInterpreter.doEvaluate(ScriptInterpreter.java:226) ~[?:?]
	at org.eclipse.xtext.xbase.interpreter.impl.XbaseInterpreter.internalEvaluate(XbaseInterpreter.java:204) ~[?:?]
	at org.eclipse.xtext.xbase.interpreter.impl.XbaseInterpreter._doEvaluate(XbaseInterpreter.java:447) ~[?:?]
	at org.eclipse.xtext.xbase.interpreter.impl.XbaseInterpreter.doEvaluate(XbaseInterpreter.java:228) ~[?:?]
	at org.eclipse.smarthome.model.script.interpreter.ScriptInterpreter.doEvaluate(ScriptInterpreter.java:226) ~[?:?]
	at org.eclipse.xtext.xbase.interpreter.impl.XbaseInterpreter.internalEvaluate(XbaseInterpreter.java:204) ~[?:?]
	at org.eclipse.xtext.xbase.interpreter.impl.XbaseInterpreter.evaluate(XbaseInterpreter.java:190) ~[?:?]
	at org.eclipse.xtext.xbase.interpreter.impl.ClosureInvocationHandler.doInvoke(ClosureInvocationHandler.java:46) ~[?:?]
	at org.eclipse.xtext.xbase.interpreter.impl.AbstractClosureInvocationHandler.invoke(AbstractClosureInvocationHandler.java:29) ~[?:?]
	at com.sun.proxy.$Proxy215.apply(Unknown Source) ~[?:?]
	at org.eclipse.smarthome.model.script.internal.actions.TimerExecutionJob.execute(TimerExecutionJob.java:49) ~[?:?]
	at org.quartz.core.JobRunShell.run(JobRunShell.java:202) ~[?:?]
	... 1 more

which is completly black magic for me :slight_smile:

Should I be worried? Maybe someone knows how to deal with it?

I’m not completely sure I agree. But the issue is on whose responsibility should it be to decide whether a message is ephemeral or persistent.

As I’ve delved into MQTT deeply they take a very strong position that it is the responsibility of the message publisher to make that decision. If the message represents a state, they should publish that message with the retained flag set. For example, the current state of a light bulb should be published with the retained bit because whether or not any subscribers are connected to the broker at the time the light bulb went on, they will know the state of the bulb when they do connect. If the state is ephemeral (e.g. a button press event) the retained flag should be false because the message only has meaning at the time it is sent.

Given this is the position that the MQTT protocol itself takes, it is reasonable for the MQTT binding to set Items to UNDEF when it connects and there is no retained message present. At that point in time the state of the Item is not defined.

Where this runs into problems is, as you identify, the order of operations. restoreOnStartup happens before the binding connects and realizes the Items are UNDEF. OH’s model is the opposite of MQTT’s when using restoreOnStartup. So should the binding follow the technology’s model or should it follow OH’s model? I’m not so sure the answer isn’t “both” and let us have a flag to control the behavior.

Anyway, the “proper” MQTT solution would be to modify the sender of the MQTT publisher to use retained messages for those topics and disable restoreOnStartup. I realize that sometimes that isn’t possible in which case the work arounds presented here are the best available options.

I think there is a typo in the code. @vzorglub (welcome back btw!) what is that “HistoricState” thing doing there?

Well, as I point out in the github issue; the normal state of affairs for MQTT is -

Message arrives, update Item. Hurrah.
|
Nothing happens
|
New message, new Item update.

Importantly, “nothing happens” does not involve setting the Item to UNDEF simply because we have no up to date message. The duration of “nothing happens” can be days.
This is all a Good Thing.

Now let’s reboot. Subscribing to broker is always in the “nothing happens” phase. We have no up to date message. So what? That is the case 99.9% of the time. It’s not special just because we rebooted in betwèen.

As for our state based Item, we can choose to allow an rebooted Item to stay NULL, until a message does arrive. Or we can choose to restore a last known good value - that is the same end result as “nothing happens”. That’s to say, the user can manage reboot conditions without the bindings help.

Or rather, we could chòose to restore if this binding allowed us. It’s not about the MQTT protocol, it’s about the channel. Other bindings that listen for or poll message events do not behave like this. The whole idea of bindings is to isolate our abstract model (Items) from the vagaries of different protocols.

Yes I know about broker retain. It’s not relevant to events above, except as a partial external circumvention.

Copy and paste error.
Thanks!

There was a typo. I removed it from the rule above

If MQTT is implement properly by the publisher of that message, the message would have been retained. Then OH could disconnect and reconnect to the broker 1000 times in that time between messages and always get the devices current state. It’s because the device is implemented incorrectly that there is a problem.

Exactly, which is why if the message represents a state, it should have the retained flag set. That’s the whole peptide if the retained flag. Relying on restoreOnStartup in OH in this case is strictly a work around to deal with an inventory implemented device.

Only if 99% of the time you are dealing with improperly implemented MQTT. If you do have control over the devices publishing the messages, the best solution is to use MQTT as it is intended to be used and set the retained flag. If not, it would be good to file an issue with who ever does implement the device. “Breaking” the binding, which is behaving correctly from the MQTT perspective, because people are afraid if the retained flag is, imo, a regression.

Or rely on the retained flag which will update the Item to the hat reported state, not just the last state that the Item was in before OH went offline. Really, in all respects, retained messages are superior to restoreOnStartup because they work to update your Items to the correct current state even when OH is offline.

For example:

  1. OH guess offline
  2. Device published a new state
  3. OH comes back online.

If only relying on restoreOnStartup, OH will set the Item to the incorrect state and it will remain in that incorrect state until the device publishes a new message. But with retained messages OH will set the Item to the current and correct state.

In short, with retained messages, restoreOnStartup is unnecessary.

Again, it’s a philosophical question, but is it the binding’s responsibility to correct these erroneous implantations of MQTT messaging? I believe all the commonly used firmwares all for the proper implantation of MQTT. For custom code, I posit it’s the coffee developers responsibility to change their esp/Arduino/python code to do it properly.

Even if the binding were “broken” to better support restoreOnStartup, retained messages would remain a superior solution to the problem.

It’s totally relevant because if the the device is publishing a state and not using the retained bit, it’s wrong. If implemented correctly, the scenario you present wouldn’t occur.

Fine, retain mostly works, if you have that option. People do look at MQTT messages from devices and services they have no control over.

Why is it a good idea to set UNDEF when that option is not available? Because that prevents us managing it cleanly. What special advantage does that bring, what additional info do we learn?

If it were appropriate that unretained MQTT messages must not be treated as states, it follows that the binding should immediately change every connected Item to UNDEF immediately after any update.

All that’s needed here is - binding has nothing to say, don’t say anything.

It is broken, one way or the other. It only updates states to UNDEF at startup. It does not update unretained states to UNDEF between messages. These behaviours are not consistent.

We learn that we have no idea what state the device is in, which is the correct behavior according to the openHAB binding specification. Is it better to know you cannot trust the state if an item or just blindly assume that it didn’t change state in the time that OH was down? I’d argue it’s better to know that we don’t know the state than to just blindly restore it.

But in cases where you really must restore it to support, there is a rule above that can be used as a work around.

But how often do people really encounter devices that are incorrectly implemented really? Tasmota let’s you set the flag. (I’m pretty sure, my Tasmota devices are offline) ESP Easy let’s you set the flag (sadly only globally but it would be rare to have an esp ready device that publishes ephemeral messages). Shelly’s firmware appears to use retained properly. If you are coding the firmware yourself it’s your own fault if you implement it wrong. If you copied some code from the internet with understanding it then same. What devices in common use have I missed that doesn’t support the retained flag? As far as I can tell, the incidence of devices that can’t be configured or changed to do it right is rare.

Convince me otherwise. What commonly used MQTT devices it script is complicated used to publish state messages that fails to use the retained flag properly?

Aren’t rules supposed to serve this function? To handle those cases where something word and not-standard is going on?

I’m not convinced that would be incorrect behavior honestly. I don’t know if the binding knows whether a message is retained it not though. But it sends to be reasonable for the binding to assume that all subscription topics in Generic Things are states and they use retained (ephemeral topics would have trigger Channels on the broker Thing). That’s what the publisher to those topics are supposed to do.

Except that the standard behavior expected from bindings and theoretically enforced by the reviewers of bindings is if there binding doesn’t know what state the device is in it should set the Item to UNDEF. If we assume that the binding doesn’t know whether a message it received is retained it not, it’s reasonable for it to treat any message received on a state topic (by definition a topic subscribed to from a Generic Thing) as the Item’s state. It follows that if the state topic is incorrectly implemented and isn’t using the retained flag, then when the binding starts up it doesn’t know what state the device is in and UBDEF is the proper state per openHAB standards.

From my perspective, I’d rather know that I don’t know what state the Items are than use some stale state that may or may not have anything to do with the device’s actual state or even the device’s last reported state.

I already know that. We are talking about system startup, the Item is NULL. I gain nothing from it updating to UNDEF. Or it is not NULL because I have chosen to restore it, pending a useful update … but the binding tramples over my choice.

So far as I know, this is the only binding that behaves in this way. It may be that they are all wrong.

The actual reason is that it is not documented in the developer guidelines. A binding is free to UNDEF all its channels on startup.

I’m coming from an embedded background and I’d be mad not to put everything into a known state before starting of. That’s why my bindings initialize their channels with what is known after the first device fetch (in this case after a broker connection was established and some time passed).

I have also commented on the github issue.

Cheers, David

The difficulty is that this particular means of initializing is causing an “event” of significance to other parts of the system (the UNDEF update).

May we start from first principles?

To begin, can it ever be legitimate to link an unretained MQTT message to an Item state?

I’m hoping yes. In which case, what should the channel do when no MQTT message is being processed?

Let’s agree on normal usage before considering startup conditions.

I argue the answer is no. I could be convinced that it might be legitimate in cases where you have a device that publishes a new value periodically (e.g. a temperature sensor that reports once every five minutes) as that is an example that has been sited here and there as a case where retained need not be used. But even this violates some of the MQTT design principals.

In all other cases, it is not legitimate to link it to an Item’s state.

But still I ask, who does this wrong? What firmware or third party software or script is not using the retained flag properly? Are we fixing an actual problem or are we trying to accommodate users who do it wrong out of ignorance or misunderstanding. All of the common third party systems I have experience with either do it right be default or allow users to configure them right.

I’m uncomfortable pushing a less good solution to a problem of getting an Item’s state at startup as the default approach. Especially when the better solution is only rarely if ever unavailable.

But people feel more strongly about this than I do so I’ll let it be.

What worries me about that line of argument - I couldn’t care what the MQTT implementation details are, or if you’ve chosen a broker that persists retain across reboot or not, etc.
If you can’t or won’t retain, that’s fine, there is no data at boot time. This is not unusual. If no more data ever comes, that’s fine.

The binding is supposed to hide all that from my idealized openHAB Item. This one doesn’t - it reaches across the channel to inform me that nothing happened, and in the doing of that it can destroy other data. Sometimes.

People have run into practical problems with this. It needn’t happen at all. There is zero benefit to it happening. (either I already know there is no new data, because my Item is NULL, or I don’t care because I have restored it or got data from another channel).

Please check the github issue. I have already agreed not to push UNDEF. I still don’t think it’s wrong, but openHAB does this already anyway. A state channel can never have “no state” and the framework initializes with the UNDEF state. And it does so way earlier than the binding could do so, and therefore the restoreOnStartup hack works.

Only mqtt channels bound to retained topics, where the topic value is removed (set to an empty string) will be set to UNDEF.

If we want to follow the mqtt idea to the end, as argued by Rich, all channels bound to non retained topics would not return any value if the framework asks for the state.

Cheers, David

Out of curiosity, does that mean that if we subscribe to a retained topic but the broker does not (yet) have available info, a linked Item gets an UNDEF update? Or is that a nonsense idea, as the broker does not know that a topic is retained unless it does have old data?

No, not going to argue about that :crazy_face: just point out for future reference, the integrated broker (rebooting at the same time) has optional persistence.

Only messages are retained. Retained is not a property of the topics themselves. There would have to be a retained message there in the first place. This scenario could never apply because if there is not yet a message there is nothing there to indicate retained.