Milight 2.4 - V3 bridges going offline

MikeJMajor · December 26, 2018, 8:18am

Upgraded from 2.2 to 2.4 and now my event log is getting filled with entries stating that my V3 bridges have gone offline("
changed from ONLINE to OFFLINE: Bridge did not respond!").

I used to see this occasionally on 2.2, but the amount of these messages has increased considerably on 2.4.

@David_Graeff the code has changed quite a lot since last time I looked at it, but it looks to me as if the timeout for a V3 bridge has been changed from 1000ms to 300ms. Can you confirm that is the case?

David_Graeff · December 26, 2018, 10:21am

Timeouts have changed, yes. Just change the default values if necessary. My V3 bridge works, I only have problems with V6

MikeJMajor · December 26, 2018, 11:38am

How do I change the values? It looks like there’s a hardcoded socket timeout of 300ms waiting for the response of the discovery packet.

pauldsmyth · December 26, 2018, 5:47pm

I have exactly this problem and like Mike I cannot see anywhere to change this value. Can you elaborate please?

MikeJMajor · December 26, 2018, 8:52pm

@David_Graeff the following timeout in BridgeV3Handler.java is hardcoded and cannot be changed through configuration:

socket.setSoTimeout(300)

Using tcpdump on my raspberry pi 3 the time for a response to a discovery packet is often over 300ms, so this is now broken in my environment compared with previous versions which had a 1000ms timeout.

As a test I rebuilt the binding with the timeout set to 1500 and it’s been running without problems for several hours.

I’ll submit a PR with the timeout set back to 1000.

David_Graeff · December 26, 2018, 11:08pm

This is unfortunately not that easy. A lower timeout is required for sending multiple packets a second on purpose (udp makes no delivery guarantees). But the receive logic could be changed to wait for a few more packet send iterations, I guess.

MikeJMajor · December 27, 2018, 6:37am

Why do you need to send multiple discovery packets per second to a V3 bridge? What was wrong with 3 attempts with a 1000ms gap between attempts?

The line of code I’ve changed is how long it waits for the response to be received. In the same file once it receives a response it sets the timeout to the number of refresh seconds, so that same socket will be waiting for 10 seconds (default refresh time) most of the time. I can’t really see what problems that one line change could cause if it’s already ok for the socket to be waiting for 10 seconds or more.

PR

David_Graeff · December 27, 2018, 10:47am

Personally I have a V3 bridge close to the wifi access point and it reacts within 150ms. You are increasing the perceived time until the Thing comes online if the first packet got lost.

But I guess it doesn’t hurt to wait longer.

usambara · December 27, 2018, 12:27pm

I would like to test also that version of the binding. Can I download it somewhere?

Since 2.4 I have also problems with my V3 bridge, but I get another error message than you.
Status: OFFLINE Bridge did not respond or the bridge's MAC address does not match with your configuration!
Maybe it is related. Unfortunately there are no debug messages in the binding which could give me a hint.

David_Graeff · December 27, 2018, 12:58pm

There are no messages in the log, because the status tells you the reason. The text might be changed though to give a clearer picture. On the bridges IP address in your config, there is either no bridge found or a bridge is found that does not match the configured bridge ID (which is the mac address).

usambara · December 27, 2018, 2:36pm

Thanks for the hint.
I checked again the MAC-address, that’s fine.
As I understood the config itself should also be okay. With the BridgeV6 I have had no problem to get it detected.

Bridge	milight:bridgeV3:Milight1	"Milight Bridge" @ "Küche"
[
	host="192.168.44.240",
	bridgeid="ACCF23A6CE",
	passwordByte1=0, 
	passwordByte2=0
	//, PORT=8899
]
{
	//Thing	rgbLed 1	"Terrassenlicht" @ "Terrasse"
	Thing rgbLed Terrasse [ zone = "1" ]
}


Bridge	milight:bridgeV6:Milight2 "Milight Bridge 2" @ "Gartenhaus"
[
	host="192.168.44.241",
	bridgeid="F0FE6B48F6CA",
	passwordByte1=0, 
	passwordByte2=0
	//, PORT=5987
]
{
	//	Thing	rgbwwLed 1	"LED Gartenhütte" @ "Gartenhaus"
	Thing rgbwwLed Gartenhaus [ zone = "1" ]
}

I haven’t tried the things so far, as I’ll need them earliest in summer…

pauldsmyth · December 27, 2018, 11:30pm

My v3 bridge is within 6 inches of the AP and it’s totally broken. This had worked from version 1.8 through to 2.3. Now it’s totally broken. I’d like to see what testing you did on this because this is no unusable so I may have no alternative to revert to go back to 2.3 but the reason I upgraded was to move to the mqtt2.4 to solve a different problem so it appears I’m stuck between a rock and a hard place.

David_Graeff · December 28, 2018, 4:19am

I’m testing with my personal setup. And it works. But there are many clones of the V3 bridges, I have an easybulb for example. So yours could react different. To be honest the entire milight protocol is catastrophic and a patch work. No back channel and a simulated non standard, non documented session handling on V6 bridges.

pauldsmyth · December 28, 2018, 12:36pm

Well I’m now stuck in a position where I can’t confirm that this is the issue although it would appear to be. Had you made this a configuable value rather than a hard coded one I could have easily confirmed or denied. I’m having trouble with the maven build to test making the same change Mike made so I’m a little stuck right now.

LordGame · December 29, 2018, 4:40pm

I can confim that V3(milight) and V1/V2 (limitlessled) are also broken with binding in 2.4.
Usually the first command is handled properly, than the second takes seconds. Finally it goes offline for a while, no command handled on the bridge side but items got new states on the OH side, even after the bridge come online the whole thing stops working.

Sometimes pocessing a simple turn on takes 30-60 seconds (if no other command meanwhile there is a chance it will work afterward).

MikeJMajor · December 29, 2018, 4:57pm

I think there’s two issues with the 2.4 release. The first is that it doesn’t work when you have multiple bridges and the second is the timeout issue.

Can you try org.openhab.binding.milight-2.5.0-SNAPSHOT.jar

David_Graeff · December 29, 2018, 8:42pm

Hm. There is one change in 2.4 that could lead to this behaviour. The binding has a global send queue now. So every command you issue is appended to the queue. The queue works with a speed of 5 items / second if repeat is 1 (200ms delay * repeat times). So if you have a high repeat rate and a high delay time (configurable), the queue can build up.

This whole thing was done, because people reported that multiple commands at the same time for different milight bridges would interfere with each other (makes sense, all sending in the same frequency range). A global queue solves this issue. And now we have the problem of queuing up.

LordGame · December 29, 2018, 9:42pm

I have tested the 2.5.0-SNAPSHOT a bit, it looks better, but not perfect:I have got 2 bridges both set to 300ms delay between commands and send commands only once.
Most of my lamps under the V3 bridge are working except when the bridge goes down and break the state of the item without performing the action on the lamp. The next action fixes the state anyway(not ideal but works).
On the other hand the V2 bridge mostly stopped working:
I was able to perform some actions, but at a point the lamps just stopped working and the following error appeared multiple times:
2018-12-29 22:31:03.191 [WARN ] [milight.internal.protocol.QueuedSend] - Failed to send Message to ‘192.168.1.112’: Socket is closed

I am going to do more test tomorrow.

Edit: turned out that the V2 bridge needed a restart for some reason.

LordGame · December 30, 2018, 6:15pm

2018-12-30 19:06:58.306 [INFO ] [ome.model.script.rMilightBridgeState] - OFFLINE
2018-12-30 19:06:58.345 [INFO ] [ome.model.script.rMilightBridgeState] - OFF
2018-12-30 19:06:59.137 [INFO ] [nal.protocol.MilightV6SessionManager] - Confirmation received for unsend command. Sequence number: 24
2018-12-30 19:06:59.391 [INFO ] [ome.model.script.rMilightBridgeState] - ONLINE
2018-12-30 19:06:59.412 [INFO ] [ome.model.script.rMilightBridgeState] - ON
2018-12-30 19:07:14.340 [WARN ] [nal.protocol.MilightV6SessionManager] - Session timeout!
2018-12-30 19:07:14.386 [INFO ] [ome.model.script.rMilightBridgeState] - OFFLINE
2018-12-30 19:07:14.412 [INFO ] [ome.model.script.rMilightBridgeState] - OFF
2018-12-30 19:07:14.802 [INFO ] [nal.protocol.MilightV6SessionManager] - Confirmation received for unsend command. Sequence number: 24
2018-12-30 19:07:15.021 [INFO ] [ome.model.script.rMilightBridgeState] - ONLINE
2018-12-30 19:07:15.041 [INFO ] [ome.model.script.rMilightBridgeState] - ON
2018-12-30 19:07:24.974 [WARN ] [nal.protocol.MilightV6SessionManager] - Session timeout!
2018-12-30 19:07:25.047 [INFO ] [ome.model.script.rMilightBridgeState] - OFFLINE
2018-12-30 19:07:25.061 [INFO ] [ome.model.script.rMilightBridgeState] - OFF
2018-12-30 19:07:25.219 [INFO ] [nal.protocol.MilightV6SessionManager] - Confirmation received for unsend command. Sequence number: 24
2018-12-30 19:07:25.423 [INFO ] [ome.model.script.rMilightBridgeState] - ONLINE
2018-12-30 19:07:25.446 [INFO ] [ome.model.script.rMilightBridgeState] - ON
2018-12-30 19:07:40.424 [WARN ] [nal.protocol.MilightV6SessionManager] - Session timeout!
2018-12-30 19:07:40.477 [INFO ] [ome.model.script.rMilightBridgeState] - OFFLINE
2018-12-30 19:07:40.496 [INFO ] [ome.model.script.rMilightBridgeState] - OFF
2018-12-30 19:07:40.991 [INFO ] [nal.protocol.MilightV6SessionManager] - Confirmation received for unsend command. Sequence number: 24
2018-12-30 19:07:41.005 [INFO ] [nal.protocol.MilightV6SessionManager] - Confirmation received for unsend command. Sequence number: 24
2018-12-30 19:07:41.200 [INFO ] [ome.model.script.rMilightBridgeState] - ONLINE
2018-12-30 19:07:41.216 [INFO ] [ome.model.script.rMilightBridgeState] - ON
2018-12-30 19:07:56.790 [WARN ] [nal.protocol.MilightV6SessionManager] - Session timeout!
2018-12-30 19:07:56.855 [INFO ] [ome.model.script.rMilightBridgeState] - OFFLINE
2018-12-30 19:07:56.874 [INFO ] [ome.model.script.rMilightBridgeState] - OFF
2018-12-30 19:07:57.614 [INFO ] [nal.protocol.MilightV6SessionManager] - Confirmation received for unsend command. Sequence number: 24
2018-12-30 19:07:57.856 [INFO ] [ome.model.script.rMilightBridgeState] - ONLINE
2018-12-30 19:07:57.877 [INFO ] [ome.model.script.rMilightBridgeState] - ON
2018-12-30 19:08:08.140 [WARN ] [nal.protocol.MilightV6SessionManager] - Session timeout!
2018-12-30 19:08:08.225 [INFO ] [ome.model.script.rMilightBridgeState] - OFFLINE
2018-12-30 19:08:08.248 [INFO ] [ome.model.script.rMilightBridgeState] - OFF
2018-12-30 19:08:08.706 [INFO ] [nal.protocol.MilightV6SessionManager] - Confirmation received for unsend command. Sequence number: 24
2018-12-30 19:08:08.918 [INFO ] [ome.model.script.rMilightBridgeState] - ONLINE
2018-12-30 19:08:08.947 [INFO ] [ome.model.script.rMilightBridgeState] - ON
2018-12-30 19:08:35.370 [WARN ] [nal.protocol.MilightV6SessionManager] - Session timeout!
2018-12-30 19:08:35.450 [INFO ] [ome.model.script.rMilightBridgeState] - OFFLINE
2018-12-30 19:08:35.470 [INFO ] [ome.model.script.rMilightBridgeState] - OFF
2018-12-30 19:08:35.643 [INFO ] [nal.protocol.MilightV6SessionManager] - Confirmation received for unsend command. Sequence number: 24
2018-12-30 19:08:35.854 [INFO ] [ome.model.script.rMilightBridgeState] - ONLINE
2018-12-30 19:08:35.872 [INFO ] [ome.model.script.rMilightBridgeState] - ON
2018-12-30 19:08:51.366 [WARN ] [nal.protocol.MilightV6SessionManager] - Session timeout!
2018-12-30 19:08:51.443 [INFO ] [ome.model.script.rMilightBridgeState] - OFFLINE
2018-12-30 19:08:51.459 [INFO ] [ome.model.script.rMilightBridgeState] - OFF
2018-12-30 19:08:52.176 [INFO ] [nal.protocol.MilightV6SessionManager] - Confirmation received for unsend command. Sequence number: 280
2018-12-30 19:08:52.224 [INFO ] [nal.protocol.MilightV6SessionManager] - Confirmation received for unsend command. Sequence number: 24
2018-12-30 19:08:52.303 [INFO ] [ome.model.script.rMilightBridgeState] - ONLINE
2018-12-30 19:08:52.321 [INFO ] [ome.model.script.rMilightBridgeState] - ON
2018-12-30 19:09:02.286 [WARN ] [nal.protocol.MilightV6SessionManager] - Session timeout!
2018-12-30 19:09:02.350 [INFO ] [ome.model.script.rMilightBridgeState] - OFFLINE
2018-12-30 19:09:02.379 [INFO ] [ome.model.script.rMilightBridgeState] - OFF
2018-12-30 19:09:02.607 [INFO ] [nal.protocol.MilightV6SessionManager] - Confirmation received for unsend command. Sequence number: 24
2018-12-30 19:09:02.829 [INFO ] [ome.model.script.rMilightBridgeState] - ONLINE
2018-12-30 19:09:02.851 [INFO ] [ome.model.script.rMilightBridgeState] - ON

Here is a snippet of the milight related log entries. It works really unstable, goes online and offline frequently. Success depends on when the command was submitted and when it fails for any reason it is very hard to correct, mostly no response at all.

I’m going to try checking the state of the bridge on the rule level to avoid these issues, but it is a really painfull workaround.

What is thie “Confirmation received for unsend command. Sequence number” line?

David_Graeff · December 30, 2018, 6:34pm

The milight V6 protocol is not documented, only reverse engineered.

Every command gets a sequence number and a confirmation is send by the bridge. Except if the send command is not expected and the session is closed. The current implementation is absolutely unstable, I agree. But so far there exists no working, stable solution out in the internet and you will find some question marks in the code. You are invited to help of course.