Upgraded from 2.2 to 2.4 and now my event log is getting filled with entries stating that my V3 bridges have gone offline("
changed from ONLINE to OFFLINE: Bridge did not respond!").
I used to see this occasionally on 2.2, but the amount of these messages has increased considerably on 2.4.
@David_Graeff the code has changed quite a lot since last time I looked at it, but it looks to me as if the timeout for a V3 bridge has been changed from 1000ms to 300ms. Can you confirm that is the case?
@David_Graeff the following timeout in BridgeV3Handler.java is hardcoded and cannot be changed through configuration:
Using tcpdump on my raspberry pi 3 the time for a response to a discovery packet is often over 300ms, so this is now broken in my environment compared with previous versions which had a 1000ms timeout.
As a test I rebuilt the binding with the timeout set to 1500 and it’s been running without problems for several hours.
I’ll submit a PR with the timeout set back to 1000.
This is unfortunately not that easy. A lower timeout is required for sending multiple packets a second on purpose (udp makes no delivery guarantees). But the receive logic could be changed to wait for a few more packet send iterations, I guess.
Why do you need to send multiple discovery packets per second to a V3 bridge? What was wrong with 3 attempts with a 1000ms gap between attempts?
The line of code I’ve changed is how long it waits for the response to be received. In the same file once it receives a response it sets the timeout to the number of refresh seconds, so that same socket will be waiting for 10 seconds (default refresh time) most of the time. I can’t really see what problems that one line change could cause if it’s already ok for the socket to be waiting for 10 seconds or more.
I would like to test also that version of the binding. Can I download it somewhere?
Since 2.4 I have also problems with my V3 bridge, but I get another error message than you. Status: OFFLINE Bridge did not respond or the bridge's MAC address does not match with your configuration!
Maybe it is related. Unfortunately there are no debug messages in the binding which could give me a hint.
There are no messages in the log, because the status tells you the reason. The text might be changed though to give a clearer picture. On the bridges IP address in your config, there is either no bridge found or a bridge is found that does not match the configured bridge ID (which is the mac address).
My v3 bridge is within 6 inches of the AP and it’s totally broken. This had worked from version 1.8 through to 2.3. Now it’s totally broken. I’d like to see what testing you did on this because this is no unusable so I may have no alternative to revert to go back to 2.3 but the reason I upgraded was to move to the mqtt2.4 to solve a different problem so it appears I’m stuck between a rock and a hard place.
I’m testing with my personal setup. And it works. But there are many clones of the V3 bridges, I have an easybulb for example. So yours could react different. To be honest the entire milight protocol is catastrophic and a patch work. No back channel and a simulated non standard, non documented session handling on V6 bridges.
Well I’m now stuck in a position where I can’t confirm that this is the issue although it would appear to be. Had you made this a configuable value rather than a hard coded one I could have easily confirmed or denied. I’m having trouble with the maven build to test making the same change Mike made so I’m a little stuck right now.
I can confim that V3(milight) and V1/V2 (limitlessled) are also broken with binding in 2.4.
Usually the first command is handled properly, than the second takes seconds. Finally it goes offline for a while, no command handled on the bridge side but items got new states on the OH side, even after the bridge come online the whole thing stops working.
Sometimes pocessing a simple turn on takes 30-60 seconds (if no other command meanwhile there is a chance it will work afterward).
Hm. There is one change in 2.4 that could lead to this behaviour. The binding has a global send queue now. So every command you issue is appended to the queue. The queue works with a speed of 5 items / second if repeat is 1 (200ms delay * repeat times). So if you have a high repeat rate and a high delay time (configurable), the queue can build up.
This whole thing was done, because people reported that multiple commands at the same time for different milight bridges would interfere with each other (makes sense, all sending in the same frequency range). A global queue solves this issue. And now we have the problem of queuing up.
I have tested the 2.5.0-SNAPSHOT a bit, it looks better, but not perfect:I have got 2 bridges both set to 300ms delay between commands and send commands only once.
Most of my lamps under the V3 bridge are working except when the bridge goes down and break the state of the item without performing the action on the lamp. The next action fixes the state anyway(not ideal but works).
On the other hand the V2 bridge mostly stopped working:
I was able to perform some actions, but at a point the lamps just stopped working and the following error appeared multiple times: 2018-12-29 22:31:03.191 [WARN ] [milight.internal.protocol.QueuedSend] - Failed to send Message to ‘192.168.1.112’: Socket is closed
I am going to do more test tomorrow.
Edit: turned out that the V2 bridge needed a restart for some reason.
Here is a snippet of the milight related log entries. It works really unstable, goes online and offline frequently. Success depends on when the command was submitted and when it fails for any reason it is very hard to correct, mostly no response at all.
I’m going to try checking the state of the bridge on the rule level to avoid these issues, but it is a really painfull workaround.
What is thie “Confirmation received for unsend command. Sequence number” line?
The milight V6 protocol is not documented, only reverse engineered.
Every command gets a sequence number and a confirmation is send by the bridge. Except if the send command is not expected and the session is closed. The current implementation is absolutely unstable, I agree. But so far there exists no working, stable solution out in the internet and you will find some question marks in the code. You are invited to help of course.