ZW096 Won't Complete Initialization

Just a quick update. Aeotec tested the device with the same firmware as yours and it worked ok with their test software. I’ve tested the similar device (same device, different firmware as I’m in the UK) with the binding, and it also works fine.

Aeotec are going to do some more tests - probably next week…

Oh no, now I’m worried it has something to do with my setup :worried: Do you still think it’s a firmware issue or could it be something on my end? I was about to ask for help in the dev binding thread for a different device, but now I’m thinking the issues might be related. Along with that switch I bought a Nodon soft remote and it will include ok but takes a very very long time to finish discovery, even after being woken up multiple times. I’ve only been able to get it to add successfully once or twice but I’ve had to exclude and reinclude it because it seems to stop sending its scene commands after a few hours. I’ve also seen entries in the debug logs about timeouts and retrying communications (don’t have the logs in front of me right now so can’t quote exactly). This is strange because the other 5 devices I have never give me any problems.

I don’t see how it can be related to anything in your setup. The binding is sending the correct information, the device is ack’ing this request, but then not responding. The data the binding sends is also correct (ie the same as the log that Aeotec engineer sent me). So, my only thought at the moment is there might be a problem with your device.

I don’t think this is the issue here either. In the log I looked at, the device ack’d really quickly (30mS or so I think) and this happened every time. So I think the device is receiving ok, and the controller receives the ack ok.

If you want to post a full debug log, I can take a look to see if there’s an issue…

Attached is the log that shows one of the failed inclusions. I currently have it added to my network via the Zensys tool. It ran fine for the first 24 hours (no messages in Habmin), and it still works as I’m typing this, but Habmin says “Node Initializing INIT_NEIGHBORS”. This message, or something similar, is usually the precursor to the remote no longer working.

I’m not sure what this log is trying to show. It’s very short and doesn’t really have much information in it. It shows an inclusion, with no devices added (the inclusion failed), then just shows a bunch of data received from different devices (nothing transmitted from the binding).

Previously we’ve not been looking at failed inclusions, so I’m not sure where this fits in and it seems totally unrelated?

Sorry about that. I’m struggling to reproduce this issue in a consistent way. I was hoping that log had something useful in it because that’s what has happened 4 out of the 8 times I’ve tried to add the remote. I’m not really sure what to give you so I might need a little guidance on next steps. There are two issues - 1. inclusion frequently fails with this device via the binding 2. when inclusion does succeed or when the device is included outside the binding, it takes hours to complete discovery and it eventually stops working after about 24 hours. If either of those issues are not related to what this thread was originally about, I can start a new one, I’m making an assumption that the issues with the remote might be similar to the issues with the switch. I’ve turned down those chatty devices (been working on a graphing project) so that the logs will be clearer. How should I proceed in troubleshooting? Would deleting node 19’s xml file (the remote added via Zensys and sitting at “INIT_NEIGHBORS”) and restarting give you something useful?

Inclusion really has nothing to do with the binding - it is solely handled by the stick. If this is failing then I’m not sure there’s anything that can be done other than to try again. All the binding does is to send the command to the controller to put it into inclusion mode - after that it waits for the success, or failure, notification.

The failure of the device to respond to the message we discussed originally will prevent it from initialising correctly. This is probably the error that you have, although if you have logs with a different error then I’m happy tor take a look. I would strongly suggest to look at the log yourself using the online viewer on my website first though.

Ok, inclusion being unrelated is good to know, one less thing for me to focus on. Your log viewer is great by the way, I’ve been using it but since I don’t have much experience with broken devices it’s hard to know what “broken” actually looks like. There wouldn’t happen to be any documentation I could reference on what some of those binding log entries mean? I want to do as much self-help as I can.

I guess the next steps would be to confirm that the remote fails to reply (init_neighbors issue) and then contact Nodon support?

Sorry - my bad. I missed the fact that we had moved off the ZW096 and on to the Nodon…

Probably this device needs to be woken up - I’m not sure how that is done, but INIT_NEIGHBORS is the first thing that happens during the initialisation process, so this probably indicates that the device isn’t being seen. Presumably the device has a button you can press to wake it up?

Its actually really easy to wake up…press any of the 4 buttons :slight_smile:. The similarity to the ZW096 is that once the device is included it can take over 2 hours to finish discovery with me waking it up randomly about every 15 to 30 minutes. If discovery finishes, it will work for about 24 hours and then slip back into discovery again which is where it is now with init_neighbors. Fortunately this time it’s still working, but during previous attempts OpenHAB has stopped receiving scene commands when it enters this state. Mostly what I’m hoping you can help me with is to confirm that this isn’t a systemic issue and that both devices are simply misbehaving on their own. The attached log is me switching on a light while the remote is in the state init_neighbors. Even after the node goes back to sleep init_neighbors remains in Habmin.

Which node am I looking for? There are a couple of nodes sending wakeup messages. One of them is also apparently a listening node, so something might be wrong there (that’s node 10).

In either case, both nodes are already completed initialisation, so everything looks fine in this log.

I don’t see this in this log - the two devices sending wakeup messages are finished initialisation. Also, why does it go back to init_neighbours - are you restarting the binding? I think that’s the only reason it would go back to the start of initialisation - or maybe there’s another way that I can’t think of at the moment…

The other possibility is that HABmin is not displaying this correctly, but that seems unlikely I think as HABmin is just getting the information from the binding, and the binding is clearly stating the initialisation is DONE -:

NODE 10: Application Command Request (ALIVE:DONE)

At this point I really need to thank you for helping me with this. I haven’t had to troubleshoot devices before so this has been a learning experience.

Node 19 is the Nodon. Node 10 is a ZW100 running on USB power that has never been added on battery.

That’s exactly the problem, shouldn’t there be a flurry of discovery activity after the device wakes up if it’s truly waiting for discovery to finish? I did restart the binding at one point or another during my troubleshooting. There is an XML for node 19 so it should finish discovery easily which is why I don’t understand why its been sitting on init_neighbors. My wife has been using it constantly too so its had plenty of wake ups.

I think its safe to say that the Nodon and the Aeotec issues are unrelated. Given that I’ve had several days now of stable behavior I’m ok with ignoring the init_neighbors message and seeing if it resolves itself in future builds. I hate to give up but this issue has been so intermittent.

Ok, that makes sense then.

I thought the problem was the initialisation was stuck getting the neighbours?

Ummm - yes, but once it’s finished discovery, then it will stop, right? The state shown in the log is DONE - this means the binding is no longer initialising the device - it’s complete. Presumably earlier, this state would have been different, but I don’t see that in the log.

Sorry - this doesn’t make sense. If there’s an XML, then it means discovery has been completed. The discovery state (ie DONE) also shows that it has been completed - it is not sitting on init_neighbours according to the logfile. I don’t really see how the log can show something different to the UI - maybe it’s possible, but I think they should show the same thing.

As above, the log shows a wakeup, but since the device has finished initialisation already (according to the binding), it doesn’t do anything.

It sounds like there’s a disconnect between what’s in the logs and what is being reported by Habmin/Paper UI. Despite being completely discovered Habmin still shows the following:

init_neighbors

Strange - I’m not sure how that’s possible, but I will take a look. Is this coincident with the logs showing done (ie this image is definitely taken when the logs are showing (Alive:Done) for the status)?

Yes, nothing has changed since the logs I sent you yesterday showing no discovery activity after a wakeup. Are there any logs I can pull related to how Habmin queries the binding?

HABmin doesn’t query the binding - it gets the data from ESH. When the binding updates the state, it tells ESH the current state. There’s nearly no way that I can see for these to be out of sync.

Have you refreshed the browser (I guess so if it’s been running for a long time, but just thought I’d ask).

The only other thing I can suggest is to reinitialise the device (there’s an option in HABmin to do this) then wake it up a bunch of times, and then send me the log. I can see if things look ok from that…

This is the log after reinitializing. Now Habmin says:

not_communicating

This is caused by the controller reporting that the device is DEAD.

There is a test version of the security binding that bypasses this as this test seems to cause problems. The idea is to refactor this out before merging the security binding into master…

You might want to give it a try - it should avoid this problem at least.

That solved it! All of my devices came right up after installing the testing version. I’ll patiently wait for the merge to master.