Wake Up No More Information command to battery node not getting through

apella12 · April 3, 2021, 9:05pm

I get a lot of zniffer ‘unACK’ traffic with the controller trying to tell a battery node “Wake Up No More Information”. It appears to me by the time the message gets to the node, the node may already be asleep, so there is a lot of retries with different routes and speeds and finally explorer frames. zniff extract.pdf (217.6 KB)

The device in question is an ECO Door DWZWAVE25, although this is not the only device I have seen this with. The initial Wakeup notification goes smoothly and if the door is opened (some time later) the route and speed is the same as the successful wake-up notification (I believe the last working route (LWR) is retained). I’m not sure how long a battery device stays awake (or if there is a config for that -not found in manual), but it looks like (From other examples) the controller waits about 1 sec before sending the Wake Up No More Information. Can’t say if this is what is happening, but I thought I would ask.

Bob

Edit: Next day, more info: My new theory is that the first time “Wake Up No more Information” hits the battery device and it goes to sleep without an “Ack” that it got the message & that triggers the activity. I’m going to see if I can capture in both the zniffer and in Debug mode.

Edit 2: Caught 2 events with Zwave Debug and Zniffer. Relevant nodes are 64 and 40. It does appear that the lack of an Ack from the battery device after the “No More Info” starts the zwave traffic “storm”. From the Debug it looks like the binding gives up after about 5 seconds and about 11 seconds later pronounces the battery device asleep. The TXT is the debug file and the pdf’s are extracts of what is going on in the network during the 5 seconds. Any ideas?

No More Infor Debug.txt (853.2 KB) node 64-5sec.pdf (219.6 KB) Node 40-5 sec.pdf (220.7 KB)

apella12 · April 4, 2021, 10:19pm

So more work and now bad news for me. After waking nearly all of my battery devices and zniffing the traffic it is only the ECO DWZWAVE25 that does not send an Ack after “No More Information”. I have 6 of them. Would appreciate any ideas about a work around. I wonder if the wake-up class can be removed from the device database? So far I just went to 86400 on wake-up to minimize the “storms” to six a day.

Bob

apella12 · April 9, 2021, 6:47pm

Obviously I’m on my own on this one. Below is an update, if only to help me track what I have done.

In no particular order;

Reached out to the retailer (Smartest House). They claim the device acknowledges (Ack) their version of “Go to Sleep” with Home Assistant. They thought the Ack’ed “go to sleep” command was sent within 200 msec of the Wake-up. They also (as I mentioned in the post above) suggested the Wake-up be removed.
I don’t want to try the Wake-up class removal until I understand (or someone tells me-please!) what will happen. Ideally OH3 will still respond to the Wake-up Notification from the ECO, send queued commands, but just not send the “Wake Up No More information”, as that is what is causing the storm.
Under my theory that the device goes to sleep on it’s own while waiting for OH3 to send “Wake Up No More information”, I created a custom Zwave .jar with sleep parameters 500 Msec and 3.5 sec that seems to reduce the zwave “storms”. I have actually observed the ECO devices Ack the “Wake Up No More information” command on occasion. After examining hours of zniffer logs to pick out “wake” related commands I was going to try 200 Msec (as it was suggested by the retailer), but back the other parameter to 4.0 sec. since some other devices get the “Wake Up No More information” before they are done communicating with the controller and then there is a “storm”. Realistically I figure there is going to be “storms” regardless of these settings, just looking for the best fit with my devices, my network mesh.
I have purchased another type of door sensor for next week.

Bob

chris · April 9, 2021, 7:04pm

Sorry for the lack of support recently - I’m currently moving from the UK to NZ and looking for somewhere to live, changing jobs etc - it’s a busy time here

No - if you remove the wakeup class, the device will no longer work as the binding will not be able to track when the device is awake (since this is the purpose of this class!). This will cause the binding to think the device is effectively a mains device and it will try and communicate with it when it is asleep which will clearly not work.

apella12 · April 9, 2021, 9:47pm

Sorry I did know from other posts you were busy. Did not intend as a criticism. It’s all volunteer right?

However thanks for the information on the wake-up class. That eliminates that as an option.

Unfortunately the sleep timing parameters noted above have yielded conflicting results. The devices sometimes respond to the “go to sleep” with an Ack and sometimes they don’t. I had one give an Ack after 2.8 seconds of idle time (on the 3.5 sec clock) and another not respond after 0.3 sec (on the 0.5 sec clock).
It’s probably just a coincidence, but the one that responded was a singlecast and the “no Acks” are routed (one or two hops).

Anyway I’ll keep at it. Take care of yourself.

Bob

ps- In case you do get a moment Here is a better debug than the one above, nodes 40,45 & 34 all fail to Ack using the 0.5/3.5 .jar zwave.log (226.6 KB)

chris · April 10, 2021, 3:46am

I’m actually at a bit of a loss to understand the problem you’re having. If the binding doesn’t get the ack, then it still assumes that the device received the command and went to sleep, so what is the problem you’re experiencing?

apella12 · April 10, 2021, 12:47pm

Fair point. From a strictly OH point of view the binding handles the non-Ack from the device.

The issue for me is all the Zwave frame traffic and congestion (zniffer-pdfs above) for the 5-6 seconds the controller is trying to get the message through to the device. Although the consequences are unclear, my personality type hates to see this kind of flailing.

Besides the obvious option to replace the devices, my post here was to understand if something could be done on the OH side, even though it is the device, not the binding or OH. Hence my question about the Wake-up class in the device database and the fiddling with the sleep timers in the binding. Although at this point, I think I have exhausted those options (unless you have something else). I do have a new type of device on the way for testing.

Since you are busy and I’m retired don’t spend anymore time on this. I’ll either live with it or replace the devices.

Bob

apella12 · April 15, 2021, 8:18pm

A short denouement: Not an OpenHAB issue, not a Zwave binding issue, maybe not a device issue. Two scenarios in the attached pdfs of zniffer monitoring. Both ECO Door DWZWAVE25 devices. Node 29 works as expected. Node 40 does not respond to the “No more Information” causing 5 seconds of frantic zwave activity until the binding times out and declares the device asleep. Only difference is Node 29 is in direct communication with the controller and Note 40 is routed. My other routed ECO Door DWZWAVE25 devices behave as Node 40. My off the wall theory is that the routed devices hear the “No more Information” command directly from the controller a fraction of a sec before the routed signal arrives and go to sleep, so do not respond when the routed command arrives. Regardless whether that is true or not, I have replaced 3 of the six with another device, kept the working node and living with the other two. I adjusted the wakeup on the “living with” nodes to daily so as to minimize the zwave frame events.

Also did contact the maker, but they were of no help. I’m done with this little project.

Bobnode -29 dwzwave25.pdf (225.3 KB) node 40 dwzwave25.pdf (249.6 KB)

apella12 · June 6, 2021, 6:55pm

So I did spend additional retirement time on this….: In reviewing the Silabs requirements on sleeping nodes I noticed they were required to Sleep the sooner of the “Wake_Up_No_More_Information (WUNMI)” or ten seconds since the last transaction. So I decided to bypass the WUNMI command in the ZWave Node class, recognizing this will consume up to 10 seconds of battery life once or twice a day. I have been running like this for about a week without any perceptible change in battery charge (rechargeable batteries are Plan B). I also adjusted the sleep timer parameters, so the Node would be declared asleep in OH around the time the ten seconds expired. I’m sure this is not compliant with Silabs requirements for a ZWave binding.
ZWaveNode-hum-no sleep-timermods.txt (54.2 KB)

One discovery while testing, I realized that these changes allowed my test battery device to be discovered and be fully configured in one shot. For comparison with the unaltered class, the configuration of the same device spanned 3 action button pushing episodes. Knowing that battery device initialization is a frequent forum topic, I was wondering if lowering the WUNMI priority might allow more initialization before sleep. Looking to find a middle ground I tried this together with sending the WUNMI message with only the ACK transmit option but failed spectacularly. Having no Java training this was not a surprise. If this is a non-starter, let me know.

Here is the log of the one shot initialization.
node83 initial-one shot.txt (599.9 KB)
Note near the end of the file, the node awoke again, giving me some comfort that it did go to sleep without the WUNMI command.

Bob

chris · June 7, 2021, 4:35am

In theory this is what the binding does, but I suspect there’s a bug in there and I’ve not had the time to look for it. The binding should detect if the device is initialising and increase the timer. It’s always a balance between this, and people complaining that their device batteries are being drained. It’s ok when you’re only waking up once. day or even once an hour, but some devices wake up often, and then it can become a problem.

apella12 · June 7, 2021, 3:26pm

First it hit me today that changing the WUNMI priority in the WakeUpCC would not work (as an alternative to not sending at all) because I found the code in the TransactionMgr that bumps the priority to immediate if the node is battery.

Here are my observations so far on initialization;

The above thought led me to look at the transaction priority of the other initialization commands and for reasons I have not found yet, not all are bumped to 2, despite the log saying they are, so it appears the WUNMI jumps the queue and stops initialization. I did a base case initialization on the same device as above with the standard binding (no changes by me) that shows this at the 5 second mark (13:13:31.320) after the first awake (13:13:26.214). The next command after reawake is another Version CC with priority 3.
node35 initialization-unaltered-jar.txt (595.6 KB)
There is a timer related quirk I noticed on only the very first “Node is awake” in both files. The Manufacturer Specific message is composed, but is not delivered until after the first timer delay period. There also seems to be a command that needs to be cancelled (despite the “Node awake” saying there are no messages in the queue and prior lines indicating it was “Acked”) that takes up more time, so it is about 4 seconds before initialization resumes apace. In one of my failed experiments I had too long of a timer delay and the node went to sleep and I could never get to the Manufacturer Specific response. That is why I changed the Sleep timer the way I did. It appears this only happens the very first Awake.

Bob

chris · June 7, 2021, 7:24pm

Really this should not matter since the no-more-info command should only be sent it there are no messages in the queue at all for the device in question.

I’m not sure what you mean by this. All messages should be delivered to the converter quickly and this should not depend on the state of the device being awake - this should be the same for standard messages or manufacturer specific messages.

apella12 · June 7, 2021, 9:27pm

Really this should not matter since the no-more-info command should only be sent it there are no messages in the queue at all for the device in question.

Ok. I do not know how quickly some of these initialization commands get loaded into the queue on a PI3b and if they depend on data from a previous request, so there could be an empty queue right around when the 5 second Delay timer expires (both times) and therefore the WUNMI was sent. It seemed too coincidental, but is certainly possible.

I’m not sure what you mean by this

All I was saying there is a 4 second gap near the beginning of the initialization with very little activity and that left only 1 second before the WUNMI was sent. Picture from Node 35 file above.
Four second gap-node 35

My thought was to selectively lower the priority of the WUNMI and run an experiment to see if it would sit at the bottom of the queue until the device was fully initialized (equivalent to not sending the command at all), but if the queue is frequently empty during the initialization process, that won’t work. I’ll just stick with what I have I guess.

Thanks for the feedback,

Bob

chris · June 7, 2021, 9:59pm

It shouldn’t take 5 seconds to queue these commands - if it’s taking this amount of time, then there will likely be other problems, and in general, 5 second delays would result in a really unsatisfactory user experience so I do not expect this is normal at all.

I also don’t know what the 5 second delay you talk about is even? Do you mean the transaction timeout? If so, this is only going to trigger if there is a timeout - again this is a non-standard situation.

Again, this is caused by the timeout - you presumably have another device in your system that is not responding. The queue is 8 packets long at this point, and there is a transaction that is blocking the commands from being sent.

This has nothing to do with the wakeup timer - although the wakeup timer is being “bypassed” due to this delay.

This is not the way it works - the no-more-information is not queued if there are messages to the device queued. You definitely should not rely on the queue priority for this since most of the time there will be nothing in the queue, and this will mean that the no-more-information command will be sent immediately.

This is handled in a totally different area of the code - not the queue priority. Queue priority will simply not work in order to solve this issue.

apella12 · June 9, 2021, 12:51am

So today I focused on the 4 second delay. The `Add Node command (0x4A) Stop (0x05)’ is timing out.

Bob

Final edit (I hope)
Gave up on a github PR. What solves this for me is to comment out three lines, so it hardly qualifies as an programming effort anyway. Successful debug attached
zwave.log (651.8 KB)

In AddNodeMessageClass:
// payload.setCallbackId(0); line 86
In AddNodeMessageClassTest
// assertTrue(msg.getSerialMessage().getCallbackId() == 0); line 70
In ZwaveInclusionControllerTest
// assertTrue(txFrame.getSerialMessage().getCallbackId() == 0); line 100

Unless this is a quirk of my Zooz controller (SDK 6.45) this should be happening to more than just me. Will be interested to see if you can duplicate once you are settled. My test device is fully configured by just installing the battery vs. problematic inclusion with multiple wakes on OH3.1 (various versions).

I also found out along the way that the Remove Node (0x4B) stop (0x05) was timing out (less of a concern, since there are no ensuing commands), but was not fixed in a similar fashion like I had hoped