WakeupTimerTask -- what is it?

Tags: #<Tag:0x00007f014c3dde30>

(Ian) #1

Most mornings my z-wave network is incredibly sluggish and some devices are no longer responding to polls, or some previously non-responsive ones came back. My debug logs are full of messages such as

NODE 22: WakeupTimerTask 0 Messages waiting, state REQUEST_NIF
NODE 65: WakeupTimerTask 0 Messages waiting, state DYNAMIC_VALUES

These messages appear in quick succession, e.g. 5 per second for a row of different nodes. These gradually proceed through various states until they all reach “DONE” and then my network works again. This can however take until lunchtime to complete some days.

I thought this was network healing so I turned it off but it’s still going on, today it’s now 11AM and the debug logs have only just gone quiet. Various devices are no longer responding to polls.

I’ve just tried to find out when this process starts, but it seems my logs have been full of these messages for about 3 days now with no clearly defined start point.

(Scott Rushworth) #2

It would be helpful to know the zwave binding version you are using. If you are not M3 or newer, then there is no heal. Turning off the heal would not be helpful, except for troubleshooting. Are you running openHABian? Check the load on your server… something could be slowing it down, bad SD card, etc. It will be guesswork without more details and zwave debug logs.

WakeupTimerTask is for battery devices… the binding is waiting for the device to wakeup.

(Ian) #3

It was just a generic question about the wakeuptimertask messages and what it’s all about and how come I get a flurry of them every day that makes my network unresponsive (even mains-powered devices have 10-30 second delays), but here’s the info.

Openhab2 2.4.0~20180924 (updated using the recommended transition process)

It’s not on a Pi, it’s a debian box with openhab2 snapshot installed, with no load issues.

As for the debug logs, it’s generated 200 megs of debug logs in 2 days, so kind of hard to know where to start with that one!

(Ian) #4

I’m just chewing my way through the logs now trying to find a possible start point to this.

(Scott Rushworth) #5

How many devices? I have a major slow down after a binding restart that lasts for an hour or so (120+ devices). Is something restarting OH or the bindings?

Cool… you do have healing then! When did this start occurring? After the upgrade? Just for giggles, double check you don’t have more than one zwave binding… list -s | grep zwave in Karaf.

(Ian) #6

I’ve got something in the region of 50 z-wave devices. The machine was last rebooted 3 days ago, the ps listing for the openhab process states it was started 3 days ago so it’s not been restarted since then, I don’t know offhand how to check for when the binding itself was last restarted.

I think it’s been doing this since the update but am not 100% sure, as it’s only relatively recently that I’ve been running with debug on all the time and checking the logs frequently.

Regarding bindings, just the one:
openhab> list -s | grep zwave
207 │ Active   │  80 │     │ org.openhab.binding.zwave

(Ian) #7

The behaviour does look similar to that seen when I reboot the machine or restart openhab, even though it hasn’t been rebooted or restarted.

As for debug logs, this is probably the most representative of them, attached with “.xml” added to get around the site upload restriction.
zwave.log.19.gz.xml (504.9 KB)

(Rossko57) #8

Bear in mind it may be the other way about, some blockage elsewhere causing a queue of wakeup checks. Symptom not cause?

(Ian) #9

I was thinking that as when I wake up it sends settings for 12 radiators at once, all battery powered, but memory tells me there’s a lot of NIF requests in there, however my memory is something I’ve grown to distrust…

I’ve just been updating the entire openhab system which was planned for this weekend anyway, so it’s had an update and restart, I’ll keep an eye on it and make sure I pay more attention being as the logs from before I disabled healing have been flushed away.

(Mark) #10

How often are your devices being polled? If they’re all being polled close together in time, I suppose it could cause a delay. If polling is frequent, try dialing back to every few hours. Or, maybe set it to Daily just to get it out of the equation.

(Ian) #11

The radiators are all polled every 30 minutes, I use the values of the radiator valves to decide whether the boiler needs to be on or off (if the most-open valve is only open 10% I turn off the boiler) so would rather keep it at that level. I am investigating methods of spacing out those polls but it seems I can’t trigger a poll from a rule. The other option would be to set each radiator to slightly different poll times so they’d become separated over time, e.g. one at 30 mins, another at 31, the next at 32 and so on.

(Mark) #12

Hmm. If there are NIFs every time the devices wake up, maybe they are not completing initialization. You can tell by checking for the existence of the node.xml files in userdata/zwave. If those files are not there for your radiator nodes, then a zwave debug log would be very helpful to see what’s going on.

(Ian) #13

There was a debug posted earlier in this thread. The problem I’ve got is that the network seems to go through this every night and once complete, devices that were responding on one day no longer respond on the next. I’ve updated to the latest snapshot, disabled network healing and will be keeping a closer eye on it this week to see if I can narrow down the circumstances a bit.

(Mark) #14

Sorry, I missed that. Let me take a look…

(Ian) #15

Thanks. The debug posted above was posted after disabling network healing, I have no idea if healing can have this kind of effect or not. The logs that were around before I disabled healing were rotated out and lost unfortunately.

(Mark) #16

What are some of the node numbers that are problematic, and what specific devices are they?

(Mark) #17

I don’t know what node 64 is, but this doesn’t look good every 5 seconds…

(Scott Rushworth) #18

Search the forum for the command refresh.REFRESH for details. You can send this command to an item to poll the channel.

(Mark) #19

Timeouts on 3 and 58 not good either.


(Ian) #20

Node 64 is a mains-powered external motion sensor, this kind of thing happens only when the morning “storm” of z-wave activity is happening which is what that log is from, during the rest of the day, including all the 30-minute radiator polling, is working fine now. Tomorrow morning it’ll probably go through another storm.

As for problematic nodes, it’s essentially all of them, the z-wave network becomes extremely unresponsive in the mornings for all devices, battery powered or mains powered, which is what I’m trying to track down. I was looking to see if there was some event that happens every day which might trigger the network traffic.