Z-Wave [SS #1447, 2.4.0.201811291745] very long response times (on/off switch takes 28s to respond)

2.4.0-SNAPSHOT Build #1447
x86 PC
Aeotech Z-Wave Gen 5 stick
~65 Z-Wave nodes

@chris could you please take a look at the debug log below?
node3 was flicked (in sitemap), but took 28s to respond. I see this throughout my whole network.
When entering a room it takes longer than usual for my Squeezebox to come on, and even longer for the lights. Delay is not consistent, but is there more often than not.
When tailing the zwave debug log, there are no frantic continuous activity.
When this log was taken, I had just power cycled my PC and also unplugged/re-plugged the Z-Wave stick.
A top looks ok. Java/OH only consumes 5-9% CPU

Z-Wave debug log

Feels like this behavior started on Saturday when I installed a new Snapshot. Upgraded again yesterday. Same.

Thanks

I’m not @chris, but I’ll throw in my $0.02. :wink:

Your log shows only 28 seconds of activity – not enough to draw too many conclusions.

However…

These things do not look good to me. Whatever node 46 is, it’s not behaving well in your network. If the timeouts are happening on a regular basis, your network will definitely be sluggish.

Maybe try excluding node 46 from your network (or figure out why it’s timing out), then see if things improve.

Not sure what this is…
1

Timeout
2

Timeout…
3

Timeout…
4

Thanks @mhilbush, for chiming in.
Good catch.
I checked node46 and it’s a Fibaro ‘The Button’ a battery device, but when I checked just now, it was quiet.
But I will monitor my /var/log/openhab2/zwave.log for timeouts.

I found that one node no longer in the net was getting Timeout retries every 8s, but that node is long gone, no longer in the net, so after a marked as fail and remove from controller in habmin that stopped.

I still got timeouts from 5-6 nodes and 3 in particular, and re-init/heal of those did not work. (stopped communicating)
I power cycled them, and viola, no timeouts for 30 minutes now, and response in net is very good.

How many timeouts are acceptable?
How often does it occur for others? (interesting to know)

Well, to sum up:
Yesterday, I became aware of a node (an Aeon Multisensor 6) that was spamming my net with BURGLAR events almost every 100ms. I think that was because it was placed in the roof looking down on my router lights and the LED display of 2 thermostats. Relocating it made it shut up.
I’m thinking that period of spam (was probably several days) could have left some of the meshing/relaying nodes in a peculiar state, such as those 3 I had to power cycle.
Also the zombie node (off net for ~3 weeks) was getting timeouts every 8s.

So, this shows that it is a good idea to monitor the network looking for talkative nodes and timeouts in the Z-Wave logs with DEBUG on. (for Z-Wave logging to own file, have a look here.)

Afterthought:
Would it be possible to monitor these things and report it via some sort of Network Health?
Logging Z-Wave DEBUG severity on a regular basis should perhaps be set up to use a RAM/tempfs disk…

Hi @OMR
Just catching up with messages - it looks like you’ve worked this out now?

This would potentially be possible - I’m not sure completely how this would look like, but we could monitor things like a high level of SPAM and provide some sort of report or channel. I’m open to ideas about this as it is something that devices do seem to do occasionally :frowning: .

Yes Chris. You (and Snapshot #1447) are in the clear.It was all issues in my network.

Some final thoughts on:

?

Well, it should be low. The occasional timeout is ok, but I would expect the timeouts to be less than 1% of traffic (or in that range). As you see - the more timeouts there are, the more the network will slow down.

OK.

And what about the timeouts @mhilbush noticed here??
Normal for a battery device?

Just brainstorming but could there be some way to get which nodes have sent :

  • most packets
  • most retransmissions
    over the last hour (or other intervals) and make these values available to the user so they can themself set thresholds or triggers based on this? If it’s even possible… :slightly_smiling_face: