OH2 Z-Wave refactoring and testing... and SECURITY

Tags: #<Tag:0x00007fd312fa84e8> #<Tag:0x00007fd312fa82b8>

(Chris Jackson) #3156

It’s certainly not working as I expected -:

This shows the second message that is rejected is sent 5ms after the first one - this should have been delayed (like the third TX message is). This seems to be common…

(Mark) #3157

Good catch. There’s 200 ms between transmissions on node 68, but the message for node 85 goes out immediately after the rejection.

(Chris Jackson) #3158

Ok, I’ve resolved the issue with the holdoff not working first time around - the request gets requeued before the timer is started, so that shouldn’t be a big issue.

As a matter of interest, what sort of computer are you running on? I’m wondering if I shouldn’t add a short delay between all transactions (say 20ms) to see if we can reduce the occurrences of failures in the first place…

(Mark) #3159

Intel Core i5 overclocked to 4.2 GHz. :sunglasses:

(Chris Jackson) #3160

It’s not linked to the node, so this is just coincidence… The holdoff is a total block on all sending.

In all, this does look a lot better. I see one OFFLINE in this log -:

After this, node 68 works fine, so this is really caused by comms issues with the controller. With the first holdoff being skipped, we effectively only have a single 200ms delay here before we’re offline.

After your earlier comments about it always taking 2 attempts, I increased the 200ms to 300ms - I was thinking about dropping it back down given the issue with missing the first hold, but I might leave it and try and eliminate these sort of OFFLINEs if we can.

I wonder if that’s why some people have more problems than others? Maybe adding a standard delay between transactions of 5 to 20ms would help avoid overloading the controller when people are using fast computers (just thinking out loud - comments welcome).

@5iver and @digitaldan - what are you guys running (probably also something fast :wink: ).

(Chris Jackson) #3161

Updated version that fixes the first holdoff is here. This uses a 250ms holdoff.

(Mark) #3162

Looks much better! Only one node is offline, and that’s a battery-powered device.

Very few REJECTED messages, and when one occurs, it’s successful on the 2nd attempt.

Sending you a log.

(Scott Rushworth) #3163

Nothing too fancy here… I’m using 10+ year old junk from the recycling pile (AMD x2 5600+ 2.8GHz w/6GB DDR2). I nearly tossed it last week when the CPU fan mount spontaneously fractured. It got lucky… I found a model for it so I printed a replacement!

(Scott Rushworth) #3164

What’s going on here? This first bit looks OK…

But then there’s a barrage of multiple reports (off screen), and then some more gets, then node dies and comes back online…

(Scott Rushworth) #3165

Another strange one…

(Scott Rushworth) #3166

The first one was from a siren (battery powered frequently listening), the second from a dimmer, this one is from a WADWAZ-1 door sensor. I currently have 30 dead nodes and growing, all battery powered.

(Chris Jackson) #3167

I’m shocked. I felt sure you’d be using a cryogenically cooled super-computer :smile:

(Chris Jackson) #3168

Can you email me over the log?

(Scott Rushworth) #3169

On their way…

(Chris Jackson) #3170

Great - thanks. The hold-off seems to be a big step forward :slight_smile: .

I will merge this in to the dev binding tonight probably (I’ll take a look at Scotts log first).

FTR -:

(SiHui) #3171

Hey guys, that is not fair, one of those is my main working computer (upgraded with an ssd and still working fine with Win10). And yes, I bought it in 2008 … :joy:

(Scott Rushworth) #3172

My POS has got one too! :kissing_heart:

(Micael ) #3173

I have always had issues starting up the z-network ~70nodes, and I have OH running on my main server, rackmount core i7 4-something GHz, so I thought I should try the new updated version, and gut feeling is that it works much better than before, also looking in logs gives me the feeling that this is a step up from before. I have not been updating so much lately, so maybe this is something that was fixed earlier during the last month or so, but I just wanted to give my feedback!

(Chris Jackson) #3174

I suspect that there is some “bad sh!te” happening at the network level. It’s a guess as we don’t have any visibility of that at the binding level, but the multiple responses are indicative of lots of retries happening, or maybe the network being congested and retries getting queued. I thought that the controller only sends a few retries though (3 I thought). In this sequence the binding is sending 1 GET request, and we get 12 REPORTs.

The OFFLINE at the end I will look at. This is another area that I shouldn’t set the device offline. Here the frame is rejected by the controller, not the device, so we shouldn’t blame the device and set it offline.

(Chris Jackson) #3175

Thanks - it’s really useful…