Z-wave stick goes offline after sudo reboot from cron

Hello community,

I have problem with z-wave stick in OH3. After each restart from cron (weekly) of after each karaf bundle restart - my one or even all three z-wave stick go offline.

Recently I’ve Symlink a try in order to fix it. But it seems that it just do nothing in my case.

It’s extra frustrating to restart it again and sometimes twice to be able to see all z-wave stick online again. I’ve start DEBUG logging to be able to find out what is happening but tbh - my lack of knowledge goes up here and the only thing I can do is ask for help here…

Hopefully someone will jump into it and help me solve this issue.

Logs as zip file are on cloud here (just click direct download): Logs.zip

PS: It’s started with summer time change - I’m not sure that is only a coincident or actual case
PS2: Please skip modbus errors - PLC unit has been disconected form LAN.

What I’ve noticed in logs is:

	Line 36756: 2021-05-14 16:21:27.671 [DEBUG] [WaveSerialHandler$ZWaveReceiveThread] - Protocol error (OOF). Got 0x03.
	Line 36759: 2021-05-14 16:21:27.671 [DEBUG] [WaveSerialHandler$ZWaveReceiveThread] - Protocol error (OOF). Got 0x0A.
	Line 36761: 2021-05-14 16:21:27.671 [DEBUG] [WaveSerialHandler$ZWaveReceiveThread] - Protocol error (OOF). Got 0x00.
	Line 36763: 2021-05-14 16:21:27.671 [DEBUG] [WaveSerialHandler$ZWaveReceiveThread] - Protocol error (OOF). Got 0x17.
	Line 36765: 2021-05-14 16:21:27.671 [DEBUG] [WaveSerialHandler$ZWaveReceiveThread] - Protocol error (OOF). Got 0x93.
	Line 36804: 2021-05-14 16:21:27.673 [DEBUG] [WaveSerialHandler$ZWaveReceiveThread] - Protocol error (OOF). Got 0x04.
	Line 36813: 2021-05-14 16:21:27.673 [DEBUG] [WaveSerialHandler$ZWaveReceiveThread] - Protocol error (OOF). Got 0x00.
	Line 36815: 2021-05-14 16:21:27.673 [DEBUG] [WaveSerialHandler$ZWaveReceiveThread] - Protocol error (OOF). Got 0x26.
	Line 36817: 2021-05-14 16:21:27.673 [DEBUG] [WaveSerialHandler$ZWaveReceiveThread] - Protocol error (OOF). Got 0x02.
	Line 36819: 2021-05-14 16:21:27.673 [DEBUG] [WaveSerialHandler$ZWaveReceiveThread] - Protocol error (OOF). Got 0x84.
	Line 36820: 2021-05-14 16:21:27.673 [DEBUG] [WaveSerialHandler$ZWaveReceiveThread] - Protocol error (OOF). Got 0x07.
	Line 36822: 2021-05-14 16:21:27.673 [DEBUG] [WaveSerialHandler$ZWaveReceiveThread] - Protocol error (OOF). Got 0x54.
	Line 40690: 2021-05-14 16:21:27.896 [DEBUG] [WaveSerialHandler$ZWaveReceiveThread] - Protocol error (OOF). Got 0x6F.
	Line 40763: 2021-05-14 16:21:27.898 [DEBUG] [WaveSerialHandler$ZWaveReceiveThread] - Protocol error (OOF). Got 0x82.

@chris

While there is likely some problem here that needs to be looked at, you can mitigate the problem some by addressing the reason you are rebooting weekly. As you can see, reboot can be risky and disruptive and should be used rarely and only in certain situations (e.g. kernel upgrades, unresponsive systems, stuff like that). Rebooting weekly on a schedule is, at best a temporary bandaid that only masks a problem most of the time. At worst it is completely unnecessary and only enforces outages and risks of creating additional problems.

It’s far better to solve the root problem that causes you to reboot weekly than to schedule a reboot on a weekly basis.

The main reason for reboots are slowing down z-wave network.
All system has been organised on 3 z-wave sticks - 1 direct usb and 2 socat over LAN.
What I’ve noticed there is a huge difference approx 20 min after restart system works like a harm - super fast.

I’ve did all my best at least thats what I think to make it work as fast as it is possible.

  1. Check if there is no death nodes / unsued nodes in the system

  2. Eliminate battery PIR Multisensors - put them on constant power supply and change parameters according to manufacture guidance (for constant power supply).

  3. I’ve used direct association - which increase the speed but it cut me out form more advance rules.

  4. comment all channels which are unnecessary for me - like power, amperage, etc.

//Number:Temperature    Parter2PIRSzatniaWejscieSensorTemperature                     "Sensor (temperature)"                                                                   {channel="zwave:device:5649cceebe:node35:sensor_temperature"}
//umber               Parter2PIRSzatniaWejscieSensorUltraviolet   "Sensor (ultraviolet)"         {channel="zwave:device:5649cceebe:node35:sensor_ultraviolet"}
//Number               Parter2PIRSzatniaWejscieSensorLuminance     "Sensor (luminance)"           {channel="zwave:device:5649cceebe:node35:sensor_luminance"}
//Number               Parter2PIRSzatniaWejscieSensorRelhumidity   "Sensor (relative humidity)"   {channel="zwave:device:5649cceebe:node35:sensor_relhumidity"}
Switch           Parter2PIRSzatniaWejscieAlarmMotion                      "Motion alarm"                                                                          {channel="zwave:device:5649cceebe:node35:alarm_motion"}
//Switch               Parter2PIRSzatniaWejscieAlarmTamper         "Tamper alarm"                 {channel="zwave:device:5649cceebe:node35:alarm_tamper"}
//Number               Parter2PIRSzatniaWejscieBatteryLevel        "Battery level"                {channel="zwave:device:5649cceebe:node35:battery-level"}
Dimmer           Parter2SzatniaWejscieDimmerSwitchDimmer                  "Dimmer"                                                                                {channel="zwave:device:5649cceebe:node50:switch_dimmer"}
//Number   Parter2SzatniaWejscieDimmerSceneNumber    "Scene number"               {channel="zwave:device:5649cceebe:node50:scene_number"}
//Number   Parter2SzatniaWejscieDimmerMeterCurrent   "Electric meter (amps)"      {channel="zwave:device:5649cceebe:node50:meter_current"}
//Number   Parter2SzatniaWejscieDimmerMeterWatts     "Electric meter (watts)"     {channel="zwave:device:5649cceebe:node50:meter_watts"}
//Number   Parter2SzatniaWejscieDimmerMeterVoltage   "Electric meter (volts)"     {channel="zwave:device:5649cceebe:node50:meter_voltage"}
//Number   Parter2SzatniaWejscieDimmerMeterKwh       "Electric meter (k wh)"      {channel="zwave:device:5649cceebe:node50:meter_kwh"}
//Switch   Parter2SzatniaWejscieDimmerMeterReset     "Reset meter [deprecated]"   {channel="zwave:device:5649cceebe:node50:meter_reset"}
//Switch                Parter2SzatniaWejscieDimmerAlarmPower                         "Alarm (power)"                                                                          {channel="zwave:device:5649cceebe:node50:alarm_power"}
//Switch                Parter2SzatniaWejscieDimmerAlarmHeat                          "Alarm (heat)"                                                                           {channel="zwave:device:5649cceebe:node50:alarm_heat"}
//Number   Parter2SzatniaWejscieDimmerTimeOffset     "Clock time offset"          {channel="zwave:device:5649cceebe:node50:time_offset"}
  1. Increase poll period to max 86400 - this can be 0 (need to check that)

example multisensor type setup

UID: zwave:device:5649cceebe:node35
label: Parter_2_PIR_Szatnia_wejscie
thingTypeUID: zwave:aeon_zw100_01_010
configuration:
  config_52_1: 10
  config_54_2: 10
  config_202_1: 0
  config_204_1: 0
  group_1:
    - controller
    - node_57
  config_112_4: 3600
  config_39_1: 20
  config_9_2_00000100: 0
  config_56_1: 4
  config_58_1: 5
  config_201_2_000000FF: 1
  config_43_2: 100
  config_64_1: 1
  config_252_1: 0
  config_60_1: 2
  config_81_1: 0
  config_41_4_0000FF00: 1
  wakeup_interval: 3600
  config_50_4_7FFF0000_wo: 0
  config_255_4_wo: 0
  config_102_4: 0
  config_9_2: 2
  config_3_2: 60
  config_45_1: 2
  config_5_1: 1
  config_41_4_wo: 20
  config_55_1: 8
  config_53_2: 1000
  config_51_1: 90
  config_100_1_wo: 0
  config_201_2: 1
  config_203_2: 0
  config_50_4_wo: 0
  action_reinit: false
  config_201_2_0000FF00: 0
  wakeup_node: 1
  config_113_4: 3600
  config_9_2_00000001: 0
  config_111_4: 3600
  config_49_4_wo: 0
  config_59_1: 10
  config_49_4_7FFF0000_wo: 280
  config_57_2: 5121
  config_42_1: 10
  config_41_4_00FF0000: 0
  config_44_1: 10
  config_50_4_0000FF00_wo: 1
  config_61_1: 0
  config_40_1: 1
  config_49_4_0000FF00_wo: 1
  config_110_1_wo: 0
  config_2_1: 0
  config_101_4: 0
  config_8_1: 15
  config_46_1: 0
  config_4_1: 5
  config_48_1: 0
  config_103_4: 0
  node_id: 35
bridgeUID: zwave:serial_zstick:5649cceebe

example dimmer setup:

ID: zwave:device:5649cceebe:node50
label: Parter_2_Szatnia_wejscie_dimmer
thingTypeUID: zwave:aeon_zw111_02_003
configuration:
  config_71_1: 0
  config_92_1: 10
  config_90_1: 0
  config_248_1: -125
  group_4: []
  group_1:
    - controller
  group_3: []
  group_2: []
  config_131_1: 20
  config_112_4: 600
  config_85_4: 1179747
  config_20_1: 0
  config_21_4_000000FF: 0
  config_83_1: 0
  config_252_1: 0
  config_81_1: 0
  config_123_1: 3
  config_120_1_wo: 3
  config_121_1: 3
  config_68_4: 855638015
  config_102_1: 0
  config_21_4_0000FF00: 0
  config_125_1: 1
  config_3_1: 1
  config_66_4: 855638015
  config_64_4: 184549375
  config_129_1: 0
  config_100_1_wo: 0
  config_91_2: 50
  config_247_1: 0
  config_249_1: 1
  config_21_4_0F000000: 0
  config_130_1: 1
  switchall_mode: 255
  config_132_1: 99
  config_113_4: 600
  config_111_4: 3
  config_84_4: 301991936
  config_21_4_00FF0000: 1
  config_82_1: 0
  config_80_1: 2
  config_255_4: 0
  config_110_1_wo: 0
  config_122_1: 0
  config_101_1: 0
  config_124_1: 3
  config_69_4: 184483840
  config_103_1: 0
  config_4_1: 1
  config_67_4: 855638015
  node_id: 50
  config_65_4: 855638015
  config_86_4: 1507328
  config_128_1: 2
bridgeUID: zwave:serial_zstick:5649cceebe

Let’s say it’s now between 1-3 second between Aeotec multisensor motion alarm and actual light ON.
But sometimes very rarely PIR does not start the light - once per 1-2 months.

I know there were reports of problems with the nightly heal. Have you disabled that? Assuming you have disabled that, have you run it manually since then? Perhaps the binding runs a heal when the binding boots in which case you could just reenable the heal instead of needing to restart OH.

It’s not really clear why a restart of OH would change the behavior of the Zwave network though.

They are enabled in each z-wave stick (I’ve tried disable it - but it cause even more problems).

Commenting items/channels you are not interested in will not reduce traffic in the Z-Wave network. Only the representation of the channel on a programming level is omitted that way, the data will still be transported.

To reduce traffic in the network you may want to disable all reports you are not interested in or reduce the frequency they are sent if you cannot disable them.

Healing the network - as far as I know - is only needed when devices change their physical location so their neighbors are no longer correct. When the devices are not moved around, a network heal is no longer needed once is has been performed successfully.
As you have more than one controller, maybe you can schedule a heal at different times for each one. That way the amount of traffic is spread over time (wireless communication is done on a shared medium, so the controllers “see” the traffic of other controllers within their own range).

Thanks for your suggestions.

Still that will unload the quantity on programming level.

Commenting items/channels you are not interested in will not reduce traffic in the Z-Wave network. Only the representation of the channel on a programming level is omitted that way, the data will still be transported.

I’ve turn off all unneeded report as well. It’s a 101-103 and 111-113 parameter. In my case its 0 and report its 3600 - Here I can increase as far as I know up to 7200? Correct me if I’m wrong but that’s all I can do regarding reports from setting menu?

For now I have set the healing at 4am for all of z-wave sticks - I will change that and put 1h break. good idea. The best time for disable the healing is the day that whole system works like harm - I will wait for it (I’m not moving the devices - they are all the time in the same spots).

That is not strictly correct. The binding will not poll any channel that is not linked, so there will normally be a reduction in traffic - but in the grand scheme of things it isn’t high unless the user has set the polling period high.

This error is normally caused by some sort of low level problem in the serial communications. It indicates that the binding and controller have got out of synchronisation - in theory, it should resync, but given you don’t know the root cause, it may be that the resync doesn’t work due to some underlying problem.

I can’t help on your problems, but I have used a Zniffer to see the actual traffic and found zwave traffic that I did not think was happening, noisy devices and surprising routes. Zniffer also shows the delay between nodes on hops. Could be more complicated with three sticks. I have only one stick with 47 nodes and the zniffer is right by the stick. I’m down to about 4 frames a minute, including Acks, just the essential info. Was only $35 here in the US.

Chris, thanks for the clarification :+1:

What stick are you use to put zniffer on?

I used the UZB from Silabs. I’m not an expert, but think any other 500 chip zstick can be flashed, if you have a spare. I and followed the instructions on the web and the free Silabs firmware.

Personally I doubt that this will help you. The problem seems to be with serial communications between the controller and the binding, or possibly the computer is too busy and it’s dropping frames. Normally if there is a problem on the network, this would manifest in other ways - typically when sending data the frames would be rejected by the controller due to lack of buffers - you typically would not see the Out Of Frame errors that are in the logs here.

Hi, Thanks for check the logs,

IT’s Dell R240 - which is running up to 2-3% of its capacity
Maybe a new driver for usb would help /

Agree Zniffer may not help with this specific problem, but with an 3 Zsticks network, I can guarantee you will learn a few things and it is only about the cost of one new device.

Bob

I don’t disagree that it’s interesting to use the sniffer, and it’s possible to learn stuff - just that I had assumed the main drive was to solve the problem reported here, and I’m not sure it will help for that.

My context was the initial post indicated a weekly reboot using a cron rule, having the network in three parts and three nightly heals on each zstick was the going-in (“working”) situation. And now all that no longer works. IMO (I know this is your area of expertise- not mine) all that is not indicative of stability and a cause of lot of network traffic and possibly, however unlikely, caused the reported problem. If not it will be useful anyway trying to streamline what he/she has.

Bob

Other possibilities. There were issues with the OH3 nrjavaserial discussed elsewhere. Your error message is not the same, but it was related to serial ports. Might want to look through the conversations. Also here.

Also don’t know if you have the IP camera binding, but there is an active thread about a conflict between Zwave and IP camera.

Bob

I will check that topic later - no I don’t have IP camera binding installed.