Zwave Controller goes offline and can't recovery after night

Hi there,
my Zwave Controller (UZB-1) seems to randomly lock / hang during the night.
I’ve recently updated my OH Docker Container to 3.0.1 but the error persists.

events.log only shows one device after each other as offline:

2021-03-04 04:23:17.008 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing ‘zwave:device:57dc600655:node5’ changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller

Except for some Errors for OpenHabCloud there is nothing suspicious in openhab.log

2021-03-04 01:50:00.123 [ERROR] [io.openhabcloud.internal.CloudClient] - Error connecting to the openHAB Cloud instance
2021-03-04 01:50:00.360 [INFO ] [io.openhabcloud.internal.CloudClient] - Disconnected from the openHAB Cloud service (UUID = 1d1864a0-a894-4cdf-89a7-29e4e856a9fa, base URL = http://localhost:8080)
2021-03-04 01:50:03.678 [INFO ] [io.openhabcloud.internal.CloudClient] - Connected to the openHAB Cloud service (UUID = 1d1864a0-a894-4cdf-89a7-29e4e856a9fa, base URL = http://localhost:8080)

so far i can “fix” / “recover” my network by opening the docker bash and deleting the /var/lock/LCK_ttyACM0 (after that the nodes slowly reappear as online)

I’ve seen some additional debugging options in regards to the serial interface though it seems like they are only temporary enabled.
Looking forward to any suggestions on what i can do to troubleshoot this further :slight_smile:

PS: Daily Heal is set to 2 AM - I’m in UTC+1 so this doesn’t match …

It’s definetly not the heal time - the nodes disappeared again …
The only thing i remember doing was decreasing the temperature at one thermostat manually (might have caused 10 update messages …) though it wasn’t node 10…

2021-03-04 08:01:54.666 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:57dc600655:node10' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller
2021-03-04 08:01:54.985 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:57dc600655:node8' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller
2021-03-04 08:01:54.985 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:57dc600655:node4' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller
2021-03-04 08:01:55.714 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:57dc600655:node6' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller
2021-03-04 08:01:55.922 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:57dc600655:node7' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller
2021-03-04 08:02:13.686 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:57dc600655:node2' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller
2021-03-04 08:09:00.405 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:57dc600655:node5' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller

I’ve looked up the other forum post:

the logging option was

log:set DEBUG org.openhab.core.io.transport.serial 

and i previously got the same result. I’ll enable it now with an OH restart in order to check if other log entrys appear once the error occurs.

Reading the post further i also checked permissions of the LCK file in the docker image.
It seems that the permissions of the lock file might be ok …

root@openhab-3:/openhab# ls -l /var/lock/                                                                             
total 4                                                                                                               
-rw-r--r-- 1 openhab openhab 11 Mar  4 09:21 LCK..ttyACM0   

I’ve discovered that turning on and off the controller item in Web UI is sufficient.

Could there maybe be some misconfiguration on my controller with it marking the nodes offline before they report back?
Today I also discovered that my meter and a fibaro sensor were still online while the develo thermostats (battery powered) were all offline…

how mature is this system? Is it a clean install? Has this set up ever run with out going offline, sorry dumb question, just trying to under stand.
And the docker container is running what operating system?

Hi,

I’ve previously used the hardware with OH 2.5 (directly on the synologyw/o docker) and it worked without a charm, at least I never noticed the thermostats going offline.

The software itself (docker container) was a fresh install with OH 3.0.0 and i downloaded it from the official repository. Host system is a synology, but I suppose the docker containers run debian ?!

I’ve mapped 3 folders as per download manual on the OH website to folders on my synology, also the container itself is running with higher permissions in order to be able to access the serial device.

While trying to troubleshoot the issue I replaced the docker container with a newer 3.0.1 version, though not completely clean install because this time I recovered the folders from the 3.0.0

+update from this morning, after the night something went wrong again, this time neither the fibaro nor the main powered meter were responsive and I had to disable and enable the controller again. After that all devices recovered…

I know you do not think it is heal related and it probably is not directly but I would suggest you disable heal.

For sure disabling heal was worth a try :slight_smile: I did it last week it still shows as disabled, but the problem reoccurs anyway.

Is this a big network > 100 nodes or just a few nodes?

Hi @robmac
I think its 10nodes only- so I guess you consider it small.

Yes very small. It seams unlikely that this is a z-wave issue.

Any chance it could be a driver issue on the Synology as I think they stopped updating a lot of USB drivers a while back.

Unlikely because the system used to be stable with OH 2.5.
Though I didn’t use docker before.

@antamy I see your other issue has been marked as resolved.
You disabled heal time in the controller thing item did you? (Doesn’t do the trick for me …)

The change to use docker is not exactly a small change and also a java version change. The serial handling is something that the binding has little control over but java and docker are all involved between the OS and any OH code.

Of all of the changes docker is probably the most significant change.

Should I try setting up a OH 2.5 docker container for debugging ?

Unfortunately I had to open another issue today. While debugging my things overview broke…

Not something I could advise you on but not sure what you gain as the stack is still different.

yes, disabling Heal fixed my issue. But that was not the controller going offline, it was all the nodes going offline. I used to have this controller (an early Aeotec USB stick) running on a Synology with the drivers from the Domoticz package. Not a docker install. I would regularly have the controller go offline and there was no way other than to unplug it and replug it, with all the rigamarole of setting ownership and permissions. OH2 was always so slow on my Pi, which is why I moved to Synology. With OH3, I decided to try the Pi again and it is very snappy (used to take 5-10 minutes to start with OH2) and I’ve had no problems with the Z stick.

Another thought, you can try disabling / enabling the USB stick. I didn’t have great success with this on Synology, but you could try it.

Use dmesg -T | grep tty to see the id (?) of the stick, e.g. : usb 1-1.4: cp210x converter now attached to ttyUSB0

Then use echo '1-1.4' | sudo tee /sys/bus/usb/drivers/usb/unbind followed by echo '1-1.3' | sudo tee /sys/bus/usb/drivers/usb/bind and see if that restarts the port.

Hi everyone,
after enabling DEBUG Logging for zwave i can present you with more detailed log files, each before and after recovering the Zwave Network by clicking Sync on the controller …
both events and zwave.log showed activity after the reset, openhab.log didn’t have any new entries (therefore only one file)

zwave.log was split in files each 16MB. In order they are zwave.log1, zwave.log1, zwave.log + isplit another one which only has the log entrys after reset events_afterSync.log

The number of characters in these files crash the forum … therefore a link to the complete set of files: Connecting...

I suppose the most interesting parts are:

events.log (around 4AM on 18th) because it shows that the sensors were updating at that time and then suddenly dropping offline

2021-03-18 03:59:03.185 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'FlurThermostat_Sensortemperature' changed from 21.72 °C to 21.53 °C
2021-03-18 04:00:00.272 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'KucheThermostat_Sensortemperature' changed from 19.59 °C to 19.71 °C
2021-03-18 04:00:05.734 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'BadezimmerThermostat_Sensortemperature' changed from 22.11 °C to 22.15 °C
2021-03-18 04:00:25.079 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'WohnzimmerThermostat_Sensortemperature' changed from 19.97 °C to 19.72 °C
2021-03-18 04:03:45.920 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'ArbeitszimmerThermostat_Sensortemperature' changed from 18.46 °C to 19.2 °C
2021-03-18 04:03:54.753 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'FlurThermostat_Sensortemperature' changed from 21.53 °C to 21.5 °C
2021-03-18 04:04:00.917 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'openweathermap:weather-and-forecast:8fa826c227:local' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Total timeout 10000 ms elapsed
2021-03-18 04:04:00.924 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'openweathermap:weather-api:8fa826c227' changed from ONLINE to OFFLINE
2021-03-18 04:04:53.375 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'KucheThermostat_Sensortemperature' changed from 19.71 °C to 19.79 °C
2021-03-18 04:05:05.058 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'BadezimmerThermostat_Sensortemperature' changed from 22.15 °C to 22.13 °C
2021-03-18 04:05:14.727 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'WohnzimmerThermostat_Sensortemperature' changed from 19.72 °C to 19.7 °C
2021-03-18 04:16:47.054 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:57dc600655:node6' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller
2021-03-18 04:22:14.693 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:57dc600655:node2' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller
2021-03-18 04:24:48.725 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:57dc600655:node4' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller
2021-03-18 04:27:48.474 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:57dc600655:node8' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller
2021-03-18 04:27:48.623 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:57dc600655:node10' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller
2021-03-18 04:30:07.015 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:57dc600655:node7' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller
2021-03-18 04:36:54.900 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:57dc600655:node5' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller

events.log after the reset/sync

2021-03-18 06:54:59.415 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:serial_zstick:57dc600655' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Controller is offline
2021-03-18 06:54:59.498 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:57dc600655:node4' changed from OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller to OFFLINE (BRIDGE_OFFLINE): Controller is offline
2021-03-18 06:54:59.499 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:57dc600655:node5' changed from OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller to OFFLINE (BRIDGE_OFFLINE): Controller is offline
2021-03-18 06:54:59.500 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:57dc600655:node9' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE): Controller is offline
2021-03-18 06:54:59.501 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:57dc600655:node3' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE): Controller is offline
2021-03-18 06:54:59.501 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:57dc600655:node2' changed from OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller to OFFLINE (BRIDGE_OFFLINE): Controller is offline
2021-03-18 06:54:59.556 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:57dc600655:node10' changed from OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller to OFFLINE (BRIDGE_OFFLINE): Controller is offline
2021-03-18 06:54:59.557 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:57dc600655:node8' changed from OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller to OFFLINE (BRIDGE_OFFLINE): Controller is offline
2021-03-18 06:54:59.558 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:57dc600655:node7' changed from OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller to OFFLINE (BRIDGE_OFFLINE): Controller is offline
2021-03-18 06:54:59.560 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:57dc600655:node6' changed from OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller to OFFLINE (BRIDGE_OFFLINE): Controller is offline
2021-03-18 06:55:00.914 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:serial_zstick:57dc600655' changed from OFFLINE (COMMUNICATION_ERROR): Controller is offline to OFFLINE (BRIDGE_OFFLINE): Controller is offline
2021-03-18 06:55:38.974 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:serial_zstick:57dc600655' changed from OFFLINE (BRIDGE_OFFLINE): Controller is offline to ONLINE
2021-03-18 06:55:38.982 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:57dc600655:node9' changed from OFFLINE (BRIDGE_OFFLINE): Controller is offline to ONLINE
2021-03-18 06:55:38.984 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:57dc600655:node5' changed from OFFLINE (BRIDGE_OFFLINE): Controller is offline to ONLINE
2021-03-18 06:55:38.985 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:57dc600655:node3' changed from OFFLINE (BRIDGE_OFFLINE): Controller is offline to ONLINE
2021-03-18 06:55:38.987 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:57dc600655:node2' changed from OFFLINE (BRIDGE_OFFLINE): Controller is offline to ONLINE
2021-03-18 06:55:38.988 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:57dc600655:node4' changed from OFFLINE (BRIDGE_OFFLINE): Controller is offline to ONLINE
2021-03-18 06:55:40.482 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:57dc600655:node3' changed from ONLINE to ONLINE: Node initialising: REQUEST_NIF
2021-03-18 06:55:40.770 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:57dc600655:node10' changed from OFFLINE (BRIDGE_OFFLINE): Controller is offline to ONLINE
2021-03-18 06:55:40.770 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:57dc600655:node6' changed from OFFLINE (BRIDGE_OFFLINE): Controller is offline to ONLINE
2021-03-18 06:55:40.774 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:57dc600655:node7' changed from OFFLINE (BRIDGE_OFFLINE): Controller is offline to ONLINE
2021-03-18 06:55:40.777 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:57dc600655:node8' changed from OFFLINE (BRIDGE_OFFLINE): Controller is offline to ONLINE
2021-03-18 06:55:41.334 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'SteckdoseKuechenlicht_Electricmetervolts' changed from 232.5 to 232.2
2021-03-18 06:55:41.941 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:57dc600655:node3' changed from ONLINE: Node initialising: REQUEST_NIF to ONLINE
2021-03-18 06:55:45.489 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'ArbeitszimmerThermostat_Sensortemperature' changed from 19.2 °C to 18.22 °C
2021-03-18 06:55:58.061 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'SchlafzimmerThermostat_Sensortemperature' changed from 17.26 °C to 17.22 °C

zwave.log extracts seem to be to large to post here, please use the link which shows as “conneting” in the post above.

Does the more complete log help debugging?

btw. node 1 is my controller,
node 3 is a main powered powermete /switch
node 9 a fibaro motion sensor
node 2 a comet-zwave thermostat
all other nodes 4,5,6,7,8,10 are develo thermostats

Do your logs show [DEBUG] messages? Those are what are needed to see what’s going on. There is also a tool that you can import your logs to for easier viewing - Z-Wave Log Viewer. That may make it easier for you to track down what is happening when everything goes offline.