Items state going Crazy at the end of the VM snapshot

  • Platform information:
    • Hardware: QNAP NAS Running Virtual Station VM (VMS is 4 cores 2GB of RAM 50 GB of hdd)
    • OS: Ubuntu 16.04.5 LTS_
    • openHAB version: 2.3.0 -1 stable

Dear, I have a quite a strange issue with my environment. I run OH in VM on the large AMD Ryzen -8 core QNAP NAS. It runs nice. I have a backup task every day at 2:30 AM. Takes about 20 minutes to make a VM snapshot. Sometimes but not always at the end of the snapshot My OH goes completely crazy.
In one instance all items loose the state and go UNDEF or change to some default values, relays change the state. So I wake up in a house with half of the light are lit, and some important subsystems not running, like heating circulation pump In the event log I see it happens in one instant moment. From that moment OH continues to work normally, with sensor data flowing in, most of the items gets the data back. but some relay are left in the position they should not be.
What it could be, how to trouble shoot that issue further.

2018-12-06 02:49:18.136 [vent.ItemStateChangedEvent] - GF_Temperature_tk1_Bedroom changed from 25 to UNDEF
2018-12-06 02:49:18.136 [vent.ItemStateChangedEvent] - B_Temperature_tjm_Office changed from 22.81 to UNDEF
2018-12-06 02:49:18.137 [vent.ItemStateChangedEvent] - office_motion changed from CLOSED to UNDEF
2018-12-06 02:49:18.137 [vent.ItemStateChangedEvent] - GF_Temperature_t3d_Kitchen changed from 24 to UNDEF
2018-12-06 02:49:18.137 [vent.ItemStateChangedEvent] - B_Temperature_t83 changed from 25.31 to UNDEF
2018-12-06 02:49:18.137 [vent.ItemStateChangedEvent] - B_Humidity_un2 changed from 34.6 to UNDEF
2018-12-06 02:49:18.137 [vent.ItemStateChangedEvent] - G_Temperature_t2q_Gardenbox changed from 9.37 to UNDEF
2018-12-06 02:49:18.138 [vent.ItemStateChangedEvent] - boiler_outb6_heater_top changed from OFF to UNDEF
2018-12-06 02:49:18.138 [vent.ItemStateChangedEvent] - B_Temperature_tgS_heating_water changed from 27.43 to UNDEF
2018-12-06 02:49:18.138 [vent.ItemStateChangedEvent] - G_Temperature_t2q_Gardenbox changed from UNDEF to 9.37
2018-12-06 02:49:18.138 [vent.ItemStateChangedEvent] - GF_Temperature_tk1_Bedroom changed from UNDEF to 25
2018-12-06 02:49:18.138 [vent.ItemStateChangedEvent] - B_Temperature_tjm_Office changed from UNDEF to 22.81
2018-12-06 02:49:18.138 [vent.ItemStateChangedEvent] - Pool_in0 changed from OPEN to CLOSED
2018-12-06 02:49:18.139 [vent.ItemStateChangedEvent] - Pool_in0 changed from CLOSED to OPEN
2018-12-06 02:49:18.139 [vent.ItemStateChangedEvent] - Pool_in1 changed from OPEN to CLOSED
2018-12-06 02:49:18.139 [vent.ItemStateChangedEvent] - Pool_in1 changed from CLOSED to OPEN
2018-12-06 02:49:18.139 [vent.ItemStateChangedEvent] - Pool_in2 changed from OPEN to CLOSED
2018-12-06 02:49:18.139 [vent.ItemStateChangedEvent] - Pool_in2 changed from CLOSED to OPEN
2018-12-06 02:49:18.139 [vent.ItemStateChangedEvent] - Pool_in3 changed from OPEN to CLOSED
2018-12-06 02:49:18.140 [vent.ItemStateChangedEvent] - Pool_in3 changed from CLOSED to OPEN
2018-12-06 02:49:18.140 [ome.event.ItemCommandEvent] - Item 'B_office_rel5_wardrobe' received command ON
2018-12-06 02:49:18.140 [vent.ItemStateChangedEvent] - office_in0 changed from OPEN to CLOSED
2018-12-06 02:49:18.140 [vent.ItemStateChangedEvent] - B_office_rel5_wardrobe changed from OFF to ON
2018-12-06 02:49:18.140 [GroupItemStateChangedEvent] - gLights_B changed from OFF to ON through B_office_rel5_wardrobe
2018-12-06 02:49:18.141 [vent.ItemStateChangedEvent] - office_in0 changed from CLOSED to OPEN
2018-12-06 02:49:18.141 [vent.ItemStateChangedEvent] - office_in1 changed from OPEN to CLOSED
2018-12-06 02:49:18.141 [ome.event.ItemCommandEvent] - Item 'B_office_rel6_workshop' received command ON
2018-12-06 02:49:18.141 [vent.ItemStateChangedEvent] - B_office_rel6_workshop changed from OFF to ON
2018-12-06 02:49:18.142 [vent.ItemStateChangedEvent] - office_in1 changed from CLOSED to OPEN
2018-12-06 02:49:18.142 [vent.ItemStateChangedEvent] - office_in2 changed from OPEN to CLOSED
2018-12-06 02:49:18.142 [ome.event.ItemCommandEvent] - Item 'B_office_rel2_shower3' received command ON
2018-12-06 02:49:18.142 [vent.ItemStateChangedEvent] - office_in2 changed from CLOSED to OPEN
2018-12-06 02:49:18.142 [vent.ItemStateChangedEvent] - B_office_rel2_shower3 changed from OFF to ON
2018-12-06 02:49:18.142 [vent.ItemStateChangedEvent] - office_in3 changed from OPEN to CLOSED
2018-12-06 02:49:18.142 [ome.event.ItemCommandEvent] - Item 'B_office_rel1_shower2' received command ON
2018-12-06 02:49:18.143 [ome.event.ItemCommandEvent] - Item 'B_office_rel0_shower1' received command ON
2018-12-06 02:49:18.143 [vent.ItemStateChangedEvent] - B_office_rel1_shower2 changed from OFF to ON
2018-12-06 02:49:18.143 [vent.ItemStateChangedEvent] - B_office_rel0_shower1 changed from OFF to ON

Is there a description how the snapshot works. Maybe it freezes the VM (cut from ip-devices or services) for a specific time or slows it or confuses it about the date/time.

Likewise; I’d bet yo’ll find various bindings timing out devices

I’m also running OH as a VM on a QNAP TS-453 Pro, but I rarely use the snapshot feature.
Instead I rely on daily backup of openHAB’s directories (etc,usr and var)
However, from time to time, I create a manual snapshot of the VM, mainly before apt-get upgrade, for example.
Anyway, I didn’t notice the behavior you’re describing…
I suggest there is something to to with the way QNAP Virtualization Station takes the snapshots.

One suggestion I’d like to make is to pause the VM before taking the snapshot and resuming it afterwards.
I don’t know if the automatic snapshot procedure pauses the VM, but, if not, you might want to write a script that does that.
Or, write a complete backup script: pause VM, take snapshot, resume the VM.
Take a look at the virsh program. On my QNAP with QTS 4.3.4.0675 virsh is in /share/CACHEDEV2_DATA/.qpkg/QKVM/usr/bin/virsh.
Type help after you run virsh and you’ll see the commands supported.
virsh script rxample:

suspend --domain <UUID>
create-snapshot-as --domain <UUID> --name <snapshot_name>
resume --domain <UUID>

or something like that.
Maybe this approach will prevent OH going crazy :laughing:

There is very little information how the snapshots are done on the QNAP but most of the time it is seamless and I do not see any problems. I found curious thing in a logs. The crash is happening exactly in a window of those 2 events:

[OpenHAB] Completed backup task [Backup_VM] (to /VMs/OpenHAB_backup).
2018/12/06 02:50:12 - admin

[OpenHAB] Backup task [Backup_VM]: The maximum number of snapshots 32 has been reached and the snapshot Backup_VM has been deleted for taking the latest snapshot.
2018/12/06 02:48:35 - admin