Anyone seen random item state change?

juelicher · July 27, 2023, 11:33am

Hi,

yesterday I observed a very strange issue and I am not sure what I am looking at. Maybe a bug somewhere, maybe a fluke or maybe everything worked as expected:

I have several VMs which I start manually or via openhab rules when needed. In addition I monitor these VMs with the network binding and send notifications when they are up or down:

Thing:

network:pingdevice:Network_Wireguard		[ hostname="vm1.domain.tld" ]

Item:

Switch  Network_Wireguard_Active        "Wireguard Server aktiv [MAP(network.map):%s]"          (gNetwork, gSemNetworkWireguard)            ["Status"]                  { channel="network:pingdevice:Network_Wireguard:online",        stateDescription="" [readOnly=true], listWidget="widget:Netzwerkgeraet - List widget" []  }

Yesterday at around 5 in the morning I got a notification, that this VM had been started, which should not have been the case. I checked the VMWare log files and the the log files of the VM and indeed, it was not active at this time.

The openhab event log showed, that the item changed its state to ON and after one minute, which is the interval of the network binding checking these things, back to off. So I can rule out a bug in the rule sending the notifications:

2023-07-26 05:16:31.773 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Network_Wireguard_Active' changed from OFF to ON
...
2023-07-26 05:17:34.833 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Network_Wireguard_Active' changed from ON to OFF

There was nothing strange in the openhab log files.

So I am really not sure what happened there. It might have been someone hacking into my WLAN, but I highly doubt that. The access points are showing no unknown device, I am using a very long and purely random password and it would be very unlikely, that someone hacking into my WLAN uses the IP of this VM, which is outside of my DHCP range. Beside that, wo hacks a WLAN at this time und uses it for less than a minute?

I also checke the arp cache of my openhab server, wich showed the IP with a MAC address of 00:00:00:00, which is as expected.

For the time beeing I changed the logging of the network binding to Debug and scan my network with nmap every minute,

I would be very interested to know, if someone else noticed such a behavior or has any ideas what might have happened. I have never seen this before.

I am using the openHAB 4 release version on a Debian 12 server (via apt install) and Azul Java 17 in the lates version.

Thanks for your help,
Juelicher

Wolfgang_S · July 27, 2023, 1:38pm

Did you expose any port to the internet ?
Others reporting about similar issues had opened their OH instance to the internet without securing the access. It then was easy for everyone to just click on buttons in the Main UI.

rlkoshak · July 27, 2023, 3:22pm

I would guess that there was a temporary hickup in your network and for that minute the OH machine could not in fact ping the VM. Given this Item, based on it’s name, is associated with Wireguard somehow, that hickup might also be related to something happening on the VPN itself (e.g. maybe an automated upgrade of the firmware on a device or the like).

Ping is really not reliable enough to treat a single set of missed packets to definitively indicate a machine is offline. If you want to continue to use OH for this purpose (there are much better tools for this than OH BTW) you’ll want to configure the network binding with more retries (it’s only 1 by default) and/or a longer timeout (five seconds by default), particularly for a device accessed through a VPN.

juelicher · July 27, 2023, 8:22pm

Thanks to both of you for the help!

No, the server is only reachable from my internal network. The only services reachable from the outside are openVPN and Wireguard, both on different Servers.

The strange thing is, that the VM was down and openHAB incorrectly reported it as on for one minute.

For network monitoring I am using check_mk. Maybe I should look into other solutions, but I never had the need. I am using the network binding to be able to use the state of the VMs in rules.

But I will change the number of retries, good advice! But all in all the network binding always was very reliable.

juelicher · August 14, 2023, 3:38pm

Two days ago (Saturday) I had the exact same behavior with an other IP address. This time I was better prepared , as I had set the log level of the network binding to DEBUG. In addition I am dumping the output of /proc/net/arp every few minutes.

I have the following thing:

network:pingdevice:Network_NAS [ hostname="nas.domain.tld", macAddress="AA:BB:CC:DD:EE:FF" ]

This hostname is resolved to the IP 192.168.10.115

With the following item definition

Switch  Network_NAS_Active  "NAS-I [MAP(network.map):%s]" (gNetwork, gSemNetworkNAS) ["Status"] { channel="network:pingdevice:Network_NAS:online", stateDescription="" [readOnly=true], listWidget="widget:Netzwerkgeraet - List widget" []  }

This NAS is usually switched off and was so on Saturday. Nonetheless in the events.log it is logged, that the item Network_NAS_Active changed to ON and after one minute to off again, as last time:

2023-08-12 10:32:24.821 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Network_NAS_Active' changed from OFF to ON
2023-08-12 10:33:27.902 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Network_NAS_Active' changed from ON to OFF

At the time the item state changed is logged, the IP is not logged in openhab.log.

Today I switched on the device and compared the logs. When it is on, the network binding logs the IP adress, e.g.:

2023-08-14 16:19:50.199 [DEBUG] [network.internal.utils.LatencyParser] - Parsing latency from input 64 bytes from 192.168.10.115: icmp_seq=1 ttl=64 time=0.319 ms

There are no such messages on Saturday. In addition the arp cache a few seconds before and 2 minutes after the item change was logged, shows the device with an MAC of “00:00:00:00:00:00”, which in my experience is the case, when the device is e.g. pinged but not reachable:

20230812-103201:192.168.10.115   0x1         0x0         00:00:00:00:00:00     *        enp1s0
20230812-103601:192.168.10.115   0x1         0x0         00:00:00:00:00:00     *        enp1s0

Therefore I am quite sure, that it is a bug somewhere in openHAB, but I can not say if it is in the network binding or within openHAB itself. Maybe it is possible to log, which binding triggered an item change?

Is there anything else I can do to debug the behavior and should I file a bug?

Thanks for your help,
Juelicher