Networkhealth is reporting all IPs as reachable

lonelydime · March 30, 2016, 11:47am

I was using OpenHAB 1.7 and 1.8 on Ubuntu 15 for a while and network health was working great at detecting what phones were on the router. I have static ips set up for each phone’s mac address and the following in my items file:

Switch PHONE1 “Phone1” (gMobiles) {nh=“192.168.1.20”}
Switch PHONE2 “Phone2” (gMobiles) {nh=“192.168.1.21”}
Switch PHONE3 “Phone3” (gMobiles) {nh=“192.168.1.23”}
Switch PHONE4 “Phone4” (gMobiles) {nh=“192.168.1.24”}

I switched to running OpenHAB on Windows 10 and all of a sudden network health is constantly reporting all phones are present when they’re not. I’m running Java 1.8.0_77. I added networkhealth debug to my logger and this is what pops up:

2016-03-30 07:27:33.254 [DEBUG] [o.o.b.n.i.NetworkHealthBinding] - established connection [host ‘192.168.1.24’ port ‘0’ timeout ‘5000’]
2016-03-30 07:27:36.254 [DEBUG] [o.o.b.n.i.NetworkHealthBinding] - established connection [host ‘192.168.1.23’ port ‘0’ timeout ‘5000’]
2016-03-30 07:27:36.805 [DEBUG] [o.o.b.n.i.NetworkHealthBinding] - established connection [host ‘192.168.1.21’ port ‘0’ timeout ‘5000’]
2016-03-30 07:27:36.945 [DEBUG] [o.o.b.n.i.NetworkHealthBinding] - established connection [host ‘192.168.1.20’ port ‘0’ timeout ‘5000’]

192.168.1.23 and 192.168.1.24 haven’t been assigned out by the router in a week and the problem only started once I moved to windows 10 so I’m wondering if using a different version of java is the culprit. Anyone have any ideas?

rlkoshak · March 30, 2016, 2:39pm

To eliminate Java as the cause just make sure you are running the latest Oracle Java and not OpenJDK (you are likely already doing this so it probably isn’t your problem).

What happens when you ping those IPs from the command line on the Windows 10 machine?

lonelydime · March 30, 2016, 3:10pm

Thanks for the reply. The Windows machine is running the latest Oracle Java, the Ubuntu install was running OpenJDK.

When I ping a device that is connected to the router I get a response back, if I ping one not on the router I get Destination host unreachable.

Pinging 192.168.1.23 with 32 bytes of data:
Reply from 192.168.1.x: Destination host unreachable.
Reply from 192.168.1.x: Destination host unreachable.
Reply from 192.168.1.x: Destination host unreachable.
Reply from 192.168.1.x: Destination host unreachable.

OpenHAB always reports back established connection though. If I specify a port to the devices, OpenHAB will say that the destination was unreachable.

I should also note that if I change the switches to OFF manually via the app or web server, the phones that are connected will come back on in the correct time frame, but 15 minutes or so later they’ll all flip back on. If I restart the server, they all turn ON. I am running persistence via MySQL, but I don’t think that is what is resetting them to ON

rlkoshak · March 30, 2016, 3:23pm

Post your Items and any relevant rules that touch those Items. Also post your NH config. This might be a bug.

Honestly I’ve not found NH to be reliable enough to rely upon for presence detection alone but my problem was false negatives (i.e. NH says a device isn’t there when it is) than false positives.

lonelydime · March 30, 2016, 3:54pm

That’s what is confusing me, false negatives I could understand, but I have no idea why it’s detecting the devices when they’re not there and that IP address doesn’t exist on the network. Thanks for your help.

Networkhealth was working fine for me before I switched the server OS, I use it in conjunction with other things like motion sensors for presence detection. I was testing out MQTT as well, but OwnTracks didn’t seem as reliable for whatever reason.

openhab.cfg

########################### NetworkHealth Binding #####################################
#
# Default timeout in milliseconds if none is specified in binding configuration
# (optional, default to 5000)
networkhealth:timeout=5000

# refresh interval in milliseconds (optional, default to 60000)
networkhealth:refresh=15000

# Cache the state for n minutes so only changes are posted (optional, defaults to 0 = disabled)
# Example: if period is 60, once per hour the online states are posted to the event bus;
#          changes are always and immediately posted to the event bus.
# The recommended value is 60 minutes.
networkhealth:cachePeriod=60

Items

Switch PHONE1 "Phone1" (gMobiles) {nh="192.168.1.20"} 
Switch PHONE2 "Phone2" (gMobiles) {nh="192.168.1.21"} 
Switch PHONE3 "Phone3" (gMobiles) {nh="192.168.1.23"} 
Switch PHONE4 "Phone4" (gMobiles) {nh="192.168.1.24"}

Rules

rule "Periodically check Presence"
when
    Time cron "0 */5 * * * ?"
then
    //If phone is on wifi
    if (PHONE1.state == ON) {
        someoneIsHome = true
        minutesGone = 0
    }
    else {
        var Number phone1GoneMinutes = PRESENCE_PHONE1_GONE.state
        phone1GoneMinutes = phone1GoneMinutes + 5
        PRESENCE_PHONE1_GONE.postUpdate(phone1GoneMinutes)
    }
    if (PHONE2.state == ON) {
        someoneIsHome = true
        minutesGone = 0
    }
    else {
        var Number phone2GoneMinutes= PRESENCE_PHONE2_GONE.state
        phone2GoneMinutes= phone2GoneMinutes+ 5
        PRESENCE_PHONE2_GONE.postUpdate(phone2GoneMinutes)
    }
end

rule "phone 1 connects"
when
	Item PHONE1 changed from OFF to ON
then
	var phone1AwayTime = PRESENCE_PHONE1_GONE.state

	PRESENCE_PHONE1_GONE.postUpdate(0)

	if (phone1AwayTime >= 45) {
		phone1ComingHome = true
	}
end

rule "phone 2 connects"
when
        Item PHONE2 changed from OFF to ON
then
        var phone2AwayTime = PRESENCE_PHONE2_GONE.state

        PRESENCE_PHONE2_GONE.postUpdate(0)

        if (phone2AwayTime >= 45) {
                  phone2ComingHome = true
        }
end

SiteMap

Switch item=PHONE1 label=“Phone1” mappings=[ON=“On”,OFF=“Off”]
Switch item=PHONE2 label=“Phone2” mappings=[ON=“On”,OFF=“Off”]
Switch item=PHONE3 label=“Phone3” mappings=[ON=“On”,OFF=“Off”]
Switch item=PHONE4 label=“Phone4” mappings=[ON=“On”,OFF=“Off”]

rlkoshak · March 30, 2016, 3:58pm

Perhaps the cache isn’t working right. Try setting the cachePeriod to 0 (or comment it out) to disable the cache. If that doesn’t work I’d suggest posting a Issue on github.

lonelydime · March 30, 2016, 4:03pm

I’ll give that a shot and report back, thanks!

lonelydime · March 30, 2016, 4:09pm

Well that was fast, I guess the cache is what gave it the appearance of working for a time period. Now when I reset the switches to OFF they all turn back ON once the refresh interval hits. Just in case I pinged the phone I know is on the network and one that is not.

Pinging 192.168.1.20 with 32 bytes of data:
Reply from 192.168.1.20: bytes=32 time=579ms TTL=64
Reply from 192.168.1.20: bytes=32 time=594ms TTL=64
Reply from 192.168.1.20: bytes=32 time=611ms TTL=64
Reply from 192.168.1.20: bytes=32 time=632ms TTL=64

Pinging 192.168.1.21 with 32 bytes of data:
Reply from 192.168.1.x: Destination host unreachable.
Reply from 192.168.1.x: Destination host unreachable.
Request timed out.
Reply from 192.168.1.x: Destination host unreachable.

.20 is PHONE1 and .21 is PHONE2, both are ON

Farhanito · March 31, 2016, 5:36pm

Experiencing the same behavior. OH2, Windows 10, no cache set.

lonelydime · March 31, 2016, 6:24pm

I started writing my own batch file to handle what I was doing with Network Health and I noticed something weird with how Windows handles pings. I’m not sure if it’s a Windows 10 difference or not because I didn’t do any research, but here’s something that may describe what we’re seeing:

Pinging x.x.x.x with 32 bytes of data:
Reply from y.y.y.y: Destination host unreachable.

Ping statistics for x.x.x.x:
Packets: Sent = 1, Received = 1, Lost = 0 (0% loss)

Note how even though the IP address x.x.x.x doesn’t exist as it’s unreachable, windows reports the ping as Received. Apparently it will mark received if the network received the ping command, not if the destination IP was available or not. Without digging around in Network Health’s source code, it may explain what’s happening.

Andrew89 · April 24, 2016, 12:48am

Any update on this one? My openHAB server is running on Win 10 also and reports all devices as connected or ‘ON’ even if they arent’t. If I disconnect a device from WiFi or Ethernet it will report as OFF once, but each subsequent check says ON, very annoying as the WiFi presence would be perfect to include in the conditions of many rules.

Ward · May 14, 2016, 5:56am

Seeing this on Windows 7 also

john · May 26, 2016, 3:26pm

any update to this? I’m experiencing the same thing on windows 10. Exact same config was working fine for me on raspbian, but switched to windows and now all networkhealth devices are always connected all with this in the log

Line 112703: 2016-05-26 17:02:38.825 [DEBUG] [o.o.b.n.i.NetworkHealthBinding] - established connection [host '192.168.1.50' port '0' timeout '5000']
Line 112710: 2016-05-26 17:02:41.825 [DEBUG] [o.o.b.n.i.NetworkHealthBinding] - established connection [host '192.168.1.50' port '0' timeout '5000']
Line 112721: 2016-05-26 17:02:43.325 [DEBUG] [o.o.b.n.i.NetworkHealthBinding] - established connection [host '192.168.1.22' port '0' timeout '5000']

some of those devices are phones which arent even here and my routers says they are offline and I can’t ping them in windows

Farhanito · May 27, 2016, 4:17am

same issues with Network-binding on OH2 beta3, Win10.
tried turning System Ping option ON/OFF, no difference.

iamjoshwilson · June 17, 2016, 7:58pm

Anyone figured out a work around?

Experiencing same issues on OH 1.8.2, Windows 10

proutska · July 6, 2016, 8:06pm

I temporarily solved this problem on Windows 10 and OH 1.8.2. When I changed the network location from public to home and disabling the firewall network health seemed to be working. Then I turned on the firewall and it stopped working. Then I turned off the firewall again but it doesn’t work anymore.

Casa75 · September 21, 2016, 6:56pm

I’m using windows 8.1
The problem is solved for me by changing the networkhealth timeout configuration less then 500ms in the openhab.conf file.
e.g. networkhealth:timeout=500
By doing a ping in windows with timeout 5000 the result is 0% loss, but with timeout 500 it is 100% loss.

Ping with timeout 5000ms:
C:>ping -w 5000 192.168.1.74

Pinging 192.168.1.74 with 32 bytes of data:
Reply from 192.168.1.19: Destination host unreachable.
Reply from 192.168.1.19: Destination host unreachable.
Reply from 192.168.1.19: Destination host unreachable.
Reply from 192.168.1.19: Destination host unreachable.

Ping statistics for 192.168.1.74:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss)

With 500ms timeout:
C:>ping -w 500 192.168.1.74

Pinging 192.168.1.74 with 32 bytes of data:
Request timed out.
Request timed out.
Request timed out.
Request timed out.

Ping statistics for 192.168.1.74:
Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),