OH3 and Insteon - Can't keep connection running

jacques · May 15, 2021, 3:11pm

Since migrating to OH3, I have not been able to have a stable connection to my Insteon Hub (2245-222). I’m wondering if anybody else is experiencing the same thing.

With the last version of OH2 I was running (2.5?), the connection between OH2 and the hub was very stable and reliable. I would go for months without touching the system. Now, I can only expect 2 to 3 days of service before I have to restart the hub.

The curious thing is that when this happens, I drop back to using the Insteon app on my iPhone. The app correctly tracks the status of the items and I can send commands and have the devices react correctly. However, if I use curl commands at the command line of the host RPi, I also don’t get a response back from the hub. The last time this happened, I issued similar commands from other computers on the home network and also did not see a response. If I stop and restart OH3, the log will show that the Insteon binding was not able to connect (login?) to the hub. Only way to fix it is to restart (unplug/plug) the hub and then restart OH3.

I use the default connection string “/hub2/my_user_name:my_password@192.168.1.100:25105,poll_time=1000”. I’m wondering if the poll_time of 1 second is too fast and if that should be changed to 2000 or even 3000? The DEBUG log for the binding shows that at some point, the binding is issuing commands and waiting for responses; instead of raising an error message, it just keeps waiting and re-queuing the messages indefinitely. I would think that at some point if the hub has not responded to any message for a certain period of time, the binding would go into an error state. As it is now, OH3/Insteon binding continues to operate, still accepting command and the binding keeps sending them out to the non responding hub.

This is my 3rd try using OH3. The first one was using the official docker image on a RPi4 but that proved problematic (problem with the hub and could not get SONOS to work). I then moved to using OH3 directly under Raspian OS on the same machine as was hosting the docker image (experiencer the hub issue). Finally I moved back to a dedicated RPi3 using the official Openhahian image. Other that the Insteon binding, the rest of the OH3 works as expected. I’m using the Insteon, Z-Wave, SONOS, Astro, HTTP, MQTT, Network, NTP, OpenGarage, OWM and Unifi bindings.

Again, other than migrating from OH2 to OH3, I did not change anything else in my home network. Although it is not impossible that the hub is defective, the fact that it keeps running with the official Insteon app tells me the problem is somewhere else.

Would appreciate any suggestion.

PS. Can we still downloads the 2.x version of OH? I might try going back to it to see if I can reproduce the problem. I just wish I could export the OH3 database to text files!!

ranielsen · May 16, 2021, 1:46am

If there are errors communicating to the hub, you should be seeing warning messages in openhab.log. What do you see there?

jacques · May 16, 2021, 6:00pm

There are no warning issued by the binding. The only “warning” in the log file (with DEBUG) are the ones that I generated as part of a debugging strategy. I don’t mind sharing the log file with you if you want to take a closer look. Let me know what the best way to do that would be.

//Jacques

ranielsen · May 16, 2021, 10:41pm

About all I can suggest is to set the log level for org.openhab.binding.insteon.internal.driver.hub to DEBUG and if you want more info set it to TRACE. The code that reads/writes to the hub you are using is at openhab-addons/HubIOStream.java at main · openhab/openhab-addons · GitHub.

BTW, there are only minimal differences between the 2.5 and 3.0 versions of the binding. My guess is that something is up with your hub.

jacques · May 17, 2021, 1:47pm

I had found that resource already. Might take me a while to decipher that code. I had already enabled DEBUG in the log for the Insteon binding. What I observed is that I get a last command:

2021-05-14 20:46:38.114 [DEBUG] [ding.insteon.internal.InsteonBinding] - got msg: IN:Cmd:0x50|fromAddress:3F.DE.61|toAddress:34.2D.60|messageFlags:0x20=ACK_OF_DIRECT:0:0|command1:0x00|command2:0xFF|

after which no new message is received from the hub and the log if full of entries that looks like this:

qe taken off direct: KeyPadButtonGroup(0:0:0) OUT:Cmd:0x62|toAddress:40.CF.A8|messageFlags:0x0F=DIRECT:3:3|command1:0x19|command2:0x01|
2021-05-14 20:46:53.624 [DEBUG] [nsteon.internal.device.InsteonDevice] - next request queue processed in 1498 msec, quiettime = 500
2021-05-14 20:46:53.625 [DEBUG] [binding.insteon.internal.driver.Port] - writing (500): OUT:Cmd:0x62|toAddress:40.CF.A8|messageFlags:0x0F=DIRECT:3:3|command1:0x19|command2:0x01|
2021-05-14 20:46:55.122 [DEBUG] [nsteon.internal.device.InsteonDevice] - still waiting for query reply from 40.CF.A8 for another 4502 usec
2021-05-14 20:46:57.121 [DEBUG] [nsteon.internal.device.InsteonDevice] - still waiting for query reply from 40.CF.A8 for another 2502 usec
2021-05-14 20:46:59.122 [DEBUG] [nsteon.internal.device.InsteonDevice] - still waiting for query reply from 40.CF.A8 for another 502 usec
2021-05-14 20:47:01.121 [DEBUG] [nsteon.internal.device.InsteonDevice] - gave up waiting for query reply from device 40.CF.A8

Now that I have turned DEBUG for the binding, I am hoping to catch the future instances of the issue and review the output. What is the significance of the “qe taken off direct”?

As I mentioned previously, even assuming that my hub is not working at 100%, should the binding not raise a WARNING message at some point?

ranielsen · May 17, 2021, 2:21pm

The method that sends the message to the hub is HubIOStream.getURL. If you have DEBUG logging enabled, you should see messages that start with getting, see line 280:

logger.debug("getting {}", url);

My suspicion is that the binding is getting a connection to the hub, but is hanging on getting a response back. Currently, there is a connection timeout of 30 seconds, but read timeout is not set. If it is not set, I think it defaults to no timeout.

jacques · May 17, 2021, 2:40pm

Yes, I see that message. I came up in the log file for the last time just before the lines I pasted in my previous message. Note, the debug file is full of those messages exactly 1 seconds apart which matches the poll_time=1000 in the connection string.

2021-05-14 20:46:39.090 [DEBUG] [teon.internal.driver.hub.HubIOStream] - getting http://192.168.1.182:25105/buffstatus.xml

This is the strange part - the message stopped showing up in the file for the next 45 minutes that the log file was captured. I get plenty of the “still waiting” messages but the binding is not issuing a new “getting buffer” command.

Is the read timeout a parameter that can set via the binding control in the OH3?

ranielsen · May 17, 2021, 4:02pm

I created a pull request to add a read timeout.

ranielsen · May 18, 2021, 12:50pm

@jacques, pull request has been approved and is in the latest 3.1.0-SNAPSHOT build. This should solve the hanging problem when read doesn’t return. Can you try it out and let me know what you see?

jacques · May 26, 2021, 8:20pm

Hey Rob. Yes, I will give it a try (sorry for the late response).

jacques · May 27, 2021, 1:54am

Ok. I’m up and running with 3.1.0 Build #2402. I have been using the modified Binding for 5 days now and no issues - still going strong. Just thought I would let you know.

Update: 18 June - still running strong; HUB has been up since May 26 with no interruption.

Thorne4511 · May 29, 2021, 6:30am

If you have access to another Apple device, you should receive a notification requesting To do this, press and hold the set button on your new insteon sensor until its LED begins blinking. Check that the WiFi network is up and running.

MySainsburys