@bernhard and @jhardy08 you both seem to be able to reproduce the issue, right? Can you try with a significantly different setting “Refresh Interval” in the bridge? (default is 300) and see whether it has an impact on the number of opened TCP connections (or the time until it stops working)?
@jhardy08 which firmware version do you have on your VBUS-LAN adapter?
In history the system was running for several day and then lost the connection to Resol DL2 now since 7 days the system is running without the problem no duplicate connection are visible.
Now I changed the interval to 10 and restarted the system to see the problem are coming now in shorter interval.
I let you know about the result.
What I see with netstat I have another problem with open TCP ports to my SMA Modbus device (192.168.1.174) maybe the problem is related to the Modbus Binding.
I had already set the refresh time to 600 which is the maximum allowed. I will try 10 as @bernhard and see what happens.
My VBUS/LAN adapter is a DL3 datalogger with the latest firmware 2.2.4 (201909120705).
And my openhab is 2.5.10 running on a Raspberry Pi.
what is interesting: the addresses shown are IPv4 addresses, while the tcp connection indicator gives a tcp6 - do you have IPv6 enabled in your router/DHCP server? (I don’t have a dedicated idea, just asking random questions to try to find a trace…)
Yesterday openhab stopped working, I was not able anymore to to connect via ssh to my openhab 2.5.10 on Raspberry4.
So I have done a hard rested.
Today I checked with netsat and some TCP session are open
As standard on raspi TCP V6 is enabled, I have now disabled TCP V6 in /etc/sysctl.conf and restarted the system.
After restart I have some contection to Resol DL2 what shows as TCP connections
thanks for the details. Do you agree with me if I claim, that decreasing the “Refresh Interval” made it way worse?
About IPv6: does anybody here understand what tcp6 exactly means in the netstat output? I have here tcp and tcp6 connections to different IPv4 addresses…
@danielwippermann what is the DL3 behavior on several Vbus-TCP conenctions? My Vbus-LAN closes the first connection as soon as a new one is opened, while the DL3 here seems to support several in parallel. On the other side I have also seen situations (e. g. when killing the OH process and the vbus-java lib not properly closing the connection where I have had several ones hanging in state FIN_WAIT1.
Note: the resol binding opens a new connection every “Refresh Interval” if the current connection is not in state CONNECTED, in the 2.x branch without closing the previous one - which might be the culprit.
While debugging the binding I face the same issue with the VBUS/LAN-adapter to stop responding on new connections, and there even the HTTP-Web-Interface used to configure the IP address and password is not responding. This in my eyes is an issue the resol device which we cannot solve here. On the other side: I did not yet have this effect during normal operation. For opening too many new connections this seems an issue on the binding side.
This would be very interesting to see and merge. Do you want to share the code somewhere?
With the 10s refresh interval I have one connection at startup, 7 “established” connections three hours later and the binding is stuck with 10 “established” connections, and 14 “time_wait” connections five hours later.
With 600s refresh interval I have 6 “established” connections after 8 hours of operation.
So it looks indeed that decreasing the refresh interval makes it worse.
Is there a way to log the binding activity to pinpoint the moment where the connection duplicates ?
Thanks.
to increase the amount of logging info from the resol binding. But be careful it will add A LOT of details, but I guess the interesting part will be only there if you go to trace.
You can search the logs for Connection state changed to. - Would be interesting to see if you have a lot of them. - Do you have anything else connecting the DL3? In my VBUS/LAN world only one connection is working at a time and if it happens (e. g. during debugging) that I have two OH instances (the production and the test one) accessing the same VBUS/LAN they fight for the connection, yielding a lot of RECONNECT messages.
Thanks for the confirmation. Do you also see INTERRUPTED and RECONNECTING states?
I tried to debug a little in the cases where the issue of accumulation of open TCP connections happens also for me and I believe this is a bug in the used library. I filed an issue for it.
For this I would assume that the DL3 does not connect to its own tcp live data port (7053) and by this close the other connections, so it should not interfere.
Yesterday I updated on my Raspberry Pi 4 Openhab from 2.5 to 3.0. So far, after a few adjustments, everything works as before. Unfortunately I can’t get the resol binding to work. It is not shown to me as a possible binding. I put this file (org.openhab.binding.resol-2.3.0-SNAPSHOT-2018-03-11.jar) in the addons folder. However, nothing happens. What am I doing wrong? Maybe someone can help me.