I think this stack exchange discussion may provide more clues and be relevant to the behavior you are seeing and why some devices are resolving to ipv6 address since ipv6 when enabled resolution is tried first by default.(debian seems to have enabled it for both by default) at least in my debian 12 clean build that line in nsswitch.conf is this .hosts: files mdns4_minimal [NOTFOUND-return] dns mylocalhostname
and of course, this wikipedia also adds to the understanding of the behaviors being seen.
This link talks about unix socket waiting behaviors as well.
The main point here is that I didnāt deliberately enable IPv6, and had no expectation that my devices would communicate on IPv6. They all have IPv4 addresses. For whatever reason the discovery routines are finding them in IPv6, not noticing theyāre an existing device that we already know on IPv4, attempting to communicate with them on IPv6, failing, then leaking the socket.
We can probably agree that itād be better in my case if it wasnāt trying to use IPv6. I can control that myself. But for other people that may be beyond their experience or what they want to do. And ultimately, irrespective of the cause, when the discovery fails for whatever reason it should tidy up after itself.
Interestingly, I turned off IPv6 on my Raspberry Pi. And Iām still getting the leak, but for a different reason:
2023-10-14 16:43:31.071 [DEBUG] [discovery.ShellyDiscoveryParticipant] - shellyplus1-083af2029730: Shelly device discovered: IP-Adress=[fe80:0:0:0:a3a:f2ff:fe02:9730], type=shellyplus1
2023-10-14 16:43:31.086 [DEBUG] [g.shelly.internal.api2.Shelly2ApiRpc] - shellyplus1-083af2029730: Connect Rpc Socket (discovery = true)
2023-10-14 16:43:31.087 [DEBUG] [helly.internal.api2.Shelly2RpcSocket] - shellyplus1-083af2029730: Connect WebSocket, URI=ws://[fe80:0:0:0:a3a:f2ff:fe02:9730]/rpc
2023-10-14 16:43:31.103 [TRACE] [shelly.internal.api.ShellyHttpClient] - shellyplus1-083af2029730: HTTP POST for http://[fe80:0:0:0:a3a:f2ff:fe02:9730]/rpc {"id":787696111,"src":"shellyplus1-083af2029730","method
":"Shelly.GetConfig"}
Accept-Encoding: gzip
User-Agent: Jetty/9.4.50.v20221201
Content-Type: application/json; charset=UTF-8
2023-10-14 16:43:31.104 [DEBUG] [g.shelly.internal.api2.Shelly2ApiRpc] - shellyplus1-083af2029730: WebSocket error
2023-10-14 16:43:31.105 [TRACE] [shelly.internal.api.ShellyHttpClient] - shellyplus1-083af2029730: API call returned exception
org.openhab.binding.shelly.internal.api.ShellyApiException: java.util.concurrent.ExecutionException: java.nio.channels.UnsupportedAddressTypeException
at org.openhab.binding.shelly.internal.api.ShellyHttpClient.innerRequest(ShellyHttpClient.java:213) ~[?:?]
at org.openhab.binding.shelly.internal.api.ShellyHttpClient.httpPost(ShellyHttpClient.java:144) ~[?:?]
at org.openhab.binding.shelly.internal.api2.Shelly2ApiRpc.rpcPost(Shelly2ApiRpc.java:1209) ~[?:?]
at org.openhab.binding.shelly.internal.api2.Shelly2ApiRpc.apiRequest(Shelly2ApiRpc.java:1153) ~[?:?]
at org.openhab.binding.shelly.internal.api2.Shelly2ApiRpc.getDeviceProfile(Shelly2ApiRpc.java:169) ~[?:?]
at org.openhab.binding.shelly.internal.discovery.ShellyDiscoveryParticipant.createResult(ShellyDiscoveryParticipant.java:149) ~[?:?]
at org.openhab.core.config.discovery.mdns.internal.MDNSDiscoveryService.createDiscoveryResult(MDNSDiscoveryService.java:222) ~[?:?]
at org.openhab.core.config.discovery.mdns.internal.MDNSDiscoveryService.considerService(MDNSDiscoveryService.java:214) ~[?:?]
at org.openhab.core.config.discovery.mdns.internal.MDNSDiscoveryService.serviceResolved(MDNSDiscoveryService.java:207) ~[?:?]
at javax.jmdns.impl.ListenerStatus$ServiceListenerStatus.serviceResolved(ListenerStatus.java:106) ~[?:?]
at javax.jmdns.impl.JmDNSImpl$1.run(JmDNSImpl.java:911) ~[?:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
at java.lang.Thread.run(Thread.java:833) ~[?:?]
Caused by: java.util.concurrent.ExecutionException: java.nio.channels.UnsupportedAddressTypeException
at org.eclipse.jetty.client.util.FutureResponseListener.getResult(FutureResponseListener.java:118) ~[?:?]
at org.eclipse.jetty.client.util.FutureResponseListener.get(FutureResponseListener.java:101) ~[?:?]
at org.eclipse.jetty.client.HttpRequest.send(HttpRequest.java:732) ~[?:?]
at org.openhab.binding.shelly.internal.api.ShellyHttpClient.innerRequest(ShellyHttpClient.java:180) ~[?:?]
... 15 more
Caused by: java.nio.channels.UnsupportedAddressTypeException
at sun.nio.ch.Net.checkAddress(Net.java:161) ~[?:?]
at sun.nio.ch.SocketChannelImpl.checkRemote(SocketChannelImpl.java:816) ~[?:?]
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:839) ~[?:?]
at org.eclipse.jetty.client.AbstractConnectorHttpClientTransport.connect(AbstractConnectorHttpClientTransport.java:100) ~[?:?]
at org.eclipse.jetty.client.HttpClient$1.connect(HttpClient.java:665) ~[?:?]
at org.eclipse.jetty.client.HttpClient$1.succeeded(HttpClient.java:642) ~[?:?]
at org.eclipse.jetty.client.HttpClient$1.succeeded(HttpClient.java:634) ~[?:?]
at org.eclipse.jetty.util.SocketAddressResolver$Async.lambda$resolve$1(SocketAddressResolver.java:186) ~[?:?]
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883) ~[?:?]
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034) ~[?:?]
... 1 more
2023-10-14 16:43:31.121 [DEBUG] [discovery.ShellyDiscoveryParticipant] - shellyplus1-083af2029730: Device discovery of device with address [fe80:0:0:0:a3a:f2ff:fe02:9730] failed: java.nio.channels.UnsupportedAddressTypeException(class java.nio.channels.UnsupportedAddressTypeException)
Itās still finding the device on IPv6, but now the network stack gives an exception and refuses to attempt to connect. Which appears to still be leaking the WebSocket.
Yes, I agree the bindings behavior has changed. Without knowing what changed (if any thing) in your specific environment we can only guess as to why it suddenly started leaving web sockets open. I have been trying to repro this condition since the original post from the other user and reviewing his heap dump and other data also as an additional data point his issues also seemed to point back to discovery related problems.
I do not use shelly devices in my production home automation solution I use ZigBee and Zwave.
That was why I previously mentioned that discovery did not work for me in 4.x version but worked fine in 3.4x version. Since I had never had shelly devices in my environment, I found it even more interesting that it auto discovered fine on 3.4x and not at all on any 4.0 version using same hardware again another data point to look at since the transition from 3.4x to 4.0x included an application change as well as a java version change.
However, the issue was interesting enough to me to go out on amazon and purchase 2 shelly devices and add them to my test environment just to see if I could repro the problem. I still suspect this is a larger issue then just shelly devices. But until I could repro that I cannot say for sure if that is true. The point I am making here is the timed wait on the sockets is a known issue that occurs in other situations and can have more then 1 cause as the last link I pointed out discusses network packet drops such as what a weak Wi-Fi induces with random packet drops or network saturation interference from other wireless transmitters and so on. Those scenarios are all possible causes of web sockets hanging around for extended periods of time. Improving the error handling on this binding is definitely needed and the binding maintainer has a few updated versions in the pipeline based on what I read so perhaps this web socket clean-up is another one that will get implemented.
Iāve reverted my IPv6 disabling, as it makes my MQTT unhappy (itās trying to bind to an IPv6 address for some reason, which I could probably also disable, but since it doesnāt fix the leak anyway not much point).
I think the key, which started this overall thread I think, is to make sure all exception handlers clean up the Web sockets, otherwise we get leaks. And I think this particular one is traceable to an exception that didnāt clean up the Web socket.
I have no real ability to build and test, and Java isnāt a language Iāve spent any time with. I could probably have a go at a pull request, but in reality Iād just be chucking stuff at a wall that someone who knows what theyāre doing could do way better.
All of which is fine. I can put in a daily/overnight restart of openhab and it will probably work adequately for the interim period.
OK. Working on that. It triggers a dependency on Gson, so working through how to resolve that.
Resolved Gson. But now have an unresolved dependency on slf4j. It looks to me like slf4j should be always included already, so Iām suspecting Iām breaking my installation. I canāt find just org.slf4j.jar, only things like org.slf4j-api.jar.
I think Iām out of my depth here unfortunately. Iām not sure I want to just randomly pull in additional modules until it works.
Iāve also downloaded the entire snapshot build, and unpacked it. Thereās no slf4j jar in there, so Iām dubious that itās required. Can anyone provide a hint as to what I may be missing?
OK, looks like I can install an entire snapshot version using apt. That seems safer in terms of it working, less safe in terms of upgrading my whole installation. Ah well.
I will wait to see what Paulās results are as well but for me using the fresh snapshot install on a freshly built Debian 12 fully updated image with zulu17jdk installed via apt and install of Openhab snapshot version Build 3675 latest installed via apt added the shelly binding during initial setup console reports binding as version 4.1.0.202310130405 . I see no change in discovery working manually adding a new shelly thing works ok. Still unable to repro a leak no orphan socketā¦
On a separate note using Openhab 3.4 and java 11 discovery works fine. using open 3.4 and java 17(yes I know it was not a supported configuration) discovery does not find any device.
As far as I have been able do from a testing perspective.
Openhab 3.4 using recommended dependencyās normal build seems to work and discover shelly devices fine
any version of Openhab 4.x I have tested (using the exact set of instructions in the documentation) does not work with discovery on a freshly built clean load of Openhab (not a upgrade).
regardless of the underlying OS .
Saw same behavior on Linux as well as windows.
Same hardware for all tests.
Heap dump review does not indicate any anomalyās
and 168 threads counted.
On a side note I really like the new 4.1 it is sweet being able to toggle the logging via the GUI.
For me, same behaviour with the snapshot build. I have openHAB 4.1.0 (build Build #3675).
The Shelly binding reports being 202310130405:
openhab> list -s |grep shelly
280 x Active x 80 x 4.1.0.202310130405 x org.openhab.binding.shelly
Iām seeing thread growth in bites of 8, and same symptom - associated with failed discovery and āWebSocket errorā in the log at the time where the threads grow.
Sorry guys, I was traveling and therefore couldnāt participate in debugging, but a great community made already progress
Thatās a good finding and matches the symptoms with the Shelly Wall Display, which is not active as a thing, but causes the problem ā because it sends mDNS discovery packets
I need to place the api.close() in a finally block so it gets called when request was processed, but also when it failed.
try {
ShellyApiInterface api = gen2 ? new Shelly2ApiRpc(name, config, httpClient)
: new Shelly1HttpApi(name, config, httpClient);
api.initialize();
profile = api.getDeviceProfile(thingType);
logger.debug("{}: Shelly settings : {}", name, profile.settingsJson);
deviceName = profile.name;
model = profile.deviceType;
mode = profile.mode;
properties = ShellyBaseHandler.fillDeviceProperties(profile);
logger.trace("{}: thingType={}, deviceType={}, mode={}, symbolic name={}", name, thingType,
profile.deviceType, mode.isEmpty() ? "<standard>" : mode, deviceName);
// get thing type from device name
thingUID = ShellyThingCreator.getThingUID(name, model, mode, false);
} catch (ShellyApiException e) {
ShellyApiResult result = e.getApiResult();
if (result.isHttpAccessUnauthorized()) {
logger.info("{}: {}", name, messages.get("discovery.protected", address));
// create shellyunknown thing - will be changed during thing initialization with valid credentials
thingUID = ShellyThingCreator.getThingUID(name, model, mode, true);
} else {
logger.debug("{}: {}", name, messages.get("discovery.failed", address, e.toString()));
}
} catch (IllegalArgumentException e) { // maybe some format description was buggy
logger.debug("{}: Discovery failed!", name, e);
} finally |
api.close();
}
and also this is a good catch
The binding is currently not supporting IPv6 handling. Therefore I need to check the address family on discovery requests and refuse IPv6
Sometime in the last couple of days the thread count stopped increasing. Not associated with a config or install change, it just stopped. Which is very unusual. Which is another way of saying that a test may not be proof, if it can just stop increasing on its own without code change. Having said that, it also looks like the message āWebSocket errorā also stopped occurring in the log. So presumably if I have that message without thread number change, then it works. Iāll see today if that message is still occurring.
For those who may also want to test, the process here is:
Assuming youāre using apt to install - you want to change your sources.list to include unstable. In my case thatās in /etc/apt/sources.list.d/openhab.list, and the line looks like:
deb [signed-by=/usr/share/keyrings/openhab.gpg] https://openhab.jfrog.io/artifactory/openhab-linuxpkg unstable main
Upgrade, which should give you the snapshot version
Go to the openhab console: ssh -p 8101 openhab@localhost
(password is habopen)
Turn on trace logging for the binding
log:set TRACE org.openhab.binding.shelly
Upgrade the binding to the version provided by Markus above
[openhab> bundle:list |grep Shelly
280 x Active x 80 x 4.1.0.202310211629 x openHAB Add-ons :: Bundles :: Shelly Binding Gen1+2
[openhab> bundle:update 280 https://github.com/markus7017/myfiles/blob/master/shelly/org.openhab.binding.shelly-4.1.0-SNAPSHOToom.jar?raw=true
I always restart my openhab so I can start clean. sudo service openhab restart
Then you need to get the process id of your openhab instance
ps -ef |grep openhab
Use the process id to get a count of the threads that refer to āWebSocketā
I did. No errors nor additional threads so far. Would there be anything different in the log that would tell me Iām definitely running the right version?
So itās the new one. Would the changes have removed the WebSocket error, or just caused it to not leak when there was the error? I havenāt seen any discovery messages at all since the upgrade - which fixes it I guess, just not what I expected.