OH4 runs out of memory

justaoldman · October 14, 2023, 3:49am

I think this stack exchange discussion may provide more clues and be relevant to the behavior you are seeing and why some devices are resolving to ipv6 address since ipv6 when enabled resolution is tried first by default.(debian seems to have enabled it for both by default) at least in my debian 12 clean build that line in nsswitch.conf is this .hosts: files mdns4_minimal [NOTFOUND-return] dns mylocalhostname
https://unix.stackexchange.com/questions/586334/how-to-enable-mdns-for-ipv6-on-ubuntu-and-debian
and of course, this wikipedia also adds to the understanding of the behaviors being seen.

This link talks about unix socket waiting behaviors as well.

PaulL1 · October 14, 2023, 4:23am

All of which is true.

The main point here is that I didn’t deliberately enable IPv6, and had no expectation that my devices would communicate on IPv6. They all have IPv4 addresses. For whatever reason the discovery routines are finding them in IPv6, not noticing they’re an existing device that we already know on IPv4, attempting to communicate with them on IPv6, failing, then leaking the socket.

We can probably agree that it’d be better in my case if it wasn’t trying to use IPv6. I can control that myself. But for other people that may be beyond their experience or what they want to do. And ultimately, irrespective of the cause, when the discovery fails for whatever reason it should tidy up after itself.

Interestingly, I turned off IPv6 on my Raspberry Pi. And I’m still getting the leak, but for a different reason:

2023-10-14 16:43:31.071 [DEBUG] [discovery.ShellyDiscoveryParticipant] - shellyplus1-083af2029730: Shelly device discovered: IP-Adress=[fe80:0:0:0:a3a:f2ff:fe02:9730], type=shellyplus1
2023-10-14 16:43:31.086 [DEBUG] [g.shelly.internal.api2.Shelly2ApiRpc] - shellyplus1-083af2029730: Connect Rpc Socket (discovery = true)
2023-10-14 16:43:31.087 [DEBUG] [helly.internal.api2.Shelly2RpcSocket] - shellyplus1-083af2029730: Connect WebSocket, URI=ws://[fe80:0:0:0:a3a:f2ff:fe02:9730]/rpc
2023-10-14 16:43:31.103 [TRACE] [shelly.internal.api.ShellyHttpClient] - shellyplus1-083af2029730: HTTP POST for http://[fe80:0:0:0:a3a:f2ff:fe02:9730]/rpc {"id":787696111,"src":"shellyplus1-083af2029730","method
":"Shelly.GetConfig"}
Accept-Encoding: gzip
User-Agent: Jetty/9.4.50.v20221201
Content-Type: application/json; charset=UTF-8


2023-10-14 16:43:31.104 [DEBUG] [g.shelly.internal.api2.Shelly2ApiRpc] - shellyplus1-083af2029730: WebSocket error
2023-10-14 16:43:31.105 [TRACE] [shelly.internal.api.ShellyHttpClient] - shellyplus1-083af2029730: API call returned exception
org.openhab.binding.shelly.internal.api.ShellyApiException: java.util.concurrent.ExecutionException: java.nio.channels.UnsupportedAddressTypeException
        at org.openhab.binding.shelly.internal.api.ShellyHttpClient.innerRequest(ShellyHttpClient.java:213) ~[?:?]
        at org.openhab.binding.shelly.internal.api.ShellyHttpClient.httpPost(ShellyHttpClient.java:144) ~[?:?]
        at org.openhab.binding.shelly.internal.api2.Shelly2ApiRpc.rpcPost(Shelly2ApiRpc.java:1209) ~[?:?]
        at org.openhab.binding.shelly.internal.api2.Shelly2ApiRpc.apiRequest(Shelly2ApiRpc.java:1153) ~[?:?]
        at org.openhab.binding.shelly.internal.api2.Shelly2ApiRpc.getDeviceProfile(Shelly2ApiRpc.java:169) ~[?:?]
        at org.openhab.binding.shelly.internal.discovery.ShellyDiscoveryParticipant.createResult(ShellyDiscoveryParticipant.java:149) ~[?:?]
        at org.openhab.core.config.discovery.mdns.internal.MDNSDiscoveryService.createDiscoveryResult(MDNSDiscoveryService.java:222) ~[?:?]
        at org.openhab.core.config.discovery.mdns.internal.MDNSDiscoveryService.considerService(MDNSDiscoveryService.java:214) ~[?:?]
        at org.openhab.core.config.discovery.mdns.internal.MDNSDiscoveryService.serviceResolved(MDNSDiscoveryService.java:207) ~[?:?]
        at javax.jmdns.impl.ListenerStatus$ServiceListenerStatus.serviceResolved(ListenerStatus.java:106) ~[?:?]
        at javax.jmdns.impl.JmDNSImpl$1.run(JmDNSImpl.java:911) ~[?:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
        at java.lang.Thread.run(Thread.java:833) ~[?:?]
Caused by: java.util.concurrent.ExecutionException: java.nio.channels.UnsupportedAddressTypeException
        at org.eclipse.jetty.client.util.FutureResponseListener.getResult(FutureResponseListener.java:118) ~[?:?]
        at org.eclipse.jetty.client.util.FutureResponseListener.get(FutureResponseListener.java:101) ~[?:?]
        at org.eclipse.jetty.client.HttpRequest.send(HttpRequest.java:732) ~[?:?]
        at org.openhab.binding.shelly.internal.api.ShellyHttpClient.innerRequest(ShellyHttpClient.java:180) ~[?:?]
        ... 15 more
Caused by: java.nio.channels.UnsupportedAddressTypeException
        at sun.nio.ch.Net.checkAddress(Net.java:161) ~[?:?]
        at sun.nio.ch.SocketChannelImpl.checkRemote(SocketChannelImpl.java:816) ~[?:?]
        at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:839) ~[?:?]
        at org.eclipse.jetty.client.AbstractConnectorHttpClientTransport.connect(AbstractConnectorHttpClientTransport.java:100) ~[?:?]
        at org.eclipse.jetty.client.HttpClient$1.connect(HttpClient.java:665) ~[?:?]
        at org.eclipse.jetty.client.HttpClient$1.succeeded(HttpClient.java:642) ~[?:?]
        at org.eclipse.jetty.client.HttpClient$1.succeeded(HttpClient.java:634) ~[?:?]
        at org.eclipse.jetty.util.SocketAddressResolver$Async.lambda$resolve$1(SocketAddressResolver.java:186) ~[?:?]
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883) ~[?:?]
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034) ~[?:?]
        ... 1 more
2023-10-14 16:43:31.121 [DEBUG] [discovery.ShellyDiscoveryParticipant] - shellyplus1-083af2029730: Device discovery of device  with address [fe80:0:0:0:a3a:f2ff:fe02:9730] failed: java.nio.channels.UnsupportedAddressTypeException(class java.nio.channels.UnsupportedAddressTypeException)

It’s still finding the device on IPv6, but now the network stack gives an exception and refuses to attempt to connect. Which appears to still be leaking the WebSocket.

justaoldman · October 14, 2023, 4:44am

Yes, I agree the bindings behavior has changed. Without knowing what changed (if any thing) in your specific environment we can only guess as to why it suddenly started leaving web sockets open. I have been trying to repro this condition since the original post from the other user and reviewing his heap dump and other data also as an additional data point his issues also seemed to point back to discovery related problems.
I do not use shelly devices in my production home automation solution I use ZigBee and Zwave.
That was why I previously mentioned that discovery did not work for me in 4.x version but worked fine in 3.4x version. Since I had never had shelly devices in my environment, I found it even more interesting that it auto discovered fine on 3.4x and not at all on any 4.0 version using same hardware again another data point to look at since the transition from 3.4x to 4.0x included an application change as well as a java version change.
However, the issue was interesting enough to me to go out on amazon and purchase 2 shelly devices and add them to my test environment just to see if I could repro the problem. I still suspect this is a larger issue then just shelly devices. But until I could repro that I cannot say for sure if that is true. The point I am making here is the timed wait on the sockets is a known issue that occurs in other situations and can have more then 1 cause as the last link I pointed out discusses network packet drops such as what a weak Wi-Fi induces with random packet drops or network saturation interference from other wireless transmitters and so on. Those scenarios are all possible causes of web sockets hanging around for extended periods of time. Improving the error handling on this binding is definitely needed and the binding maintainer has a few updated versions in the pipeline based on what I read so perhaps this web socket clean-up is another one that will get implemented.

PaulL1 · October 14, 2023, 5:02am

Agree in general.

I’ve reverted my IPv6 disabling, as it makes my MQTT unhappy (it’s trying to bind to an IPv6 address for some reason, which I could probably also disable, but since it doesn’t fix the leak anyway not much point).

I think the key, which started this overall thread I think, is to make sure all exception handlers clean up the Web sockets, otherwise we get leaks. And I think this particular one is traceable to an exception that didn’t clean up the Web socket.

I have no real ability to build and test, and Java isn’t a language I’ve spent any time with. I could probably have a go at a pull request, but in reality I’d just be chucking stuff at a wall that someone who knows what they’re doing could do way better.

All of which is fine. I can put in a daily/overnight restart of openhab and it will probably work adequately for the interim period.

Lolodomo · October 14, 2023, 8:47am

Did one of you test the most recent binding change which is about web socket closing?
This change was merged 4 days ago.
https://github.com/openhab/openhab-addons/pull/15721

PaulL1 · October 14, 2023, 8:19pm

OK. Working on that. It triggers a dependency on Gson, so working through how to resolve that.

Resolved Gson. But now have an unresolved dependency on slf4j. It looks to me like slf4j should be always included already, so I’m suspecting I’m breaking my installation. I can’t find just org.slf4j.jar, only things like org.slf4j-api.jar.

I think I’m out of my depth here unfortunately. I’m not sure I want to just randomly pull in additional modules until it works.

I’ve also downloaded the entire snapshot build, and unpacked it. There’s no slf4j jar in there, so I’m dubious that it’s required. Can anyone provide a hint as to what I may be missing?

OK, looks like I can install an entire snapshot version using apt. That seems safer in terms of it working, less safe in terms of upgrading my whole installation. Ah well.

apella12 · October 15, 2023, 12:29am

I’m completely out of my depth and amazed by your debug attempts.

The only thing I can add is that there was a change with slf4j, although if you are on a later snapshot it was reported as fixed. openHAB 4.1 Milestone discussion - Setup, Configuration and Use / News & Important Changes - openHAB Community

justaoldman · October 15, 2023, 2:02am

I will wait to see what Paul’s results are as well but for me using the fresh snapshot install on a freshly built Debian 12 fully updated image with zulu17jdk installed via apt and install of Openhab snapshot version Build 3675 latest installed via apt added the shelly binding during initial setup console reports binding as version 4.1.0.202310130405 . I see no change in discovery working manually adding a new shelly thing works ok. Still unable to repro a leak no orphan socket..
On a separate note using Openhab 3.4 and java 11 discovery works fine. using open 3.4 and java 17(yes I know it was not a supported configuration) discovery does not find any device.
As far as I have been able do from a testing perspective.
Openhab 3.4 using recommended dependency’s normal build seems to work and discover shelly devices fine
any version of Openhab 4.x I have tested (using the exact set of instructions in the documentation) does not work with discovery on a freshly built clean load of Openhab (not a upgrade).
regardless of the underlying OS .
Saw same behavior on Linux as well as windows.
Same hardware for all tests.
Heap dump review does not indicate any anomaly’s
and 168 threads counted.
On a side note I really like the new 4.1 it is sweet being able to toggle the logging via the GUI.

PaulL1 · October 15, 2023, 3:10am

For me, same behaviour with the snapshot build. I have openHAB 4.1.0 (build Build #3675).

The Shelly binding reports being 202310130405:

openhab> list -s |grep shelly
280 x Active x  80 x 4.1.0.202310130405     x org.openhab.binding.shelly

I’m seeing thread growth in bites of 8, and same symptom - associated with failed discovery and “WebSocket error” in the log at the time where the threads grow.

I think it makes sense to do some sort of WebSocket destroy as I noted in the above comment OH4 runs out of memory - #79 by PaulL1

markus7017 · October 16, 2023, 6:37pm

Sorry guys, I was traveling and therefore couldn’t participate in debugging, but a great community made already progress

That’s a good finding and matches the symptoms with the Shelly Wall Display, which is not active as a thing, but causes the problem → because it sends mDNS discovery packets

I need to place the api.close() in a finally block so it gets called when request was processed, but also when it failed.

            try {
                ShellyApiInterface api = gen2 ? new Shelly2ApiRpc(name, config, httpClient)
                        : new Shelly1HttpApi(name, config, httpClient);
                api.initialize();
                profile = api.getDeviceProfile(thingType);
                logger.debug("{}: Shelly settings : {}", name, profile.settingsJson);
                deviceName = profile.name;
                model = profile.deviceType;
                mode = profile.mode;
                properties = ShellyBaseHandler.fillDeviceProperties(profile);
                logger.trace("{}: thingType={}, deviceType={}, mode={}, symbolic name={}", name, thingType,
                        profile.deviceType, mode.isEmpty() ? "<standard>" : mode, deviceName);

                // get thing type from device name
                thingUID = ShellyThingCreator.getThingUID(name, model, mode, false);
            } catch (ShellyApiException e) {
                ShellyApiResult result = e.getApiResult();
                if (result.isHttpAccessUnauthorized()) {
                    logger.info("{}: {}", name, messages.get("discovery.protected", address));

                    // create shellyunknown thing - will be changed during thing initialization with valid credentials
                    thingUID = ShellyThingCreator.getThingUID(name, model, mode, true);
                } else {
                    logger.debug("{}: {}", name, messages.get("discovery.failed", address, e.toString()));
                }
            } catch (IllegalArgumentException e) { // maybe some format description was buggy
                logger.debug("{}: Discovery failed!", name, e);
            } finally |
                api.close();
            }

and also this is a good catch

PaulL1:

Interestingly, I turned off IPv6 on my Raspberry Pi. And I’m still getting the leak, but for a different reason:

2023-10-14 16:43:31.071 [DEBUG] [discovery.ShellyDiscoveryParticipant] - shellyplus1-083af2029730: Shelly device discovered: IP-Adress=[fe80:0:0:0:a3a:f2ff:fe02:9730], type=shellyplus1
2023-10-14 16:43:31.086 [DEBUG] [g.shelly.internal.api2.Shelly2ApiRpc] - shellyplus1-083af2029730: Connect Rpc Socket (discovery = true)
2023-10-14 16:43:31.087 [DEBUG] [helly.internal.api2.Shelly2RpcSocket] - shellyplus1-083af2029730: Connect WebSocket, URI=ws://[fe80:0:0:0:a3a:f2ff:fe02:9730]/rpc
2023-10-14 16:43:31.103 [TRACE] [shelly.internal.api.ShellyHttpClient] - shellyplus1-083af2029730: HTTP POST for http://[fe80:0:0:0:a3a:f2ff:fe02:9730]/rpc {"id":787696111,"src":"shellyplus1-083af2029730","method
":"Shelly.GetConfig"}
Accept-Encoding: gzip
User-Agent: Jetty/9.4.50.v20221201
Content-Type: application/json; charset=UTF-8


2023-10-14 16:43:31.104 [DEBUG] [g.shelly.internal.api2.Shelly2ApiRpc] - shellyplus1-083af2029730: WebSocket error
2023-10-14 16:43:31.105 [TRACE] [shelly.internal.api.ShellyHttpClient] - shellyplus1-083af2029730: API call returned exception
org.openhab.binding.shelly.internal.api.ShellyApiException: java.util.concurrent.ExecutionException: java.nio.channels.UnsupportedAddressTypeException
        at org.openhab.binding.shelly.internal.api.ShellyHttpClient.innerRequest(ShellyHttpClient.java:213) ~[?:?]
        at org.openhab.binding.shelly.internal.api.ShellyHttpClient.httpPost(ShellyHttpClient.java:144) ~[?:?]
        at org.openhab.binding.shelly.internal.api2.Shelly2ApiRpc.rpcPost(Shelly2ApiRpc.java:1209) ~[?:?]
        at org.openhab.binding.shelly.internal.api2.Shelly2ApiRpc.apiRequest(Shelly2ApiRpc.java:1153) ~[?:?]
        at org.openhab.binding.shelly.internal.api2.Shelly2ApiRpc.getDeviceProfile(Shelly2ApiRpc.java:169) ~[?:?]
        at org.openhab.binding.shelly.internal.discovery.ShellyDiscoveryPart

The binding is currently not supporting IPv6 handling. Therefore I need to check the address family on discovery requests and refuse IPv6

I’ll look into that at the weekend.

PaulL1 · October 16, 2023, 6:48pm

That’s awesome thanks @markus7017. Let me know when it’s built through into snapshot for testing and I can try it out.

markus7017 · October 20, 2023, 4:50pm

I try to work on this at the weekend

markus7017 · October 21, 2023, 4:33pm

Please try this build, it’s based on the latest merge fixing one issue + 2 changes mentioned above, but doesn’t not include my pending PR changes. If the fix works,I’ll merge this with the latest changes. So use this build as a temporary trial (save the old jar, then remove from adding folder and check with bundle::list).
https://github.com/markus7017/myfiles/blob/master/shelly/org.openhab.binding.shelly-4.1.0-SNAPSHOToom.jar?raw=true

markus7017 · October 21, 2023, 4:33pm

IPv6 issue should be fixed, but note that IPv6 is not supported by the binding

PaulL1 · October 21, 2023, 5:49pm

Thanks, I’ve installed and am testing.

Sometime in the last couple of days the thread count stopped increasing. Not associated with a config or install change, it just stopped. Which is very unusual. Which is another way of saying that a test may not be proof, if it can just stop increasing on its own without code change. Having said that, it also looks like the message “WebSocket error” also stopped occurring in the log. So presumably if I have that message without thread number change, then it works. I’ll see today if that message is still occurring.

For those who may also want to test, the process here is:

Assuming you’re using apt to install - you want to change your sources.list to include unstable. In my case that’s in /etc/apt/sources.list.d/openhab.list, and the line looks like:

deb [signed-by=/usr/share/keyrings/openhab.gpg] https://openhab.jfrog.io/artifactory/openhab-linuxpkg unstable main

Upgrade, which should give you the snapshot version
Go to the openhab console: ssh -p 8101 openhab@localhost
(password is habopen)
Turn on trace logging for the binding

log:set TRACE org.openhab.binding.shelly

Upgrade the binding to the version provided by Markus above

[openhab> bundle:list |grep Shelly
280 x Active x  80 x 4.1.0.202310211629     x openHAB Add-ons :: Bundles :: Shelly Binding Gen1+2
[openhab> bundle:update 280 https://github.com/markus7017/myfiles/blob/master/shelly/org.openhab.binding.shelly-4.1.0-SNAPSHOToom.jar?raw=true

I always restart my openhab so I can start clean. sudo service openhab restart
Then you need to get the process id of your openhab instance

ps -ef |grep openhab

Use the process id to get a count of the threads that refer to “WebSocket”

while true; do date; sudo jstack 9868 | grep WebSocket | wc -l; sleep 60; done >> threadcount2.txt &  tail -f threadcount2.txt

Check the openhab log to see if there are any “WebSocket error” messages turning up

fgrep "WebSocket error" /var/log/openhab/openhab.log

That’s about it.

markus7017 · October 21, 2023, 9:53pm

dis you installed the new build?

PaulL1 · October 21, 2023, 9:56pm

I did. No errors nor additional threads so far. Would there be anything different in the log that would tell me I’m definitely running the right version?

markus7017 · October 22, 2023, 6:46am

that‘s the expected behavior
check bundle:list if shows you the build timestamp

PaulL1 · October 22, 2023, 6:57am

OK, bundle timestamp: 4.1.0.202310211629

So it’s the new one. Would the changes have removed the WebSocket error, or just caused it to not leak when there was the error? I haven’t seen any discovery messages at all since the upgrade - which fixes it I guess, just not what I expected.

markus7017 · October 22, 2023, 7:11am

the 2 mentioned above and another one on disconnect handling, which was already merged