OH4 runs out of memory

Agree in general.

I’ve reverted my IPv6 disabling, as it makes my MQTT unhappy (it’s trying to bind to an IPv6 address for some reason, which I could probably also disable, but since it doesn’t fix the leak anyway not much point).

I think the key, which started this overall thread I think, is to make sure all exception handlers clean up the Web sockets, otherwise we get leaks. And I think this particular one is traceable to an exception that didn’t clean up the Web socket.

I have no real ability to build and test, and Java isn’t a language I’ve spent any time with. I could probably have a go at a pull request, but in reality I’d just be chucking stuff at a wall that someone who knows what they’re doing could do way better.

All of which is fine. I can put in a daily/overnight restart of openhab and it will probably work adequately for the interim period.

Did one of you test the most recent binding change which is about web socket closing?
This change was merged 4 days ago.

OK. Working on that. It triggers a dependency on Gson, so working through how to resolve that.

Resolved Gson. But now have an unresolved dependency on slf4j. It looks to me like slf4j should be always included already, so I’m suspecting I’m breaking my installation. I can’t find just org.slf4j.jar, only things like org.slf4j-api.jar.

I think I’m out of my depth here unfortunately. I’m not sure I want to just randomly pull in additional modules until it works.

I’ve also downloaded the entire snapshot build, and unpacked it. There’s no slf4j jar in there, so I’m dubious that it’s required. Can anyone provide a hint as to what I may be missing?

OK, looks like I can install an entire snapshot version using apt. That seems safer in terms of it working, less safe in terms of upgrading my whole installation. Ah well.

I’m completely out of my depth and amazed by your debug attempts. :slightly_smiling_face:

The only thing I can add is that there was a change with slf4j, although if you are on a later snapshot it was reported as fixed. openHAB 4.1 Milestone discussion - Setup, Configuration and Use / News & Important Changes - openHAB Community

I will wait to see what Paul’s results are as well but for me using the fresh snapshot install on a freshly built Debian 12 fully updated image with zulu17jdk installed via apt and install of Openhab snapshot version Build 3675 latest installed via apt added the shelly binding during initial setup console reports binding as version 4.1.0.202310130405 . I see no change in discovery working manually adding a new shelly thing works ok. Still unable to repro a leak no orphan socket…
On a separate note using Openhab 3.4 and java 11 discovery works fine. using open 3.4 and java 17(yes I know it was not a supported configuration) discovery does not find any device.
As far as I have been able do from a testing perspective.
Openhab 3.4 using recommended dependency’s normal build seems to work and discover shelly devices fine
any version of Openhab 4.x I have tested (using the exact set of instructions in the documentation) does not work with discovery on a freshly built clean load of Openhab (not a upgrade).
regardless of the underlying OS .
Saw same behavior on Linux as well as windows.
Same hardware for all tests.
Heap dump review does not indicate any anomaly’s
and 168 threads counted.
On a side note I really like the new 4.1 it is sweet being able to toggle the logging via the GUI.

For me, same behaviour with the snapshot build. I have openHAB 4.1.0 (build Build #3675).

The Shelly binding reports being 202310130405:

openhab> list -s |grep shelly
280 x Active x  80 x 4.1.0.202310130405     x org.openhab.binding.shelly

I’m seeing thread growth in bites of 8, and same symptom - associated with failed discovery and “WebSocket error” in the log at the time where the threads grow.

I think it makes sense to do some sort of WebSocket destroy as I noted in the above comment OH4 runs out of memory - #79 by PaulL1

Sorry guys, I was traveling and therefore couldn’t participate in debugging, but a great community made already progress

That’s a good finding and matches the symptoms with the Shelly Wall Display, which is not active as a thing, but causes the problem → because it sends mDNS discovery packets

I need to place the api.close() in a finally block so it gets called when request was processed, but also when it failed.

            try {
                ShellyApiInterface api = gen2 ? new Shelly2ApiRpc(name, config, httpClient)
                        : new Shelly1HttpApi(name, config, httpClient);
                api.initialize();
                profile = api.getDeviceProfile(thingType);
                logger.debug("{}: Shelly settings : {}", name, profile.settingsJson);
                deviceName = profile.name;
                model = profile.deviceType;
                mode = profile.mode;
                properties = ShellyBaseHandler.fillDeviceProperties(profile);
                logger.trace("{}: thingType={}, deviceType={}, mode={}, symbolic name={}", name, thingType,
                        profile.deviceType, mode.isEmpty() ? "<standard>" : mode, deviceName);

                // get thing type from device name
                thingUID = ShellyThingCreator.getThingUID(name, model, mode, false);
            } catch (ShellyApiException e) {
                ShellyApiResult result = e.getApiResult();
                if (result.isHttpAccessUnauthorized()) {
                    logger.info("{}: {}", name, messages.get("discovery.protected", address));

                    // create shellyunknown thing - will be changed during thing initialization with valid credentials
                    thingUID = ShellyThingCreator.getThingUID(name, model, mode, true);
                } else {
                    logger.debug("{}: {}", name, messages.get("discovery.failed", address, e.toString()));
                }
            } catch (IllegalArgumentException e) { // maybe some format description was buggy
                logger.debug("{}: Discovery failed!", name, e);
            } finally |
                api.close();
            }

and also this is a good catch

The binding is currently not supporting IPv6 handling. Therefore I need to check the address family on discovery requests and refuse IPv6

I’ll look into that at the weekend.

1 Like

That’s awesome thanks @markus7017. Let me know when it’s built through into snapshot for testing and I can try it out.

I try to work on this at the weekend

3 Likes

Please try this build, it’s based on the latest merge fixing one issue + 2 changes mentioned above, but doesn’t not include my pending PR changes. If the fix works,I’ll merge this with the latest changes. So use this build as a temporary trial (save the old jar, then remove from adding folder and check with bundle::list).
https://github.com/markus7017/myfiles/blob/master/shelly/org.openhab.binding.shelly-4.1.0-SNAPSHOToom.jar?raw=true

IPv6 issue should be fixed, but note that IPv6 is not supported by the binding

Thanks, I’ve installed and am testing.

Sometime in the last couple of days the thread count stopped increasing. Not associated with a config or install change, it just stopped. Which is very unusual. Which is another way of saying that a test may not be proof, if it can just stop increasing on its own without code change. Having said that, it also looks like the message “WebSocket error” also stopped occurring in the log. So presumably if I have that message without thread number change, then it works. I’ll see today if that message is still occurring.

For those who may also want to test, the process here is:

  1. Assuming you’re using apt to install - you want to change your sources.list to include unstable. In my case that’s in /etc/apt/sources.list.d/openhab.list, and the line looks like:
deb [signed-by=/usr/share/keyrings/openhab.gpg] https://openhab.jfrog.io/artifactory/openhab-linuxpkg unstable main
  1. Upgrade, which should give you the snapshot version

  2. Go to the openhab console: ssh -p 8101 openhab@localhost
    (password is habopen)

  3. Turn on trace logging for the binding

log:set TRACE org.openhab.binding.shelly
  1. Upgrade the binding to the version provided by Markus above
[openhab> bundle:list |grep Shelly
280 x Active x  80 x 4.1.0.202310211629     x openHAB Add-ons :: Bundles :: Shelly Binding Gen1+2
[openhab> bundle:update 280 https://github.com/markus7017/myfiles/blob/master/shelly/org.openhab.binding.shelly-4.1.0-SNAPSHOToom.jar?raw=true
  1. I always restart my openhab so I can start clean. sudo service openhab restart

  2. Then you need to get the process id of your openhab instance

ps -ef |grep openhab
  1. Use the process id to get a count of the threads that refer to “WebSocket”
while true; do date; sudo jstack 9868 | grep WebSocket | wc -l; sleep 60; done >> threadcount2.txt &  tail -f threadcount2.txt
  1. Check the openhab log to see if there are any “WebSocket error” messages turning up
fgrep "WebSocket error" /var/log/openhab/openhab.log

That’s about it.

dis you installed the new build?

I did. No errors nor additional threads so far. Would there be anything different in the log that would tell me I’m definitely running the right version?

that‘s the expected behavior :heart_eyes:
check bundle:list if shows you the build timestamp

OK, bundle timestamp: 4.1.0.202310211629

So it’s the new one. Would the changes have removed the WebSocket error, or just caused it to not leak when there was the error? I haven’t seen any discovery messages at all since the upgrade - which fixes it I guess, just not what I expected.

the 2 mentioned above and another one on disconnect handling, which was already merged

Hello @markus7017
I wanted to try it also to test the issue with the Wall Screen but installing the new Jar gives me thi> s error:

Error while starting bundle: file:/usr/share/openhab/addons/org.openhab.binding.shelly-4.1.0-SNAPSHOToom.jar

org.osgi.framework.BundleException: Could not resolve module: org.openhab.binding.shelly [305]
Unresolved requirement: Import-Package: com.google.gson; version=“[2.10.0,3.0.0)”
at org.eclipse.osgi.container.Module.start(Module.java:463) ~[org.eclipse.osgi-3.18.0.jar:?]
at org.eclipse.osgi.internal.framework.EquinoxBundle.start(EquinoxBundle.java:445) ~[org.eclipse.osgi-3.18.0.jar:?]
at org.apache.felix.fileinstall.internal.DirectoryWatcher.startBundle(DirectoryWatcher.java:1260) ~[?:?]
at org.apache.felix.fileinstall.internal.DirectoryWatcher.startBundles(DirectoryWatcher.java:1233) ~[?:?]
at org.apache.felix.fileinstall.internal.DirectoryWatcher.doProcess(DirectoryWatcher.java:520) ~[?:?]
at org.apache.felix.fileinstall.internal.DirectoryWatcher.process(DirectoryWatcher.java:365) ~[?:?]
at org.apache.felix.fileinstall.internal.DirectoryWatcher.run(DirectoryWatcher.java:316) ~[?:?]

Maybe because i’m on OH 4.0.3??

urg, gson is one of the core packages and shpuld be always there

4.0.3 should be fine

try

  • delete the jar, stop OH
  • openhab-cli clean-cache
  • start OH, wait 5min until cache is recreated
  • open OH console
  • enter „ feature:install oh-transport-coap“
  • copy jar to addins folder
  • wait a bit and check with bundle:list that binding is Active

I had that same problem. I resolved it by first upgrading my whole openhab to the latest snapshot release (changing my sources.list to include unstable), then upgrading to the new jar. I think it’s a newer version of gson required - so you have it, but not a new enough version.

I did manually upgrade gson, which you can do, but it then asked for a different jar to be upgraded as well, and I decided I’d be chasing my tail.