OH4 runs out of memory

Hi @cmachtel,
Sorry for slow reply that thing called work got in the way and I just saw your post.
Yes, that command will create something that is often referred to as a “support zip” or “support bundle”.
should be located in your userdata directory It will likely contain 6 files 5 txt files with various info such as what was loaded version and all of those threads you saw using the other command. as well as a “heapdump.hpof” file. That you can open with a tool like Eclipse MAT referenced previously that will do an analysis of what the object were using all your memory space and point out “likely suspect”
It should tell you pretty much everything that was happening at the time you triggered it.
Also, assuming you launched openhab using the start_debug script and If you want a second heap dump after the console gets sluggish/unresponsive you can also trigger one from Eclipse MAT by selecting the related Openhab java process and allow it to also acquire “capture” a heap dump as well.

1 Like

Hi @justaoldman and all the others who helped me.
I managed to get the .zip containing the dump and i directly see that the threads grow from 331 without Shelly binding to 2572 after 30mn of Shelly binding installed without Things… 5mn later the system broke down.

I will now try to get used with the Eclipse mat. but if anybody is interested in helping, here is my last dump file: Dropbox - 2023-10-05_114925.zip - Simplify your life

Thank you

1 Like

Please see if the newest snapshot (3659 or above) resolves this OOM issue. There was a patch pushed to fix a bug upstream.

@morph166955 You mean a snapshot of the Shelly binding?
How can i update that?
I tried downloading here: GitHub - openhab/openhab-addons: Add-ons for openHAB
but it gives me not a Jar for installing in OH

No, im referring to OH 4.1 SNAPSHOT.

Looking at your heap dump - it doesn’t seem to have memory issues, its fairly small (300 megs). From thread list however I can see a lot of WebSocketClient instances, way too much for a regular launch. Looking at references it seems to be related to the lgthinq binding. I can’t see anything wrong in its code for now. I could be blind or wrong (I prefer later!).

Edit: Second suspect place is shelly and its Shelly2RpcSocket. Looking at its code - there are multiple calls to close which is grateful shutdown of connection.
@cmachtel Heap dump contains 261 instances of websocket sessions. Do you have so many shelly devices?

Cheers,
Łukasz

Hi @splatch ,
Thank you for taking time for me.
I have around 60 Shelly devices and 4LG devices. The LGBinding works well and if Shelly is not installed there is no issue and the thread numbers are constant around 330.

I currently upgrading to the latest snapshoot (What i would have preferred not to do on a daily used system) but lets see…
@markus7017 sorry for disturbing but you seems to be maintainer of Shelly binding, could you help me on this?

Bad news, using latest release 3659 the problem is exactly the same. Shelly generates a lot of threads and brings the system to collapse after 30mn.
Additionally, the Hue bridge also does not work out of the box…

I’m now trying to do a roll back to 4.0.3 as this one was at least stable and the other part of my home works (except shelly)

I have also done some tries with Eclipse Mem but as i’m not used to i could not find anything :frowning: :- :face_with_spiral_eyes:
Any idea what i can do next?

Which version of the shelly binding do yo have installed? dev or release version? You might want to use the dev version if not already installed.

Thank you @Oliver2 for suggestions, currently using release version.
After work and kids came in between, I finally worked on it today.
Sorry, but I did not manage to change Shelly binding version.
Followed this guide: https://github.com/markus7017/myfiles/blob/master/shelly/READMEbeta.md
I have tried everything, rebooting, clean, …with 4.0 and 4.1 snapshoot. I do not understand because I managed to install the LG binding I’m now using since 6 months like that without issues.
The version from marketplace seems to be 4.0.3
I did the test again, deleted all my Shelly things installed the binding and after 30mn same thing… a lot of threads and system freeze.
Am I the only one with this issue? What can I do? I have a new installed openhabian, why does it work for others?

@Oliver2 & @markus7017 Hello,
I tried something new. I installed a fresh Openhabian image, did the initial setup and installed nothing else than the Shelly Binding. Discovered 32 Things and added them all. Did nothing else, no items, no rules, nothing.
After 30mn, I have exactly the same issue. Thread size grows and grows.
This should be reproducible from any other one?.. It still gives me no way to solve it but maybe for you to find the problem ??
UPDATE:
one hour later: I did the same experience with OH 4.1.0.M1 and the result is exactly the same :frowning:
one more hour later same test with latest build (3664) and shelly binding 4.1.0.202310070405 and reaction is exactly the same

Maybe it helps to unconditionally stop the client and not only if it is started? It could also be “failed” or “starting”.

1 Like

@wborn
Would it help if i change logging level for more details?

hmm, usually the binding creates one rpc session (WebSocket connection) device and it staS open unless the connection breaks (device restart etc.)

@splatch
Could you guide me how to check the thread list and heap?

Could someone create a TRACE log?
(OH console: log:set TRACE org.openhab.binding.shelly)

@cmachtel Which type of devices are you using?
Maybe it’s related to a specific devices or tPe of device. Start to disable things.

  • If it’s related to Shelly2Rpx disable all Gen1 devices, they don’t use the web socket interface
  • then disable relays, sensor devices ise temporary inbound web socket connections, relays permanent outbound
  • maybe you could identify a single device type, but list them here anyways, also if including a a shelly addon

No worries, we get that working

I’ve created a PR to make sure WebSocketClients are always stopped:

I do not agree with this change. It’s not a solution to remove conditions and enforce a close. I want to understand the logic problem at least tommake sure that there will be no side effects. If checking the logical state fails at this point there must be a problem in an upper level

I could imagine that this caused by sensor devices creating an inbound web socket and the bindjng refuses to accept, because it could’t find a matching thing

I would have preferred a review by you before it got merged.

@markus7017 : I see you are there, I just reviewed your other PR fixing several bugs. I need your feedback to merge it.

I’m sorry, I did not follow this thread before merging, I wrongly considered the fixes trivial/obvious and didn’t await your approval, @markus7017

@wborn, @markus7017 - would you prefer reverting that commit before M2 build is started?