OH3.1/3.2 out of memory error

You can press the pause button to disable the thing in the main UI. Should allow you to test if that makes a change whilst you wait to find time to open the wall.

1 Like

yeah butā€¦ these sound like typical zwave network problems, a chatty node, a zombie nodeā€¦ these usually manifest themselves as a slow zwave network, zwave devices grind to a halt or respond very slow butā€¦ not cause a memory leak
something else is buggered

Actually I didnā€™t had any issues at all with the zwave network. It was stable, responsive etc. Only the Qubino Mini dimmer doesnā€™t restore itā€™s connection to the controller after a power cut. Which is very annoying when you are still in the process of renovating parts of the house.

Actually this node was already disabled. Yesterday evening solved the issue.

Slowly the nice graph of @matt1 is starting to agree with you. Below the graph of last couple of days, including the moment when I updated the zwave network.

Did checked again a heap dump with the memory analyser
Outcome directs again towards the zwave.

Not sure what the most efficient way is to continue, focus on the zwave by disabling that binding first? Or is the result of this memory analysis not very useful and just start with the one having the least impact?

The numbers match what you saw before, so those donā€™t look like your growing element. Check again over time maybe.
There will always be something that is the biggest memory user.

That would be one very quick way to eliminate that possibility or not
Maybe for Matt or Rosskoā€¦ is there any way this is the nrjavaserial file leak bug? I think ? the zwave binding still uses it

Good point, the total memory used which is identified as ā€˜leakā€™ barely changed.
Will leave it running for now and work on reducing the system load by manual defining the persistence configuration.

Found an option to compare two heap dumps, did run that yesterday compared with today and the result was: No leak suspect was found.
Will check it again tomorrow.

Downside is it one of the key bindings in especially the light control and you need at least 24h to see some difference in the graph :sweat_smile:

On the other hand, there is still a lot of heap space available and it barely inclines. Maybe better to check in a couple of days again the heap size.
Just found out also the /etc/default/openhab file changed back to the original one and has overwritten the heap size values. So they are back to the original values again.

  Current heap size           154,060 kbytes
  Maximum heap size           316,800 kbytes
  Committed heap size         190,208 kbytes

I donā€™t fully understand the nrjavaserial issue but I believe it does not cause a serious memory leak, it looses track of if the port is locked and hence the serial port stops working when a binding closes and opens the port. Iā€™m guessing you would notice serial ports no longer working long before you notice the leaked memory on a graph. A lot of bindings use that for the serial port including zigbee, zwave and modbus etc.

A lot of people use zwave and if it is in that binding you have to ask why are others not reporting this? Is it only when you use a certain thing or feature? If your not wanting to take the whole binding down, then try changing the number of devices or the pausing all rules that use the binding. Does doing that change the slope of the leak? It may be painful but taking the whole binding down for 12 hours while you sleep may be the fastest way to solve this. As soon as we know the binding we can ask the maintainer for some clues and to look into it.

What binding or part of openhab is it in?
How to reproduce it?

We need to answer those questions.

Agree, and if the issue is clearly visible I will do it.
But for now, the system seems to be stabilized after 8-10 hours, below the graph of the last 48h.
Will leave the system run now for a couple of dayā€™s to see itā€™s behavior and then do a restart.

  Current heap size           141,376 kbytes
  Maximum heap size           316,800 kbytes
  Committed heap size         190,208 kbytes
``

Sounds good. I would imagine after that time itā€™s done pretty much everything itā€™s likely to do, in terms of first time rule run or processing device updates.

Here an update on the current situation.

Unfortunately there is still a slight slope in the used heap, but very small. You donā€™t see it in the nice graphs from @matt1 when only showing 2 days, but on 7 days it is clear.
Created several heap dumps and used the compare function between two snapshots, but no leak is detected.

Didnā€™t changed anything the last week, only started with creating a group to modify later the percistance strategy from default to store what is needed.

Next steps are:

  • change percistance
  • if no change, start with disabling bindings 1 by 1. But this will take some time, especially due to the very small slope.

1 Like

What are the number of running ā€˜threadsā€™ doing, are they also climbing?

Enabled this morning to monitor the number of threads from the Systeminfo binding, this is the result until now.
Seeing the very small slope inclining on heap perhaps the elapsed time is to short to say something about this?

Below the graphs of 3 days ago. Guess one could slowly start seeing a slight incline in number of threads.
Coming weekend no time to work on openhab, so will keep the system run w/o making any changes this weekend.

Small update from my side, unfortunately the issue isnā€™t solved yet. So I am following the advise of @matt1 to disable the bindings one by one and see the result.

Can exclude now:

  • custom binding for the Yamaha musiccast
  • modbus
  • solaredge
  • shelly
  • chromecast

To go:

  • zwave
  • RFlink
  • network
  • network UPS
  • Deconz
  • mqtt
  • systeminfo (hope it isnā€™t this one, hard to find the issue then :sweat_smile:)

I have being doing the same for the last two weeks :slightly_frowning_face: think my issue was homekit.

Actually it is not to bad to do. Of course nice is different, but with the great help here and the really helpful feature of showing the heap in a graph is it pretty quick to determine if there is a change.
With about 12h you can see already a bit the direction, with ~24h you know pretty sure if there is a change. And you see the small slope, I have more the 2 weeks before the free space is gone.

Edit: donā€™t have homekit, but thnx for the tip!

All bindings have been checked except systeminfo. No change in the inclining slope.
Will try once removing all bindings after the weekend, but this is quite strangeā€¦
can it be the issue is connected to using the API connected to Nodered or something else?
I only have 1 rule forwarding the notification, can also try to disable that one.

Then I am out of ideaā€™s, so if somebody had additional things to look at would be helpful.

Again an update on the current status. Tried now to delete all bindings and see the result.
Below two screenshots with the outcome, from 16/11 onward the system was removed with all bindings and restarted. 14-16/11 was with all bindings on.
Conclusion, there is a slight difference but not a lot. Will activate the bindings again and continue the search in other directions. Maybe the persistence services or NodeRed is causing issues.


For background also the monitoring of the number of threads, this is quite stable.

Are you graphing item state update per hour?
I have a Resol binding that goes out of memory. The binding increases the updates to 150,000 per hour. I wonder whether persistence can keep up.

I also have a mem leak with Homekit but you donā€™t run that.

Thnx for you reaction!
Use 2 ways of Persistence currently:

  • the build in strategy in RRD4j ā†’ by removing all bindings there where no updates of items anymore (disabled also all bridges, so there is no trying of things to poll for updates)
  • some dedicated items towards InfluxDB ā†’ reduced almost to no updates at all, only a few. Offloaded quite some work to the NAS which has the database anyway, so OH isnā€™t busy with this.

Now removed the RRD4j service, letā€™s see if this brings any reduction.
I am preparing also to move away from the default persistence configuration, unfortunately didnā€™t finished this work yet.