[SOLVED] Out of memory: Kill process java, results in openhab being killed too

Max_G · November 2, 2018, 11:09am

I was wondering, what I could do to either identify the root cause, or band-aid via early detection and then restart the openhab service.

I am still running OH1, which after a few years stopped on Aug 28 and again on Oct 31, as a result of Java being seemingly out of memory.
Once an openHAB system has reached a certain size, such an event is quite dramatic.
I am running OH on a Raspberry Pi 2 Model B Rev 1.1

How do others prevent this from happening?

Aug 28 23:31:10 rpiautomation kernel: [3135518.505730] Out of memory: Kill process 808 (java) score 314 or sacrifice child
Aug 28 23:31:10 rpiautomation kernel: [3135518.506094] Killed process 808 (java) total-vm:528436kB, anon-rss:343516kB, file-rss:0kB, shmem-rss:0kB
Aug 28 23:31:10 rpiautomation kernel: [3135518.672431] oom_reaper: reaped process 808 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Aug 28 23:31:11 rpiautomation mosquitto[8975]: Client arduino_BME280_S already connected, closing old connection.
Aug 28 23:31:11 rpiautomation openhab.sh[794]: Killed

Oct 31 10:30:29 rpiautomation kernel: [8618261.705738] Out of memory: Kill process 2332 (java) score 323 or sacrifice child
Oct 31 10:30:29 rpiautomation kernel: [8618261.706389] Killed process 2332 (java) total-vm:576832kB, anon-rss:352104kB, file-rss:0kB, shmem-rss:0kB
Oct 31 10:30:29 rpiautomation kernel: [8618261.886083] oom_reaper: reaped process 2332 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Oct 31 10:30:29 rpiautomation openhab.sh[2315]: Killed
Oct 31 10:30:29 rpiautomation systemd[1]: openhab.service: Main process exited, code=exited, status=137/n/a
Oct 31 10:30:29 rpiautomation systemd[1]: openhab.service: Unit entered failed state.
Oct 31 10:30:29 rpiautomation systemd[1]: openhab.service: Failed with result 'exit-code'.

mstormi · November 2, 2018, 1:11pm

Try increasing available Java heap space, start java with options -Xms400m -Xmx512m,
the latter will double the default.
If that doesn’t help you likely have a bug in your code to leak memory that you need to find.

rlkoshak · November 2, 2018, 3:35pm

One way to help identify the memory leak is to systematically remove bindings one at a time and watching to see if memory continues to grow or not. Once the memory stabilizes you probably have found your culprit. If you make it through ALL your bindings then there may be something in the core that is leaking memory.

In either case, if the binding has a 2.x equivalent or the problem is in the core, I’m afraid upgrading to OH 2.x may be the only permanent solution. There is little to no further development being done on OH 1.x bindings that have 2.x replacements and there is no further development going on for OH 1.x core. So the memory leak will not get fixed.

As a short term mitigation, you can schedule the RPi to restart OH every night through a cron job. That will clear out the used memory and hopefully will make it run for longer periods without running out of memory. However, a restart of OH means some significant downtime while OH restarts which could be disruptive.

It is odd though that you are only seeing the memory growth just now. It may not be feasible, but have you made any changes in the 2-3 months before Aug 28 (e.g. installed a new binding)? That would be the first thing I’d look at.

Max_G · November 2, 2018, 11:28pm

Thank you both

Not every low memory event causes the oom_kill process to kill things.

@mstormi: while I can do basic admin on Linux, it is not clear to me where to add these parameters. E.g. is OH launching JAVA, or does it load on start-up of Linux. Probably the former? Or would I most likely look for a java.conf type of file to put these values in?

@rlkoshak: Yes, I reckon you remember my story that I built an OH2 rPi in August 2017, and stopped at the decision, which admin interface to use. It still sits on this webpage If I only had time
Where would I monitor the growth of memory other than what I do now (if required)?
It seems to take two month before this memory business kills OH… a bit long to disable a binding.
On that note, no bindings where added or exchanged since Jan 2017. [turned out later to be false: I had installed Grafana]
Yes, version control; I know how important it is; major changes I keep track of, minor I don’t.

I’ll probably go and change the java startup parameters (when I find where to put them); and then find the time to migrate to OH2.

[2018-12-07 edited: added text in italics]

mstormi · November 3, 2018, 7:33am

That depends on your setup. If you installed from packages, usually there’s /etc/default/openhab2 containing a line EXTRA_JAVA_OPTS. With OH1 I think it was /etc/default/openhab but that is so old I just don’t remember any more. Could have also been in /etc/init.d/openhab.

Seems I misunderstood your problem. Your Java does not kill itself but gets killed by the Linux kernel.
Never encountered that before and I didn’t even know what oom_kill is so I g**gled. So it is a kernel builtin/triggered procedure. I have never seen that before play a role in modern Raspbian setups.
I assume yours is pretty ancient as you also still run OH1 ?
Anyway, when it’s killing anything while you’re just at 50% mem usage something’s badly wrong (assuming your chart is right). I’d disable the OOM killer. See bottom of this link how to configure it.

Max_G · November 3, 2018, 9:39am

Thanks… I am aware of being on thin ice with OH1, which has progressed to v2 almost 2 years ago.

Disabling oom_kill is not a good idea, as the machine will simply hang/freeze, which is worse than having OH kicked out; I am sure I can test for the latter and auto restart.

mstormi · November 3, 2018, 9:49am

While it’s rather 4 years than 2 , that’s true.
But your current problems are not related to the OH version.

Seems you have a pretty strange setup, neither of this seems to happen on “today’s” Pis.
Have you considered moving to openHABian and install OH1 on top (for the time being) ?

Either way, consider adding a USB stick and move swap there as described here. Should give you some extra headroom.

Max_G · November 3, 2018, 10:04am

Hmm, four years already… means I am using OH1 for more than that (When I started there was no v2).

Agree to no it is not an OH problem; but thought someone may have an idea on how to fix it.

It’s a standard Stretch image on a Pi2; everything in its default location.

In any case, I won’t bother anyone further, and migrate to OH2 on a Pi3 booting from HDD.

Max_G · December 1, 2018, 11:09pm

For all it is worth; I did some further digging… and eventually disabled Grafana, which I installed a few months ago, but did not end up using (not wanting to spend the time with it on OHv1, having the idea of migrating to OHv2) and left it sitting there idling.

After stopping grafana, memory free stayed at the previously experienced 60% free.
I have now uninstalled grafana on the the OHv1 rPi… and all seems fine again.

…and thanks for your input and support.

mstormi · December 2, 2018, 12:02pm

Have you tried paging (swapping) to some external medium ?
See my link three posts up how to do that (forum says you didn’t click it).
I guess it would have helped a lot with a Grafana or other memory hog SW caused memory shortage, at least if that SW does need to have a large working set of memory in RAM. InfluxDB and Grafana typically only need (paged-in) RAM when you render or access graphical stats, but not for the rest of the time.

Max_G · December 7, 2018, 4:44am

Thanks… yes, it only uses the SD card to boot, all its files are on a SDD.
Problem is solved, by uninstalling grafana (which I shall explore in OH2).