[bug] OpenHAB3 M5 memory leak

cinadr · December 13, 2020, 12:10pm

Hi!

I have 3-4 GB memory usage after 1 or 2 days in docker under CentOS and zwave2mqtt reaction time is growing, it is usually around 10-30 secs. How to dig deeper with this problem? After restart memory starts at 800-900K. CPU usage inside docker container is 200 %.
I have 2 rules running one operating a zwave aeotec nano dimmer based on motion detection and another cycling my cameras with zoneminder binding each 10 seconds (this is a scripted rule). I’ve read on forum that someone had issues with IPCamera binding as of slow garbage collection not releasing images from heap and using high CPU slowdowns and poor garbage collection. How can I check that with Karaf console?

mstormi · December 13, 2020, 1:47pm

Mind the posting rules, stay on topic and don’t invade others’ threads, please. I moved your post.

cinadr · December 13, 2020, 2:40pm

I thought it is connected to Milestone 5 and that is why I posted there. Sorry for that.

cinadr · December 13, 2020, 2:43pm

After running for two hours it has a tendency:

Andrew_Rowe · December 13, 2020, 3:16pm

First question is… was the a brand new install or an upgrade?
If it is an upgrade, what version and did you not see this behavior in previous version?
You’ve been here almost a year so I’m guessing you have had OpenHAB running in some form or another for awhile.

If this issue is something that only began with M5, then it was relevant but no worries, let’s try to figure it out (starting a new thread will help getting the right people’s attention)

Obviously this appears to be a test set up since not a lot of stuff running. Memory consumsion issues are notoriously difficult to run down, especially if it is a live system being used in production. First step is to figure out if something really is wrong or if what you are seeing is… you know… normal.
Does memory usage continue to increase? Or does it level off and the system is operating normally. The comment about 200% CPU usage indicates an overloaded system. What is the hardware platform, I see a docker set up running on CentOS, is beefy system?
Anyhow, the first method of troubleshooting memory issues is to unload each binding one at a time and see if the problem ceases. Most the time this is a pain but with only two bindings, shouldn’t be as difficult. Zwave is generally mature and rock solid, zoneminder is my first guess. This brings to mind that zoneminder has more then one binding (I think, don’t use) version of binding would be helpful.
If it is a brand new install or test system, you may what to just wipe it out and start over, at least this will insure the problem is easily reproducible.
good luck!

Bruce_Osborne · December 13, 2020, 4:07pm

It could very well be related to an addon such as zwave2mqtt.

Did the developer test it for OH3 or was this just the output of the generic conversion script? I know zwave needed more work after conversion.

cinadr · December 13, 2020, 4:50pm

This is a docker installation with redefining all my things through WebUI from scratch started from M3-M4-M5-build 2066. I have these bindings installed: ambientweather1400ip, astro, chromecast, darksky, evohome, mqtt, ntp, remoteopenhab, samsungtv, unifi, zoneminder. I use zwave2mqtt and zigbee2mqtt to controll my automation devices. I have several items linked with most of these bindings already. Right now I’m in a transition from version 2 to 3. I’m trying to be as “codeless” as possible with openHAB3.

Yes, you are right. I have a “production” version of 2.5.10 running “natively” on Windows Server 2019 Datacenter Core. And sure the problematic instance is a test setup which runs on this server inside a Hyper-V VM of CentOS 7.6 within docker 19.03.14 (API: 1.40). My Hardware specs are: Intel(R) Core™ i7-4785T CPU @ 2.20GHz and 32GB RAM. (Runing several VM’s as Serviio media server, torrenting, etc). So it is an all-in-one solution “home lab”.

My plan then is to backup my current docker volumes and unload bindings and let the system run one day for each change and see what cause the problem. I’ll report back as it something turns out.

Thank you for your help!

One last question: do I need to uninstall or just disable binding in karaf/ui? I think the binding has no memory need while in disabled state…

Andrew_Rowe · December 13, 2020, 5:25pm

good question… ahhh… IDK for sure
maybe someone brighter then me will pop in and answer for sure but my comment is if all the bindings are installed thru the UI, reinstalling them should be easy, I would probably just uninstall them to be sure
… which brings up… a restart might be needed here and there as well
that is quite a list of bindings
edit: ~~someone sometime recently had something that showed all the running threads which provided clues as to what was hogging the system. Maybe search the tern ‘threads’ or something~~ nevermind, it was a Mac (thread) probably no help to you
Several of those binding you listed use internet access, could a long running thread (or several) be buggering things up. Currently OH3 can spawn numerous threads I think (see discussion about thread ‘pooling’ in OH3 recently for more details)

cinadr · December 13, 2020, 5:26pm

Blockquote
It could very well be related to an addon such as zwave2mqtt. Did the developer test it for OH3 or was this just the output of the generic conversion script? I know zwave needed more work after conversion.

It is a separate program that can be used through mqtt binding. (Just like zigbee2mqtt)
Please, see here.

Andrew_Rowe · December 13, 2020, 5:45pm

I know nothing about:
ambientweather1400ip, darksky, evohome, samsungtv, unifi, zoneminder

I used (or have used):
astro, chromecast, mqtt, ntp
astro, mqtt and ntp are all mature and used by bazillions of folks

chromecast… I would be immediately suspicious of

I’m guessing the two weather bindings are hitting the internet so…
Finally… remoteopenhab binding is brand new
I’m guessing you are federating both instances?
that is a brand new binding and I usually don’t ping people but @Lolodomo has been very eager to debug it so he might be able shed some light on this

Also lolodomo thank you for great binding that so many folks need to help bridge the gap to OH3

cinadr · December 13, 2020, 6:00pm

What might worth a mention is that I have a javascript rule (only this one). Can someone confirm that it does not contain any errors? I’ve got experience of a beginner only with javascript. Here’s the code:

var int cam = 0
if ((ZMServer_VideoMonitorId.state === UNDEF) || (ZMServer_VideoMonitorId.state === NULL) || (ZMServer_VideoMonitorId.state == "4")) {
  cam = 1
} else {
  cam = Integer::parseInt(ZMServer_VideoMonitorId.state.toString) + 1     
}
var String id = String::format("%d", cam)
ZMServer_VideoMonitorId.sendCommand(id)

It basically rotates four zoneminder cameras running this every 10 seconds. I disabled this as it has my suspicions. But I have to keep it disabled for at least a day to confirm this. It is based on zoneminder help page examples so the logic should not cause the binding to fail.

Bruce_Osborne · December 13, 2020, 6:14pm

Since you have Datacenter, I am surprised you are not running OH in a Hyper-V Linux VM. My production one is in a Debian Hyper-V VM.

cinadr · December 13, 2020, 8:40pm

This setup was generated by a problem with z-wave and zigbee usb sticks initially. I could not get working Windows USB sharing over network (no working set2net server or com2net signed driver, etc). And Hyper-V has no USB passthrough (planned to move to VMWare but I had no time to migrate everything. Now I have Zwave2MQTT and Zigbee2MQTT running on windows natively I can move openHAB to docker in Hyper-V. That is the plan right now.

Bruce_Osborne · December 13, 2020, 8:43pm

I am using a Z-Wave stick with the Windows driver on the Datacenter server. The resulting COM port is then bridged to my Hyper-V VM. Probably not ideal, but it works.

cinadr · December 14, 2020, 5:31am

Can you share some tips/links how to do that? Thank you, I would appreciate it…

Bruce_Osborne · December 14, 2020, 10:57am

My son owns the server. We worked on it some together but he manages it.

cinadr · December 20, 2020, 10:41am

Hi!

After testing for 6 days I think this may be a zoneminder binding issue. If I enable my rule to cycle 4 cameras every 10 seconds memory is accumulated and reaches 2-3 GB in a day. If I stop the rule the memory increase stops as well. CPU usage peaks only after reaching 2.2 GB memory allocation. I assume this is related to java garbage collection settings but I could not test this. If I restart OH3 with disabled rule memory usage stays at 1 GB and no CPU power consumed. Disabling other bindings or rules it has no effect on memory/CPU usage.

@mhilbush: Can you look into it? Do you need any more data with this? Where to file a bug report? I’m willing to test this further. Thank you.

J-N-K · December 20, 2020, 11:01am

Just to be sure: can you unlink the item from the channel and let the rule run? If memory usage stays low, it‘s the binding, otherwise it‘s rule engine or something like that.

mhilbush · December 20, 2020, 12:39pm

I’ve responded to you on the Zoneminder thread, where you also posted about this issue.