openHAB crashing with out of memory exception (Sonos, 1.7.1)

guessed · December 7, 2015, 3:52pm

It’s interesting to see what the Sonos Binding is reaching out to on your Network during it’s UPnP discovery phase.

I was installing something on Saturday, and it required me to run a special tool on my Mac. In the middle of this, I noticed a bunch of errors in my openHAB logs… all pointing to my Mac. It looked like the UPnP Discovery process used by the Sonos Binding was reaching out to a few more devices than I’d planned.

These problems were only visible because I’m running this in my logback.xml (to debug something else):

<logger name="org.apache.http" level="TRACE" />

These calls were occurring every minute, so perhaps there’s something on the Network that’s triggering a bug in the Sonos code.

I haven’t looked, but there may also be a way to disable the UPnP discovery, and instead rely upon the manually configured list.

Marty56 · December 8, 2015, 8:16pm

Confirmed. Also the new binding is not working.
Have to switch back to my php workaround.
a bit disappointing that the openhab poster child binding is not working.

steve1 · December 8, 2015, 8:21pm

The binding does discover all UPnP devices on the LAN. For example it discovers my Synology NAS, Fire TV, and so on. However, from what I see in the code it should be ignoring those devices once they are discovered. It’s not clear to me at this point if the discovery aspect of UPnP is the cause of the problems or just some periodic task in the binding that transitions to a bad state and starts generating numerous tasks.

guessed · December 9, 2015, 8:19am

@steve1,
My logic here was that, given the bogus calls it was making to my Mac, it’s possible it’s making similar calls to something else that causes it to mis-behave in some manner.

In my case, it seems benign, it’s making a call to myMac/upnp, which wasn’t part of the SSDP advertisement. If you’re seeing leakages over time, then it might pay to enable the extra logging, and see if there’s a correlation between the machine becoming overwhelmed, and what’s being called at/near that point in time.

Anyhow, the extra logs would give you the data to say either way, since it goes on silently otherwise.

steve1 · December 9, 2015, 11:33am

I have a memory dump of the JVM process when the issue is occurring. I used it to verify that the Sonos binding was the culprit. I also had a (very large) log file for the jupnp-based binding that showed the command that was overflowing the task queue but I deleted it since it was about to fill up my server disk. I should have noted the command but at that point I was convinced I wasn’t going to be running this binding any more or investing more time trying to fix it given it will be soon replaced by the OH2 version. I’ve also seen some indications that it’s not always the same command.

bernd_boehm · January 6, 2016, 5:16am

In the meantime an upgrade of the Java VM to the most recent 1.8 fixed this issue (for now)

steve1 · January 6, 2016, 11:09am

Can you give more details? I don’t know how a JVM upgrade can fix logic problems in the code.

DieterL · January 6, 2016, 11:38am

Even if I can’t really think off it as a solution, it might be possible…

As long as we don’t know the root cause (why the threads were created, or have i missed something) it could also be a problem within the network stack of the jvm.

I still did not get the problem with my installation but I’ve had a (strange) problem last week: I didn’t get status updates from the Sonos to my item, especially the Play/Pause switch. Interestingly for me the switch itself worked and could control the device, only if I changed the status manually on the device or via Sonos App, there were no events in OH. But after rebooting the server (due to kernel update) everything works again.

Regards
Dieter

steve1 · January 6, 2016, 12:32pm

Sure, it’s not impossible, but I think it’s very unlikely. In any case, it would be good to know the context of this claim (operating system, specific version of JVM, which JVM bug fix is believed to fix the bindign problem, etc.). On the other hand, it wouldn’t hurt to upgrade the JVM and see if openHAB continues crashing or not. I’m also wondering about the “for now” qualification. What behavior was he seeing before (maybe it was a different issue, like yours) and how did he verify it’s been fixed? It sometimes took 3-5 days before the Sonos binding crashed the server.

bernd_boehm · January 7, 2016, 9:20am

So my 2 cents about this: The problem is apparently a memory leak in the uPnP library. Since JVM 1.8 there were some changes in the garbage collection and the general memory management. It may be that these changes can compensate for the memory hole.
My weak statement: “for now” only means: I used the binding previously without any problems - even though it had the bug ever since. Then at some point I got the OoM issues - and I can’t tell the reason. ->

Now I upgraded the JVM from 1.7 to 1.8_65 and the issue didn’t occur for 4 days now. This is a lot better than before where I could only run OH for about 12h without restart. Even more - I could not use the binding at all. Even if it didn’t crash yet - the sonos remained silent.

The specific setup is: Raspi 2 on Raspbian (most recent kernel, can’t check the exact version currently) OH 1.7.1, JVM 1.8_65 (Oracle)

steve1 · January 7, 2016, 11:36am

Thanks for extra information. The problem I analyzed was not a memory leak (where a “leak” is memory usage that should have been reclaimed, but wasn’t). The OOM condition occurred because 5000+ active threads were created to handle thousands of mostly identical tasks being created by the uPnP library in a short period of time. Each of these threads consumes native memory for the thread stack and this causes the crash. The native memory usage is not subject to garbage collection and since the threads were active, even the native memory could not be released through other memory management. The native memory usage is why we saw that the Java heaps were not excessively large at the time of the OOM condition.

My setup is RPi 2 (Raspbian/wheezy) , Java 8, OH 1.7.1 + 1.8.0 patches for several bindings.

As described earlier in this thread, I also have a modified version of the Sonos binding that uses the openhab fork of the jupnp UPnP library. However, the excessive task creation issue is still there. It just fills up the disk with log messages instead of running out of memory (because the task queue is bounded and writes to the log when it overflows).

At this point, based on the log file information about the type of tasks being initiated (mostly “SendingRenewal” tasks), it appears that the issue is in the UPnP library and not the binding layer. Unfortunately, if this is true it probably still exists in the ESH Sonos binding. Since the OH1 Sonos binding is no longer actively maintained, I’m planning to do some testing with the ESH Sonos implementation to determine if the problem is solved there or not.

bernd_boehm · January 9, 2016, 5:29am

Update: I guess it was wishful thinking… the OoM exception is back again. So you were right steve, my issue is the same. Myriads of threads…

steve1 · January 9, 2016, 1:24pm

@bernd_boehm I’m sorry to hear that. I was hoping for the best since I’d like to use this binding. I’m still planning to try it with OH2/ESH but I don’t know any reason why the UPnP layer will behave differently given I tried the same UPnP layer with the OH1 binding and had similar problems.

Zvika · July 14, 2016, 3:08pm

Hi,

My Sonos is on its way…
I assume this problem is still relevant? (at least, the Wiki says so)

If so, maybe it’s worthy to consider fixing it in OH1, since it seems that it will take time until OH2 will really replace OH1.

I’ve seen a possible workaround, mentioned at https://community.openhab.org/t/sonos-tts-and-say/7661 -

@Lolodomo

On my side, I continue using the Sonos Vera plugin through the MiOS binding. And so I can use Microsoft translator.

Does it mean that we can use the Sonos Vera plugin instead of the buggy default Sonos plugin?
If I’m correct, could someone modify the Wiki to recommend this (and explain how, for newbies ) ?

Thanks

Marty56 · July 14, 2016, 3:52pm

The songs binding 1.0 is not maintained and I have given up the hope that it will ever be fixed.
Whether the version 2.0 is more stable is not clear for me.

The Sonos Vera plugin would require a dedicated Mimos bridge.
This seems to be an very expensive workaround.

I am using PHP scripts which are started via http post.
This solution is not perfect but at least openhab is not crashing.

hmerk · July 14, 2016, 4:16pm

I don’t know all of your use cases and I have only one Sonos Play:1, just using openHAB controlling play/pause/volume, but I can say that it never crashed my system.

Marty56 · July 15, 2016, 3:09am

And probably you are having a system that is generous on RAM.
I my case (3 songs speaker) and 1 GB RAM running Windows (which is not much) openhab crashed every 1-2 days as far as I can recall.

hmerk · July 15, 2016, 6:10am

It is just a CubieBoard2, so not generous on RAM.

Marty56 · July 15, 2016, 12:04pm

Then you are just lucky.

steve1 · July 15, 2016, 1:49pm

It seems like the binding is relatively reliable with a single Sonos device. I have 3 Play:1 devices and the binding will typically not run more than 24 hours before it crashes the server.