openHAB crashing with out of memory exception (Sonos, 1.7.1)

HI
As you menioned the sonos binding, at least it caused a memory leak with the upnp library (see wiki). I’ve also found this thread about the problem: http://knx-user-forum.de/forum/supportforen/openhab/37306-upnp-verursacht-outofmemory-exception (German). There is a way to check the problem (reply #7).

As all information I’ve found about it where quite old (not after Jan. 2015) I’m not sure whether or not this is still a problem. I hope not, as I’m also using the binding but just for a few days now…

Regards
Dieter

Thanks @DieterL, that was an interesting discussion. At this point I’m becoming more convinced that the problem is with the Sonos binding. This is what I saw in the log immediately before the OutOfMemoryError:

2015-11-15 23:02:33.072 [ERROR] [c.protocol.sync.ReceivingEvent] - Invalid subscription ID, no active subscription: (IncomingEventRequestMessage) NOTIFY /dev/RINCON_B8E937E4C19201400/svc/upnp-org/ZoneGroupTopology/event/cb.xml SEQUENCE: 0
2015-11-15 23:31:42.319 [ERROR] [c.protocol.sync.ReceivingEvent] - Invalid subscription ID, no active subscription: (IncomingEventRequestMessage) NOTIFY /dev/RINCON_B8E937E4C19201400/svc/upnp-org/ZoneGroupTopology/event/cb.xml SEQUENCE: 0
2015-11-15 23:31:42.486 [ERROR] [c.protocol.sync.ReceivingEvent] - Invalid subscription ID, no active subscription: (IncomingEventRequestMessage) NOTIFY /dev/RINCON_B8E937E4C19201400/svc/upnp-org/ZoneGroupTopology/event/cb.xml SEQUENCE: 0
2015-11-15 23:35:51.386 [WARN ] [efaultUpnpServiceConfiguration] - Thread terminated org.teleal.cling.registry.RegistryMaintainer@15a5352 abruptly with exception: java.lang.OutOfMemoryError: unable to create new native thread
2015-11-15 23:35:51.406 [WARN ] [efaultUpnpServiceConfiguration] - Root cause: java.lang.OutOfMemoryError: unable to create new native thread

The virtual memory spiked to 1.6 GB and dropped immediately, although not back to the level it was before the spike. None of the Java JMX heap nor nonheap memory usage metrics showed a significant increase at the time.

Real memory usage also spiked. Surprising, openHAB didn’t crash and has continued running.

Given the warnings I see around each memory incident about Upnp being unable to create native threads, I think the following chart of the number of active threads is interesting. Normally there are about 100 threads running, but that spiked to about 3000 threads right at the time of the OutOfMemoryError and then quickly dropped again to normal.

The 3000 threads could easily account for the spike in the virtual memory usage. I’m going to continue to monitor the situation, but I’m increasingly thinking that the Sonos binding is the culprit.

I just had a thought given the other Sonos discussion. They mentioned the problem happened when the Sonos is offline. I replaced a secondary wifi AP over the weekend. I’m wondering if the Sonos went offline during that switch. The timing seems about right and we hadn’t used the Sonos for the last 48 hours. I’ve reconnected the Sonos and I’ll see if this memory/thread problem happens again.

HI Steve
I hope that solved the problem, keep me up to date. I’m not really interested in running in such a problem myself :wink:

Regards
Dieter

I’m about 98% sure that was the problem. With the exception of a small spike earlier today, the virtual memory usage is flat for the last 24+ hours. I suspect the Sonos as the cause of the small spike because the thread count also spiked, but nothing like when the Sonos wasn’t reachable.

Well, my server crashed again this morning. After doing more searching, it appears this is a known and relatively common problem with the Sonos binding and the old version of Cling library that it uses. I’m going to start investigating how to fix this, starting with upgrading the upnp library to the latest version. In the meantime, I may write some rules to stop the bundle during the night when we aren’t using the Sonos. In the worse case, I’ll just remove the Sonos binding until I or someone can find a fix.

As an aside, it’s slightly impressive that a RPi 2 Java process can be running 5000+ active threads before it dies. :slight_smile:

Here are some links to related discussions. I was surprised that I could find no open issues related to this problem. If someone knows an existing issue number, please let me know. Otherwise, I create one in the next day or two.

Potential Memory leak? (Sonos?)
Redirecting to Google Groups

Hi @steve1,

great to hear you are working on this issue. I have the same problem here and removed the binding.

I personally unplug some of my Sonos speakers, when I am away to avoid unnecessary standby power consumption. It would be great, if the binding could handle offline speakers. :smile:
If I can help testing, please tell me.

Best regards
Ollie

Hi @steve1

That’s not what I wanted to hear from you :frowning:
Until now, I didn’t have this problem but after your posting, I’m waiting for it…

For now, I’m interested how do you manage to deactivate the binding based on rules? Is it a real deactivation of the binding or just a deactivation of the rules? I’m also thinking about unplug the power of the sonos when I’m away but didn’t do it because of this problem. If that would be possible I also have other bindings to deactivate (onkyo av), that writes every 3 seconds a log message when the receiver has no power. Doesn’t cause any real problem, but…

Regards
Dieter

@oliver_kuhl,

I probably will need some help testing.

I have found the cause of the symptoms I’m seeing. The Cling upnp library uses a thread pool with unbounded size. Based on information I’ve seen in their forums, it was intentionally designed to be able to create threads until the machine or process runs out of resources. The rationale for this decision isn’t clear but it’s obviously dangerous in an openHAB context. It is possible for the binding code to provide it’s own bounded threadpool. That would fix the current symptom but it doesn’t fix the root cause of why so many upnp tasks are being created in the first pace. We’d also need to decide what to do when the threadpool has hit it’s maximum capacity.

My server crashed again last night due to this problem. I’m going to add the -XX:-HeapDumpOnOutOfMemoryError and related arguments to my openHAB JVM startup script. Hopefully, this will do a memory dump when the error occurs. If so, I’ll be able to use the Eclipse Memory Analyzer to look at the thread stack traces and see which code is generating all the upnp requests. If we can identify the cause, then we’ll have a better idea if a Cling upgrade will fix the problem or not.

Hi,

I like the general idea of using upnp for an improved usability. However, the binding must work flawlessly with Sonos speakers, that are temporarily not available. If the thread issue cannot be solved right now, I would prefer a working binding without upnp, but a static IP configuration.

I personally may not have enough knowledge about threads and especially not upnp, so I am not a big help regarding coding unfortunately…

If you have a first version ready for testing, pls send me a jar or link to your git-repo and I’ll compile it myself.

AFAICT from the Sonos forums, upnp is the only choice for interacting with the Sonos devices. For now, I’m hoping that once we identify the source of excessive threadpool usage we’ll be able to fix the Cling library behavior.

You may be right. I just had the simple php-script here in mind, which works without upnp, but seems to be limited: Sonos PHP Script - HomeMatic-Forum / FHZ-Forum

So let’s try to fix upnp. :wink:

@steve1, maybe this here could be an alternative to cling?
http://bitletorg.github.io/weupnp/

In openHAB 2.0 / ESH, the SONOS binding uses jUPnP, which is a fork of cling 2.0.
It might be worth to try refactoring the SONOS 1.x binding to use JUPnP as well.

Best
Hans-Jörg

That’s a good idea. I can investigate that possibility.

I’m still waiting for the issue to happen again so I can hopefully get a thread dump and determine the root cause of the excessive thread creation.

I spend endless hours, trying to workaround the Sonos bugs and was never sucessful. Finally I removed the sonos binding and am controlling the sonos speakers via php scripts that are call via http puts.
But if you are Java programmer, you may have more powerful possibilities.

Since then all my memory overflows are gone. Before that I had a crash every 24-48 hours.

@martin_klimke Yes, it is frustrating. I have two more Sonos devices arriving today. I’m a little concerned it will aggravate the memory/threading issue. What types of capabilities do you have with the external scripts? Can you control volume, get the current track, etc? In general, how do the capabilities compare with the binding (assuming there was no memory leak)?

@hmerk I’ve been able to compile the Sonos binding with the forked jupnp. I’ve started doing some testing and it’s updating my Sonos test item, so that’s encouraging.

Hi Steve, that sounds great.
Best
Hans-Jörg

I can control volume, play back sound files and start a radio station.
I cannot get anything back from the sonos.
This is a disadvantageous because I cannot read out the volume and therefore I cannot really restore the state the sonos speaker had before I had applied a command.

this was my post:

So if you have good results with the forked Jupnp, I would be interested in trying it.

Testing is going well. It appears that the binding is working with the openhab jupnp library fork.

There are still quirks that probably are related to the binding implementation itself.

  • Items lose their states and appear to not be restored if an items file is reloaded (seems like relatively common behavior in openHAB bindings).
  • Sonos players aren’t recognized if they are started after openHAB starts.
  • If the Sonos player goes offline while openHAB is running, there are repeated log warning messages (~1/second) from the UPNP layer. This could be an issue for users who want to turn off their Sonos at certain times.
  • There’s an NPE from the SonosZonePlayer if player availability changes while openHAB is running. I haven’t completely identified the pattern of available that triggers the NPE but I’ve seen it several times.

The good news is that the binding seems to work about as well as it did before the jupnp upgrade and now the OutOfMemoryError should be avoided because the UPNP task queue is bounded. I’m going to update my primary openHAB installation and run with the updated implementation for several days to see what happens.

With the older implementation my openHAB crashed again. Unfortunately, the HeapDumpOnOutOfMemoryError did not work (I’ve seen info online that this happens sometimes). However, the hotspot log file shows that there were 4000+ Cling (UNPNP) threads at the time the JVM crashed. I’ve set another tripwire with a rule that’s monitoring the thread count via JMX and will manually dump the heap if the thread count crosses a threshold. This analysis is somewhat low priority since moving to the new library should fix the memory problem anyway.

1 Like