Too much time before a Sonos thing becomes definitively ONLINE

Could you rename the topic into “Too much time before a Sonos thing becomes definitively ONLINE” ?

Issue created, in ESH as I don’t know exactly where is the problem except that it was reproduced for Sonos.

1 Like

Is it possible to downgrade the installed 2.5.1 jupnp library in current 2.5.0 snapshot?
If so, I would try to install the 2.4.0 version.

Thanks to @Lolodomo I got the information that the jUPnP bundle is somehow broken - see ESH issue above.
So I was able to implement a hopefully temporary (:wink:) workaround:
I implemented a rule which always restarts the jUPnP bundle when the sonos Thing goes offline.

This is working because the first time after a power on the sonos thing is always recognized immediately. The problem starts with the next power off/on. You then can either restart openhab or just restart the jUPnP bundle - that fixes the behaviour.

General question: How is the status regarding a possible jUPnP fix, does anybody know?

Hello Thomas,

I would be interessted in some more details about your workaround - is there a chance you share it here? I guess an offical fix will definetly take some time and I would like to see if this could help me too.

I’m in the same situation like you, since I updated to OH2.4 my Sonos speakers can no longer be controlled in a stable way via Basic UI.

My log is flooded ramdonly with below errors for any of my speakers:

[hingStatusInfoChangedEvent] - ‘sonos:PLAY3:XYZ’ changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): The Sonos player RINCON_XBE932481F6601400 is not available in local network.

[hingStatusInfoChangedEvent] - ‘sonos:PLAY3:XYZ’ changed from OFFLINE (COMMUNICATION_ERROR): The Sonos player RINCON_XBE932481F6601400 is not available in local network. to ONLINE

It looks even like the error message is partially upside down with that “to ONLINE” statement at the end of the string.

Thanks & Regards,
Christoph

Sure, I will do so.
I will post it tomorrow, now it’s time for bed :blush:

My rule to restart the jUPnP bundle:

rule "jupnp bundle restart"
when
  Thing "sonos:CONNECTAMP:amp" changed from ONLINE to OFFLINE
then
executeCommandLine("sudo -u openhab ssh -p 8101 -i /home/openhabian/rsh/openhab.id_rsa openhab@localhost bundle:restart org.jupnp",6000)
end

I am using openhabian on an raspberry pi 3.
That the rule is able to call the karaf call without entering a password you have to generate a ssh key.
I used this tutorial

I created a directory called rsh:
mkdir /home/openhabian/rsh
enter the directory and generate ssh key:
sudo -u openhab ssh-keygen -t rsa -f openhab.id_rsa
Add content of the public key file to /srv/openhab2-userdata/etc/keys.properties
openhab=<yourkey>,_g_:admingroup

Test it:
sudo -u openhab ssh -p 8101 -i /home/openhabian/rsh/openhab.id_rsa openhab@localhost

Additionally to the tutorial I had to do the following, that I was able to restart the bundle without entering a password:
First reset root password:
sudo passwd root
Next, add openhab to sudoers file:
su -
visudo
Finally add openhab user to sudoers file:
add: openhab ALL=(ALL) ALL

Then I was able to restart the bundle from the rule

1 Like

@Lolodomo: Do you have any news about a possible fix of the UPnP library ?

I’m having a similar but slightly different issue. My sonos speakers come on relatively quickly. After a few days they all go offline and stay offline. A reload of org.jupnp as mentioned above resolves the issue for a few days. I have a rule now that monitors the thing state as mentioned above and reloads it when the crash happens. I noticed this happened when I moved from 2.5.0M1 to one of the recent nightly builds.

Best, Jay

I’ve updated the bug with the following info:

I decided to turn on jupnp debugging for a few minutes while my Sonos PlayBar was in a bad state. Every second I get this message set:

2020-02-02 13:40:34.657 [DEBUG] [org.jupnp.transport.Router ] - Sending via TCP unicast stream: (OutgoingActionRequestMessage) POST http://REDACTED:1400/DeviceProperties/Control
2020-02-02 13:40:34.657 [DEBUG] [org.jupnp.transport.spi.StreamClient] - Will not attempt request because it failed in the last 600 seconds: (OutgoingActionRequestMessage) POST http://REDACTED:1400/DeviceProperties/Control

Then, every 600 seconds I get:

2020-02-02 13:44:11.804 [DEBUG] [org.jupnp.transport.Router ] - Sending via TCP unicast stream: (OutgoingActionRequestMessage) POST http://REDACTED:1400/DeviceProperties/Control
2020-02-02 13:44:21.804 [INFO ] [org.jupnp.transport.spi.StreamClient] - Timeout of 10 seconds while waiting for HTTP request to complete, aborting: (OutgoingActionRequestMessage) POST http://REDACTED:1400/DeviceProperties/Control

2020-02-02 13:54:22.222 [DEBUG] [org.jupnp.transport.Router ] - Sending via TCP unicast stream: (OutgoingActionRequestMessage) POST http://REDACTED:1400/DeviceProperties/Control
2020-02-02 13:54:32.222 [INFO ] [org.jupnp.transport.spi.StreamClient] - Timeout of 10 seconds while waiting for HTTP request to complete, aborting: (OutgoingActionRequestMessage) POST http://REDACTED:1400/DeviceProperties/Control

A CURL (with POST) to the URL gives me:
<s:Envelope xmlns:s=“http://schemas.xmlsoap.org/soap/envelope/” s:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"><s:Body><s:Fault>s:ClientUPnPError401</s:Fault></s:Body></s:Envelope>
NOTE: I get this from ALL Sonos speakers I have. I’m not sure if I’m missing something on the query to get a bad authentication.

I have verified that I have <1ms ping times to the bar (no packet loss) and they are on the same network so the Sonos TTL=1 is not an issue here.

I let this go for about an hour with the same result. Then I restarted OH2. All of my speakers immediately came online and are stable. Given this, I do NOT believe the issue to be the Sonos itself as a “random” restart of OH2 should not clear out the issue on the Sonos.

Going through the debug of the OH2 restart it looks like OH2 effectively registers as an endpoint with the Sonos speaker. My assumption (since I can’t just make one fail at will) is that this registration becomes invalid at some point. Assuming that this theory is correct, at this point I would suggest that the Sonos binding be modified to detect this issue and “self heal” by restarting the registration process with that specific speaker when the HTTP request to the control URL fails to reply.

For anyone who is having this issue, please try to move to 3.1.0-S2130 or later. Work has been done to resolve thread exhaustion issues which should be good after there.

Hi @morph166955

Just tested latest sonos binding version (I use build 2142).
I wanted to test the behaviour of my first post of this thread:
I switched of the power of my sonos device (unplugged device) and switched it on again. I still get the messages:

2021-01-14 16:51:46.130 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'sonos:CONNECTAMP:celler' changed from OFFLINE (COMMUNICATION_ERROR): Der Sonos Speaker RINCON_000 ist nicht im Netzwerk verfügbar. to OFFLINE (COMMUNICATION_ERROR)
2021-01-14 16:51:46.141 [INFO ] [openhab.event.InboxRemovedEvent     ] - Discovery Result with UID 'sonos:CONNECTAMP:RINCON_000' has been removed.
2021-01-14 16:51:49.715 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'sonos:CONNECTAMP:celler' changed from OFFLINE (COMMUNICATION_ERROR) to ONLINE
2021-01-14 16:51:49.722 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'sonos:CONNECTAMP:celler' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Der Sonos Speaker RINCON_000 ist nicht im Netzwerk verfügbar.

After about 10 minutes the device is recognized:

2021-01-14 17:00:42.673 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing ‘sonos:CONNECTAMP:celler’ changed from OFFLINE (COMMUNICATION_ERROR): Der Sonos Speaker RINCON_000 ist nicht im Netzwerk verfügbar. to ONLINE

Do you think the rediscovery of the ONLINE status can be improved, to become quicker?

Thanks,
Thomas

I submitted a PR for JuPnP yesterday to make this setting configurable. It should seriously reduce the timer.

1 Like

Do you have a link to the PR?

Thanks a lot!
Can you give an example how and where to set the configuration via config file?

in services/runtime.cfg:

org.jupnp:retryAfterSeconds=10

For those interested in testing, this is a very experimental image (although it seems to be stable on my system) that may resolve this issue. In fact, it may significantly improve performance of upnp devices.
This version of jupnp has some extra code on the threads that let them work without a pool. There is a chance that this can cause instant spikes in memory or cpu, but they should be short lived.

In addition to retryAfterSeconds, add:

org.jupnp:asyncThreadPool=false
org.jupnp:mainThreadPool=false

Please make the config addition with OH stopped, it will be highly disruptive if it’s active.

Please report success or failure so we can determine if this should get merged. Thanks!

To test the jar, do I have to copy it to the addon dir, like I do for bindings?

No. You need to login to the karaf console and do bundle:update.

1 Like