Web Socket Error on AmazonEchoControl binding every 65 seconds

J-N-K · June 14, 2020, 4:46pm

The time where every connect failed. Something must have changed and that is clearly not on openHAB side. But maybe if we find out what has changed, we can adapt to that.

Flip · June 14, 2020, 4:48pm

Maybe I’m wrong but we are all in CEST-timezone… So I guess most of us are using the german api-url:

https://alexa.amazon.de/api

right?

andyzle · June 14, 2020, 4:49pm

I refer to the web socket error, oom is only a severe side effect.

omatzyo · June 14, 2020, 4:53pm

I’m EST in the US

Flip · June 14, 2020, 4:56pm

So you are using alexa.amazon.com right?

omatzyo · June 14, 2020, 4:56pm

Right

ggg · June 14, 2020, 5:06pm

But we need two changes then:

Protect against the memory leak situation triggered by this or any similar scenario
Identify specifically what is wrong on this Amazon integration and try to recover the functionality

Trinitus01 · June 14, 2020, 5:10pm

@J-N-K After the call of “webSocketClient.connect” in the “WebSocketConnection” contructor the handler should call the method “onWebSocketConnect”, but this never happens. So i think it stuck trying to establish the webSocket connection. “initPongTimeoutTimer” is also called in the constructor and tried to close after 60 sec. If the webSocketConnection is established everything worked fine and “initPongTimeoutTimer” was able to close it. It was necessary to cancel the “Future” to fix the memory issue, for sure. But it seems that this was only a very bad side effect.

Wikibear · June 14, 2020, 5:14pm

Checked now when it starts:
2020-06-12 18:32:29.510 local German time. Means UTC +2.

tbbear · June 14, 2020, 5:24pm

Doesnt fix it

J-N-K · June 14, 2020, 5:27pm

Exactly. This was clearly a programming error. But as you said: the root cause is that the connect fails. This is a bit surprising because In case of a failure onWebSocketError should have been called, which isn’t called either.

So for debugging we probably have to check if the amazon servers don’t respond at all or if the response processing fails (before it reaches our own code). Even if encrypted connections don’t produce readable content, using tcpdump or wireshark might give an insight on that.

J-N-K · June 14, 2020, 5:27pm

Doesn’t fix what? You still see OOM?

ggg · June 14, 2020, 5:29pm

I have sporadic instances of that error every day of the month (pretty much except the 8th of june), and even earlier during May…but then more continuously on the 12th of June until I uninstalled the binding.

Wikibear · June 14, 2020, 5:30pm

Why oracle time to ask amazon what they changed.

By the way… There is a user agent setting in code. Maybe this is the error cause amazon will not exept user agent.

Trinitus01 · June 14, 2020, 5:33pm

The header user agent is not set for the webSocketConnection.
If it check this on firefox there is a header for user agent, for sure.

tbbear · June 14, 2020, 6:00pm

Yes, i still get the error every 65 seconds

tbbear · June 14, 2020, 6:02pm

2020-06-14 20:00:24.732 [INFO ] [nternal.WebSocketConnection$Listener] - Web Socket error
java.nio.channels.AsynchronousCloseException: null
at org.eclipse.jetty.client.http.HttpConnectionOverHTTP.close(HttpConnectionOverHTTP.java:181) ~[?:?]
at java.util.ArrayList.forEach(ArrayList.java:1257) [?:1.8.0_252]
at org.eclipse.jetty.client.AbstractConnectionPool.close(AbstractConnectionPool.java:208) [bundleFile:9.4.20.v20190813]
at org.eclipse.jetty.client.DuplexConnectionPool.close(DuplexConnectionPool.java:237) [bundleFile:9.4.20.v20190813]
at org.eclipse.jetty.client.HttpDestination.close(HttpDestination.java:385) [bundleFile:9.4.20.v20190813]
at org.eclipse.jetty.client.HttpClient.doStop(HttpClient.java:260) [bundleFile:9.4.20.v20190813]
at org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:93) [bundleFile:9.4.20.v20190813]
at org.eclipse.jetty.util.component.ContainerLifeCycle.stop(ContainerLifeCycle.java:180) [bundleFile:9.4.20.v20190813]
at org.eclipse.jetty.util.component.ContainerLifeCycle.doStop(ContainerLifeCycle.java:201) [bundleFile:9.4.20.v20190813]
at org.eclipse.jetty.websocket.client.WebSocketClient.doStop(WebSocketClient.java:371) [bundleFile:9.4.20.v20190813]
at org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:93) [bundleFile:9.4.20.v20190813]
at org.openhab.binding.amazonechocontrol.internal.WebSocketConnection.close(WebSocketConnection.java:171) [bundleFile:?]
at org.openhab.binding.amazonechocontrol.internal.WebSocketConnection$2.run(WebSocketConnection.java:200) [bundleFile:?]
at java.util.TimerThread.mainLoop(Timer.java:555) [?:1.8.0_252]
at java.util.TimerThread.run(Timer.java:505) [?:1.8.0_252]

J-N-K · June 14, 2020, 6:10pm

This is not an out-of-memory error but an error message because of a failed connect. If openHAB crashes due to out-of memory, that is a severe problem.

ggg · June 14, 2020, 6:14pm

Openhab was crashing due to OOM error. Confirmed it did so originally (assuming no longer with the patch?). Everything started failing after a certain point and needed a restart - some have even needed a restore from backup.
If this fix avoids the OOM while still showing the websocket failure, that’d be a good improvement.

 2020-06-12 16:43:28.276 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception:
    java.lang.OutOfMemoryError: Java heap space
 ...
 ...
    at org.openhab.binding.amazonechocontrol.internal.WebSocketConnection.<init>(WebSocketConnection.java:102) ~[?:?]

Edited as not sure if you refer to the situation after the patch, or originally

Martin_Zobel-Helas · June 14, 2020, 6:16pm

I installed the new version you provided. So far no more OOMs, but I am still looking carefully at my OH installation.