AndrewFG
(Andrew Fiddian-Green)
October 3, 2022, 10:52am
1
I was recently contacted by a developer at Gardena / Husqvarna concerning the call rate issue on their server. So for the benefit of other users and developers, I am posting his question and my reply here. If you have further thoughts please let me know, and I will forward them as an annexe to my response below…
Gardena question …
We have encountered a problem with the openHAB binding with our GARDENA smart system API. I also stumbled upon a recent request from you about this binding, so I’m contacting you directly.
Some openHAB users have complained that they regularly encounter an error 403 in the simultaneous login context (with an error code WAF_FILTERED, which has been resolved in the meantime). Furthermore, our analysis has shown that the openHAB binding at least partially tries to authenticate the same user against our Authentication API several times per second (up to two times in 50ms).
Can you help us to understand why exactly this happens, or do you see a way to avoid this enormous traffic on the openHAB binding side?
My reply …
I think that essentially the root of the problem is that your servers impose call rate limits, both in a short time window (maximum calls per minute), and in a longer time window (maximum total number of calls in 24 hours).
The problem arises if (for whatever reason) the OH binding exceeds the call rate limit. At that point your server refuses the connection attempt. Which the OH binding considers to be an error, so it tries again. Which the server refuses again, etc. ad infinitum, in a mutual escalation. So ironically when you strictly impose call rate limiting, you are actually causing more calls than would be the case if you would not impose such limiting.
When the OH binding starts up, it will make a ‘flurry’ of calls initially. Namely to establish its credentials, connect the SSE socket (or is it a WebSocket? I don’t recall), poll the server to get initial state of all objects, and issue any pending state update commands. But once this initial flurry of calls has completed, the binding settles down to just receive state updates at a slow rate from the server via the SSE/WebSocket call-back, plus the occasional command issued by the user or an OH internal automation. I can imagine that in some systems, the initial flurry of calls may already exceed your short term rate limits, thus immediately triggering the mutual escalation between server and client.
Furthermore there are a number of issues that exacerbate the above problem…
In recent months, you switched from a password based authentication model to a certificate based model. This means that any OH users who have not upgraded their systems will still be attempting to connect via the no longer supported password model. Thus triggering more server / client escalations.
During the summer there was a period of time when your sever was being even more aggressive than usual concerning its rate limits. Thus triggering more server / client escalations. I think that in the meantime you rolled back those changes, so perhaps that is now less problematic. ??
Concerning the SSE/WebSocket connection: your authentication process grants tokens and certificates valid for 7 days and 1 day respectively, but in fact your server force disconnects the link after two hours. Which causes the OH client to reconnect, re-authenticate, and restart the initialization flurry.
Having said this, in August I made the following changes in the OH binding code to attempt to ameliorate these above issues…
• https://github.com/openhab/openhab-addons/pull/13016
• [gardena] Improve thread synchronization by andrewfg · Pull Request #13253 · openhab/openhab-addons · GitHub
The OH code release cycle comprises three build/stability levels (daily snapshot, monthly milestone, and yearly stable release). The above mentioned changes were incorporated in release ‘v3.4.0 Milestone 2’ (Releases · openhab/openhab-distro · GitHub ) which was released 22 days ago. So any OH users who update their systems based on the monthly milestone cycle will now have those changes on their systems. But any OH users who wait for the v3.4.0 final release will only be updated in December (target date) this year. Not to mention OH users who stay on older versions who will not get those changes at all.
So my recommendations to you are as follows…
Consider relaxing your call rate limits. Ironically I think that relaxing the limits will actually result in fewer calls on the server.
If users contact you directly concerning the above issue, please advise them to update their OH system to v3.4.0-M2 or later.
If the above does not solve the problem, then please advise users to open a formal issue here Issues · openhab/openhab-addons · GitHub on GitHub.
… or for a more informal discussion here https://community.openhab.org/ on the OH community forum (search for ‘Gardena’).
Do not force disconnect the SSE/WebSocket connection after 2 hours. But rather allow the connection to remain open until the token or certificate actually expires.
2 Likes