Hi!
I have 3 LIFX Lightbulbs and they have recently started to slow down in response to an openhab command (~6sec).
After some tinkering, I deactivated my network binding that is checking around 100 devices in my network using ping and measures latency.
Now, the LIFX-Latency is gone and everything is responsive again.
Is there a limit on how many open connections I can have and can this be increased?
First things first, the Network binding doesnât use HTTP. It uses ping and arping.
Secondly, when the LIFX are reacting slowing, is it just those or is everything slow? Whatâs the computer doing at these times (run htop to see how much CPU and RAM is being used).
Finally, openHAB makes a horrible networking monitoring tool. Thatâs not what itâs made to do. There are times where knowing whether a device is reachable over the network might impact your home automations (e.g. when receiving a command to open a garage door, making sure OH can reach the garage door and if not generating an alert so the user knows why the door didnât open). But the use of bindings like Network and SystemInfo should be limited to those use cases. Use a proper networking or system monitoring tool to monitor the health and status of your system overall. It will work better, faster, and provide you better information than openHAB ever could. And itâll be less work to set it up.
If you just want to see what devices are online on your network and latencey and such, Fing, Zabbix, and Nagios are all pretty good choices.
I did check that, looking at the amound of threads. CPU is low, Ram consumption is low, 290 Threads is OK⊠I also see HDD I/O is low. So I assume, the computer is running ok. Also, I have a combination of other lights like shelly, those turn on immediately, just the LIFX is slow.
My concern is that if a binding is used so heavily and is using a OS-command (ping), that it might be able to slow down other bindings that need to use OS-Features (maybe LiFX is using ping as well). Like a wait or so. I have no details, but maybe one session(openhab) can only send one ping at a time? I can replicate it, network binding on -> lifx-slowâŠ
I looked into other tools, check-mk as well, but ended up with openhab as one reason is probably because I know it best and can analyse it via grafana automatically⊠and the other reason is that Ican monitor zwave-zigbee or other âdevicesâ as well, independend of their protokoll. I could use the unifi-binding or a little script if the network binding is not suited⊠or maybe point the finger to a problem and the developer finds something to improve.
It seems reasonable to use openHAB to do some basic network monitoring, at ping level.
The exceptional part of your usage is having 100 devices instead of 3 or 4.
Which might well highlight some blocking behavior in some part of the mechanism. Give us a clue, how often are you pinging these things?
hmm⊠I use default⊠that is 5 sec wait time, one minute refreshintervall, 1 retry. I assume those 100 are send parallel though, otherwise it is difficult to keep the refreshinterval⊠but I donât see spikes in my threads (at least in htop).
if I were writing the binding iâd be doing one at a time in sequence, and treat the interval as minimum refresh. But mostly theyâd each be over and done with in a few milliseconds, and no queue built up.
Iâve no idea how it really works.
Even with one-at-a-time, I would expect them to come in bursts, as theyâd all get scheduled for much the same time at initialization. You might see that simply by noting Item change timings in events.log
What kind of router do you have? 100+ devices is a lot for a cheap router to handle and you may be hitting a hardware limitation.
Also I know you donât want to change the interval, but to see the result it is worth doing for fault finding reasons. If you donât experiment then you donât learn and you wonât progress.
Further thought; indirect effects. Updating, what, 200 Items per minute is not a huge load. But a very real load. Are you persisting these? Further workload, this time tying up I/O resources. None of these are obvious showstopper but it all adds up.
Sometimes intelligent configuring can make a lot of difference.
Do you really need 100 latency values? (Why!)
Do you need to persist 100 unchanging ONLINE states every minute? Etc.
Thanks for the Input!
I have unify switches and unifi access points (about 30 clients pro NanoHD). Routing is done internally in the Proxmox Server to reach the subnet of the LIFX lights, and this server doesnât show any load.
I am all up for it, I donât actually âneedâ those pings, I can live without them, but I want them because I can:) So Iâm not really looking for a workaround, but the source of the problem.
Interesting thought. In peak times, I am updating about 500 items per second, which can be handled quite nice by the system. I donât think this is the issue also because other devices (shelly lights) are not effected, only the LIFX connection.
Yeah, Iâm not there yet For now, it is more like: put everything I can in there and figure out if that was helpful later I donât need any latency values nor do I need 99% of the online statuses. But it was installed to find the problem with unresponsve shelly devices (before a patch fixed that).
I can do workarounds, I can open MQTT sessions from most devices and work with last-wishes or so. Can also do it with scripts, or use the unifi binding⊠but for now, I was interested in the âwhyâ.
I will analyse this a bit, currently, the binding is disabled, maybe there is a difference between enable, enable with 10 and enable with 100 devices. Maybe only offline devices will slow the process . Will keep you updated when I do have progress.
The LIFX LAN protocol is based on UDP which has no guaranteed delivery. So if there is also a lot of UDP ping traffic in your network, the devices routing the UDP traffic may opt for dropping excess UDP packages. This may cause the binding having to send packages several times for them to arrive (causing slowness) or not arrive at all (causing unreliability). LIFX lights also have a limit on how much packages per second they can handle. LIFX advises at most 20 messages per second. The older LIFX lights can handle less messages per second than newer lights.
I wonder what the timeout period is before a retry is invoked. Seconds, likely?
It retries when there is no ACK after 250ms and max retries 3 times. Iâve tested this with 40+ lights in my network where its very common that packages are dropped. If you wait longer between retries you get annoyed by slow light response times whenever packages are dropped.
Very occasionally I still have a light that doesnât properly respond. I still have to investigate if that can be improved by slightly increasing the max retries. But if the light was momentarily offline when lights are switched increasing the max retries will not solve it.
Thatâs quite a short timeout really, although Iâm sure itâll be sensible for most home networks which is after all the target audience.
I get the feeling this isnât the average home network though.
I am always impressed how quickly a relevant developer is finding and joining a discussion so thanks @wborn to drop in!
I had a little contact with the yeelightsâ protocoll, they use UDP as well, and they send like a thousand UDP requests per second⊠just to make sure one comes through⊠that is a bad practice I guess, but it seems to be working. I did tcpdump that, didnât count if âallâ have been receipt, but the screen filled up quickly on the sending and receiving side, so I donât think packages are dropped here.
I would consider my home network to be professional hardware build with medium knowledge. I prefer ip-based protocolls over zigbee/zwave, so I have all outlets, switches and lights using IP, so that grow quickly - therefore bigger investments in hardware that can support the load, but besides this, it is still just a little home here
Based on this input I will probably ssh into the different devices in my network (openhab itself, proxmox, switch, accesspoints) and see (tcpdump) where the UPD packets are lost to confirm/deny that theory.
So, little update,
I did a little tcpdump inside openhab to see if the UDP Packages are even created.
There is a constant 3sec udp package that is polling the state of the light. All good.
then, I activate some network things. the 3sec polling is paused for a couple of seconds, then back to normal, 3sec polling.
then I activated some more network things⊠same thing, there is a pause⊠but then back to normal, 3 sec polling.
It is around 65 network things when there is no âback to normalâ anymore. the polling is between 10 and 60 secsâŠ
If I move back to 60,50. it is not recovering from that easily. But back to around 10, the 3sec polling comes back.
I mixed the network things I added to see if âcertainâ things are the problem, but I suspect it is the number, not the things.
So I monitored my OpenHAB more. The UPDs are not created, but why?
Pidstat -d 10 is showing that my nvmeSSD is not doing a lot, i have about 100kb read/write speed with 0% IO delay.
htop is 3% cpu with 1of2G Memory used, showing 51,255 threads, with or without the networks, no problem here.
I am running out of ideas. Maybe I have to switch to a different concept
It appears youâre using Linux. I do know that you can run into things like port exhaustion in Windows, Iâm wondering if the same thing is possible in Linux. The only issue is that it would be per remote host vs collective remote hosts. Maybe some sort of other limit on the network side? UDP isnât a âconnectionâ so it wouldnât be anything surrounding that. It could be in the networking gear though if you have firewalls segregating things and or NATs.
Thread management, blocking methods, and possibly a combination of the two could explain it as well, as you hint at. You could possibly get to a point where things canât be âprocessedâ fast enough and before youâre finished with one cycle, the next one has been added and things now start to exacerbate. This would also explain why even when you lower the number, things donât immediately clear up.
Can you set the max RAM in Linux for Java like you do in Windows? I would assume so.
Also, 5 seconds is an insane timeout, especially on an internal network. If something doesnât ping back within 1, somethingâs up. Iâd suspect that you could actually get away with 500ms or even 100ms.
I do have a ânetworkâ of devices, from firewall to NAT etc. But because the UDP packages are not sent (did tcp dump on the openhab-host) i think this cannot be the problem.
I could, but the current RAM is not exhaused, at least, I donât see it. heap size is increased, although I donât have the number ready right nowâŠ
I agree, but some devices (like shelly devices) have latencies from >1sec⊠not often, maybe 2/3 times a day, but they have. I donât want offline-warnings if that occurs.
I missed the part where zero UDP are being sent. Hmmmm.
This topic was automatically closed 41 days after the last reply. New replies are no longer allowed.