Just to add my 2 cents, although i have not read this whole thread, so apologies if this is redundant, but my observation is similar in that after a certain amount of cameras, our shared serial logic that zwave uses becomes stuck when establishing a serial connection, and the java process has to be restarted while those cameras are disabled to make it work again. Once the system is up , i can enable the cameras, and everything works, until the next reboot or if the zwave binding gets restarted… This is also 100% reproducible on my system, which is a core-i5 intel Nuc with 8gb of RAM running Debian.
I have a very strong feeling that Serial ports getting blocked after some re-connecting · Issue #1842 · openhab/openhab-core · GitHub is the culprit here, despite the issue being around reconnections, I have the exactly same sequence of logs messages that is mentioned in that issue, and the behavior is also the same. While this is purely observational, and I have not done any deep debugging, my guess is that somehow load, either CPU, number open files, IO load, or some other resource constraint is causing this same problem in the nrjavaserial library. Hopefully the fix mention will get pulled in and tested soon…
attached the log: maybe interesting @matt1 I see those “alarm stream not running” quite often immediately before the out of memory errors.
also really not sure what atually runs out of memory … I see that the RAM is constantly getting more used … but the graph below shows the %tage of free mem of the machine
so over 60% free mem at the point I see those oom errors.
2021-05-26 00:23:56.768 [INFO ] [era.internal.handler.IpCameraHandler] - The alarm stream was not running for camera 192.168.1.110, re-starting it now
2021-05-26 00:23:58.110 [INFO ] [era.internal.handler.IpCameraHandler] - The alarm stream was not running for camera 192.168.1.110, re-starting it now
2021-05-26 00:24:08.083 [WARN ] [io.netty.channel.AbstractChannel ] - Force-closing a channel whose registration task was not accepted by an event loop: [id: 0x2fadbedf]
java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
at java.lang.Thread.start0(Native Method) ~[?:?]
at java.lang.Thread.start(Thread.java:803) ~[?:?]
at io.netty.util.concurrent.ThreadPerTaskExecutor.execute(ThreadPerTaskExecutor.java:33) ~[bundleFile:4.1.42.Final]
at io.netty.util.internal.ThreadExecutorMap$1.execute(ThreadExecutorMap.java:57) ~[bundleFile:4.1.42.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.doStartThread(SingleThreadEventExecutor.java:1033) ~[bundleFile:4.1.42.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.startThread(SingleThreadEventExecutor.java:1002) ~[bundleFile:4.1.42.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:882) ~[bundleFile:4.1.42.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe.register(AbstractChannel.java:472) [bundleFile:4.1.42.Final]
at io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:87) [bundleFile:4.1.42.Final]
at io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:81) [bundleFile:4.1.42.Final]
at io.netty.channel.MultithreadEventLoopGroup.register(MultithreadEventLoopGroup.java:86) [bundleFile:4.1.42.Final]
at io.netty.bootstrap.AbstractBootstrap.initAndRegister(AbstractBootstrap.java:311) [bundleFile:4.1.42.Final]
at io.netty.bootstrap.Bootstrap.doResolveAndConnect(Bootstrap.java:157) [bundleFile:4.1.42.Final]
at io.netty.bootstrap.Bootstrap.connect(Bootstrap.java:141) [bundleFile:4.1.42.Final]
at org.openhab.binding.ipcamera.internal.onvif.OnvifConnection.sendOnvifRequest(OnvifConnection.java:517) [bundleFile:?]
at org.openhab.binding.ipcamera.internal.onvif.OnvifConnection.processReply(OnvifConnection.java:308) [bundleFile:?]
at org.openhab.binding.ipcamera.internal.onvif.OnvifCodec.channelRead(OnvifCodec.java:54) [bundleFile:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [bundleFile:4.1.42.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [bundleFile:4.1.42.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [bundleFile:4.1.42.Final]
at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:438) [bundleFile:4.1.42.Final]
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:328) [bundleFile:4.1.42.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:302) [bundleFile:4.1.42.Final]
at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:253) [bundleFile:4.1.42.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [bundleFile:4.1.42.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [bundleFile:4.1.42.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [bundleFile:4.1.42.Final]
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287) [bundleFile:4.1.42.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [bundleFile:4.1.42.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [bundleFile:4.1.42.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [bundleFile:4.1.42.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1422) [bundleFile:4.1.42.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [bundleFile:4.1.42.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [bundleFile:4.1.42.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:931) [bundleFile:4.1.42.Final]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [bundleFile:4.1.42.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:700) [bundleFile:4.1.42.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:635) [bundleFile:4.1.42.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:552) [bundleFile:4.1.42.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:514) [bundleFile:4.1.42.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1044) [bundleFile:4.1.42.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [bundleFile:4.1.42.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [bundleFile:4.1.42.Final]
at java.lang.Thread.run(Thread.java:834) [?:?]
2021-05-26 00:24:08.083 [WARN ] [io.netty.channel.AbstractChannel ] - Force-closing a channel whose registration task was not accepted by an event loop: [id: 0x5312b89a]
java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
at java.lang.Thread.start0(Native Method) ~[?:?]
at java.lang.Thread.start(Thread.java:803) ~[?:?]
at io.netty.util.concurrent.ThreadPerTaskExecutor.execute(ThreadPerTaskExecutor.java:33) ~[bundleFile:4.1.42.Final]
at io.netty.util.internal.ThreadExecutorMap$1.execute(ThreadExecutorMap.java:57) ~[bundleFile:4.1.42.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.doStartThread(SingleThreadEventExecutor.java:1033) ~[bundleFile:4.1.42.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.startThread(SingleThreadEventExecutor.java:1002) ~[bundleFile:4.1.42.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:882) ~[bundleFile:4.1.42.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe.register(AbstractChannel.java:472) [bundleFile:4.1.42.Final]
at io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:87) [bundleFile:4.1.42.Final]
at io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:81) [bundleFile:4.1.42.Final]
at io.netty.channel.MultithreadEventLoopGroup.register(MultithreadEventLoopGroup.java:86) [bundleFile:4.1.42.Final]
at io.netty.bootstrap.AbstractBootstrap.initAndRegister(AbstractBootstrap.java:311) [bundleFile:4.1.42.Final]
at io.netty.bootstrap.Bootstrap.doResolveAndConnect(Bootstrap.java:157) [bundleFile:4.1.42.Final]
at io.netty.bootstrap.Bootstrap.connect(Bootstrap.java:141) [bundleFile:4.1.42.Final]
at org.openhab.binding.ipcamera.internal.onvif.OnvifConnection.sendOnvifRequest(OnvifConnection.java:517) [bundleFile:?]
at org.openhab.binding.ipcamera.internal.onvif.OnvifConnection.processReply(OnvifConnection.java:305) [bundleFile:?]
at org.openhab.binding.ipcamera.internal.onvif.OnvifCodec.channelRead(OnvifCodec.java:54) [bundleFile:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [bundleFile:4.1.42.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [bundleFile:4.1.42.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [bundleFile:4.1.42.Final]
at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:438) [bundleFile:4.1.42.Final]
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:328) [bundleFile:4.1.42.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:302) [bundleFile:4.1.42.Final]
at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:253) [bundleFile:4.1.42.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [bundleFile:4.1.42.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [bundleFile:4.1.42.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [bundleFile:4.1.42.Final]
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287) [bundleFile:4.1.42.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [bundleFile:4.1.42.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [bundleFile:4.1.42.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [bundleFile:4.1.42.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1422) [bundleFile:4.1.42.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [bundleFile:4.1.42.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [bundleFile:4.1.42.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:931) [bundleFile:4.1.42.Final]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [bundleFile:4.1.42.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:700) [bundleFile:4.1.42.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:635) [bundleFile:4.1.42.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:552) [bundleFile:4.1.42.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:514) [bundleFile:4.1.42.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1044) [bundleFile:4.1.42.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [bundleFile:4.1.42.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [bundleFile:4.1.42.Final]
at java.lang.Thread.run(Thread.java:834) [?:?]
Please do not raise what looks to be a separate issue about out of memory in someone else’s thread that is not related, open a new thread. Will not reply here also I have done many replies on this topic before, do a search on “OOME” before posting a new thread.
This thread is about a FRESHLY rebooted system having zwave issues when a large number of cameras are setup. Your issue is not the same where it takes a week for your memory to run out. Will not reply further.
I uploaded a new/alternative version of the ipcamera binding. It is identical with 1 exception is that it will not start until the openHAB core reaches start level 95 (80 is the norm).
I tried the updated version without success. The system would reboot fine with one HIKVISION camera, but once I added a second HIKVISION camera the Z-Wave network wouldn’t start after a reboot.
Thanks for testing it, I uploaded another one with the start delay set to 100 which is the maximum I believe, which means the binding should only start after everything else has finished. Do you notice any change in the order of the logs when the cameras handshake in the logs VS when ZWave does its startup? Since I have never tested this setting before I have no idea if it actually will make a difference to when the cameras will start, so checking the logs will be important to confirm if it is doing anything different.
Also you will need to UNINSTALL the binding and have the jar UNZIPPED in the addons folder. If you don’t uninstall the binding from the UI it will still use the merged binding from what I understand. Its normal for the manually added binding to not show up in the installed list when it in the addons folder.
Lastly I did mention in an earlier post to try setting the camera up as a GENERIC camera thing and not one of the thing types that has ONVIF, that will remove the ONVIF handshake when the binding starts. You may potentially find this gets you running till a solution is found, but since I can not reproduce due to not using Zwave your going to have to do all testing. Reporting back what you find will help others and also narrow down where the problem is to help someone come up with a long term fix.
It does appear to be this as I am hitting it with the Zigbee binding. 2 cameras it works, any more and the zigbee coordinator will stay at UNKNOWN status and not go online. When this happens if you uninstall and reinstall the zigbee binding, it then will not offer the serial port as a suggested port to use so I am pretty certain this points to the serial library being the root cause and hence why Zigbee and Zwave are both affected.
There was a change merged for the modbus binding here that may be worth looking into for any other affected bindings…
I seem to have fixed this issue (at least on my system with Zigbee binding) by throwing away the netty server and moving to the OH3 8080 servlet. Precompiled jar is here if anyone wants to help test, but there is a breaking change as the serving is now done differently, see github description…
@msexton187 Are you sure the newer version is being used? Does the camera now only work on the newer urls mentioned in the github?
I don’t use Zwave but the same issue was happening for me with Zigbee. The zigbee coordinator stick would stay as unknown and would not come online until I paused all cameras except 2, rebooted and then after the zigbee came online I could then unpause the cameras and get everything working.
I did an interesting test where if I uninstalled zigbee, rebooted and got all cameras working and then installed and setup zigbee, the issue was still present and I could not get zigbee working. I don’t believe it is caused by network traffic when the serial port is getting opened, as this test ruled that theory out as the zigbee binding and serial ports were not touched until after the network traffic was zero.
After using this newer jar I can leave all 4 cameras enabled and reboot or restart as many times as I wish and it all stays working.
Since the changes made should not have anything to do with zigbee or serial ports it is not clear why this has made a difference on my system. It also does not make sense why the number of cameras makes a difference.
How many cameras do you have? Do you have more than 4 when you reboot? How fast is your server, is it a PI?
When I tested it I confirmed I was using the latest. Initially I had two versions running, but then I uninstalled both and reinstalled the git-hub jar. The SNAPSHOT version was …828…
I didn’t test the exact scenario that you did. I removed all things for z-wave and ipcamera. Added the serial controller and let it auto detect. Once they all showed up I rebooted. All came back online. Then I added my 3 cameras. All were online. I rebooted and all z-wave things stayed offline. Since this was the same behavior as last time I decided not to test any further.
As for hardware, this is a small server running an i7 and debian 64bit loaded using openHABian. My real system is a Pi4, and it works fine with all cameras and all z-wave things. It’s never had an issue. It’s only been my non-PI system. It blows my mind that crappy Raspberry PI4 performs without issue, but the much beefier server acts up. Presently I’m hanging my hat on version 5.3 of NVSerial, but I’m also skeptical that will help.
Smells like a concurrency issue, where the bigger box doing more things in parallel is is the root cause. The little box plodding along one step at a time gets results in the order it expected.
e.g.
Start process A
Start process B (but not much happens because A has the resource)
A results
B results
all good.
-vs-
Start process A
Start process B (this time we can run freely)
B results
mess up processing B results because A not back yet.
Could be. I seem to recall @matt1 adding the largest startup delay he could to the ipcamera binding, but that didn’t help either. Once the new nvserial library is added I’ll start testing again. At this point I don’t know what else to try.
The startup delay would not have done anything. I can install and setup all zigbee AFTER everything has started and it fails if I use the older binding. New binding and I can no longer reproduce any issues. I took a deeper look into the Netty library and it actually can be used to talk to serial ports so perhaps it is conflicting in some way, but still it is weird that a few cameras work and it only poses a problem when going over a certain number. I think it would be worth updating the Netty library to the latest version on a system to see if that solves the issue.
If installing the all in one jar does not work, then installing just the Jars that is needed will work…
All these are needed, but they are inside the all in one bundle.
netty-buffer
netty-codec
netty-codec-http
netty-common
netty-handler
netty-resolver
netty-transport
You can either download and drop the jar files into the addons folder, OR you can use the console to download and install them. Either way will work but it looks like your syntax is wrong…