Rules stop processing

I’ve been running on OH2 for a couple of weeks now fairly successfully. This morning my rule engine just stopped which is very disconcerting!

Openhab log shows rules triggering like normal. I see a pair of error entries that appear to be related to homekit which could be related. I see a few more rules trigger over the next 10-20 minutes though. A rule fires that has a timer. I see the entry for the timer starting but it never returns to do anything. That’s the last entry.

Basic UI is very sluggish at this point. I can interact with items and they work but no rules fire. System load is .07 which is mostly due to my log viewing and top. 0.00 for 15 minutes.

Item states continue to change in events.

I setup homekit last Friday I think so it is “newer.”

Running OH2 stable on oracle java on Debian 8. dual core 2gb memory, 700MB+ free.

A service restart took a good 5 minutes which is very unusual.

2017-03-08 07:15:25.001 [INFO ] [home.model.script.Override setpoints] - Rule Triggered
2017-03-08 07:15:25.002 [INFO ] [home.model.script.Override setpoints] - Alarm not armed away
2017-03-08 07:15:25.014 [INFO ] [home.model.script.Override setpoints] - No changes needed.
2017-03-08 07:15:47.741 [INFO ] [del.script.Monitor Thermostat States] - Rule Triggered
2017-03-08 07:16:47.749 [INFO ] [del.script.Monitor Thermostat States] - Rule Triggered
2017-03-08 07:17:47.739 [INFO ] [del.script.Monitor Thermostat States] - Rule Triggered
2017-03-08 07:18:07.748 [INFO ] [del.script.Monitor Thermostat States] - Rule Triggered
2017-03-08 07:18:27.743 [INFO ] [del.script.Monitor Thermostat States] - Rule Triggered
2017-03-08 07:18:47.771 [INFO ] [del.script.Monitor Thermostat States] - Rule Triggered
2017-03-08 07:19:07.737 [INFO ] [del.script.Monitor Thermostat States] - Rule Triggered
2017-03-08 07:19:56.803 [ERROR] [lfe.hap.impl.http.impl.BinaryHandler] - Exception in binary handler
java.io.IOException: No route to host
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)[:1.8.0_121]
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)[:1.8.0_121]
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)[:1.8.0_121]
at sun.nio.ch.IOUtil.read(IOUtil.java:192)[:1.8.0_121]
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)[:1.8.0_121]
at io.netty.buffer.UnpooledUnsafeDirectByteBuf.setBytes(UnpooledUnsafeDirectByteBuf.java:447)[208:org.openhab.io.homekit:2.0.0]
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:880)[208:org.openhab.io.homekit:2.0.0]
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)[208:org.openhab.io.homekit:2.0.0]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)[208:org.openhab.io.homekit:2.0.0]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)[208:org.openhab.io.homekit:2.0.0]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)[208:org.openhab.io.homekit:2.0.0]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)[208:org.openhab.io.homekit:2.0.0]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)[208:org.openhab.io.homekit:2.0.0]
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)[208:org.openhab.io.homekit:2.0.0]
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)[208:org.openhab.io.homekit:2.0.0]
at java.lang.Thread.run(Thread.java:745)[:1.8.0_121]
2017-03-08 07:19:56.828 [ERROR] [.hap.impl.http.impl.AccessoryHandler] - Exception caught in web handler
java.io.IOException: No route to host
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)[:1.8.0_121]
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)[:1.8.0_121]
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)[:1.8.0_121]
at sun.nio.ch.IOUtil.read(IOUtil.java:192)[:1.8.0_121]
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)[:1.8.0_121]
at io.netty.buffer.UnpooledUnsafeDirectByteBuf.setBytes(UnpooledUnsafeDirectByteBuf.java:447)[208:org.openhab.io.homekit:2.0.0]
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:880)[208:org.openhab.io.homekit:2.0.0]
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)[208:org.openhab.io.homekit:2.0.0]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)[208:org.openhab.io.homekit:2.0.0]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)[208:org.openhab.io.homekit:2.0.0]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)[208:org.openhab.io.homekit:2.0.0]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)[208:org.openhab.io.homekit:2.0.0]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)[208:org.openhab.io.homekit:2.0.0]
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)[208:org.openhab.io.homekit:2.0.0]
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)[208:org.openhab.io.homekit:2.0.0]
at java.lang.Thread.run(Thread.java:745)[:1.8.0_121]
2017-03-08 07:20:00.002 [INFO ] [lipse.smarthome.model.script.CaseFan] - Rule Triggered
2017-03-08 07:20:25.001 [INFO ] [home.model.script.Override setpoints] - Rule Triggered
2017-03-08 07:20:25.002 [INFO ] [home.model.script.Override setpoints] - Alarm not armed away
2017-03-08 07:20:25.013 [INFO ] [home.model.script.Override setpoints] - No changes needed.
2017-03-08 07:21:33.006 [INFO ] [odel.script.Convert ModeState to num] - Rule Triggered
2017-03-08 07:21:33.013 [INFO ] [odel.script.Convert ModeState to Num] - Second Floor Thermostat updated to Heating
2017-03-08 07:21:34.015 [INFO ] [odel.script.Convert ModeState to Num] - SecondFloorThermostate changed state
2017-03-08 07:25:25.002 [INFO ] [home.model.script.Override setpoints] - Rule Triggered
2017-03-08 07:25:25.002 [INFO ] [home.model.script.Override setpoints] - Alarm not armed away
2017-03-08 07:30:00.002 [INFO ] [marthome.model.script.TheaterMonitor] - System is off
2017-03-08 07:30:00.005 [INFO ] [lipse.smarthome.model.script.CaseFan] - Rule Triggered
2017-03-08 07:46:39.504 [INFO ] [ome.model.script.Garage Door Toggle] - Door closing
2017-03-08 07:46:39.504 [INFO ] [ome.model.script.Garage Door toggle] - Starting Timer
(EOF)

Any thoughts out there on this? I don’t think it is homekit related. It stopped again last night around 9:30pm. No homekit errors. No warnings, just stopped. I had restarted it about 16 hours prior so it hadn’t been running all that long.

I have no idea what to do but I no longer can trust it.

The log entry:
java.io.IOException: No route to host
followed by all the lines with:
[208:org.openhab.io.homekit:2.0.0]

would lead me to think your OH can not talk to the homekit box any more.
maybe you have DHCP and the box gets a new IP because of renewed lease?

Thanks for your response!

The second time the rules stopped, there was no homekit errors. Otherwise I would just disable it (may anyway for testing).

OH server has a static IP and isn’t the homekit “box” OH itself?

I don’t use homekit so that was a guess. It does need to talk to all your iOS devices and Siri, doesn’t it? maybe one of your devises went offline?

Sorta, I haven’t dug into how homekit works fully but my high level understanding is that my ios devices (two iphones currently) talk to the homekit bridge that is being created by the homekit binding. So yes, if my phones leave the wifi, they would time out but I suspect the phones check in with the bridge, not the other way around.

I would hope this wouldn’t lock up eclipse.

Homekit has been working fine, I really thing that was a coincidental stab in the dark for me. There was no homekit errors this time that it locked up.

I want to increase logging until I figure this out. Which service should I debug or trace to closer monitor rules?

Hard to give any advice on that. If during the second outage you had “No route to host” as well, i would say your culprit is your network in general.

It’s not a network issue. There was no errors the second time. It just stopped processing.

I’m fairly certain the homekit error was a coincidence the first time.

I’m looking for what service is the “rules” engine that I can increase logging on.

maybe this:
log:exception-display
can help?
Taken from:
http://karaf.apache.org/manual/latest/#_commands_2

I think that just shows the last error (which was nothing for me… not sure why). The engine has run for days or hours before crashing… but it doesn’t put any errors, it just stops. I need to find a way to get a better understanding of what happens right as it crashes. Obviously it isn’t logging a fault. But maybe I could isolate it down to an interaction with a specific item or binding…

I’d start by putting individual logging to each rule start, like:
logInfo(“my.rules”, “start rule rulename”)
once you know which one is causing the problems, log every step this rule is making.
long and winding road…

I already have a loginfo with rule triggered as the first line of each rule. Been doing that for over a year. I’ll add one showing the rule completed I guess.

I also have the problem that rules stop working. Running the latest OH2.1.0-SNAPSHOT in a docker container. Right now I haven’t figured out why it stops. I also habe Homekit support activated. Restarting the docker container always solves the problem.

Excellent to know you have homekit running as well. This started after I setup homekit BUT, I only had been on OH2 for about a week maybe 2 and restarted often as I was still perfecting the transition from OH1.

I will probably disable homekit and see if it helps. It’s very frustrating that logging just doesn’t say anything.

I’m on stable on Debian VM so seems like this isn’t solved in newer code which was on the table to try.

Just crashed again. One minute rules are running, next they aren’t. I see nothing in the logs. This is so infuriating. I have removed homekit.

I’m at a loss. I may have to go back to OH 1. It seems OH2 just isn’t ready or something. The rules are all copied right over from OH1, the items are all from OH 1. My OH1 environment ran perfectly for well over a year.

I can’t fix something that doesn’t at least throw an error, I’m not a programmer.

I’ve burned substantial time and have learned nothing.

I started with OH2/OH2.1.0-SNAPSHOT. So I can’t go back to OH1.

In my case it seams that not all rules stops working. I haven’t found a rule which of the rules is crashing. But today all my time related rules stopped working. Moxified, do you know, if all of your rules crashed or only some of them?

I opened an issue. Let’s see what the developers are saying.

Okay, I got an answer and the issue was closed:

There is no code in this repo that is related to rule execution at all.
I also guess you will have to analyse a bit deeper and come up with a reproducible scenario.
Possibly your issue is a duplicate of eclipse/smarthome#3132.

So if our problem it a duplicate and it’s only related to “Time cron” rules it will hopefully be fixed soon. But the problem is that I don’t know how to dig deeper into it.

Yeah, we have different issues. ALL rules stop for me.

@Kai, I’m sorry to bug you but can you explain a little better what you mean by there is no code related to rule execution? My rules stop processing… all of them and no matter what features I have tried restarting, they will not start again until I restart the openhab service. Items still update, mqtt and other bindings all still work.

I admit I don’t have a clue how the core code of this software works but it seems something is in charge of rule execution and handling and it is getting jammed without throwing any error or warning. A hint on something to turn logging up on. I have logging up on anything I could think of and I still see nothing.

Please help!!! The only thing that reproduces it is a random amount of time.