Rules stop processing

Yes indeed. In fact, I was initially on the wrong foot here, and I now remember that I did switch this over as part of the PR I once prepared on the Automation part of the core. But it was never merged cfr the discussion we had afterwards on the contributions of Johann (scriptable automation by smerschjohann · Pull Request #1783 · eclipse-archived/smarthome · GitHub)

@kgoderis, @Kai could we help somehow? Or should we wait for your analysis and bug fix for further testing?

@bbubbat. Quartz is a Java scheduler library that is apparently used for the timer triggers in the OH1-style rules. You could try adding org.quartz to the log traces. I’d also go back through your log and look for any references to quartz in the logs.

@steve1 Okay, here are two logs. One with minutely cron jobs working. The other with only one job execution. Both logs with org.quartz set to TRACE.

You can find them here: https://www.dropbox.com/s/0fo9ep308rxm49b/openhab%20logs.zip?dl=0

Note: It take’s several docker restarts to get the cron rules working. Because we only focus on the cron rules, yet, I didn’t test the other rules.

Nope, no TRACE entries whatsoever. The rules are still running though so they have to be parsing?

I haven’t restarted openhab since setting the trace level but I haven’t needed to before for other log changes.

It might be coincidence, but it seems that createTimer() is not working, if cron related rules stop working. I just mention it because it may help to narrow down the problem.

I do use a lot of timers in my rules. The first time the system crashed was right around running several rules that contain timers as it was handling closing my garage door. I’m not so sure about the other times.

@bbubbat do you by any chance use the harmony binding?

@Moxified yes, I use it and it works great (if only the HomeKit integration wouldn’t crash)

Interesting. I am tracking an issue with harmony that others claim also cause their rules to stop.

I expect oh to crash today based on history timelines. If it does, that rules out HomeKit and will disable harmony and try again.

Okay, before I used OH, I used ioBroker and after that homebridge. They all have problems with the harmony hub. I don’t know if it was exactly the same problem, but at some point there was a problem parsing the xml from the harmony hub and it stopped working. So what I did and also do with OH, as soon as such an error occurs in the log file, I restart the application (in this case OH).
Right now, all of my rules are working since yesterday. But HomeKit crashed. So unfortunately I can’t use HomeKit, but at least everything else is running.

Harmony ran perfectly for me for over a year on oh1. It’s only been a problem since migrating to 2 a month ago…

I trust he will fix it but it seems like there is a rewrite of xmpp or smack or something he needs to dig through which is bogging him down.

Well, perhaps the issue is homekit. My OH install has been running for over a week now with no crashing since I removed the homekit binding.

The harmony plugin is another potential but that’s been running this whole time. They also supposedly fixed that so I will upgrade to the binding snapshot next time I restart OH.

The only other thing I can think of is that I probably had changed items and rules on the fly without restarting OH. I stayed completely hands off over the last week other than to tail the logs. Last night I changed a bunch of rules and a couple of items that needed some tweaking and didn’t restart. I’ll give it a few more days to see if it crashes.

@Moxified My installation is now up and running since 44 hours. No need for a restart so far. I found this Thread. There is a tip from digitaldan, saying to turn off IPv6. And in fact this solved my problem at least with HomeKit. I also installed the latest OH snapshot. So I can’t say if the rules now work because HomeKit doesn’t crash anymore or because the developers fixed some stuff or both. Nevertheless if you still experience HomeKit crashes you may give disabling IPv6 a try.

Interesting. I’ll give that a go as well.

I’m still testing but I’m actually leaning towards it being an issue with changing a config file (rules, items, sitemap) while OH is running.

After the previously mentioned week of it working I changed a couple of config items and two days later it crashed still without homekit but I hadn’t rebooted it.

About a week ago I setup a second OH install and set them to monitor each other through some rules and mqtt switches that one will turn on every hour and the other should shut it back off. If this doesn’t happen, push notifications are sent by the surviving rules engine. The system ran for several days without issue.

Two days ago I made changes to the backup OH’s rules, items and sitemap without restarting OH service. I also turned homekit back on on the production OH and restarted it. I’m expecting the backup OH install to crash sometime today but it is slightly different than my production one and may behave differently as such.

Just to “close the loop” not that I have definitive proof but my system has been running for about 20 days now without any issues. The only thing that really changed was upgrading the harmony plugin. Others had seen rules processing issues as well.

For the time being I keep a second copy of OH running simply to watchdog my production version. Since VM’s are lightweight, I may just adopt this strategy moving forward for additional peace of mind.

I am currently facing same issue that cron based rules stop working after several hours or days.
After a none deterministic time the rules are not triggered anymore.
Log does not show any info why the cron based rules are not triggerd anymore.

I have 2 rule with
Time cron “0 0/1 * * * ?” // every minute

Is there any conclusion on how to get these rules up and running again.
using Ubuntu 16.04.2 LTS with openhab 2.1.0-1

thx

This is slightly different as mine were all stopping. Unfortunately random rule engine freeze without warning is something I have come to live with for now. It has me thinking about moving to a different solution too which is sad but all historical attempts to figure it out leaves me scratching my head with the same recommendations to add more logging which I have done and never shows anything. I love what automation adds to my life but I’m getting sick of babysitting an ornery rules engine.

I do all of my rules through designer and they all check out. They are mostly very simple rules. No warnings or errors or any common denominator. Sometimes I’ll go a few weeks, sometimes a day, sometimes an hour. I have a second OH instance that literally sets an mqtt switch every hour and if the primary doesn’t switch it back, I get a text message. Oh and I have to have the primary do the same for the secondary as it goes down randomly (although much less) too even though it has virtually no rules.

1.8 worked great for over a year with none of this.

Maybe the common denominator is the Debian OS. Perhaps it running in a VM? I ran my 1.8 on a physical box.

well , i guess we have the same issue.
all other rules not cron based are still working.
all cron based are stop working
using ubuntu with no VM

Same for me , no error or warning giving any hint when the triggers stop working.
trying to read throug you thread to get some hints on how to resolve or collecting some more infos to trace the route course.

so far i understood that there is something with the internal scheduler, right?
what is the component/bundle that needs to be in debug log to trace more infos on what cause the issue?

facing the issue since last update from 2.0.0 to 2.1.0

No, all my rules stop, cron or otherwise.

My issue was never related to cron only. The thread early on was steered towards a known cron issue at the time but that was a red herring for me.

This initial thread was solved by updating the harmony binding to a snapshot. The issue seems to have returned again with OH 2.1. It happened much less frequently once I had the new harmony snapshot 2.1 binding until I upgraded to OH 2.1 with native bundled harmony binding.