Rules stop processing

kreutzer_peter · July 31, 2017, 7:12pm

well , i guess we have the same issue.
all other rules not cron based are still working.
all cron based are stop working
using ubuntu with no VM

Same for me , no error or warning giving any hint when the triggers stop working.
trying to read throug you thread to get some hints on how to resolve or collecting some more infos to trace the route course.

so far i understood that there is something with the internal scheduler, right?
what is the component/bundle that needs to be in debug log to trace more infos on what cause the issue?

facing the issue since last update from 2.0.0 to 2.1.0

Moxified · July 31, 2017, 7:15pm

No, all my rules stop, cron or otherwise.

My issue was never related to cron only. The thread early on was steered towards a known cron issue at the time but that was a red herring for me.

This initial thread was solved by updating the harmony binding to a snapshot. The issue seems to have returned again with OH 2.1. It happened much less frequently once I had the new harmony snapshot 2.1 binding until I upgraded to OH 2.1 with native bundled harmony binding.

kreutzer_peter · July 31, 2017, 7:49pm

i do not have harmony bundle installed.
I guess the cron based rule issue sould not be related to any other binding, right?

FrankR · August 21, 2017, 12:01pm

@Moxified and @kreutzer_peter have your problems been solved?

All my cron rules stop working every few days while other rules fired by my z-wave items are still running.

This happened after I switched from 2.1.0-SNAPSHOT to 2.1.0-stable and still exists with 2.2.0-SNAPSHOT (#1003).

I am running:
Pushover action 1.11.0.snapshot
Astro binding 2.2.0.snapshot
Exec binding 2.2.0.snapshot
Harmony Hub binding 2.2.0.snapshot
Http binding 1.11.0.snapshot
Network binding 2.2.0.snapshot
Ntp binding 2.2.0.snapshot
Weather binding 1.11.0.snapshot
Zwave binding 2.2.0.snapshot
MapDB persistence 1.11.0.snapshot
VoiceRSS Text-to-speech 2.2.0.snapshot

on a rpi3 with debian jessie.

Moxified · August 21, 2017, 12:55pm

My problem never really has been cron related. I continue to try to track down what causes rules processing to stop for me. At this point, my system has been running for 3 weeks without stopping. I even have tried agitating it and it hasn’t seem phased. I just can’t isolate down what causes it to stop in days sometimes and go months other times.

FrankR · August 21, 2017, 1:56pm

I know that your problem does not only affect cron rules. Sounds frustrating.
I guess I will use linux crontab to check if OH rules are still running and restart the service if not so. Not a solution to be proud of, just a shabby workaround. But I hope this will work for me.

bbubbat · August 25, 2017, 9:22am

@FrankR I still have the problem, that rules stop working. Especially time based rules. I’m wondering if your workaround with crontab works?

rockster · August 25, 2017, 9:49am

I too had a problem like that, changing the persistence service from mysql to jdbc-mysql solved it

FrankR · August 25, 2017, 3:28pm

@bbubbat: this workaround is still a todo, but I’m pretty sure it will help. I have an OH cron rule firing every minute leaving its name in the logfile. Via a shell- oder python script I can check, let’s say every 10 minutes, the difference between now and the last entry of my cron rule in openhab.log and restart openhab2.service if the difference exceeds 10 minutes.

@rockster: my persistence service is MapDB which shouldn’t be a problem … hopefully. How and why did you relate persistence with cron?

rockster · August 25, 2017, 4:17pm

It took weeks before i figured out that somehow the persistence service was the cause for my problem. The whole execution of rules stoped working, after changing the persistence service, everything ran smoothly. I dont know how this was related.

FrankR · August 25, 2017, 5:00pm

So you switched on and off your bindings and services one after the other?

Moxified · August 25, 2017, 5:44pm

I will keep your persistence finding in mind but from my experience, this is very hit or miss and I have been dead set on a fix that seemed to work just to go though freezing hell again a month or so later.

At this point, my system crashed several times in a row for a week. I changed nothing except restart the service the last time and it has been up for over a month. Explain that…

I kinda hope some form of enhancement to the rules engine makes it finally start throwing an error instead of just crapping out.

FrankR · August 25, 2017, 5:52pm

Me too! I opened an issue here.

Moxified · August 25, 2017, 9:03pm

I’m a pessimist for sure but good luck. I’ve been met with mostly silence for this type of issue. The few responses I get are, add loginfo lines to your rules and turn up logging levels on xyz. I have done all of the above and it just creates 10 times the logs when it works with the same eeire silence when it stops.

Thats why I keep saying I hope they enhance rules logging so that it will throw some form of error instead of just stop. I don’t mind chasing or fixing something but when it just plain stops with no rhyme reason or consistency… it’s seemingly impossible for me (a non developer) to figure out.

FrankR · August 26, 2017, 6:27am

@Moxified you are absolutely right. I am a developer (part-time) and I would of course enhance logging first before changing anything. Analysis always has to be the first step.
Nevertheless I guess they are planning about a complete replacement of cron scheduler as you can see here
…

Tdsnet · August 30, 2017, 12:47pm

Hi All,
I have the same problem. 3 days ago i changed from 1.8.3 to 2.1.0 , after some minors changes (color for IOS, chanel for hue) all run perfectly during 8 hours and after Rules stop working.
I use just MQTT, UDP, TCP, HTTP, WEATHER,HUE binding, items still update, command are correct but Rules stop.
I need logout and restart console (I use Windows 10 x64). and rules stop after one or more hours…
In log: no messages… just stop logging all messages from rules.
If i modify one rule and save, i have message in log
[INFO ] [el.core.internal.ModelRepositoryImpl] - Refreshing model 'ograff.rules’
but Rules stays inactives…
So i think i wait some 1 other year before to use 2.x.x , and today i restore 1.8.3
Good luck

chrisslh · February 9, 2018, 6:44pm

Some time ago since this topic has been discussed.
I updated since weeks to Openhab 2.2 and have exactly the same issue.
Rules stop working…no error message, nothing…randomly sometimes daily sometimes after 2 or 3 days.
Events.log still has entries…openhab.log nothing.
Items are updated normally…but SQL persistence doesn’t record anything.

Does anyone has a hint how to deal with that?
Thanks in advance.

Moxified · February 9, 2018, 7:08pm

My problem went away with 2.2. BUT, I changed a fair amount with the cut over.

I would do two things:

Start Using VSCode if you are not so that you can get solid error checking.
Look through your rules and find any use of “Thread::sleep” and eliminate as many as possible.

There was a known issue I stumbled across a while ago that reported rules engine would just stop if you had a lot of threads taken up from the this method.

AndyMt · March 14, 2018, 10:05am

I have the same issue - every now and then OH (2.2) stops processing cron rules and rules are unable to create timers. To mitigate this I’ve installed a “watchdog” which forces the system to reboot. Ugly, but it’s a workaround.
I’m using the “Expire” binding for this. Strange enough - but it’s timers seem to work even if the OH scheduler is down…

This is what I do:

install the expire binding
add a watchdog item:
Switch EX_Watchdog_Scheduler "Watchdog for Scheduler [%s]" (G_Watchdog) { expire="4m, state=OFF" }
Add a rule which sets the watchdog periodically:

    //-------------------------------------------------------------
    // Check if cron'ed rules are still working
    rule "CheckCronScheduler"
    when
    	Time cron "7 * * * * ?"
    then
    	// this signals that the scheduler is still working
    	EX_Watchdog_Scheduler.sendCommand(ON)
    end

Then have a rule which get’s triggered if the watchdog changes to OFF - which happens when it’s not updated for more than xyz minutes:

//------------------------------------------------
// this fires if the scheduler is offline
rule "Actions.Force.SchedulerRestart"
when
    Item EX_Watchdog_Scheduler changed to OFF
then
    // skip if this is a fresh reboot, otherwise => circle...
    if (!(SI_CPU_SystemUptime.state instanceof DecimalType) || (SI_CPU_SystemUptime.state as DecimalType)*1.0 < 5.0)
        return;
    logError("Actions.Force.SchedulerRestart", "Scheduler error: reboot!")
    Thread::sleep(500)
	executeCommandLine("sudo -S /sbin/reboot", 5000)
end

How to force a reboot is a whole different topic, this of course depends on your installation and platform. I had to add the reboot command to sudoers - but be very careful with that!

As an alternative I’ve tried to restart the OH scheduler bundle via Karaf. This works if I do it manually, but never when doing it by script - even though the bundles are effectively stopped and started (checked that manually).

magd1978 · May 17, 2018, 7:11am

Hello.
I will tell you my experience:
I was having the same problems: since version 2.2 of openHAB the rules stopped working randomly.
First I realized that it had to do with failures in the connection to Internet, that is to say, when openHAB lost the connection to Internet for a time the rules stopped working. Then I realized that it happened whenever I stopped receiving meteorological information through “WeatherUnderground” binding. Finally I located the problem in a rule that determines whether there is cold or heat, based on the weather information. I have come to the conclusion that this rule, despite having a try…catch block, caused the rule engine to stop when trying to translate a “NULL” or “UNDEF” state to a double type value. To avoid this, I added a synchronized() block with the item state, so that it would not change until it does the necessary checks and the assignation to a double type var. So far, the scripting engine has not stopped again. I attached the script as an example:

/* SENSOR FRÍO/CALOR EXTERIOR (regla de openHAB 2)
 * Autor: Manuel Alberto Guerrero Díaz
 * Versión: 1.0.20180515
 * Descripción: Determina si hace frío o calor en el exterior en base
 * la información de los ítems "Weather_Temperature",
 * "Temp_Confort_Min" y "Temp_Confort_Max", con una histéresis
 * especificada en la constante "histeresisFCE". También se tendrá en
 * cuenta el ítem "Info_Meteo", que nos informa de si tenemos
 * información meteorológica actualizada.
 * En el caso de que alguno de los ítems nombrados anteriormente no haya
 * sido inicializado, o la información meteorológica no sea válida, se
 * considerará falso.
 * La regla se ejecutan al arrancar el sistema, al
 * recibir una cambio en los valores de los ítems "Weather_Temperature",
 * "Temp_Confort_Min", "Temp_Confort_Max" o "Weather_LastUpdate".
 * El resultado se envía como actualización de estado de los ítems
 * "Frio_Exterior" y "Calor_Exterior".
 * NOTA: Para trazar poner en Karaf "log:set DEBUG org.eclipse.smarthome.model.script.frio_calor_exterior" (después volver a poner "INFO")
*/


import java.util.concurrent.locks.ReentrantLock

val double histeresisFCE = 1.00
var ReentrantLock frio_calor_exterior_lock  = new ReentrantLock()

rule "Sensor frío/calor exterior"
when
	Item Weather_Temperature changed or
	Item Temp_Confort_Min changed or
	Item Temp_Confort_Max changed or
	Item Info_Meteo changed or
    System started
then
	frio_calor_exterior_lock.lock()
	try {
		var double tempConfortMin = null
		var double tempConfortMax = null
		var double tempExterior = null

		synchronized(Weather_Temperature.state) {
			if ((Weather_Temperature.state.toString == "NULL") || (Weather_Temperature.state.toString == "UNDEF") || (Info_Meteo.state != ON)){
				logDebug("frio_calor_exterior", "La temperatura exterior no está disponible o la información está obsoleta, luego decimos que no sabemos si hay frío o calor (UNDEF)")
				Frio_Exterior.postUpdate(UNDEF)
				Calor_Exterior.postUpdate(UNDEF)
			} else {
				if ((Frio_Exterior.state.toString == "NULL") || (Frio_Exterior.state.toString == "UNDEF")) Frio_Exterior.postUpdate(OFF)
				if ((Calor_Exterior.state.toString == "NULL") || (Calor_Exterior.state.toString == "UNDEF")) Calor_Exterior.postUpdate(OFF)
				tempExterior = (Weather_Temperature.state as DecimalType).doubleValue
				logDebug("frio_calor_exterior", "La temperatura exterior es " + tempExterior + "ºC")
				if ((Temp_Confort_Min.state.toString == "NULL") || (Temp_Confort_Min.state.toString == "UNDEF")) {
					logDebug("frio_calor_exterior", "No se ha definido una temperatura mínima de confort y no podemos comparar, luego decimos que no hay frío")
					Frio_Exterior.postUpdate(OFF)
				} else {
					tempConfortMin = (Temp_Confort_Min.state as DecimalType).doubleValue
					logDebug("frio_calor_exterior", "La temperatura mínima de confort definida es " + tempConfortMin + "ºC y la histéresis es " + histeresisFCE + "ºC")
					if (tempExterior <= (tempConfortMin - histeresisFCE)){				
						logDebug("frio_calor_exterior", "Temperatura exterior no superó mínima de confort - histéresis: actualizamos estado de Frio_Exterior a ON")
						Frio_Exterior.postUpdate(ON)
					} else if (tempExterior >= (tempConfortMin + histeresisFCE)) {
						logDebug("frio_calor_exterior", "Temperatura exterior alcanzó mínima de confort + histéresis: actualizamos estado de Frio_Exterior a OFF")
						Frio_Exterior.postUpdate(OFF)
						} else {
							logDebug("frio_calor_exterior", "Temperatura exterior dentro de la histéresis de temperatura mínima de confort: no actualizaremos estado de Frio_Exterior")
						}
				}
				if ((Temp_Confort_Max.state.toString == "NULL") || (Temp_Confort_Max.state.toString == "UNDEF")) {
					logDebug("frio_calor_exterior", "No se ha definido una temperatura máxima de confort y no podemos comparar, luego decimos que no hay calor")
					Calor_Exterior.postUpdate(OFF)
				} else {
					tempConfortMax = (Temp_Confort_Max.state as DecimalType).doubleValue
					logDebug("frio_calor_exterior", "La temperatura máxima de confort definida es " + tempConfortMax + "ºC y la histéresis es " + histeresisFCE + "ºC")
					if (tempExterior >= (tempConfortMax + histeresisFCE)){
						if (Frio_Exterior == ON) {
							logDebug("frio_calor_exterior", "Temperatura exterior alcanzó máxima de confort + histéresis, pero pondremos Calor_Exterior a OFF porque se solapa con Frio_Exterior=ON")
							Calor_Exterior.postUpdate(OFF)
						} else {
							logDebug("frio_calor_exterior", "Temperatura exterior alcanzó máxima de confort + histéresis: actualizamos estado de Calor_Exterior a ON")
							Calor_Exterior.postUpdate(ON)
						}
					} else if (tempExterior <= (tempConfortMax - histeresisFCE)) {
						logDebug("frio_calor_exterior", "Temperatura exterior no superó máxima de confort - histéresis: actualizamos estado de Calor_Exterior a OFF")
						Calor_Exterior.postUpdate(OFF)
						} else {
							logDebug("frio_calor_exterior", "Temperatura exterior dentro de la histéresis de temperatura máxima de confort: no actualizaremos estado de Calor_Exterior")
						}
				}
			}
		}
	} catch(Throwable t) {
		logError("frio_calor_exterior", "error de ejecución: " + t.localizedMessage.toString())
	} finally {
		frio_calor_exterior_lock.unlock()
	}
end