Rules stop running in openHAB 2.5.1

Hi All.

I have ONE rule in openHAB and it stops running after a while. There are NO OTHER RULES. No events log entries for the issue, no openhab log entries for the issue. I have read through Rich’s posts on “Why do rules stop” and applied as best I can as follows;

I have defined two basic items as follows;

Switch motionControl “”
Switch emailControl “”

I have defined the rule as follows;

var Timer clearEmail = null
var Timer motionDetect = null

rule “controlRules” when

    System started


    then

    if(motionDetect === null){
    motionDetect = createTimer(now, [|
    if(motionControl.state == null || motionControl.state == OFF){sendCommand(motionControl, ON)}
    else {if(motionControl.state !== OFF){sendCommand(motionControl, OFF)}}
    motionDetect.reschedule(now.plusSeconds(10))
    ])}


    if(clearEmail === null){
    clearEmail = createTimer(now, [|
    if(emailControl.state == null || emailControl.state == OFF){sendCommand(emailControl, ON)}
    else {if(emailControl.state !== OFF){sendCommand(emailControl, OFF)}}
    clearEmail.reschedule(now.plusMinutes(5))
    ])}

The idea is that these two items are “flip, flopped” every 10 seconds and 5 minutes respectively as watchdog controls.

The rule runs with Qaurtz Threads set to 100, Quartz thread priority set to 6, and Rule Engine threads set to 100. The Virtual machine is a Ubuntu 19.10, 4Gb RAM and 16GB storage, with 50% free Disk space.
CPU’s run on average 20% utilization and RAM around 940MB of the 4096MB.

The logs for event and openhab show the execution running smoothly for around 6 hours and then everything stops with no errors in the logs. There are no backup or other processes running in this time frame or consecutive test time frames. The six hours to 8 hours seems to be constant.

If I adjust the rules threads, or quartz threads, or both, to lesser then the corresponding time between stoppages also reduces linearly, suggesting that there is thread dependancies or thread ageing is a possible issue?

According to the posts this is textbook and I don’t understand why this rule “only one and one in isolation” should be problematic?

Any help would be greatly appreciated.

Hi,

could you try and add some log output on when the rule is fired and when the timer is rescheduled?
I have seen some issues where System started is triggered not only on system start, but when e.g. z-wave things are reconfigured.

Other than that, I would maybe recommend having a look at expire items, and at triggering your motion detection with something else than system started… (e.g. motionDetect received update)

All the best,

Bob

Hi Bob.

System start works fine, the rule runs fine. The problem is it just stops. Multiple System starts should not affect it as init is null value.

/var/log/openhab2/events.log:

You have new mail.
Last login: Sun Feb 23 09:32:43 2020 from 192.168.2.170
root@ubuntu:~# systemctl restart openhab2
root@ubuntu:~# tail -f /var/log/openhab2/events.log
2020-02-23 09:43:33.123 [ome.event.ItemCommandEvent] - Item ‘motionControl’ received command ON
2020-02-23 09:43:33.125 [vent.ItemStateChangedEvent] - motionControl changed from OFF to ON
2020-02-23 09:43:43.126 [ome.event.ItemCommandEvent] - Item ‘motionControl’ received command OFF
2020-02-23 09:43:43.127 [vent.ItemStateChangedEvent] - motionControl changed from ON to OFF
2020-02-23 09:43:53.126 [ome.event.ItemCommandEvent] - Item ‘motionControl’ received command ON
2020-02-23 09:43:53.127 [vent.ItemStateChangedEvent] - motionControl changed from OFF to ON
2020-02-23 09:44:03.133 [ome.event.ItemCommandEvent] - Item ‘motionControl’ received command OFF
2020-02-23 09:44:03.134 [vent.ItemStateChangedEvent] - motionControl changed from ON to OFF
2020-02-23 09:44:13.134 [ome.event.ItemCommandEvent] - Item ‘motionControl’ received command ON
2020-02-23 09:44:13.139 [vent.ItemStateChangedEvent] - motionControl changed from OFF to ON
2020-02-23 09:44:23.140 [ome.event.ItemCommandEvent] - Item ‘motionControl’ received command OFF
2020-02-23 09:44:23.143 [vent.ItemStateChangedEvent] - motionControl changed from ON to OFF
2020-02-23 09:44:33.140 [ome.event.ItemCommandEvent] - Item ‘motionControl’ received command ON
2020-02-23 09:44:33.152 [vent.ItemStateChangedEvent] - motionControl changed from OFF to ON
2020-02-23 09:44:43.142 [ome.event.ItemCommandEvent] - Item ‘motionControl’ received command OFF
2020-02-23 09:44:43.143 [vent.ItemStateChangedEvent] - motionControl changed from ON to OFF

/var/log/openhab2/openhab.log

2020-02-23 09:43:01.807 [INFO ] [.core.internal.i18n.I18nProviderImpl] - Time zone set to ‘#####’.
2020-02-23 09:43:01.817 [INFO ] [.core.internal.i18n.I18nProviderImpl] - Locale set to ‘####’.
2020-02-23 09:43:06.413 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model ‘VideoMonitor.items’
2020-02-23 09:43:07.040 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model ‘VideoMonitor.sitemap’
2020-02-23 09:43:08.683 [INFO ] [thome.model.lsp.internal.ModelServer] - Started Language Server Protocol (LSP) service on port 5007
2020-02-23 09:43:13.362 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model ‘control.rules’
2020-02-23 09:43:14.099 [INFO ] [.dashboard.internal.DashboardService] - Started Dashboard at http://xxx.xxx.xxx.xxx:8080
2020-02-23 09:43:14.100 [INFO ] [.dashboard.internal.DashboardService] - Started Dashboard at https://xxx.xxx.xxx.xxx:8443
2020-02-23 09:43:14.146 [INFO ] [penhab.io.transport.mqtt.MqttService] - MQTT Service initialization completed.
2020-02-23 09:43:14.147 [INFO ] [t.mqtt.internal.MqttBrokerConnection] - Starting MQTT broker connection ‘localhost’
2020-02-23 09:43:14.394 [INFO ] [t.mqtt.internal.MqttBrokerConnection] - Starting MQTT broker connection ‘mqtt’
2020-02-23 09:43:14.692 [WARN ] [mqtt.internal.MqttPersistenceService] - mqtt-persistence:broker
2020-02-23 09:43:14.693 [WARN ] [mqtt.internal.MqttPersistenceService] - mqtt-persistence:topic
2020-02-23 09:43:14.694 [WARN ] [mqtt.internal.MqttPersistenceService] - mqtt-persistence:message
2020-02-23 09:43:15.073 [INFO ] [ui.habmin.internal.servlet.HABminApp] - Started HABmin servlet at /habmin
2020-02-23 09:43:15.179 [INFO ] [panel.internal.HABPanelDashboardTile] - Started HABPanel at /habpanel
2020-02-23 09:43:15.290 [INFO ] [openhab.ui.paper.internal.PaperUIApp] - Started Paper UI at /paperui

/etc/logrotate.conf;

hourly
su root adm
rotate 1
create
dateext
compress
include /etc/logrotate.d

When these timers stop, what is the condition of the rest of the system? UI still responsive? can you update your test Items from REST API ?

Just an aside really, that can never be true. Maybe meant
motionControl.state == NULL

I agree. Would you complete the experiment, and set the 10-second timer to say 5-seconds? If altering the thread pool alters the run duration, then so should altering the rate of whatever is going wrong, if timer connected.

I would not start on System startup, and wouldn’t check for == null.
This can be different when you use mapdb persistence for example.
And if the timer expires, you cannot .reschedule() it any more.
Think of a completely different approach using cron or the like.
Go study the Design Patterns in the Solution category.

An executing code block of a timer can reschedule itself.
Now, maybe there is some problem with that we’re looking for, but this an often used technique.

@rossko, (thanks for the NULL tip, you are correct, a Item state cannot be null, but NULL, I have used a sendCommand in the init to force default boot state to OFF for the Item, and removed the “NULL” piece of code).,

The experiment worked fine, as you suspected, this due to the fact that I forgot to disable a older rule, that uses several executeCommand’s (each as a timer, wrapped in a main timer code loop) with return and timeout set to 1 second apiece.(there are currently 5 of them = 5 seconds total + whatever processing time with no reschedule on each execCommand timer). The main timer containing the exeCommand timers then loops every 10 seconds. various techniques of triggering applied varying from crom, system start as init.

Basically the executeCommand’s grep’s a text file for various key words (one key word per executeCommand) and returns the first matching value (-m 1) of each unique keyword, assigns it to a variable string, which the code in each executeCommand timers code loop (within hours of a day schedule) then uses to sendCommand in order to change a Item.state to ON.

Then there is a timer rule that then turns the Item off.
The timer rule works fine as when the Item received a ON command it uses two other Item.states to prevent multi firing of the timer rule. (The “autoTimer” rule is not included here, just the “motionDetect” ruke )

The issue appears to be the same as in the post “Why do my rules stop running” basically the quartz thread pool where available threads are used up by the odd executeCommand that fails to return a value in time or a failure of the system to process said backlog.

Having read all the posts it appears that (especially Rich) tries every way possible to address avoiding the need to run code in this fashion ranging from reducing rule execution time to alternative methods such as having the executeCommand run the rule with no return that fires a script that then returns the value to openHAB via LSB, MQTT or REST.

There is no real answer for resolving this relating to bug fixes or better error handling from openHAB’s side, however older versions of openHAB (2.1) appear to have had better error handling and been more resilient.

The ultimate question I have is what are the true limmitations in openHAB relating to executeCommand, Quartz and thresholds?

below I have attached the code for reference.

var Timer timer1 = null
var Timer timer2 = null
var Timer timer3 = null
var Timer timer4 = null
var Timer timer5 = null
var Timer motionDetect = null

rule “motionDetect” when

    System started // Use Startup or
    //Time cron "0/10 * * * * ?" //Every 10 seconds using cron

    then

    if(motionDetect === null){
    motionDetect = createTimer(now, [|

    if(timer1 === null){
    timer1 = createTimer(now, [ |var readValue_Front_Garages =  executeCommandLine(" sudo /bin/grep -m 1 \"Front_Garages\" /path/text.file",1000)
    if (readValue_Front_Garages == ''){if(Front_Garages.state !== OFF){sendCommand(Front_Garages, OFF)}} else {if(Front_Garages.state !== ON){sendCommand(Front_Garages, ON)>
    var Number hour = now.getHourOfDay
    if(readValue_Front_Garages == ''){} else { if(hour >= 19 || hour <= 05){if(Driveway_Floodlight.state !== ON){sendCommand(Driveway_Floodlight, ON)}}}
            ])} else {if(timer1 !== null){timer1.cancel timer1 = null}}

    if(timer2 === null){
    timer2 = createTimer(now, [ |var readValue_Front_House =  executeCommandLine(" sudo /bin/grep -m 1 \"Front_House\" /path/text.file",1000)
    if (readValue_Front_House == ''){if(Front_House.state !== OFF){sendCommand(Front_House, OFF)}} else {if(Front_House.state !== ON){sendCommand(Front_House, ON)}}
    var Number hour = now.getHourOfDay
    if(readValue_Front_House == ''){} else { if(hour >= 19 || hour <= 05){if(Driveway_Floodlight.state !== ON){sendCommand(Driveway_Floodlight, ON)}}}
            ])} else {if(timer2 !== null){timer2.cancel timer2 = null}}

    if(timer3 === null){
    timer3 = createTimer(now, [ |var readValue_Patio_Courtyard =  executeCommandLine(" sudo /bin/grep -m 1 \"Patio_Courtyard\" /path/text.file",1000)
    if (readValue_Patio_Courtyard == ''){if(Patio_Courtyard.state !== OFF){sendCommand(Patio_Courtyard, OFF)}} else {if(Patio_Courtyard.state !== ON){sendCommand(Patio_Cour>
    var Number hour = now.getHourOfDay
    if(readValue_Patio_Courtyard == ''){} else { if(hour >= 19 || hour <= 05){if(Patio_Floodlight.state !== ON){sendCommand(Patio_Floodlight, ON)}}}
            ])} else {if(timer3 !== null){timer3.cancel timer3 = null}}

    if(timer4 === null){
    timer4 = createTimer(now, [ |var readValue_Rear_House =  executeCommandLine(" sudo /bin/grep -m 1 \"Rear_House\" /path/text.file",1000)
    if (readValue_Rear_House == ''){if(Rear_House.state !== OFF){sendCommand(Rear_House, OFF)}} else {if(Rear_House.state !== ON){sendCommand(Rear_House, ON)}}
            ])} else {if(timer4 !== null){timer4.cancel timer4 = null}}

    if(timer5 === null){
    timer5 = createTimer(now, [ |var readValue_Side_House =  executeCommandLine(" sudo /bin/grep -m 1 \"Side_House\" /path/text.file",1000)
    if (readValue_Side_House == ''){if(Side_House.state !== OFF){sendCommand(Side_House, OFF)}} else {if(Side_House.state !== ON){sendCommand(Side_House, ON)}}
    var Number hour = now.getHourOfDay
    if(readValue_Side_House == ''){} else { if(hour >= 19 || hour <= 05){if(Patio_Floodlight.state !== ON){sendCommand(Patio_Floodlight, ON)}}}
           ])} else {if(timer5 !== null){timer5.cancel timer5 = null}}

    motionDetect.reschedule(now.plusSeconds(10)) //Reschedule when System start is used, this line disabled when cron is used.
    ])} else {if(motionDetect !== null){motionDetect.cancel motionDetect = null}}

    end

Okay so it turns out not to be quite like that, you’ve got complex script calls going on.

This is all a bit stinky. I say that because at heart openHAB is an event driven system - stuff happens in the world, triggers consequences - not clock driven.
Can you outline what this arrangement is for? The scripts are polling devices, or it’s a random simulation, or?

This sounds like something the LogReader Binding can do perhaps. Or, as you hint at, I’d probably write a simple cron script to push these values to OH using the REST API rather than have OH Rules sit there and load them every ten seconds. There are just some things that are better done outside of OH.

If you are running out of threads than it is likely the case that the timing just worked out in older versions of OH and because of various changes are no longer working out. This part is actually better in 2.5 now as the thread pool for Timers has increased from 2 to 10.

However, overall based on your description, I think the whole approach is a little suspect. It sounds way more complicated than necessary and it is apparently brittle.

Here are some simplification/improvements. These are not comprehensive nor are they all applicable at the same time.

  1. Rewrite the Rules using Scripted Automation, the limited thread pool issue doesn’t exist with Scripted Automation.

  2. See if the LogReader binding can read the lines from these files and issue the command.

  3. Use the Exec binding instead of executeCommandLine to execute the command every 10 seconds trigger the Rule on changes to an Item linked to the out Channel. Bindings don’t consume a Rule or Timer thread to execute the command every ten seconds.

  4. Use meaningful names for all Items and variables (e.g. frontGaragesTimer instead of timer1).

  5. There is a lot of duplicated code here it can be made much more generic.

  6. Only use === or !== for null.

  7. Always use the function on the Item instead of the action for sendCommand and postUpdate where possible.

  8. Use new lines and indentation. You’ve got two/three? layers of nested if statements all on one line. It’s all but impossible to figure out what goes with what. That’s what indentation is for.

  9. You have apparent and significant syntax errors. Does this Rule even parse? Maybe it’s truncating the really long lines?

  10. Consider something like the Time of Day Design Pattern to centralize the calculation of time periods.

So, for example, if we used the Exec binding configured to execute every 10 seconds instead of executeCommandLine and Timers the Rule would become:

import org.eclipse.smarthome.model.script.ScriptServiceUtil

rule "Front Garages"
when
    Item FrontGarages_Out received update or
    Item FrontHouse_Out received update or
    Item Patio_Courtyard_Out received update or
    Item Rear_House_Out received update or
    Item Side_House_Out received update
then
    val device = ScriptServiceUtil.getItemRegistry.getItem(triggeringItem.name.replace("_Out", ""))

    // No output, turn off the device
    if(triggeringItem.state.toString == '' && device.state != OFF) device.sendCommand(OFF)

    // Output, turn on the device and the floodlight if it's night
    else {
        if(device.state != ON) device.sendCommand(ON)

        // Get the secondary device
        var SwitchItem floodlight = null
        if(triggeringItem.name.contains("Front")) floodlight = Driveway_Floodlight
        else if(triggeringItem.name.contains("Patio") || triggeringItem.name.contains("Side")) floodlight = Patio_Floodlight

        // Turn on device2 if it's night time
        if(device2 !== null &&vTimeOfDay.state == "NIGHT" && device2.state != ON) device2.sendCommand(ON)
    }
end

You could add all of the Out Items above to a Group and use Member of Out received update instead of listing each Item individually.

The above code is a little easier to follow, much shorter, and avoids all duplicate code. Furthermore, it doesn’t use any Timer threads and it doesn’t consume a Rule thread for more than a few hundred milliseconds.

Now, I did just type this in, there is likely to be errors.

1 Like

Thanks Rich, I am sorted. I have built a bash script running from OS cron to push the events via MQTT to openHAB. I just found that Python classes were a little too long winded for my taste.

This topic was automatically closed 41 days after the last reply. New replies are no longer allowed.