Rules stop executing after a while

i have already crons in crontab on my pc and they works fine.

i will open an issue on github

Here’s the issue: Rules stops executing after a while #660

I’m adding to this thread here, because I have been seeing similar problems over the last few weeks. I’m running OH 2.2 release, and after a few days rules simply stop triggering. My cron rules still work, and events are coming in, but all rules that are being triggered by events stop.

No error, or anything. It mostly happens at about 3:30 am (don’t know why), but I have had it happen at other times.

I’m running on a quad core proliant gen 8 server, with thread pool set to 50.

I can’t pinpoint any one cause - but I have set up a cron rule, that detects when rules stops triggering, and logs it (it only runs every 5 minutes though, so it takes a little while to detect that rules have stopped).

Here is my rules log output when this happens:

14-May-2018 03:34:14.568 [DEBUG] [ntime.internal.engine.RuleEngineImpl] - Executing rule 'Parse Json Data from MCT 111BC Module'
14-May-2018 03:34:14.569 [INFO ] [pse.smarthome.model.script.JSON Data] - Received from module: MCThings111BC : {"time":"1526268850473","Uptime":13251347,"UptimeString":"153:8:55:47","BatteryVoltage":3596,"Temperature":24.250000,"LowBattery":false}
14-May-2018 03:34:14.570 [INFO ] [pse.smarthome.model.script.JSON Data] - item: 0 : "time"->"1526268850473"
14-May-2018 03:34:14.570 [INFO ] [pse.smarthome.model.script.JSON Data] - item: 1 : "Uptime"->13251347
14-May-2018 03:34:14.570 [INFO ] [pse.smarthome.model.script.JSON Data] - item: 2 : "UptimeString"->"153:8:55:47"
14-May-2018 03:34:14.571 [INFO ] [pse.smarthome.model.script.JSON Data] - item: 3 : "BatteryVoltage"->3596
14-May-2018 03:34:14.571 [INFO ] [pse.smarthome.model.script.JSON Data] - item: 4 : "Temperature"->24.250000
14-May-2018 03:34:14.571 [INFO ] [pse.smarthome.model.script.JSON Data] - item: 5 : "LowBattery"->false
14-May-2018 03:34:14.572 [INFO ] [pse.smarthome.model.script.JSON Data] - Converted to JSON List: ["time":"1526268850473", "Uptime":13251347, "UptimeString":"153:8:55:47", "BatteryVoltage":3596, "Temperature":24.250000, "LowBattery":false] of size: 6
14-May-2018 03:34:14.572 [INFO ] [pse.smarthome.model.script.JSON Data] - Converted to HashMAP: {Temperature=24.250000, time=1526268850473, UptimeString=153:8:55:47, Uptime=13251347, LowBattery=false, BatteryVoltage=3596}
14-May-2018 03:34:14.572 [INFO ] [pse.smarthome.model.script.JSON Data] - Posting: 24.250000 to: MCThings111BCTemperature
14-May-2018 03:34:14.573 [INFO ] [pse.smarthome.model.script.JSON Data] - Received time_string: 1526268850473
14-May-2018 03:34:14.573 [INFO ] [pse.smarthome.model.script.JSON Data] - Received time, milliseconds: 1526268850473
14-May-2018 03:34:14.573 [INFO ] [pse.smarthome.model.script.JSON Data] - posting json timestamp: 2018-05-13T23:34:10.473-04:00 to: MCThings111BCJsonTimestamp
14-May-2018 03:34:14.575 [INFO ] [pse.smarthome.model.script.JSON Data] - Posting: 153:8:55:47 to: MCThings111BCUptimeString
14-May-2018 03:34:14.575 [INFO ] [pse.smarthome.model.script.JSON Data] - Posting: 13251347 to: MCThings111BCUptime
14-May-2018 03:34:14.576 [INFO ] [pse.smarthome.model.script.JSON Data] - Posting: CLOSED to: MCThings111BCLowBattery
14-May-2018 03:34:14.576 [INFO ] [pse.smarthome.model.script.JSON Data] - Posting: 3596 to: MCThings111BCBatteryVoltage
14-May-2018 03:34:14.577 [DEBUG] [ntime.internal.engine.RuleEngineImpl] - Executing rule 'MCThings 111BC Basement Entry Battery Monitor Temperature'
14-May-2018 03:34:14.577 [DEBUG] [ntime.internal.engine.RuleEngineImpl] - Executing rule 'MCThings 111BC Basement Entry Battery Monitor Last Updated'
14-May-2018 03:34:14.578 [INFO ] [e.smarthome.model.script.Last Update] - MCThings111BCLastUpdate DateTimeItem updated at: Mon 03:34:14
14-May-2018 03:34:14.578 [DEBUG] [ntime.internal.engine.RuleEngineImpl] - Executing rule 'MCThings 111BC Basement Entry Battery Monitor Status'
14-May-2018 03:34:14.578 [DEBUG] [ntime.internal.engine.RuleEngineImpl] - Executing rule 'MCThings 111BC Basement Entry Battery Monitor Status'
14-May-2018 03:34:14.579 [DEBUG] [ntime.internal.engine.RuleEngineImpl] - Executing rule 'Battery Monitor (111BC) Status'
14-May-2018 03:34:15.615 [DEBUG] [ntime.internal.engine.RuleEngineImpl] - Executing rule 'Monitor LED Current'
14-May-2018 03:34:19.299 [DEBUG] [ntime.internal.engine.RuleEngineImpl] - Executing rule 'Compute garden humidex'
14-May-2018 03:34:19.301 [DEBUG] [ntime.internal.engine.RuleEngineImpl] - Executing rule 'Compute garden humidex'
14-May-2018 03:34:20.724 [DEBUG] [ntime.internal.engine.RuleEngineImpl] - Executing rule 'Glass Break Received Update'
14-May-2018 03:34:20.725 [INFO ] [e.smarthome.model.script.Last Update] - GlassBreakDetectorLastUpdate DateTimeItem updated at: Mon 03:34:20
14-May-2018 03:34:20.725 [DEBUG] [ntime.internal.engine.RuleEngineImpl] - Executing rule 'Glass Break Detected'
14-May-2018 03:34:20.725 [INFO ] [e.smarthome.model.script.Glass Break] - Glass break Detector changed to: NULL
14-May-2018 03:34:20.725 [INFO ] [e.smarthome.model.script.Glass Break] - Glass break Sensor changed to: NULL
14-May-2018 03:34:20.725 [INFO ] [e.smarthome.model.script.Glass Break] - Glass break Alarm General changed to: OFF
14-May-2018 03:34:20.836 [DEBUG] [ntime.internal.engine.RuleEngineImpl] - Executing rule 'Glass Break Received Update'
14-May-2018 03:34:20.837 [INFO ] [e.smarthome.model.script.Last Update] - GlassBreakDetectorLastUpdate DateTimeItem updated at: Mon 03:34:20
14-May-2018 03:34:21.454 [DEBUG] [ntime.internal.engine.RuleEngineImpl] - Executing rule 'Update In Use Light Sensor Back Garden'
14-May-2018 03:34:25.512 [DEBUG] [ntime.internal.engine.RuleEngineImpl] - Executing rule 'Monitor LED Current'
14-May-2018 03:34:26.117 [DEBUG] [ntime.internal.engine.RuleEngineImpl] - Executing rule 'MCThings 111BC Basement Entry Battery Monitor Status'
14-May-2018 03:34:26.118 [DEBUG] [ntime.internal.engine.RuleEngineImpl] - Executing rule 'MCThings 111BC Basement Entry Battery Monitor Status'
14-May-2018 03:34:28.146 [DEBUG] [ntime.internal.engine.RuleEngineImpl] - Executing rule 'Real Item Changed'
14-May-2018 03:35:00.000 [DEBUG] [ntime.internal.engine.ExecuteRuleJob] - Executing scheduled rule 'Reset homekit lock'
14-May-2018 03:35:00.000 [DEBUG] [ntime.internal.engine.ExecuteRuleJob] - Executing scheduled rule 'Test Rules operation'
14-May-2018 03:35:00.001 [INFO ] [smarthome.model.script.Testing Rules] - rules_operating status: True
14-May-2018 03:35:00.001 [ERROR] [ntime.internal.engine.ExecuteRuleJob] - Error during the execution of rule 'Reset homekit lock': null
14-May-2018 03:36:00.001 [DEBUG] [ntime.internal.engine.ExecuteRuleJob] - Executing scheduled rule 'Test Rules operation'
14-May-2018 03:36:00.002 [INFO ] [smarthome.model.script.Testing Rules] - rules_operating status: False
14-May-2018 03:36:00.002 [INFO ] [eclipse.smarthome.model.script.Rules] - Rules have stopped triggering
14-May-2018 03:36:00.176 [DEBUG] [model.script.actions.ScriptExecution] - Scheduled code for execution at 2018-05-14T15:36:00.176-04:00
14-May-2018 03:37:00.001 [DEBUG] [ntime.internal.engine.ExecuteRuleJob] - Executing scheduled rule 'Test Rules operation'
14-May-2018 03:37:00.002 [INFO ] [smarthome.model.script.Testing Rules] - rules_operating status: False

This is the full openhab.log for that time:

2018-05-14 03:34:28.080 [DEBUG] [sistence.rrd4j.internal.RRD4jService] - Stored 'dining_room_humidity' with state '35' in rrd4j database
2018-05-14 03:34:28.099 [DEBUG] [nternal.rest.NestStreamingRestClient] - Received 'put' event, data: {"path":"/","data":{"devices":{"thermostats":{"1jI-LvQvFyOH19V0pvhvs7iEqXFuKc3X":{"humidity":35,"locale":"en-GB","temperature_scale":"C","is_using_emergency_heat":false,"has_fan":true,"software_version":"5.8-8","has_leaf":false,"where_id":"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT0_n4GuwLjLAw","device_id":"1jI-LvQvFyOH19V0pvhvs7iEqXFuKc3X","name":"Dining Room","can_heat":true,"can_cool":true,"target_temperature_c":23.0,"target_temperature_f":73,"target_temperature_high_c":24.0,"target_temperature_high_f":75,"target_temperature_low_c":22.0,"target_temperature_low_f":72,"ambient_temperature_c":22.0,"ambient_temperature_f":72,"away_temperature_high_c":28.0,"away_temperature_high_f":83,"away_temperature_low_c":18.0,"away_temperature_low_f":65,"eco_temperature_high_c":28.0,"eco_temperature_high_f":83,"eco_temperature_low_c":18.0,"eco_temperature_low_f":65,"is_locked":false,"locked_temp_min_c":20.0,"locked_temp_min_f":68,"locked_temp_max_c":22.0,"locked_temp_max_f":72,"sunlight_correction_active":false,"sunlight_correction_enabled":true,"structure_id":"9AY0QSwA9NI6NBmBKWwpk6v8xIwHbh7zqN5wtLN1Bu63Tk5oOFPDrA","fan_timer_active":true,"fan_timer_timeout":"1970-01-01T00:00:00.000Z","fan_timer_duration":15,"previous_hvac_mode":"","hvac_mode":"heat-cool","time_to_target":"~0","time_to_target_training":"ready","where_name":"Dining Room","label":"","name_long":"Dining Room Thermostat","is_online":true,"last_connection":"2018-05-14T07:34:24.620Z","hvac_state":"off"}},"smoke_co_alarms":{"8DPl0FPVZ_D5iDYvVC6mobiEqXFuKc3X":{"locale":"en-US","structure_id":"9AY0QSwA9NI6NBmBKWwpk6v8xIwHbh7zqN5wtLN1Bu63Tk5oOFPDrA","software_version":"3.1.4rc3","where_id":"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT0odBTxTsbLKg","device_id":"8DPl0FPVZ_D5iDYvVC6mobiEqXFuKc3X","where_name":"Upstairs","name":"Upstairs","name_long":"Upstairs Nest Protect","is_online":true,"last_connection":"2018-05-14T07:12:50.007Z","battery_health":"ok","co_alarm_state":"ok","smoke_alarm_state":"ok","ui_color_state":"green","is_manual_test_active":false,"last_manual_test_time":"2018-04-11T20:57:38.000Z"},"8DPl0FPVZ_Bo3v0uQN7fDriEqXFuKc3X":{"name":"Basement","locale":"en-US","structure_id":"9AY0QSwA9NI6NBmBKWwpk6v8xIwHbh7zqN5wtLN1Bu63Tk5oOFPDrA","software_version":"3.1.4rc3","where_id":"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT3FLJ8DEciPxQ","device_id":"8DPl0FPVZ_Bo3v0uQN7fDriEqXFuKc3X","where_name":"Basement","name_long":"Basement Nest Protect","is_online":true,"last_connection":"2018-05-14T07:28:03.901Z","battery_health":"ok","co_alarm_state":"ok","smoke_alarm_state":"ok","ui_color_state":"green","is_manual_test_active":false,"last_manual_test_time":"2018-04-11T20:58:08.000Z"},"8DPl0FPVZ_APWnZpNnOVQ7iEqXFuKc3X":{"name":"Master Bedroom","locale":"en-US","structure_id":"9AY0QSwA9NI6NBmBKWwpk6v8xIwHbh7zqN5wtLN1Bu63Tk5oOFPDrA","software_version":"3.1.4rc3","where_id":"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT0zdeHTwYOSzQ","device_id":"8DPl0FPVZ_APWnZpNnOVQ7iEqXFuKc3X","where_name":"Master Bedroom","name_long":"Master Bedroom Nest Protect","is_online":true,"last_connection":"2018-05-13T09:54:50.262Z","battery_health":"ok","co_alarm_state":"ok","smoke_alarm_state":"ok","ui_color_state":"green","is_manual_test_active":false,"last_manual_test_time":"2018-04-11T20:58:08.000Z"},"8DPl0FPVZ_A8AHtgt3zcNLiEqXFuKc3X":{"name":"Hallway","locale":"en-US","structure_id":"9AY0QSwA9NI6NBmBKWwpk6v8xIwHbh7zqN5wtLN1Bu63Tk5oOFPDrA","software_version":"3.1.4rc3","where_id":"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT3Qaq6Lx4pdGw","device_id":"8DPl0FPVZ_A8AHtgt3zcNLiEqXFuKc3X","where_name":"Hallway","name_long":"Hallway Nest Protect","is_online":true,"last_connection":"2018-05-14T07:27:49.094Z","battery_health":"ok","co_alarm_state":"ok","smoke_alarm_state":"ok","ui_color_state":"green","is_manual_test_active":false,"last_manual_test_time":"2018-04-11T20:58:08.000Z"}}},"structures":{"9AY0QSwA9NI6NBmBKWwpk6v8xIwHbh7zqN5wtLN1Bu63Tk5oOFPDrA":{"smoke_co_alarms":["8DPl0FPVZ_APWnZpNnOVQ7iEqXFuKc3X","8DPl0FPVZ_A8AHtgt3zcNLiEqXFuKc3X","8DPl0FPVZ_Bo3v0uQN7fDriEqXFuKc3X","8DPl0FPVZ_D5iDYvVC6mobiEqXFuKc3X"],"name":"Home","country_code":"CA","time_zone":"America/Toronto","away":"home","thermostats":["1jI-LvQvFyOH19V0pvhvs7iEqXFuKc3X"],"structure_id":"9AY0QSwA9NI6NBmBKWwpk6v8xIwHbh7zqN5wtLN1Bu63Tk5oOFPDrA","co_alarm_state":"ok","smoke_alarm_state":"ok","wheres":{"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT0mM4BlNe1TRQ":{"where_id":"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT0mM4BlNe1TRQ","name":"Attic"},"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT3ccPkgs1sspw":{"where_id":"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT3ccPkgs1sspw","name":"Back Door"},"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT3onmDw4TT6ZQ":{"where_id":"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT3onmDw4TT6ZQ","name":"Backyard"},"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT3FLJ8DEciPxQ":{"where_id":"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT3FLJ8DEciPxQ","name":"Basement"},"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT3o5HrFfpd2Nw":{"where_id":"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT3o5HrFfpd2Nw","name":"Bathroom"},"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT2QL5v41HS1fQ":{"where_id":"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT2QL5v41HS1fQ","name":"Bedroom"},"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT1dV32SltYJWQ":{"where_id":"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT1dV32SltYJWQ","name":"Deck"},"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT2n5_KbWdE6rg":{"where_id":"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT2n5_KbWdE6rg","name":"Den"},"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT0_n4GuwLjLAw":{"where_id":"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT0_n4GuwLjLAw","name":"Dining Room"},"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT1oVWjd8Ex4dg":{"where_id":"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT1oVWjd8Ex4dg","name":"Downstairs"},"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT2hru6sFEYjUw":{"where_id":"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT2hru6sFEYjUw","name":"Driveway"},"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT0xRpYZ4QMpIg":{"where_id":"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT0xRpYZ4QMpIg","name":"Entryway"},"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT2CBVUk5u3D7g":{"where_id":"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT2CBVUk5u3D7g","name":"Family Room"},"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT05MeEeCndvkg":{"where_id":"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT05MeEeCndvkg","name":"Front Door"},"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT3h3SOh2QjejQ":{"where_id":"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT3h3SOh2QjejQ","name":"Front Yard"},"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT2a4T1M8ouMXg":{"where_id":"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT2a4T1M8ouMXg","name":"Garage"},"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT3Wgt1j-K8PiQ":{"where_id":"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT3Wgt1j-K8PiQ","name":"Guest House"},"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT2V0p3nUzlJVA":{"where_id":"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT2V0p3nUzlJVA","name":"Guest Room"},"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT3Qaq6Lx4pdGw":{"where_id":"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT3Qaq6Lx4pdGw","name":"Hallway"},"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT0dH2sJBC_QjQ":{"where_id":"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT0dH2sJBC_QjQ","name":"Kids Room"},"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT2AQ3L4Dq52Fw":{"where_id":"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT2AQ3L4Dq52Fw","name":"Kitchen"},"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT3fRrIw-7lgrQ":{"where_id":"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT3fRrIw-7lgrQ","name":"Living Room"},"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT0zdeHTwYOSzQ":{"where_id":"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT0zdeHTwYOSzQ","name":"Master Bedroom"},"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT3Xyjrff4wqdA":{"where_id":"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT3Xyjrff4wqdA","name":"Office"},"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT0SWS-x8CrA7A":{"where_id":"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT0SWS-x8CrA7A","name":"Outside"},"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT3-utsyfoWVdA":{"where_id":"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT3-utsyfoWVdA","name":"Patio"},"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT0ujcN9WMedww":{"where_id":"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT0ujcN9WMedww","name":"Shed"},"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT2A4vW6hL16qw":{"where_id":"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT2A4vW6hL16qw","name":"Side Door"},"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT0odBTxTsbLKg":{"where_id":"S0hMrb9lbmOOY87fdBLOoB-tZaXq9zh64cUs0gTUgT0odBTxTsbLKg","name":"Upstairs"}}}},"metadata":{"access_token":"c.yK9vahY7O6Lq4QMBKyHMJRLFyOcbbY0fOapNCqQT0r73UJIXUPeJYnIqvmHoYnZUYn5g0JTRT2PCHrlUt9yF1f2ygfxkul48ikxSQSp2Ku36yagfdzKJAGwo4jh7TZPC5No6xFj4ZMGHc8Yd","client_version":1,"user_id":"z.1.1.lwQVaC7IEn51rvBZPQrH59GldiS3necyiEwI3VKSbuw="}}}
2018-05-14 03:34:28.099 [DEBUG] [nternal.rest.NestStreamingRestClient] - Data has changed (or initial data sent)
2018-05-14 03:34:28.100 [DEBUG] [g.samsungtv.handler.SamsungTvHandler] - thingDiscovered: DiscoveryResult [thingUID=nest:thermostat:2e8429ee:1jI-LvQvFyOH19V0pvhvs7iEqXFuKc3X, properties={firmwareVersion=5.8-8, deviceId=1jI-LvQvFyOH19V0pvhvs7iEqXFuKc3X}, representationProperty=deviceId, flag=NEW, label=Dining Room Thermostat, bridgeUID=nest:account:2e8429ee, ttl=-1, timestamp=1526283268099]
2018-05-14 03:34:28.100 [DEBUG] [g.samsungtv.handler.SamsungTvHandler] - thingDiscovered: DiscoveryResult [thingUID=nest:thermostat:2e8429ee:1jI-LvQvFyOH19V0pvhvs7iEqXFuKc3X, properties={firmwareVersion=5.8-8, deviceId=1jI-LvQvFyOH19V0pvhvs7iEqXFuKc3X}, representationProperty=deviceId, flag=NEW, label=Dining Room Thermostat, bridgeUID=nest:account:2e8429ee, ttl=-1, timestamp=1526283268099]
2018-05-14 03:34:28.100 [DEBUG] [ernal.discovery.NestDiscoveryService] - thingDiscovered called for thermostat
2018-05-14 03:34:28.100 [DEBUG] [g.samsungtv.handler.SamsungTvHandler] - thingDiscovered: DiscoveryResult [thingUID=nest:smoke_detector:2e8429ee:8DPl0FPVZ_D5iDYvVC6mobiEqXFuKc3X, properties={firmwareVersion=3.1.4rc3, deviceId=8DPl0FPVZ_D5iDYvVC6mobiEqXFuKc3X}, representationProperty=deviceId, flag=NEW, label=Upstairs Nest Protect, bridgeUID=nest:account:2e8429ee, ttl=-1, timestamp=1526283268100]
2018-05-14 03:34:28.100 [DEBUG] [g.samsungtv.handler.SamsungTvHandler] - thingDiscovered: DiscoveryResult [thingUID=nest:smoke_detector:2e8429ee:8DPl0FPVZ_D5iDYvVC6mobiEqXFuKc3X, properties={firmwareVersion=3.1.4rc3, deviceId=8DPl0FPVZ_D5iDYvVC6mobiEqXFuKc3X}, representationProperty=deviceId, flag=NEW, label=Upstairs Nest Protect, bridgeUID=nest:account:2e8429ee, ttl=-1, timestamp=1526283268100]
2018-05-14 03:34:28.100 [DEBUG] [ernal.discovery.NestDiscoveryService] - thingDiscovered called for smoke detector
2018-05-14 03:34:28.100 [DEBUG] [g.samsungtv.handler.SamsungTvHandler] - thingDiscovered: DiscoveryResult [thingUID=nest:smoke_detector:2e8429ee:8DPl0FPVZ_Bo3v0uQN7fDriEqXFuKc3X, properties={firmwareVersion=3.1.4rc3, deviceId=8DPl0FPVZ_Bo3v0uQN7fDriEqXFuKc3X}, representationProperty=deviceId, flag=NEW, label=Basement Nest Protect, bridgeUID=nest:account:2e8429ee, ttl=-1, timestamp=1526283268100]
2018-05-14 03:34:28.100 [DEBUG] [g.samsungtv.handler.SamsungTvHandler] - thingDiscovered: DiscoveryResult [thingUID=nest:smoke_detector:2e8429ee:8DPl0FPVZ_Bo3v0uQN7fDriEqXFuKc3X, properties={firmwareVersion=3.1.4rc3, deviceId=8DPl0FPVZ_Bo3v0uQN7fDriEqXFuKc3X}, representationProperty=deviceId, flag=NEW, label=Basement Nest Protect, bridgeUID=nest:account:2e8429ee, ttl=-1, timestamp=1526283268100]
2018-05-14 03:34:28.100 [DEBUG] [ernal.discovery.NestDiscoveryService] - thingDiscovered called for smoke detector
2018-05-14 03:34:28.100 [DEBUG] [g.samsungtv.handler.SamsungTvHandler] - thingDiscovered: DiscoveryResult [thingUID=nest:smoke_detector:2e8429ee:8DPl0FPVZ_APWnZpNnOVQ7iEqXFuKc3X, properties={firmwareVersion=3.1.4rc3, deviceId=8DPl0FPVZ_APWnZpNnOVQ7iEqXFuKc3X}, representationProperty=deviceId, flag=NEW, label=Master Bedroom Nest Protect, bridgeUID=nest:account:2e8429ee, ttl=-1, timestamp=1526283268100]
2018-05-14 03:34:28.100 [DEBUG] [g.samsungtv.handler.SamsungTvHandler] - thingDiscovered: DiscoveryResult [thingUID=nest:smoke_detector:2e8429ee:8DPl0FPVZ_APWnZpNnOVQ7iEqXFuKc3X, properties={firmwareVersion=3.1.4rc3, deviceId=8DPl0FPVZ_APWnZpNnOVQ7iEqXFuKc3X}, representationProperty=deviceId, flag=NEW, label=Master Bedroom Nest Protect, bridgeUID=nest:account:2e8429ee, ttl=-1, timestamp=1526283268100]
2018-05-14 03:34:28.100 [DEBUG] [ernal.discovery.NestDiscoveryService] - thingDiscovered called for smoke detector
2018-05-14 03:34:28.101 [DEBUG] [g.samsungtv.handler.SamsungTvHandler] - thingDiscovered: DiscoveryResult [thingUID=nest:smoke_detector:2e8429ee:8DPl0FPVZ_A8AHtgt3zcNLiEqXFuKc3X, properties={firmwareVersion=3.1.4rc3, deviceId=8DPl0FPVZ_A8AHtgt3zcNLiEqXFuKc3X}, representationProperty=deviceId, flag=NEW, label=Hallway Nest Protect, bridgeUID=nest:account:2e8429ee, ttl=-1, timestamp=1526283268100]
2018-05-14 03:34:28.101 [DEBUG] [g.samsungtv.handler.SamsungTvHandler] - thingDiscovered: DiscoveryResult [thingUID=nest:smoke_detector:2e8429ee:8DPl0FPVZ_A8AHtgt3zcNLiEqXFuKc3X, properties={firmwareVersion=3.1.4rc3, deviceId=8DPl0FPVZ_A8AHtgt3zcNLiEqXFuKc3X}, representationProperty=deviceId, flag=NEW, label=Hallway Nest Protect, bridgeUID=nest:account:2e8429ee, ttl=-1, timestamp=1526283268100]
2018-05-14 03:34:28.101 [DEBUG] [ernal.discovery.NestDiscoveryService] - thingDiscovered called for smoke detector
2018-05-14 03:34:28.101 [DEBUG] [est.handler.NestSmokeDetectorHandler] - Smoke detector 8DPl0FPVZ_APWnZpNnOVQ7iEqXFuKc3X is not handling update for 8DPl0FPVZ_D5iDYvVC6mobiEqXFuKc3X
2018-05-14 03:34:28.101 [DEBUG] [est.handler.NestSmokeDetectorHandler] - Smoke detector 8DPl0FPVZ_APWnZpNnOVQ7iEqXFuKc3X is not handling update for 8DPl0FPVZ_Bo3v0uQN7fDriEqXFuKc3X
2018-05-14 03:34:28.101 [DEBUG] [est.handler.NestSmokeDetectorHandler] - Updating smoke detector 8DPl0FPVZ_APWnZpNnOVQ7iEqXFuKc3X
2018-05-14 03:34:28.102 [DEBUG] [est.handler.NestSmokeDetectorHandler] - Smoke detector 8DPl0FPVZ_APWnZpNnOVQ7iEqXFuKc3X is not handling update for 8DPl0FPVZ_A8AHtgt3zcNLiEqXFuKc3X
2018-05-14 03:34:28.102 [DEBUG] [est.handler.NestSmokeDetectorHandler] - Smoke detector 8DPl0FPVZ_A8AHtgt3zcNLiEqXFuKc3X is not handling update for 8DPl0FPVZ_D5iDYvVC6mobiEqXFuKc3X
2018-05-14 03:34:28.102 [DEBUG] [est.handler.NestSmokeDetectorHandler] - Smoke detector 8DPl0FPVZ_A8AHtgt3zcNLiEqXFuKc3X is not handling update for 8DPl0FPVZ_Bo3v0uQN7fDriEqXFuKc3X
2018-05-14 03:34:28.102 [DEBUG] [est.handler.NestSmokeDetectorHandler] - Smoke detector 8DPl0FPVZ_A8AHtgt3zcNLiEqXFuKc3X is not handling update for 8DPl0FPVZ_APWnZpNnOVQ7iEqXFuKc3X
2018-05-14 03:34:28.102 [DEBUG] [est.handler.NestSmokeDetectorHandler] - Updating smoke detector 8DPl0FPVZ_A8AHtgt3zcNLiEqXFuKc3X
2018-05-14 03:34:28.103 [DEBUG] [est.handler.NestSmokeDetectorHandler] - Smoke detector 8DPl0FPVZ_Bo3v0uQN7fDriEqXFuKc3X is not handling update for 8DPl0FPVZ_D5iDYvVC6mobiEqXFuKc3X
2018-05-14 03:34:28.103 [DEBUG] [est.handler.NestSmokeDetectorHandler] - Updating smoke detector 8DPl0FPVZ_Bo3v0uQN7fDriEqXFuKc3X
2018-05-14 03:34:28.103 [DEBUG] [est.handler.NestSmokeDetectorHandler] - Smoke detector 8DPl0FPVZ_Bo3v0uQN7fDriEqXFuKc3X is not handling update for 8DPl0FPVZ_APWnZpNnOVQ7iEqXFuKc3X
2018-05-14 03:34:28.104 [DEBUG] [est.handler.NestSmokeDetectorHandler] - Smoke detector 8DPl0FPVZ_Bo3v0uQN7fDriEqXFuKc3X is not handling update for 8DPl0FPVZ_A8AHtgt3zcNLiEqXFuKc3X
2018-05-14 03:34:28.104 [DEBUG] [est.handler.NestSmokeDetectorHandler] - Updating smoke detector 8DPl0FPVZ_D5iDYvVC6mobiEqXFuKc3X
2018-05-14 03:34:28.104 [DEBUG] [est.handler.NestSmokeDetectorHandler] - Smoke detector 8DPl0FPVZ_D5iDYvVC6mobiEqXFuKc3X is not handling update for 8DPl0FPVZ_Bo3v0uQN7fDriEqXFuKc3X
2018-05-14 03:34:28.104 [DEBUG] [est.handler.NestSmokeDetectorHandler] - Smoke detector 8DPl0FPVZ_D5iDYvVC6mobiEqXFuKc3X is not handling update for 8DPl0FPVZ_APWnZpNnOVQ7iEqXFuKc3X
2018-05-14 03:34:28.104 [DEBUG] [est.handler.NestSmokeDetectorHandler] - Smoke detector 8DPl0FPVZ_D5iDYvVC6mobiEqXFuKc3X is not handling update for 8DPl0FPVZ_A8AHtgt3zcNLiEqXFuKc3X
2018-05-14 03:34:28.104 [DEBUG] [est.handler.NestSmokeDetectorHandler] - Smoke detector 8DPl0FPVZ_C-7YHfxZDciriEqXFuKc3X is not handling update for 8DPl0FPVZ_D5iDYvVC6mobiEqXFuKc3X
2018-05-14 03:34:28.104 [DEBUG] [est.handler.NestSmokeDetectorHandler] - Smoke detector 8DPl0FPVZ_C-7YHfxZDciriEqXFuKc3X is not handling update for 8DPl0FPVZ_Bo3v0uQN7fDriEqXFuKc3X
2018-05-14 03:34:28.104 [DEBUG] [est.handler.NestSmokeDetectorHandler] - Smoke detector 8DPl0FPVZ_C-7YHfxZDciriEqXFuKc3X is not handling update for 8DPl0FPVZ_APWnZpNnOVQ7iEqXFuKc3X
2018-05-14 03:34:28.104 [DEBUG] [est.handler.NestSmokeDetectorHandler] - Smoke detector 8DPl0FPVZ_C-7YHfxZDciriEqXFuKc3X is not handling update for 8DPl0FPVZ_A8AHtgt3zcNLiEqXFuKc3X
2018-05-14 03:34:28.104 [DEBUG] [g.nest.handler.NestThermostatHandler] - Updating thermostat 1jI-LvQvFyOH19V0pvhvs7iEqXFuKc3X
2018-05-14 03:34:28.108 [DEBUG] [sistence.rrd4j.internal.RRD4jService] - Stored 'dining_room_ambient_temperature_c' with state '22.0' in rrd4j database
2018-05-14 03:34:28.108 [DEBUG] [g.samsungtv.handler.SamsungTvHandler] - thingDiscovered: DiscoveryResult [thingUID=nest:structure:2e8429ee:9AY0QSwA9NI6NBmBKWwpk6v8xIwHbh7zqN5wtLN1Bu63Tk5oOFPDrA, properties={structureId=9AY0QSwA9NI6NBmBKWwpk6v8xIwHbh7zqN5wtLN1Bu63Tk5oOFPDrA}, representationProperty=structureId, flag=NEW, label=Home, bridgeUID=nest:account:2e8429ee, ttl=-1, timestamp=1526283268107]
2018-05-14 03:34:28.108 [DEBUG] [g.samsungtv.handler.SamsungTvHandler] - thingDiscovered: DiscoveryResult [thingUID=nest:structure:2e8429ee:9AY0QSwA9NI6NBmBKWwpk6v8xIwHbh7zqN5wtLN1Bu63Tk5oOFPDrA, properties={structureId=9AY0QSwA9NI6NBmBKWwpk6v8xIwHbh7zqN5wtLN1Bu63Tk5oOFPDrA}, representationProperty=structureId, flag=NEW, label=Home, bridgeUID=nest:account:2e8429ee, ttl=-1, timestamp=1526283268107]
2018-05-14 03:34:28.108 [DEBUG] [ernal.discovery.NestDiscoveryService] - thingDiscovered called for structure
2018-05-14 03:34:28.108 [DEBUG] [ng.nest.handler.NestStructureHandler] - Updating structure 9AY0QSwA9NI6NBmBKWwpk6v8xIwHbh7zqN5wtLN1Bu63Tk5oOFPDrA
2018-05-14 03:34:28.146 [DEBUG] [ntime.internal.engine.RuleEngineImpl] - Executing rule 'Real Item Changed'
2018-05-14 03:34:28.199 [DEBUG] [nternal.rest.NestStreamingRestClient] - Received 'keep-alive' event, data: null
2018-05-14 03:34:28.199 [DEBUG] [nternal.rest.NestStreamingRestClient] - Received message to keep connection alive
2018-05-14 03:34:29.142 [DEBUG] [sistence.rrd4j.internal.RRD4jService] - Stored 'dining_room_ambient_temperature_c' with state '22.0' in rrd4j database
2018-05-14 03:34:29.179 [DEBUG] [sistence.rrd4j.internal.RRD4jService] - Stored 'dining_room_humidity' with state '35' in rrd4j database
2018-05-14 03:34:29.616 [DEBUG] [ab.binding.http.internal.HttpBinding] - item 'Driveway_camera_motiondetect_status' is about to be refreshed now
...
2018-05-14 03:35:00.000 [DEBUG] [ntime.internal.engine.ExecuteRuleJob] - Executing scheduled rule 'Test Rules operation'
2018-05-14 03:35:00.000 [DEBUG] [ntime.internal.engine.ExecuteRuleJob] - Executing scheduled rule 'Reset homekit lock'
2018-05-14 03:35:00.001 [INFO ] [smarthome.model.script.Testing Rules] - rules_operating status: True
2018-05-14 03:35:00.001 [ERROR] [ntime.internal.engine.ExecuteRuleJob] - Error during the execution of rule 'Reset homekit lock': null
2018-05-14 03:35:00.002 [DEBUG] [inding.expire.internal.ExpireBinding] - Item rules_operating received update 'False'; stopping any future expiration.
2018-05-14 03:35:04.465 [DEBUG] [nternal.rest.NestStreamingRestClient] - Check: Receiving streaming events, millisSinceLastEvent=6263
2018-05-14 03:35:08.189 [DEBUG] [sistence.rrd4j.internal.RRD4jService] - Stored 'basementLeakDetector' with state 'OPEN' in rrd4j database (again)
2018-05-14 03:35:08.190 [DEBUG] [sistence.rrd4j.internal.RRD4jService] - Stored 'basementLeakDetector' with state '1' in rrd4j database
2018-05-14 03:35:21.475 [DEBUG] [core.karaf.internal.FeatureInstaller] - Running scheduled sync job

ignore the ‘Reset homekit lock’ error, I just added that yesterday to try to figure out what is happening.

Nothing except cron rules are triggered after 2018-05-14 03:34:28.146 and the last rule triggered is almost always ‘Real Item Changed’ - but I have had this rule for years, and it has always worked just fine (it is being triggered by the nest thermostat update). I also don’t know the significance of the 3:30am time.

Is there any log which will tell me why rules just stop triggering? it seems silent. I’m also not sure how to start them again (I can detect this happening, and a restart of OH fixes it for a day or two, or three - but I don’t want to continually restart OH because of this).

I have 7000 lines of rules, so going through them is not really an option, Visual Studio says that the rules are OK, and this only just started happening in the last few weeks. I was suspecting the last java update, but that doesn’t explain the 3;30am (ish) problem.

Any suggestions on where to look to find out why the rules stop triggering?

It could be that your nest is restarting/booting at 3:30am and OH needs to reconnect and somehow it freezes the eventbus???
Have you tried to uninstall and re-install the nest binding?

Anything is possible. I did replace a nest smoke detector a few weeks ago - it’s possible there are some remnants of that still lying around, and the nest binding is the experimental 2.2 version…

I don’t know what:

2018-05-14 03:34:28.102 [DEBUG] [est.handler.NestSmokeDetectorHandler] - Smoke detector 8DPl0FPVZ_APWnZpNnOVQ7iEqXFuKc3X is not handling update for 8DPl0FPVZ_A8AHtgt3zcNLiEqXFuKc3X

means.

Remove the nest binding
Stop OH
Clear the cache
Start OH
Install Nest

Are the events still occuring? Look in events.log to see that your Items are still changing state and receiving the commands that would trigger your rules.

Do you have any Thread::sleeps, while(true) (or the like), or other long running commands like executeCommandLine or sendHttp*Request in your Rules?

Do you see any spike in memory or CPU usage when the Rules top working?

Yes the events are still occurring, I can see them being logged - just the rules stop triggering on them. They get recorded in the persistence files as well. Cron based rules still fire, and timers work, (just new ones do not get created because the rules that create them never fire).

I do have rules with Thread::sleeps, but they are all to do with remote control commands, none of them are automatic, and don’t seem to be anything to do with this (which only started a few weeks ago). There are a couple of executeCommandLine but they don’t seem to be associated with the issue, I have a few sendHttp requests, but they are all manually triggered (ie click a control, and it sends an http request to home a camera).

No spike in memory or CPU that I can see, but this is always at about 3:30am.

It’s frustrating, as if i could find what is causing it, I could fix it, but the rules just silently stop triggering. No messages, errors or anything, and I can’t find a way to restart rule triggers without stopping and restarting OH as a whole.

If the theory is that I’m running out of threads, or a rule is getting into a deadlock, how can I find the culprit? is there a way to tell how many rule threads are in use at any one time, so I could see when it starts to rise?

I changed the number of rule threads from 5 to 50, but it doesn’t really seem to make a difference.

Thanks,

Just to eliminate them as being a problem (this is the number 1 cause of this behavior) I’d add some logging at the ends of your Rules that have these commands, even the manually triggered ones, just to verify that these Rules are exiting and doing so in a timely manner.

I do think it is unlikely to be the cause but it is the usual cause and I’m hesitant to eliminate it as a possibility without some more confirmation.

There may be some tools (profilers) that can be hooked up to OH while it is running to see the thread counts and such but it has been so long since I’ve done anything like that I don’t know what modern Java programs use. There may be some commands that you can run in the Karaf console to get the thead counts.

But before spending too much time researching and testing on this, I would recommend backing up (always backup), and upgrading to 2.3 to verify that it is still a problem there. This may be something that has been previously identified and fixed. 2.3 is nearing release so it shouldn’t be too unstable. Note: it will require some changes to your Rules, mainly you need to put a space after all [ and before all ].

I have already added the spaces to my rules in anticipation of 2.3 - but I’m waiting for the release, it’ll probably take a few days to sort OH out after the update.

Regarding Thread::sleep, everyone seems to think this is a terrible thing - does it sleep all threads? or just the rule that’s running? How would I replace it? (I know some people say “use timers” but how?)

Here is an example rule where I use it:

rule "Skype State Machine"
when
    Item TV_Skype_State changed
then
    /* States for State machine
        0 - idle
        1 - just switch to Anynet
        2 - incoming Skype call, switch to Anynet and set up for receive
        3 - switch back to HDMI 1 (end of Skype Call?)
    */
    
    logInfo("Skype", "Skype state changed to: " + TV_Skype_State.state)
    // 0 is idle state - do nothing
    if(TV_Skype_State.state == 0) return;
    
    if(TVON.state != ON && TV_Power.state != ON) {
        logInfo("Skype", "But Samsung TV is OFF, so ignore")
        postUpdate(TV_Skype_State, "0")
        return;
    }
    if (TV_Skype_State.state == 1 || TV_Skype_State.state == 2) {
        logDebug("Skype", "Answering Skype Call")
        if (TV_Status.state != "ONLINE") {
            sendCommand(TVTools, "KEY_TOOLS")
            Thread::sleep(250)
            sendCommand(TVKey_Enter, "KEY_ENTER")
            Thread::sleep(250)        //might need this, might not...
            sendCommand(TVKey_DOWN, "KEY_DOWN")
            Thread::sleep(250)
            sendCommand(TVKey_Enter, "KEY_ENTER")
            Thread::sleep(250)
            sendCommand(TVKey_Enter, "KEY_ENTER")
        }
        else
            sendCommand(TVSource_Name, "HDMI-CEC")
        if(TV_Skype_State.state == 2) {
            //wait for source to switch to HDMI-CEC
            // use google home mini 1 to announe incoming call
            say("Incoming Skype Call, Incoming Skype Call")
            var int count = 0
            if (TV_Status.state != "ONLINE") count = 31
            while(TVSource_Name.state != "HDMI-CEC"  && count <= 30) {
                Thread::sleep(100)  //100 ms delay
                count = count + 1
            }
            logDebug("Skype", "Answer Exit Loop: TV Source is: " + TVSource_Name.state)
            Thread::sleep(2000)
            //select video answer button on Skype screen
            sendCommand(TVKey_DOWN, "KEY_DOWN") 
            Thread::sleep(250)
            sendCommand(TVKey_DOWN, "KEY_DOWN")
            Thread::sleep(250)
            sendCommand(TVKey_DOWN, "KEY_DOWN")
            Thread::sleep(250)
            sendCommand(TVKey_LEFT, "KEY_LEFT")
        }
    }
    else if (TV_Skype_State.state == 3){
         logDebug("Skype", "Hangup Skype Call")
         var int count2 = 0
         if (TV_Status.state != "ONLINE") {
             sendCommand(TVInput_HDMI, "KEY_HDMI")
             count2 = 31
         }
         sendCommand(TVSource_Name, "HDMI1/DVI")
         var prev_source = TVSource_Name.state
         //sendCommand(TVInput_HDMI, "KEY_HDMI")
         while(TVSource_Name.state != "HDMI1/DVI"  && count2 <= 30) {
            Thread::sleep(100)  //100 ms delay
            if (TVSource_Name.state != prev_source && TVSource_Name.state != "HDMI1/DVI") {
                prev_source = TVSource_Name.state
                logDebug("Skype", "Hangup: Sending HDMI command")
                sendCommand(TVInput_HDMI, "KEY_HDMI")
            }
            count2 = count2 + 1
         }
         //Thread::sleep(2750)
    	 //sendCommand(TVInput_HDMI, "KEY_HDMI")
         logDebug("Skype", "Hangup Exit Loop: TV Source is: " + TVSource_Name.state)
    }
    else postUpdate(TV_Skype_State, "0")
    logInfo("Skype", "Skype Rule Ended")
end

This rule reacts to an update from an android box, that there is an incoming Skype call, the rule then sends the remote control commands to put the TV on the correct input. The delays are due to the hardware response timing.

How would you do this without Thread::sleep?

Thanks,

In your case it is an appropriate use of sleep.

It only sleeps the one thread but unless you change the Karaf config you only get 5 meaning only 5 rules can run at the same time. When you have lots of Thread::sleeps or long Thread::sleeps you increase the chances that you run out of threads. When you run out of threads, no new rules can start executing until the ones that are blocked by sleeps or long running commands complete and the running rules start to execute. It really becomes a problem when you inadvertently create a feedback loop or other types of choke points.

Symptoms include noticeable latency between the time an event occurs and a rule runs, rules start to run erratically, events are processed out of order, or the rules simply stop executing.

Your pool is 50 so you have a lot larger of a buffer before problems might show up, but if you have a feedback loop or a whole lot of events you could run out even at 50.

I’m not convinced this is the problem, but I’d like to eliminate it.

It largely depends on why you are sleeping and surrounding context.

For example, you could replace “Answering Skype Call” with something like the following:

createTimer(now.plusMillis(1), [ |
  sendCommand(TVTools, "KEY_TOOLS")
  Thread::sleep(250)
  sendCommand(TVKey_Enter, "KEY_ENTER")
  Thread::sleep(250)        //might need this, might not...
  sendCommand(TVKey_DOWN, "KEY_DOWN")
  Thread::sleep(250)
  sendCommand(TVKey_Enter, "KEY_ENTER")
  Thread::sleep(250)
  sendCommand(TVKey_Enter, "KEY_ENTER")
])

This will move the sleeps to the Timer threads which have their own pool so you won’t be consuming a Rule thread and allow your Rule to exit immediately.

“Incoming Skype Call, Incoming Skype Call” gives me some concern. If I read this correctly, you can sleep up to six seconds. So, dealing with this section might require a bit more cleverness, though the same trick as above could work.

Personally, I like to have events occur and separate Rules trigger rather than while loops waiting for something to happen. At least you have a max wait in place. But you can shunt the while loop off to a Timer pretty easily with something like:

  say("Incoming Skype Call, Incoming Skype Call")
  val loopStart = now
  var answerCallTimer = createTimer(now.plusMillis(1), [|
    if(TVSource_Name.state != "HDMI-CEC" && now.minusSeconds(3).isBefore(loopStart) {
        answerCallTimer.reschedule(now.plusMillis(100)) // loop
    }
    else {
        logDebug("Skype", "Answer Exit Loop: TV Source is: " + TVSource_Name.state)
        Thread::sleep(2000)
        //select video answer button on Skype screen
        sendCommand(TVKey_DOWN, "KEY_DOWN") 
        Thread::sleep(250)
        sendCommand(TVKey_DOWN, "KEY_DOWN")
        Thread::sleep(250)
        sendCommand(TVKey_DOWN, "KEY_DOWN")
        Thread::sleep(250)
        sendCommand(TVKey_LEFT, "KEY_LEFT")
    }

There are other approaches that could help as well but I don’t think we need to go down that path.

One approach that doesn’t really alleviate the sleeps but could make the code a little less prone for unexpected interactions (e.g. you have other Rules that need to send the command to the TV) is Design Pattern: Gate Keeper. Instead of having your Rules send command with sleeps scattered throughout your Rules you send commands to a Design Pattern: Proxy Item and a Rule that triggers and centralizes the Thread::sleeps so you can have multiple Rules issuing commands to the TV at the same time and they won’t stomp on each other. For example:

import java.util.concurrent.locks.ReentrantLock
val lock = new ReentrantLock

rule "TV Command"
when
    Item TV_Commands received command
then
    lock.lock // Ensures only one instance of the Rule can run at a time
    try {
        val cmds = TV_Commands.state.toString.split(" ")
        cmds.forEach[ cmd | 
            sendCommand("TV"+cmd.replace("KEY_", "").toLower.charAt(0).toUppercase(), cmd)
            Thread::sleep(250)
        ]
    }
    catch(Exception e) {
        logError("TV", "Error handling TV command: " + e)
    }
    finally {
        lock.unlock
    }
end

rule "Skype State Machine"
when
    Item TV_Skype_State changed
then
    ...
        if (TV_Status.state != "ONLINE") {
            TV_Commands.sendCommand("KEY_TOOLS KEY_ENTER KEY_DOWN KEY_ENTER KEY_ENTER")
        }
    ...  
end

NOTE that I’m using Design Pattern: Associated Items to come up with the Item to send the command to based on the cmd string to be sent (it will require normalization of how you do capitalization of your Item names.

You could get even more clever to remove the Thread::sleep in the Rule thread:

import java.util.concurrent.locks.ReentrantLock
import java.util.List

val lock = new ReentrantLock
val List<String> cmds = createArrayList
var cmdsTimer = null

rule "TV Command"
when
    Item TV_Commands received command
then
    // add the commands to the queue
    cmds.addAll(TV_Commands.state.toString.split(" ") 

    // Start the timer thread to work off the commands
    if(cmdsTimer == null) {
        cmdsTimer = createTimer(now.plusMillis(1), [ |
            try {
                val cmds = TV_Commands.state.toString.split(" ")
                cmds.forEach[ cmd | 
                      sendCommand("TV"+cmd.replace("KEY_", "").toLower.charAt(0).toUppercase(), cmd)
                      Thread::sleep(250)
                ]
            }
            catch(Exception e) {
                logError("TV", "Error handling TV command: " + e)
            }
            finally {
                lock.unlock
            }
        ])
    }

end

thanks,

There are some interesting things to try in your examples. I notice you put your lock before the try{} catch(){} , I usually put mine as the first entry in the try{} section - is your example better? I ask because I have been trying to eliminate locks in case that is the source of the problem, but maybe I’m putting them in the wrong place.

many thanks for putting time into this…

No difference really.

Either place could be a problem, but locks are another thing that can cause you to quickly run out of execution threads. It will be exacerbated even further if you are using locks with sleeps.

Update from last night:

15-May-2018 03:26:40.925 [INFO ] [pse.smarthome.model.script.JSON Data] - Posting ISO timestamp: 2018-05-15T03:26:37.041-04:00 to: McTrack12733Time
15-May-2018 03:26:40.926 [INFO ] [pse.smarthome.model.script.JSON Data] - Posting: 207.149 to: McTrack12733aM
15-May-2018 03:26:40.927 [INFO ] [pse.smarthome.model.script.JSON Data] - Posting: 10.6627 to: McTrack12733dP
15-May-2018 03:26:40.927 [INFO ] [pse.smarthome.model.script.JSON Data] - Posting: 0 to: McTrack12733lux
15-May-2018 03:26:40.928 [INFO ] [pse.smarthome.model.script.JSON Data] - Posting: 00000064 to: McTrack12733DestinationUid
15-May-2018 03:26:40.928 [INFO ] [pse.smarthome.model.script.JSON Data] - Posting: 23.2163 to: McTrack12733t
15-May-2018 03:26:40.929 [INFO ] [pse.smarthome.model.script.JSON Data] - Posting: 1059771153 to: McTrack12733SequenceNumber
15-May-2018 03:26:40.929 [INFO ] [pse.smarthome.model.script.JSON Data] - Posting: 17300 to: McTrack12733MessageId
15-May-2018 03:26:40.929 [DEBUG] [ntime.internal.engine.RuleEngineImpl] - Executing rule 'Convert MCT1 Lux to text value'
15-May-2018 03:26:45.485 [DEBUG] [ntime.internal.engine.RuleEngineImpl] - Executing rule 'MCThings 111BC Basement Entry Battery Monitor Status'
15-May-2018 03:27:00.000 [DEBUG] [ntime.internal.engine.ExecuteRuleJob] - Executing scheduled rule 'Test Rules operation'
15-May-2018 03:27:00.001 [INFO ] [smarthome.model.script.Testing Rules] - rules_operating status: True
15-May-2018 03:28:00.000 [DEBUG] [ntime.internal.engine.ExecuteRuleJob] - Executing scheduled rule 'Test Rules operation'
15-May-2018 03:28:00.001 [INFO ] [smarthome.model.script.Testing Rules] - rules_operating status: False
15-May-2018 03:28:00.002 [INFO ] [eclipse.smarthome.model.script.Rules] - Rules have stopped triggering
15-May-2018 03:28:00.219 [DEBUG] [model.script.actions.ScriptExecution] - Scheduled code for execution at 2018-05-15T15:28:00.218-04:00
15-May-2018 03:29:00.001 [DEBUG] [ntime.internal.engine.ExecuteRuleJob] - Executing scheduled rule 'Test Rules operation'
15-May-2018 03:29:00.001 [INFO ] [smarthome.model.script.Testing Rules] - rules_operating status: False
15-May-2018 03:29:00.002 [INFO ] [eclipse.smarthome.model.script.Rules] - Rules have stopped triggering - so restarting OH2
15-May-2018 03:29:31.437 [DEBUG] [org.eclipse.smarthome.model.rule.ide] - BundleEvent STOPPING - org.eclipse.smarthome.model.rule.ide
15-May-2018 03:29:31.437 [DEBUG] [org.eclipse.smarthome.model.rule.ide] - BundleEvent STOPPED - org.eclipse.smarthome.model.rule.ide
15-May-2018 03:29:31.439 [DEBUG] [g.eclipse.smarthome.model.script.ide] - BundleEvent STOPPING - org.eclipse.smarthome.model.script.ide
15-May-2018 03:29:31.440 [DEBUG] [g.eclipse.smarthome.model.script.ide] - BundleEvent STOPPED - org.eclipse.smarthome.model.script.ide
15-May-2018 03:29:31.874 [DEBUG] [ntime.internal.engine.RuleEngineImpl] - Executing rule 'check for Workshop outlet overload'
15-May-2018 03:29:31.910 [DEBUG] [ntime.internal.engine.RuleEngineImpl] - Executing rule 'Driveway Camera Motion Detect Status'
15-May-2018 03:29:31.910 [INFO ] [arthome.model.script.Driveway Camera] - Driveway Camera Motion Detection status is: ACTIVE
15-May-2018 03:29:32.007 [DEBUG] [ntime.internal.engine.RuleEngineImpl] - Executing rule 'McModule McTrack 12733 Location Tracking'
15-May-2018 03:29:32.008 [INFO ] [e.smarthome.model.script.Last Update] - McTrack12733LocationTimestamp DateTimeItem updated at: Tue 03:29:32

This is driving me crazy - I wish I knew what was happening at 3:30am!

Interestingly, as soon as these stop:

15-May-2018 03:29:31.437 [DEBUG] [org.eclipse.smarthome.model.rule.ide] - BundleEvent STOPPING - org.eclipse.smarthome.model.rule.ide
15-May-2018 03:29:31.437 [DEBUG] [org.eclipse.smarthome.model.rule.ide] - BundleEvent STOPPED - org.eclipse.smarthome.model.rule.ide
15-May-2018 03:29:31.439 [DEBUG] [g.eclipse.smarthome.model.script.ide] - BundleEvent STOPPING - org.eclipse.smarthome.model.script.ide
15-May-2018 03:29:31.440 [DEBUG] [g.eclipse.smarthome.model.script.ide] - BundleEvent STOPPED - org.eclipse.smarthome.model.script.ide

Rules start executing again, even though OH is shutting down and restarting.

And - I have been inspecting system logs, and although I said no unusual load on the system - this does not seem to be true, because I found this:

May 15 03:29:41 proliant nagios: SERVICE ALERT: proliant;Current Load;WARNING;SOFT;1;WARNING - load average: 6.06, 5.55, 3.73

Which is not massively high, but normally the server runs at less than 1.0.

I’m pretty sure those bundles are the rule engine itself. So they shutdown which effectively resets the rules but you have a bunch of events in the queue so the rules start triggering again.

Something is happening at that time. It might be a fluke that it is always happening at 3:30am. Under other circumstances it might happen at midnight or 6 am.

Check the cron jobs on your machine. Perhaps there is something else being triggered that is causing a resource contention or the like.

Though the fact that resetting the Rules really points to there being a problem with your Rules. Despite your having 50 threads I think you might be running out. You can perhaps test this by reducing the number back to 5 and seeing if the rules stop running sooner and upping it to 100 to see if it takes longer.

Otherwise, you need to trace through the events that take place a few minutes prior to 3:30 and figure out if you have a run away feedback loop or something causing you to run out of threads.

I’d still recommend upgrading to the latest 3.2 snapshot before spending any more time on this. If it is a bug it might already be fixed. And if it isn’t, you will have to redo all of the above again on 2.3 to provide needed details to the devs anyway. In the long run it will save you some time, unless the problem is indeed your Rules.

Another thing to do is add some logging to verify that all your locks are being unlocked. That could be another source of problems.

I have added the logging as you suggested for the locks.

It seems from my investigation that something starts running at 3:00am, and slowly the system load goes up, not apparently due to CPU usage, but due to iowait hitting 40%. Once I hit that threshold, rules stop triggering - but when the iowait drops, rules start triggering again.

So OH may be the symptom of high iowait, not the cause.

My current suspect is crashplan, which does ramp up iowait when it’s running.

I’m running it now, and iowait hits 40% and OH stops triggering, I stop it, and iowait drops, and OH starts again…

I will probably bite the bullet and upgrade to 2.3 - it just took me 2 days to get everything working after 2.2 upgrade, so not keen to get into that again if there is some other ongoing issue.

Ha! found this on the crashplan web site:

The default file verification scan schedule, every 1 day at 3:00 AM, works for many users. However, there are situations when you may want to change these settings, as described in the sections below. To change the file verification schedule:

For what it’s worth, the upgrade to 2,3 went a whole lot smoother for me than going from 2.1 to 2.2. The only gotcha was the need for spaces inside the [ ].

I’m glad you found the culprit. It would have taken me quite some time to find that as the source of the problem.

Well it took me forever as well. I have just changed crashplan to verify every 7 days at 15:00, so if I have no problems until then, that will validate that this is the culprit.

If this is the case, then it is worth noting that if iowait goes above 40% (at least on my system), rules stop triggering.

Thanks for the help.

1 Like