I have a weird problem with the rules engine not running
2025-01-08 11:16:17.521 [DEBUG] [org.openhab.core.automation.internal.RuleEngineImpl ] - Rule engine not yet started - not executing rule 'meter-3',
2025-01-08 11:16:17.523 [DEBUG] [org.openhab.core.automation.internal.RuleEngineImpl ] - Rule engine not yet started - not executing rule 'Rechnung-3'
although the console says it does:
openhab> bundle:list|grep -i rule
166 x Active x 80 x 4.3.1 x openHAB Core :: Bundles :: Automation Script RuleSupport
205 x Active x 80 x 4.3.1 x openHAB Core :: Bundles :: Model Rules
206 x Active x 80 x 4.3.1 x openHAB Core :: Bundles :: Model Rule IDE
207 x Active x 80 x 4.3.1 x openHAB Core :: Bundles :: Model Rules Runtime
Actually NO rules are working. FWIW, I only use DSL rules.
Restarting, cache cleaning, even reinstalling oh packages did not help.
Any idea?
The OH install is actually fine except for the rules engine which is down.
Hell, I even purged-up-&downgraded the oh package (4.3.0/.1).
No change.
Rules are fine, too. Taken from a backup that used to work, plus they work on several other machines I have running.
My rules were also running fine. Initially I though the upgrade to 4.3.1 broke it but I think the restart was the issue.
I could solve the issue by:
mkdir rules/disabled
mv *.rules/disabled
then restarting, that should run now
then bring back rules in small batches. Once OH settles down, do restart it to validate. Move to the next batch. This is how I was left with one VERY simple rule that never was an issue but blocked OH on restart.
Here is a python script that may help check cron, it may need tweaking for the OH specifics though.
file: rules/check-cron.py
import os
import re
from croniter import croniter
from datetime import datetime
# ANSI escape code for red text
RED = '\033[31m'
RESET = '\033[0m'
# Function to validate cron expressions
def validate_cron(cron_expr):
try:
croniter(cron_expr, datetime.now())
return True
except Exception:
return False
# Function to search for "Time cron" in files and validate cron expressions
def search_and_validate_crons(root_dir='.'):
# Regex to find lines containing 'Time cron' and extract cron expression
cron_pattern = re.compile(r'Time cron\s+"([^"]+)"')
# Walk through all files in the given directory
for root, dirs, files in os.walk(root_dir):
for file in files:
file_path = os.path.join(root, file)
# Read file and search for cron expressions
with open(file_path, 'r', encoding='utf-8') as f:
for line in f:
# Search for cron expressions
match = cron_pattern.search(line)
if match:
cron_expr = match.group(1)
# Validate the cron expression
if validate_cron(cron_expr):
print(f"{file_path}: OK - {cron_expr}")
else:
print(f"{file_path}: {RED}NOT OK{RESET} - {cron_expr}")
# Run the script
if __name__ == "__main__":
search_and_validate_crons()
Yes, removing some rules makes the engine start up.
However on next or next-but-one restart, the engins starts but this time the dashboard is empty and the āpagesā admin menu is missing, too.
I had seen that before but Iām not getting to the bottom of why this is happening.
It does not seem to be a specific rule that causes the problem, itās somewhat random with which rules it appears so I think itās not related to the rule code but merely to something like memory usage or so.
Iāve opened an issue to merge the three threads which are reporting this behavior. Please participate there with information gathering and testing so we can hopefully get this fixed quickly.
thanks Rich. Iāll start investigating.
I have the problem on at least one installation, 4.3.1 or 4.3.0 doesnāt matter.
Went back to 4.2.3 there, that seems(?) to have solved the problem for now.
Then again, I have two other installations with 4.3.0 and 4.3.1 that do not show the problem.
They all run almost the same config so I wouldnāt think the difference is related to bindings with mem leaks or the OH software version.
As a first premature shot, the difference seems to be 64 bit OS & Java on the latter. cc @ndye
Java process on Raspi was huge ~4GB while on those systems where it works it was only 1,2GB.
@Will your screenshot shows a large Java, too. What hardware and OS are you on, whatās your output of java -version?
A bug was found in CronAdjuster, a PR submitted. But as far as I know, it only happens with āincorrectā cron field combinations such as setting date = 31, month = 2. This issue seems to have been around for at least 5 years.
Iām curious to know if there are other bugs causing the issue youāre having.
Anyone to see this, check is if the rules engine is running. In my case it wasnāt although in bundle:list was displayed to be āActiveā.
Use log:set debug org.openhab.core.automation thatāll show you when it does not run.
Iām currently inclined to believe itās only happening if you run in 64bit.
Let me know your HW, OS version (and if 32 or 64 bit!) and the output of java -version.
I mean the VIRT column which is the virtual process size. 5.6G ! Even the in-memory fraction (RES column) is 1.5G.
I, too, have a RPi4 of course with openHABian, too, but with a 32bit OS and Java.
Iām at only 1.2G and I donāt have the problem there.
On my other box with the 64bit OS+Java and a similar config, Iām at 4+ G and I DO have the problem there.
Iām wondering if some limit was hit along the way and if that caused our issue.
Maybe the rules engine thread tried to allocate even more memory but failed because it ran out of virtual space, then it might have gotten hung in that state.
I still have the same value today 5610492 but my system is working fine.
I could joke and say that I have many light bulbs but I donāt think my system is particularly huge. I am probably in the range of 100-200 items. I did install many addons to test but again nothing crazy and I cleaned up recently.
As I stated in that other thread, I got these problems after restoring a --full backup from OH 3 into OH 4.3. I later tried using a backup that was made without the --full option, and since then 4.3 has been running fine here, with hundreds of rules. Depending on how you got to your current installation, it might be worth to create a backup (without --full), do a clean/purged re-installation (again), restore the backup and see what that brings?
The two backups I experimented with were taken back-to-back, just a few seconds apart, from the same source installation, one with --full, the other one without. So that is definitely what made the difference in my case.
For completeness: My installation is working now (at least for the time being). Itās a relatively large setup with close to 1000 items and almost 10k lines of rules in DSL code. I do not see excessive CPU usage, but I do see the huge memory footprint mentioned in this thread. In my case itās 8.4G of virtual process size (on a 64 bit Ubuntu x86 machine, Zulu 17 JRE).
I donāt think the problems you experienced after doing this is related to the problem seen here. The restore of a full backup from a different version of OH, especially a different major version of OH, is never going to work. The full backup includes the actual installed bindings files. An OH 3 binding is never going to work on an OH 4 and it is known that a ābadā binding can cause OH to not come up or fail in all sorts of strange and interesting ways.
During an upgrade, the cache (which is where the add-ons get installed to) is deletec forcing OH to install the bindings anew as part of the upgrade process. When you restore a backup, you skip that process and fill up the cache with a bunch of jar files that canāt work.
This is why the supported and recommended approach is to always restore a backup to the same version of OH the backup was taken from. Then, if desired, upgrade OH post the restore. Then OH will do a bunch of stuff including clearing the cache to make your configs work with the new version of OH.