Massive rule performance issues in OH 2.2

rswennen · January 2, 2018, 8:41am

I’m having massive performance issues when dialing up the number of rules in OH 2.2.

I recently wrote a set of 4 rules to capture the state of my dumb screens (rollershutter items) based on timings of commands sent to the screens.
The screens are controlled by external button command (seen by OH).
What I do in essence is capturing those commands and capturing the timestamp they are sent. Based on that I can calculate what % the screen moved (e.g. it’s 50% open). There are some additional specificities (like the central control unit changes automatically from up/down to stop after eg. 60 sec or 90 sec, I covered that with the expire binding and I keep a current state and target state item for each screen which is inverse of OH shutterscreen % values for easy integration in homekit afterwards).

The rule works very nice for 1 screen => me very happy and very proud

However when I start applying the same rule on multiple screens the performance drops dramatically very fast. It’s literally the same set of 4 rules for another screen each time (no errors etc, tested them all separately).

For 2 screens it takes 3 minutes between the .rules refresh action and the .rules loading action and the rules actually only start responding after 8 minutes

For 4 screens time between refresh and loading is 5 minutes and rules only start responding after 12 minutes

For 8 screens OH never even gets to the stage of refreshing the rules based on the changed .rules file. When rebooting they never active. So either it takes more than hours or it doesn’t get there (not sure what because logging stops, no errors).

In either case this is not scalable at all (even when they would get through after hours) because I have 36 screens in total in my house to manage so no way I will ever get there.

So I have a few questions for the experts:

are rules in OH so resource intensive ? have rule performance issues come up anywhere else ? Is it the way OH deals with rules/rules logic => trying to understand what is underlying root cause
Would a different hardware help (currently raspbian on RPI 3). I’m sure it would improve performance but my fear is seen the exponential behavior on just a few screens any hardware how performant it might be would probably struggle just to do 36 stupid screens with 4 rules each
Would performance be better if I try to migrate the logic to node red, so keep the items in OH and do all rule logic in node red and send states back to OH?

I’m puzzled because imo 8 times 4 rules should not be something that kills OH.
@rlkoshak I know from other posts you don’t really believe in performance issues for home automation and I tended to agree with you but killing all by adding 32 rules (8x4 rules) made me change my view

Just FYI below the items and rules I’m using for 1 screen you’ll see it some calculations and info passing to items but it’s not like it’s rocket science or complex mathematical calculations. It’s fairly straighforward.

Items file

Group gScreen

Rollershutter screen_bureau "Screen bureau" (gScreen)
Number screen_bureau_current_position "current pos [%.1f]" (gScreen)
Number screen_bureau_target_position "target pos [%.1f]" (gScreen)
String screen_bureau_state "state  [%s]" (gScreen) { expire="60s,STOPPED" }
DateTime screen_bureau_time (gScreen)

Rules

val Number screen_bureau_runtime = 30
var Number screen_bureau_gelopen = 0
var Number screen_bureau_newposition = 0

// Screen bureau
rule "screen bureau openhab item received command" 
	when 
		Item screen_bureau received command
	then 
		if ((receivedCommand == UP) || (receivedCommand == 0))
            {  
				VB4_2C_screen_bureau_op.sendCommand(ON)      
            }
        if ((receivedCommand == DOWN) || (receivedCommand == 100))
            {  
				VB4_2D_screen_bureau_neer.sendCommand(ON)      
            }
		if ((receivedCommand == STOP) && (screen_bureau_state.state.toString !== "STOPPED"))
            {  
				VB4_2C_screen_bureau_op.sendCommand(ON)      
            }
	end


rule "screen bureau open" 
	when 
		Item NB_RB_bureaudeurlinks_BP8_A received update ON or
		Item VB4_2C_screen_bureau_op received update ON
	then 
		if ((screen_bureau_state.state.toString == "OPENING") || (screen_bureau_state.state.toString == "CLOSING"))
            {  
				screen_bureau_state.postUpdate("STOPPED")      
            }
        else
        	{ 
				screen_bureau_state.postUpdate("OPENING")   
				screen_bureau_time.postUpdate(now.toString())
				if (screen_bureau_current_position.state == NULL) { screen_bureau_current_position.postUpdate(0) }
				screen_bureau_target_position.postUpdate(100)
				
            }
	end

rule "screen bureau dicht" 
	when 
		Item NB_RB_bureaudeurlinks_BP8_B received update ON or
		Item VB4_2D_screen_bureau_neer received update ON
	then 
		if ((screen_bureau_state.state.toString == "OPENING") || (screen_bureau_state.state.toString == "CLOSING"))
            { 
				screen_bureau_state.postUpdate("STOPPED")      
            }
        else
        	{ 
				screen_bureau_state.postUpdate("CLOSING")
				screen_bureau_time.postUpdate(now.toString())
				if (screen_bureau_current_position.state == NULL) { screen_bureau_current_position.postUpdate(100) }
				screen_bureau_target_position.postUpdate(0)
				       
            }
	end

rule "screen bureau stopped" 
	when Item screen_bureau_state changed
	then 
		if (screen_bureau_state.state.toString == "STOPPED")
			{ 
				
				if ((now.millis- (screen_bureau_time.state as DateTimeType).zonedDateTime.toInstant.toEpochMilli)/1000 > screen_bureau_runtime) 
					{
  						screen_bureau_current_position.postUpdate(screen_bureau_target_position.state)
						screen_bureau.postUpdate(100 - (screen_bureau_current_position.state as DecimalType))
					}
				else 
					{
						if ((screen_bureau_target_position.state as DecimalType) > (screen_bureau_current_position.state as DecimalType))
							{
								screen_bureau_gelopen = 100*((now.millis - (screen_bureau_time.state as DateTimeType).zonedDateTime.toInstant.toEpochMilli)/1000)/screen_bureau_runtime
								screen_bureau_newposition = ((screen_bureau_current_position.state as DecimalType) + screen_bureau_gelopen).intValue
								if (screen_bureau_newposition > 100) { screen_bureau_target_position.postUpdate(100) } else { screen_bureau_target_position.postUpdate(screen_bureau_newposition) } 
								screen_bureau_current_position.postUpdate(screen_bureau_target_position.state)
								screen_bureau.postUpdate((100 - (screen_bureau_current_position.state as DecimalType)).intValue)
							}
						if ((screen_bureau_target_position.state as DecimalType) < (screen_bureau_current_position.state as DecimalType))
							{
								screen_bureau_gelopen = 100*((now.millis - (screen_bureau_time.state as DateTimeType).zonedDateTime.toInstant.toEpochMilli)/1000)/screen_bureau_runtime
								screen_bureau_newposition = ((screen_bureau_current_position.state as DecimalType) - screen_bureau_gelopen).intValue
								if (screen_bureau_newposition < 0) { screen_bureau_target_position.postUpdate(0) } else { screen_bureau_target_position.postUpdate(screen_bureau_newposition) } 
								screen_bureau_current_position.postUpdate(screen_bureau_target_position.state)
								screen_bureau.postUpdate((100 - (screen_bureau_current_position.state as DecimalType)).intValue)
							}	
					}
			}	
	end

mstormi · January 2, 2018, 10:21am

Well, for sure all by itself, rule execution performance is never the reason for any problem like this.
But just like with any other programmable system, your programming arts can create unwanted dependencies and even recursion.

You should ramp up logging level to see what’s really happening. Try at least DEBUG on
org.openhab.core, org.openhab.model.script.rules, org.openhab.model.script.engine, org.apache.karaf
Insert logging statements into your rules to see in openhab.log when they’re called.
I’m pretty sure you have created some sort of recursion, locking or similar, maybe a timing issue, too.

Oh, and for sure you shouldn’t be restarting OH when it’s slow… restarting indeed takes a couple of minutes or up to an hour to become fully functional again if e.g. you have a ZWave network that is being reinitialized on every restart.

PS: what’s your application anyway ? Didn’t catch that from your description. What’s a ‘dumb screen’ and what do you need rollershutter items for ? Or do you refer to those dumb screens all around your house just showing pictures of nature all day ? Guess they’re called windows.

rswennen · January 2, 2018, 11:03am

Thanks will definitely start with increasing the logging. I’m definitely not an expert in programming so pretty sure I can tune the code.

The application is for window coverings (screens similar to rollershutters, like sunscreens).

My homeautomation (NIKOBUS) system just responds on UP DOWN commands but has no intelligence, meaning you don’t get feedback about the screen actually being ope, close or x% open (like Gira controllers do for example). Hence the “dumb screens”.

My rules just try to capture timestamps of when commands are sent and based on number of seconds screen has been running to calculate the position of the sunscreen or rollershutter.

rossko57 · January 2, 2018, 11:56am

General comments.

The logging will probably reveal what you need, but at the moment it’s not clear if delay is in rule triggering or item updating etc. I’d be adding temporary logInfo() into the rules to follow progress.

I think it still applies in OH2, some factor in compilation gives the effect that the first time any given rule is run, it takes longer. That’s usually acceptable, but something to be aware if you are looking closely into timings - discount the first run.

If you’ve got eight or so identical rules operating on different devices, it’s time to look into lambda functions and name techniques that allow associated Items and so on. It’s an initial effort to do, but results in more easily maintained code, one set of rules and functions for N devices.
Having said that, step one is to get a conventional rule working before adding complexity, so this would be something to come back to later.

opus · January 2, 2018, 12:28pm

Looking at your rules code I think a (or the) problem is:
In your rule “screen bureau openhab item received command” you trigger on “screen_bureau received command”.
This rule forces updates of the item “screen_bureau_state” (via the rules “screen bureau open” or “screen bureau dicht”)
which is triggering the rule “screen bureau stopped” ( triggered by “screen_bureau_state changed”).
This rule forces updates of the item “screen_bureau”, which is starting the process at the top again.

luckymallari · January 2, 2018, 1:55pm

Correct. This is why adding logging helps in troubleshooting.

rswennen · January 2, 2018, 3:35pm

@opus that’s correct but on purpose I use the postUpdate so this should never trigger the screen_bureau received command as it’s a status update and not a command.

I’ll look through it again with logging and enhanced debugging but I can’t find any recursion for now.

opus · January 2, 2018, 5:27pm

I’m not sure on that special topic and because you observe performance issues I’d look into that as well, even if my understanding of that would be different.

rlkoshak · January 3, 2018, 4:37pm

This isn’t entirely correct. I don’t believe in trying to fix performance problems until you KNOW you have performance problems. I don’t believe in premature optimizations because in the home automation context, you almost never run into such performance problems. It needlessly complicates your rules and often causes difficult to diagnose problems unto themselves.

I’ve actually have encountered some performance problems with OH, but they usually have to deal with trying to dim a bunch of independent lights at the same time or something like that.

Typically not. Some people run with thousands of lines of Rules code that do some pretty heavy math and other calculations and they do not report performance problems.

Maybe but it is hard to say without knowing what is causing the problem in the first place.

Perhaps, but again, it is hard to say without knowing what the problem is in the first place. If the problem is some sort of unintended recursion or the like you would just be moving the recursion to NodeRed and the problem will happen again.

Also, watch CPU usage using the top command. This could tell you if your OH is pounding on the CPU too hard which would also point in the direction that you have an inadvertant loop somewhere.

Glancing through the rules I see nothing that should be causing performance issues.

But like Markus said, it isn’t clear when you are doing restarts of OH and your description doesn’t make it clear to me what exactly you are timing. Is it the time it takes the rules to load into OH or the time it takes the rules to execute?