Openhab works slow, CPU of the RPI is high

domeninini · September 7, 2018, 6:10am

Since a few weeks, the openHAB system gets slower and slower and the CPU of the RPI (model 3) rises over 50 %, sometimes over 100 %. Updating the .items file takes more than 30 min, sometimes I had to restart the Pi after a few hours when the new file couldn’t be updated. I programmed both with Eclipse SHD and VS code. When I do not update my filed, the behavior of openhab is still slow, sometimes it takes a few minutes to turn a light on. My system isn’t the smallest, but an half year ago, it worked fine with about 400 items, and maybe 100 - 200 rules.
Is there a specific failure task I do not see ?

system:

RPi 3 B
openHABian: Rasbian GNU / Linux 9 (stretch)
openHAB 2.2.0-1 Release Built

mstormi · September 7, 2018, 7:29am

Most likely this is due to the way you edit files.
Any change to a config/rules file is causing OH to reparse and rework all the rules to be affected by that file, and that’s consuming many processing ressources particularly if you change .items files.
Also, changes queue up, every FILESAVE action will trigger another reparsing job to be added even if there’s still older ones in the queue.
I also think that standard OH behavior wrt this has changed some time ago.
And particularly if you share your config directory via Samba, it’s even worse.

Try to work on copies of your config files (of different name or in different directory) and only ever copy those to their real destination when you’re done editing.

domeninini · September 7, 2018, 8:01am

Thank you for the answer. Your right, I edit filed over samba. I’ll try it out. Do you recommend to stop openHAB (systemctl stop openhab2.service) and start it after copying or can I just leave the process run ?

mstormi · September 7, 2018, 8:23am

Of course not. Keep it running.

domeninini · September 7, 2018, 9:03am

Thank you, let’s try it !

rlkoshak · September 7, 2018, 3:57pm

There is some evidence that the use of explicit primitives in Rules slows down the parsing of your Rules. I can’t say for certain this is in fact a true thing, but it might be worth spending some time going through your Rules and avoiding the use of int, long, float, double, intValue, longValue, etc except where absolutely necessary (i.e. when passing the number to a method that requires the primitive.

Letting the Rules DSL treat them as Numbers (BigDecimal to be specific) seems to be easier for the parser to work with.

I’ve only anecdotal evidence that this is in fact a problem but it has drastically reduced the load times for at least a couple of users.

Also, the nearly 2 to 1 ratio between Items and Rules seems off to me. Do you have a lot of duplicated code? If so, reducing the number of Rules and number of lines of code by using fewer generically written rules will also reduce your load times.

Following Markus’s advice is certainly going to address your immediate problem, but I’m constantly amazed at how patient OH users on RPis are when it comes to how long it takes to restart OH and load the configs. The advice I provided above might address that a little.

mstormi · September 7, 2018, 6:51pm

How do you come to think we’re patient ? Almost banged that thing to the wall a couple of times, but hey, you get what you pay for (well, most of the time).

rlkoshak · September 7, 2018, 7:47pm

I’d have expected there to be much wailing and gnashing of teeth on the forum with multiple threads dozens of posts long full of complaints and brain storming on how to make it better and lots of open issues. Instead I see one user drop the little nugget that it takes 30 minutes to restart OH here and another one mentions a problem like OPs over there and so on. Count you I think I’ve seen about five users mention these long startup/parsing times on the forum.

It seems like people are suffering in silence, at least to one who is not suffering with you all.

I’m on an adequately provisioned VM so my OH takes 1-2 minutes to start up. I also have what is probably on the small side in terms of number of Items (~300) and Rules (45 rules at ~1900 LOC), though it is big enough that if this were a general problem I would see something notable. I can’t imagine how frustrated I’d be if it took 30 minutes. I might consider another hub in that case and worry that lots of people silently are choosing another option.

Do you know if there is an issue open on this? Do we have any firm data pointing at the root cause? I’ve never run on an RPi so I can’t say how long it has been like this. I don’t remember anyone complaining back in the old 1.x days.

I wonder if it is related to the LSP…

For the curious here are some quick command lines to quickly get a count of your various Things, Items, and Rules.

# Things count
curl -s -X GET --header "Accept: application/json" "http://localhost:8080/rest/things" | python -m json.tool | grep \"UID\" | wc -l

# Items Count
curl -s -X GET --header "Accept: application/json" "http://localhost:8080/rest/items?recursive=false" | python -m json.tool | grep name | wc -l

# Rules Count - run from the conf/rules folder
grep -R rule | wc -l

# Line of Rules Code Count - run from the conf/rules folder
find . *.rules | xargs wc -l

job · September 7, 2018, 11:13pm

Hmmm. Do i understand correctly, it may be best to use the type/class Number instead of all the other more technical numeric types in rules?

rlkoshak · September 8, 2018, 1:32am

Correct. I’ve found in general the Rules DSL gets fewer errors and parses rules faster (I’ve only seconds hand experience to draw on for this assertion) when you let the Rules DSL guess the type based on context instead of specifying the type everywhere. And at least one user saw a dramatic increase in rules parsing when they dropped primitives in favor of leaving the values as Numbers.

I suspect what is happening is the Rules DSL parser is faster because it can defer the type checking to runtime instead of compile time so it can skip a lot of work during parsing. But when you specify the type it has more work to do to make sure there types match everywhere.

I’ve no explanation for why primitives exacerbates the problem. Perhaps it needs to convert them back and forth in order to do the parse time type checking.

domeninini · September 11, 2018, 12:16pm

rlkoshak:

Correct. I’ve found in general the Rules DSL gets fewer errors and parses rules faster (I’ve only seconds hand experience to draw on for this assertion) when you let the Rules DSL guess the type based on context instead of specifying the type everywhere. And at least one user saw a dramatic increase in rules parsing when they dropped primitives in favor of leaving the values as Numbers.

I suspect what is happening is the Rules DSL parser is faster because it can defer the type checking to runtime instead of compile time so it can skip a lot of work during parsing. But when you specify the type it has more work to do to make sure there types match everywhere.

I’ve no explanation for why primitives exacerbates the problem. Perhaps it needs to convert them back and forth in order to do the parse time type checking.

So, I have 423 items and 7 (?) rules - strange. There should be about 200 rules in 14 files …
@rlkoshak, could you explain your setup or have you already made a post sometimes ? Sounds to me, that you do not recommend a RPi as a server. I choose one due to the connection to the Arduino world (wireless power plugs, ir via lirc, …). Is there an alternative ?

mstormi · September 11, 2018, 12:30pm

good one

You can use almost any HW and there’s many threads on the forum to discuss the pros and cons (search for e.g. ‘best hardware platform’).
A RPi obviously isn’t as fast as a PC but it is fine in terms of power even for large installations, and it’s damn cheap (also to keep a spare unit at hand) and in widespread use, so a very good choice.
Take care of a couple of things such as backup, though. While that’s important for any server, it’s even more so on Pis to avoid SD corruption issues.

rlkoshak · September 11, 2018, 7:39pm

Are you using all Rules DSL Rules? Experimental Rules are stored somewhere else and my commands above do not work with those.

Markus’ joke has to do with the fact that I’m one of the more prolific posters on the forum so there are probably more copies of various parts of my setup on this forum than anyone else’s. Do you have any specific questions? Any one of us could write a book on their personal home automation setup.

I wouldn’t say that. I just had no idea that startups took so long on the RPi and am surprised I haven’t seen more complaints about it on the forum here. The RPi is probably the most common platform to host OH on. It is very well supported on this forum. I don’t use an RPi because I have about a dozen other services I host so it is easier for me to manage a bunch of VMs on a single desktop server than to manage a bunch of separate RPis.

As Markus indicated, OH will run on just about anything as powerful or more so as an RPi2B.

mark_leonard_tuil · September 11, 2018, 9:04pm

So I had a quick look at my system. I have a Raspberry Pi 3 Model B+. I have 82 things, 726 items, 47 rule file with 245 rules containing 9918 lines of code. I would expect that the number of rules is probably a bit inflated, because I sometimes turn a rule ‘off’ by adding X to the triggering item name. Similarly, I have lots of comment lines in my rule files. I use alternative 1 of this thread to clean up my start-up process Cleaning up the startup process / renaming rules (windows possible).

Anyway, the stats of my last boot were as follows:

2018-09-09 22:51:03.499 start openhab from a reboot of raspberry pi 3b+
2018-09-09 22:52:41.315 items and things loaded
2018-09-09 23:03:28.601 rules loaded

So in all that is a start-up time of roughly 12 minutes.

domeninini · September 12, 2018, 10:01am

If DSL Rules means, that they are written in a text file with the ending .rules, I wrote all rules in DSL. I’m no friend of ‘programming graphical’. Hmm, now one day later, there are 15 rules without changing anything in the code. But the setup works faster, than before with editing the code offline and update it all at once instead of every few minutes - thanks to @mstormi .
@rlkoshak, you recommend to reduce the lines of code. Does comments slow down OH, too ? I just read your topic to replace the Thread::sleep() phrase in rules and use createTimer() instead. Now, I replaced every delay the way you wrote, but no rule works anymore. Is there something specific they have to think about in OH2.2 ? Here’s a short cut of my rules. I did not write any ‘top-lines’ like in OH1… Is this right ?

rule "mood"
		when 
    			Item mood received command 
		then	
     			if(receivedCommand == ON)
				 		{
						white_scaled_3.postUpdate(23)
						white_scaled_a39.postUpdate(53)
						white_scaled_a7.postUpdate(100)
						white_scaled_6.postUpdate(100)
						a1_scaled.postUpdate(100)
      					Power_Plug_Socket_B12.sendCommand("ON")
    					createTimer(now.plusSeconds(1),  [ |
      							Power_Plug_Socket_B2.sendCommand("ON")
    							createTimer(now.plusSeconds(1),  [ |
      									Power_Plug_Socket_B4.sendCommand("ON")
								    	createTimer(now.plusSeconds(1),  [ |
      											Power_Plug_Socket_B3.sendCommand("ON")
											    createTimer(now.plusSeconds(1),  [ |
      													Power_Plug_Socket_B16.sendCommand("ON")
													    createTimer(now.plusSeconds(1),  [ |
      															Power_Plug_Socket_B20.sendCommand("ON")
														])
												])
										])
								])
						])
      					mood.sendCommand("OFF")        
						} 
end

The delay is essential due to the transmitter of my wireless power sockets, which need some delay to separate different signals.

mstormi · September 12, 2018, 10:58am

Not sure what exactly you mean, so to be clear:
you only need to copy those files that you edited from offline source, don’t touch the others (if you do they’ll get reparsed, too, no matter if you really change anything in the code):

rlkoshak · September 12, 2018, 3:25pm

Probably not. Comments don’t take much to recognize and skip over.

Please don’t make blanked massive changes to anything like that. Change one Thread::sleep to a Timer, then test that it still works. Then move on to the next one. When you make massive changes like that without testing then who knows how many errors you may have introduced or which of the dozens of changes you made is the cause.

I recommend Design Pattern: Gate Keeper to solve this particular problem. Not only is the code more generic it will handling the timing on commands to these devices form all of your Rules, not just from the one Rule.

Also, the nesting of the Timers is not necessary. You can create them all at once and greatly simplify the code. Probably the easiest thing to do is to put the plug Items into a Group, I’ll call it Plugs.

Plugs.members.forEach[ plug, index | createTimer(now.plusSeconds(index), [ | plug.sendCommand(ON) ] ]

All that messy code replaced with a one-liner. It loops through all the members of the Plugs Group and creates a Timer to go off index seconds in the future meaning the first Plug’s Timer goes off at zero seconds, the second one at one seconds, and so on.

mstormi · September 12, 2018, 3:42pm

To my knowledge noone has ever come up with a proper analysis so even no developer knows where to start looking so noone jumps in.
I used to open an issue on startup times related to ordering of processing of config files. That now was reworked and seems to be working at least most of the time, and most users to still be affected and aware of this are probably using the workaround I mentioned in my last post so it’s no longer an issue.

You gave a good hint on primitives, but changing code is the userland approach and I don’t think anyone ever raised a Github issue for this to tackle the root cause in the parser. Probably they are as unsure as are you and noone wants to disgrace himself by claiming this to be a bug. When I once did, Kai’s answer was basically “hey, it’s just XX microseconds on my Mac”.

Even with those two issues removed or worked around, I still wouldn’t say it’s optimized.
But it’s so difficult to pinpoint the root cause of “slow parsing”, particularly as we don’t have sufficient insight into the parser (I don’t know of any proper debug settings to get meaningful output, nor any profiling tool).

mstormi · September 16, 2018, 5:56pm

FWIW, I just stumbled across this ancient post and tried. Can’t provide any testing results of statistical relevance but it indeed seems to have a positive effect.
Convinced me enough to make a PR for openHABian of it right away

rlkoshak · September 17, 2018, 4:42pm

Nice find! If I could remember every posting I’ve read on this forum I could probably file 100 issues and PRs like this.