Openhab works slow, CPU of the RPI is high

mark_leonard_tuil · September 11, 2018, 9:04pm

So I had a quick look at my system. I have a Raspberry Pi 3 Model B+. I have 82 things, 726 items, 47 rule file with 245 rules containing 9918 lines of code. I would expect that the number of rules is probably a bit inflated, because I sometimes turn a rule ‘off’ by adding X to the triggering item name. Similarly, I have lots of comment lines in my rule files. I use alternative 1 of this thread to clean up my start-up process Cleaning up the startup process / renaming rules (windows possible).

Anyway, the stats of my last boot were as follows:

2018-09-09 22:51:03.499 start openhab from a reboot of raspberry pi 3b+
2018-09-09 22:52:41.315 items and things loaded
2018-09-09 23:03:28.601 rules loaded

So in all that is a start-up time of roughly 12 minutes.

domeninini · September 12, 2018, 10:01am

If DSL Rules means, that they are written in a text file with the ending .rules, I wrote all rules in DSL. I’m no friend of ‘programming graphical’. Hmm, now one day later, there are 15 rules without changing anything in the code. But the setup works faster, than before with editing the code offline and update it all at once instead of every few minutes - thanks to @mstormi .
@rlkoshak, you recommend to reduce the lines of code. Does comments slow down OH, too ? I just read your topic to replace the Thread::sleep() phrase in rules and use createTimer() instead. Now, I replaced every delay the way you wrote, but no rule works anymore. Is there something specific they have to think about in OH2.2 ? Here’s a short cut of my rules. I did not write any ‘top-lines’ like in OH1… Is this right ?

rule "mood"
		when 
    			Item mood received command 
		then	
     			if(receivedCommand == ON)
				 		{
						white_scaled_3.postUpdate(23)
						white_scaled_a39.postUpdate(53)
						white_scaled_a7.postUpdate(100)
						white_scaled_6.postUpdate(100)
						a1_scaled.postUpdate(100)
      					Power_Plug_Socket_B12.sendCommand("ON")
    					createTimer(now.plusSeconds(1),  [ |
      							Power_Plug_Socket_B2.sendCommand("ON")
    							createTimer(now.plusSeconds(1),  [ |
      									Power_Plug_Socket_B4.sendCommand("ON")
								    	createTimer(now.plusSeconds(1),  [ |
      											Power_Plug_Socket_B3.sendCommand("ON")
											    createTimer(now.plusSeconds(1),  [ |
      													Power_Plug_Socket_B16.sendCommand("ON")
													    createTimer(now.plusSeconds(1),  [ |
      															Power_Plug_Socket_B20.sendCommand("ON")
														])
												])
										])
								])
						])
      					mood.sendCommand("OFF")        
						} 
end

The delay is essential due to the transmitter of my wireless power sockets, which need some delay to separate different signals.

mstormi · September 12, 2018, 10:58am

Not sure what exactly you mean, so to be clear:
you only need to copy those files that you edited from offline source, don’t touch the others (if you do they’ll get reparsed, too, no matter if you really change anything in the code):

rlkoshak · September 12, 2018, 3:25pm

Probably not. Comments don’t take much to recognize and skip over.

Please don’t make blanked massive changes to anything like that. Change one Thread::sleep to a Timer, then test that it still works. Then move on to the next one. When you make massive changes like that without testing then who knows how many errors you may have introduced or which of the dozens of changes you made is the cause.

I recommend Design Pattern: Gate Keeper to solve this particular problem. Not only is the code more generic it will handling the timing on commands to these devices form all of your Rules, not just from the one Rule.

Also, the nesting of the Timers is not necessary. You can create them all at once and greatly simplify the code. Probably the easiest thing to do is to put the plug Items into a Group, I’ll call it Plugs.

Plugs.members.forEach[ plug, index | createTimer(now.plusSeconds(index), [ | plug.sendCommand(ON) ] ]

All that messy code replaced with a one-liner. It loops through all the members of the Plugs Group and creates a Timer to go off index seconds in the future meaning the first Plug’s Timer goes off at zero seconds, the second one at one seconds, and so on.

mstormi · September 12, 2018, 3:42pm

To my knowledge noone has ever come up with a proper analysis so even no developer knows where to start looking so noone jumps in.
I used to open an issue on startup times related to ordering of processing of config files. That now was reworked and seems to be working at least most of the time, and most users to still be affected and aware of this are probably using the workaround I mentioned in my last post so it’s no longer an issue.

You gave a good hint on primitives, but changing code is the userland approach and I don’t think anyone ever raised a Github issue for this to tackle the root cause in the parser. Probably they are as unsure as are you and noone wants to disgrace himself by claiming this to be a bug. When I once did, Kai’s answer was basically “hey, it’s just XX microseconds on my Mac”.

Even with those two issues removed or worked around, I still wouldn’t say it’s optimized.
But it’s so difficult to pinpoint the root cause of “slow parsing”, particularly as we don’t have sufficient insight into the parser (I don’t know of any proper debug settings to get meaningful output, nor any profiling tool).

mstormi · September 16, 2018, 5:56pm

FWIW, I just stumbled across this ancient post and tried. Can’t provide any testing results of statistical relevance but it indeed seems to have a positive effect.
Convinced me enough to make a PR for openHABian of it right away

rlkoshak · September 17, 2018, 4:42pm

Nice find! If I could remember every posting I’ve read on this forum I could probably file 100 issues and PRs like this.

job · September 18, 2018, 2:28pm

I tried that limiting, but on my system the patch had some negative effect. I guess I am already past the 650mb limit.

I currently moved openHAB to my file server Core2Duo/4gb ram. openHAB uses 1.5gb and about 300 threads. So this explains somehow my sporadic lags in rule execution. Looks like it is time to move to another hardware platform…

mstormi · September 18, 2018, 3:28pm

Huh?
Note the patch above only increases the static memory footprint (process size). It’s up to the OS to select what of that memory to page out.
On startup that can result in some additional paging, but once you access a (compiled) rule that was paged out, it just needs to be paged back in, and that should be a lot faster than to recompile at that stage.
Note I am not using 650MB but just 400M on my 1GB Pi, that resulted in an increase to the static process size of ~130MB. Also, I’m swapping (paging) to my NAS as described here.

There’s a thread somewhere people telling how large their Java process is, it’s usually around 650 MB as is mine.
If you’re at 1.5G it sounds like you need to investigate your system. It shouldn’t use 1.5 G. Do you use any statically defined large objects in your rules ? I’m referring some MP3 files, and Java sort of seems to pre-load them. At least I had choppy audio on those and the patch made that go away.
Or any binding to do some strange caching or to keep large objects such as database indices ? If so, try to reduce those settings.
Is it really 1.5G of resident memory use or just the process size?
And you’re still having issues with 4G ?? That should be more than enough.

job · September 18, 2018, 10:11pm

I understood the patch sets the min and max memory of the java process to 650mb, effectively making it fixed size. What i discovered when using the patch is a slow and unresponsive BasicUI (no icons etc.).

This is my output from htop, with the patch removed.

According to my interpretation, my installation of openHAB uses 1.3 MB of RAM. (And about 300 threads, because the number of threads goes down by about 300, if i stop openHAB)

The audio files in the /etc/sounds folder are about 7 MB, icons are about 2.5 MB.

Looking at my memory, i would guess i should use larger numbers, 1500 instead of 650, to make this patch work for me currently.

mstormi · September 19, 2018, 7:51am

1.3 GB resident indeed.
Are you running on ARM ? Which Java ?
And how much RAM does it use when started on your RPi ?

Try 3000m or even 4000m for the max value as it affects static process size and you already seem to use that much.

job · September 19, 2018, 9:32am

No, Amd64, Ubuntu 18.04.1 LTS, Core2Duo with 4 GB of Ram (Board Limit).
It’s Oracle

java version "1.8.0_181"
Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)

I have to check that, i moved openHAB to the file server a few weeks ago, because of some extreme lagging of my motion sensors rule execution. Not nice if the light turns on 3 minutes after the motion is detected.

I think i should check, how to reduce the memory footprint. I have about 150 things and about 2000 items. Is it possible to check the used memory per binding somehow?

mstormi · September 19, 2018, 10:36am

Now sorry to tell but you sort of made a mess of this thread.
It was about running on RPis, and the memsize recommendations were specific to that (i.e. a box to have an ARM processor architecture and 1GB of RAM, with Zulu Java, usually resulting in ~650 MB process size).

But now you state you run on some 64bit x86 and Oracle Java which is totally different as it’ll use 64bit and thus much more memory for the same Java code and openHAB config.
No wonder it broke your system when you applied that RPi specific memory optimization setting.

Do you mean to say that you still have problems even with that 4GB Intel machine ?
(btw, amd64 on Intel CPU does not sound right either).

Given your htop output of 3,8GB(!) Java uses, that HW/OS/Java combo possibly was a worse choice than a RP is as 3,8 out of 4 GB is a higher fraction than 650MB is of 1GB.

Either way, if you want to compare your figures to ours you should do this on the same processor+Java basis.
So start your config on your Pi again to see how much RAM it takes there.

job · September 19, 2018, 12:29pm

Sorry, i will delete my posts if you prefer.

illnesse · September 19, 2018, 6:31pm

No worries. I’m not sure what’s up with his attitude, might be the language barrier. “German English” can sound kinda rude

rlkoshak · September 19, 2018, 6:52pm

Agreed with illnesse. Please leave your posts. But in the future when being helped to debug a problem, particularly one of this nature, you must tell those who are helping you EVERY modification you make, especially if you reinstall. Otherwise we are operating with incorrect information and assumptions.

Out of curiosity, you are not running in Docker, right? Because when running in Docker the amount of RAM returned is the total for the container, not just the OH process, which can be significantly higher.

Minimal OH installed on an RPi

Modest OH configuration running in Docker

domeninini · September 26, 2018, 6:40am

A short summarize of my origin problem: I only make modifications ‘offline’ and only load up file by file to the rpi, so I can check after every upload if the changes work. The system works better as before with OH2.2, but it’s still not as fast as OH 2.1. I think, this is what we have to deal with.
The contra of this technique is: while programming offline, there’s no error-highlighting in VS-Code, I think. So, I really have to be careful while writing. Eclipse Smart Home Designer does not work on my Mac. Is there an other possibility to find out errors in syntax ?

rlkoshak · September 26, 2018, 4:43pm

The only syntax checking available is VSCode which requires a connection to a running OH or saving the file in the appropriate folder and watching openhab.log for errors.

However, you do not need to be editing the files on the RPi where OH is running to get syntax checking. You just need to have a network connection to a running OH server. So you can edit the files on another machine and move them over as desired.

cafri · June 10, 2019, 6:56pm

Platform information:
- Hardware: RPi 3B+
- OS: Openhabian
- Java Runtime Environment:
  
  openjdk version “1.8.0_202”
  OpenJDK Runtime Environment (Zulu8.36.0.152-CA-linux_aarch32hf) (build 1.8.0_202-b152)
  OpenJDK Client VM (Zulu8.36.0.152-CA-linux_aarch32hf) (build 25.202-b152, mixed mode, Evaluation)
- openHAB version: Was 2.4 Stable and upgraded to 2.5 M1 (read inside)
- My system includes Z-Wave modules
- All of my Things were created using Paper UI
- All of my Items are defined in files
- All of my rules are defined in file
- Using SSD drive (no SD card at all)

My experience with OpenHAB started with version 2.4 Stable on a RPi 3B+ with Openhabian.

I have always been using VS on a Windows machine to directly edit the rule files, which resides on the RPi.

I was using DIRECT saves and SAMBA, both of which (I found out 2 days ago) are not recommended.

In any case, even though I was working this way, whenever I saved the changes I made to a specific rules file, it was immediately “refreshed”, and after a few seconds I could already see the new logic in action.

About 2 days ago, it was no longer the case.
After I saved some changes, the system stopped running all my rules (later on I understood it probably just became very slow).

Stopping and starting the openhab2 service (including clearing the cache) did not solve the issue (the system was slow even after restart), and I also came across the following error:

Dispatching event to subscriber 'org.eclipse.smarthome.io.monitor.internal.EventLogger@138f5b7' takes more than 5000ms.

After reading this, I upgraded to OH 2.5 M1, using the instructions here.

After that, when starting the openhab2 service, it works well (or at least it seems to).
I did come across these while writing this message:

2019-06-10 21:06:28.906 [ERROR] [org.eclipse.smarthome.core          ] - bundle org.eclipse.smarthome.core:0.11.0.oh250M1 (102)[org.eclipse.smarthome.core.internal.scheduler.PeriodicSchedulerImpl(106)] : doFindMethod: Suitable but non-accessible method found in class org.eclipse.smarthome.core.internal.scheduler.PeriodicSchedulerImpl
2019-06-10 21:06:28.910 [ERROR] [org.eclipse.smarthome.core          ] - bundle org.eclipse.smarthome.core:0.11.0.oh250M1 (102)[org.eclipse.smarthome.core.internal.scheduler.PeriodicSchedulerImpl(106)] : findMethod: Suitable but non-accessible method setScheduler found in class org.eclipse.smarthome.core.internal.scheduler.PeriodicSchedulerImpl, subclass of org.eclipse.smarthome.core.internal.scheduler.PeriodicSchedulerImpl
2019-06-10 21:06:28.914 [ERROR] [org.eclipse.smarthome.core          ] - bundle org.eclipse.smarthome.core:0.11.0.oh250M1 (102)[org.eclipse.smarthome.core.internal.scheduler.PeriodicSchedulerImpl(106)] : bind method [setScheduler] not found; Component will fail
2019-06-10 21:06:28.917 [ERROR] [org.eclipse.smarthome.core          ] - bundle org.eclipse.smarthome.core:0.11.0.oh250M1 (102)[org.eclipse.smarthome.core.internal.scheduler.PeriodicSchedulerImpl(106)] : bind method [setScheduler] not found
2019-06-10 21:06:28.921 [ERROR] [org.eclipse.smarthome.core          ] - bundle org.eclipse.smarthome.core:0.11.0.oh250M1 (102)[org.eclipse.smarthome.core.internal.scheduler.PeriodicSchedulerImpl(106)] : doFindMethod: Suitable but non-accessible method found in class org.eclipse.smarthome.core.internal.scheduler.PeriodicSchedulerImpl
2019-06-10 21:06:28.923 [ERROR] [org.eclipse.smarthome.core          ] - bundle org.eclipse.smarthome.core:0.11.0.oh250M1 (102)[org.eclipse.smarthome.core.internal.scheduler.PeriodicSchedulerImpl(106)] : findMethod: Suitable but non-accessible method unsetScheduler found in class org.eclipse.smarthome.core.internal.scheduler.PeriodicSchedulerImpl, subclass of org.eclipse.smarthome.core.internal.scheduler.PeriodicSchedulerImpl
2019-06-10 21:06:28.926 [ERROR] [org.eclipse.smarthome.core          ] - bundle org.eclipse.smarthome.core:0.11.0.oh250M1 (102)[org.eclipse.smarthome.core.internal.scheduler.PeriodicSchedulerImpl(106)] : unbind method [unsetScheduler] not found; Component will fail
2019-06-10 21:06:28.929 [ERROR] [org.eclipse.smarthome.core          ] - bundle org.eclipse.smarthome.core:0.11.0.oh250M1 (102)[org.eclipse.smarthome.core.internal.scheduler.PeriodicSchedulerImpl(106)] : unbind method [unsetScheduler] not found

But I believe these are a known issue, and are not related to my issue.

Still, whenever I change one of the rules files, it becomes slow.

I stopped editing the files directly on OH, and instead I upload them using MobaXTerm.

Sometimes when uploading the files, I receive the following warning:

[WARN ] [el.core.internal.ModelRepositoryImpl] - Configuration model 'filename.rules' is either empty or cannot be parsed correctly!

Sometimes, after this warning, the file is being re-loaded, and sometimes not.
This made me also try to upload the new rules files as

filename.rules.new

and then use the following command to rename it:

mv filename.rules.new filename.rules

I am not sure if it important or not, but the OWNER of my rules files is openhabian and the GROUP is openhab.

I modified it using:

sudo chown -R openhabian:openhab /etc/openhab2/rules

This is because when the owner was openhab I could not upload the files using MobaXTerm.

Still after working this way, sometimes the new rules file is not refreshed again, after the “is either empty or cannot be parsed correctly!” error.

Just to demonstrate how the system becomes slow, I have a Test switch item:

Switch ivTest_Rule_Trigger "Test Rule Trigger" {expire="2s,command=OFF"}

I created a test.rules file with the following content:

rule TestRule
when
        Item ivTest_Rule_Trigger changed to ON
then
        logInfo("Test Rule", "New Test 1")
end

Whenever I click the Test switch on Basic UI, all rule steps are executed very quickly:

2019-06-10 20:22:46.694 [ome.event.ItemCommandEvent] - Item 'ivTest_Rule_Trigger' received command ON
2019-06-10 20:22:46.717 [vent.ItemStateChangedEvent] - ivTest_Rule_Trigger changed from OFF to ON
2019-06-10 20:22:46.770 [INFO ] [pse.smarthome.model.script.Test Rule] - New Test 1
2019-06-10 20:22:49.663 [ome.event.ItemCommandEvent] - Item 'ivTest_Rule_Trigger' received command OFF
2019-06-10 20:22:49.681 [vent.ItemStateChangedEvent] - ivTest_Rule_Trigger changed from ON to OFF

However, once I update one of the other rules files (uploading a new version of it), it takes much longer for the logInfo comment to be executed:

2019-06-10 20:25:43.261 [ome.event.ItemCommandEvent] - Item 'ivTest_Rule_Trigger' received command ON
2019-06-10 20:25:43.270 [vent.ItemStateChangedEvent] - ivTest_Rule_Trigger changed from OFF to ON
2019-06-10 20:25:48.969 [ome.event.ItemCommandEvent] - Item 'ivTest_Rule_Trigger' received command OFF
2019-06-10 20:25:48.998 [vent.ItemStateChangedEvent] - ivTest_Rule_Trigger changed from ON to OFF
2019-06-10 20:27:47.160 [INFO ] [pse.smarthome.model.script.Test Rule] - New Test 1

(this is a 2-minute delay example, but I also had an almost-5-minute delay example)

Looking at the output of the TOP command, it seems like it is using high CPU, after uploading an updated rules filed (even a few minutes later):

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
32231 openhab   20   0  657600 513284  13932 S 139.1 51.4 135:19.98 java

Executing the commands listed here, it seems like I have:

41 Things
151 Items
139 Rules
2503 lines of rules

(note that since I’m learning OpenHAB, I have MANY commented-out lines, that I removed after making all kinds of tests during my learning)

I’d be happy to share any additional data that may be required to analyze the root cause of this issue, and see if it is a problem on my side, or a bug that has to be resolved.

rlkoshak · June 10, 2019, 7:43pm

This can be safely ignored. When saving the file what happens is there are two file events, one when it is is opened for writing and one when it is done being written. The warning you see for this is happening on that first event as OH tries to read the file before it’s been written to.

As long as the openhab group has read write permissions on the files it should be OK.

This is not an unexpected amount of a delay for an RPi. I’ve seen delays reported of over half an hour. That is why Markus in the post you linked to recommends against editing files using direct saves. Every file change results in a new parsing activity and if you don’t wait long enough between file saves they pile up.