Rule execution lags, possibly Java GC

Hi Markus

Thanks for the reply!!

I am on a non-raspberry pi system… just a generic ARM platform (as you suspected). I did try Openhabian about 1-2 years ago and there were issues (at the time) with certain file paths not existing during install, so the installer didn’t successfully complete. As such I just performed a standard install of Openhab and have gone that route ever since. I can try Openhabian again to see if it now if it gets through the install, though Im guessing you’re indicating that it wont do at this moment.

Not sure Ive been through the ZRAM stuff before… I should just follow your guide you link above?? ZRAM

As for the memory increase, I did use -Xms & -Xmx , though I also recall there was another parameter or so I used…though I cant recall immediately what they are. My system is running Azul zulu-embedded-8, which I believe is a 32 bit version of Java (but running on a 64 bit OS) so I allocated for a 32 bit system version of Java. Do you know of any articles I should look at or keywords I might look for on the community forum?

My system has about 2+GB free to play with, so plenty of overhead to adjust things.

Real memory 816.68 MiB used / 1.09 GiB cached / 3.27 GiB total

Thanks

No, that’s for openHABian only. I guess ZRAM would work on your box, too, but you’ll have to risk, try and do the work yourself without support unless you migrate to openHABian.

1 Like

Have you actually tried the cron-rule “keep active” workaround?
Curious to see if it works here.

1 Like

@rossko57 Ive not tried that… I assume its an actual bash file that you run on a frequency? I quickly searched the forum here for a match on that. Have you a code snippet or link to what I need to do/setup?

More than happy to give it a go (will take a few days of course)… but happy to try!

@mstormi Ill give @rossko57’s suggestion a go over a couple of days… see how that works and if it solves the problem. If not, Im happy to take a shot at trying Java/Zram or happy to see if OpenHabian now installs without problem. Ill keep feeding back here as I go!

Nope. In each rules file that seems to be affected by “swapping out” of rules, you add another rule with a periodic cron trigger, that does nothing much. The theory is that keeping that rule active also stops other rules being swapped out of memory.

Be careful if applying to multiple rule files, to keep every rule “name” unique.

1 Like

@rossko57 Ok… Ill add it in.

Ill try this one out from the list of whats on that link.

    rule "keep this thing in memory"
    //use the compiled code in this rule file every minute
    when
        Time cron "0 0/1 * 1/1 * ? *"
    then
        return;
    end

As only my lights are time sensitive really (I dont mind if other things take 10-20 seconds), Ill only apply it to my lighting rules file for now. I guess 1 minute is a bit overkill really… though ill try it with that frequency initially over a period of 4-5 days and report back here when I have some results.

Thanks

Some have reported the rule needs to do something, maybe a logInfo().
Maybe just making an unused var fred = null would do, don’t know.

You would think so, but I expect all this is very system dependent.

If leaving this kind of rule in service, I’d tune the cron to go off at second 45 or suchlike, to avoid everything else that fires on whole minutes.

1 Like

@rossko57 Fair point. Ive gone with a switch that has no purpose other than to change state every 1 minute. I can watch it change to check the rule is running and dont have to fill my logs up with anything… so an easy way to check at any point in time that things are kept in memory and interacting with variables in my items list.

Item

Switch 	keepalive

Rule

    rule "keep this thing in memory"
    //use the compiled code in this rule file every minute
    when
        Time cron "0 0/1 * 1/1 * ? *"
    then
    	if (keepalive.state == OFF) {keepalive.postUpdate("ON")}
    	else keepalive.postUpdate("OFF")
    end

Sitemap (Hidden in a sub-menu)

Switch item=keepalive			mappings=["ON"="On","OFF"="Off"]

You should look in your events.log :wink:
You could halve the traffic by using postUpdate instead of sendCommand

1 Like

@rossko57 Hah… ok… Have edited the code snippet on on my system and the above email in case anyone uses it in future. :slight_smile:

Ill let you know how things are going in a few days time!!

Thanks for the help.

Could you please help with finding optimum Java parameters?
Your system exhibits the “wanted” behavior we need to have to test if to change Java options makes a difference. We don’t have that chance very often that’s why I’d be thankful if you help here.

Could you please start your java with these extra options. Cumulative, i.e. try #1 first, if that does not make a difference try #1 + #2, finally #1 + #2 + #3
I don’t know your install method but there’s an environment variable EXTRA_JAVA_OPTS usually set in /etc/default/openhab2 where you can add options.

  1. -XX:-UseAdaptiveSizePolicy -XX:+UseParallelGC -XX:InitiatingHeapOccupancyPercent=95
  2. -XX:MinHeapFreeRatio=15 -XX:MaxHeapFreeRatio=30
  3. -XX:+UseCodeCacheFlushing

I’m not sure but I think you should remove the -Xms parameter before you test these new ones.
Try if removing Xms (and not adding anything else in the first place) results in “reliable” occurence of the delay problem, if so leave Xms out in further tests.

1 Like

@rossko57 The on/off switch rule
Whilst that rule has remained running fine; this morning, first thing I tried was pressing one of my RF buttons and sadly, I encountered the 20+ second delay. My rule for the on/off switch is in my lighting.rules file, where the rules for my RF button presses are…so… had it just kept the one rule in memory rather than the whole rules file? (Im not expecting an answer to that question, it was just my thought on how java/rules interact). More than happy to try doing something different with that rule if you have any other suggestions?

@mstormi Changing the Java options
More than happy to give these a go. It may take me anything up to 3-7 days to fully test them through, as theres the very occasional time where things do work flawlessly without any delays, so in that circumstance and with 3 x sets of things to try, having to wait for the issue to occur etc. Well, you get the idea.

What I will do is:

  • add/change 1 setting at a time as you request
  • perform a full restart of the system after each change
  • when its started up ill press an RF button so we know the rules are working/loaded,
  • test the change for 24-48 hours to see if they delay problem occurs, with any 20+ second delay classing as a failure (RF working normally is usually 0.5 a second to 1.5 seconds experience from button press to light on, so obviously ill note any 2 seconds upwards experiences).
  • If I encounter the delay issue, Ill move though setting 1, 2 until I reach 3.

Ill also turn off the on/off switch rule for now.

Ill feed back here when I can.

Thanks

That’s very long. The usual report is 5-8 seconds on a Pi (maybe others get same effect with shorter delay but it gets accepted/ignored).
We must not forget there may be multiple causes for similar symptoms.
Can we see the events.log, there should be a clear delay between rule trigger event and expected action in this case.

I don’t know how the “keep alive” rule is supposed to work, some report success and some like you don’t :smiley:
I don’t see the problem - but I run on fat Windows boxes, different resources, different java, etc. I do see the very-first-rule-run delay of a couple of seconds, but not delay after long rest.

1 Like

@rossko57 I will up my logging on the events.log (as I think I have it turned down) and gather that info.

@mstormi Further to my earlier post from about 1 hour ago, it looks as though -XX:+UseParallelGC is set on by default on Azul Zulu Java 8 and it looks as though if you send the command in the EXTRA_JAVA_OPTS telling it the same command again, Java doesnt like that and Openhab wont start. Though in a nutshell, as my 1st settings option Im going to use:

EXTRA_JAVA_OPTS="-XX:-UseAdaptiveSizePolicy -XX:InitiatingHeapOccupancyPercent=95"

Though if you want to know what I tested/how I reached this conclusion, my tests/reseach is below:

1st try (Openhab wont start, no http jetty message and nothing at all in the logs, but fresh log files are created if I delete them and start Openhab)

EXTRA_JAVA_OPTS="-XX:-UseAdaptiveSizePolicy -XX:+UseParallelGC -XX:InitiatingHeapOccupancyPercent=95"

2nd try (Same result as 1st try)

EXTRA_JAVA_OPTS="-XX:-UseAdaptiveSizePolicy -XX:+UseParallelGC"

3rd try (Openhab loads up and works)

EXTRA_JAVA_OPTS="-XX:-UseAdaptiveSizePolicy"

4th try (Openhab loads up and works)

EXTRA_JAVA_OPTS="-XX:-UseAdaptiveSizePolicy -XX:InitiatingHeapOccupancyPercent=95"

As such, I guess it doesnt like the -XX:+UseParallelGC

As mentioned Im on Zulu JDK8 from Azul. I found this site which allows you to search the list of options within Java builds/versions and that I think suggests UseParallelGC should be a valid option(?) and that it is a default on setting for Zulu JDK8(?).

https://chriswhocodes.com/zulu_jdk8_options.html

I noted there was a UseParallelOldGC, so thought id try

5th Try (OLD GC version)

EXTRA_JAVA_OPTS="-XX:-UseAdaptiveSizePolicy -XX:+UseParallelOldGC -XX:InitiatingHeapOccupancyPercent=95"

But Openhab wouldnt start with that either.

So finally, I tried a minus sign “-” against UseParallelGC and Openhab Started up…

6th Try (Used a - minus against Parallel GC)

EXTRA_JAVA_OPTS="-XX:-UseAdaptiveSizePolicy -XX:-UseParallelGC -XX:InitiatingHeapOccupancyPercent=95"

So Im assuming in some way that UseParallelGC is turned on as default within Azul Zulu 8 Java, and it doesnt like being told to use it twice on startup.

As such, Im going to omit it from the EXTRA_JAVA_OPTS and go with my command line from my 4th try (if you agree).

Thanks

You can try combinations of Java options offline like this.
This also shows the available -XX options.
Also note the output unlike what you say UseParallelGC = false on my box (Azul 8 on RPi 3)

[13:35:48] openhabian@openhabianpi:~$ java -XX:+PrintFlagsFinal -version|grep ParallelGC
    uintx ParallelGCBufferWastePct                  = 10                                  {product}
    uintx ParallelGCThreads                         = 0                                   {product}
     bool ParallelGCVerbose                         = false                               {product}
     bool UseParallelGC                             = false                               {product}
1 Like

@mstormi Ok, well I guess it may be turned off then!

java -XX:+PrintFlagsFinal -XX:+PrintFlagsInitial -version|grep ParallelGC

    uintx ParallelGCBufferWastePct                  = 10                                  {product}
    uintx ParallelGCThreads                         = 0                                   {product}
     bool ParallelGCVerbose                         = false                               {product}
     bool UseParallelGC                             = false                               {product}

And this following command line doesnt throw any errors or complain, though I appreciate its not actually running a java file. It does at least confirm my Java version and that -XX:+UseParallelGC doesnt throw any errors.

java -XX:-UseAdaptiveSizePolicy -XX:+UseParallelGC -XX:InitiatingHeapOccupancyPercent=95 -version

openjdk version "1.8.0_152"
OpenJDK Runtime Environment (Zulu Embedded 8.25.0.76-linux-aarch32hf) (build 1.8.0_152-b76)
OpenJDK Client VM (Zulu Embedded 8.25.0.76-linux-aarch32hf) (build 25.152-b76, mixed mode, Evaluation)

Java; I have to admit is not an vast area of expertise for me… and you’re welcome to shoot me down :slight_smile: Though Im trying to understand. I read the below article and tried -XX:+UseParNewGC which does work on the EXTRA_JAVA_OPTIONS… (as in, Openhab does start up).

is that any use for me to try/use for the purposes of these tests?

@rossko57 @mstormi

Im slowly working through the Java options, Im on step 2 currently, though I am using the -XX:+UseParNewGC

Rossko57, you asked me for an event log when the delay happened, which Ive managed to get one here for you:

2020-04-29 20:09:57.137 [vent.ItemStateChangedEvent] - harmonyhub_device_HarmonyHub_51535350_buttonPress changed from UNDEF to PowerOff
2020-04-29 20:09:57.567 [vent.ItemStateChangedEvent] - harmonyhub_device_HarmonyHub_51535350_buttonPress changed from PowerOff to UNDEF
2020-04-29 20:10:43.743 [vent.ItemStateChangedEvent] - RFDataLights changed from NULLFORTESTING to 3D1301
2020-04-29 20:10:56.670 [vent.ItemStateChangedEvent] - Current_DateTime changed from 2020-04-29T20:09:56.642+0100 to 2020-04-29T20:10:56.649+0100
2020-04-29 20:10:56.913 [ome.event.ItemCommandEvent] - Item 'gBedroom1' received command ON
2020-04-29 20:10:56.940 [ome.event.ItemCommandEvent] - Item 'lsBed1' received command ON
2020-04-29 20:10:56.954 [ome.event.ItemCommandEvent] - Item 'lsBed1a' received command ON
2020-04-29 20:10:56.961 [vent.ItemStateChangedEvent] - RFDataLights changed from 3D1301 to NULLFORTESTING
2020-04-29 20:10:56.964 [nt.ItemStatePredictedEvent] - lsBed1 predicted to become ON
2020-04-29 20:10:56.977 [nt.ItemStatePredictedEvent] - lsBed1a predicted to become ON
2020-04-29 20:10:56.992 [vent.ItemStateChangedEvent] - lsBed1 changed from OFF to ON
2020-04-29 20:10:56.994 [vent.ItemStateChangedEvent] - lsBed1a changed from OFF to ON

This one shows a 13 second delay. The 3rd line is me pressing the button.

On the line that says “RFDataLights changed from 3D1301 to NULLFORTESTING” I put this into my rules to reset the RFdataLights variable so that the event log would always show a change event when I pressed a button.

Ill keep on with the Java settings, but hope this provides the event.log detail you wanted.

Yes, that’s clear enough.

It is a curiosity, but I think only a coincidence, that your Current_DateTime variable fires up in the same second. (I’m guessing that’s from a once a minute cron rule).
Out of interest though - that could fire at any random second depending on rule startup, but then be consistent about it.
For example, in the session you’ve shown us the log snippet from, I’d expect to see that update at xx:xx:56 consistently.
If you check that, it will give added confidence that it’s not a general system clagginess, but purely a rule thing.
(Note that cron rules run in different thread pool to event triggered rules, but that doesn’t mater here)

I’m still convinced by the garbage collection and/or swapping out of memory theories, I wonder which is closest to truth.

1 Like

@rossko57 It took me a little while to track where its coming from, but I can confirm its the Astro binding. It does run this every 60 seconds, though I cant find where in the binding you would change the frequency of the ntp request.

EDIT: I found the “Local Time” thing in PaperUI and changed its frequency to every 6000 seconds, rather than every 60 seconds.

This is whats in my Items file that relates to the event.log entry:

DateTime Current_DateTime "Today [%1$tA, %1$td/%1$tm/%1$tY, %1$tH:%1$tM]" <calendar> (Astro) {channel="ntp:ntp:local:dateTime"}

and //'ing out the entry in the items file, results in:

2020-04-30 08:51:35.138 [vent.ItemStateChangedEvent] - Current_DateTime changed from 2020-04-30T08:50:35.093+0100 to 2020-04-30T08:51:35.109+0100
2020-04-30 08:52:21.053 [temChannelLinkRemovedEvent] - Link 'Current_DateTime => ntp:ntp:local:dateTime' has been removed.

with no more DateTime lookups occurring since.

As far as setup of the Astro binding, my system simply has an install of the binding and a copy paste of the recommended code snippet to use it.

Im happy to finish going through the suggested Java settings (forgot to test it first thing this morning and its too late now, I walked past a PIR sensor which would have updated the RFDataLights variable) though I am on Step 3, the final step of that. Once Ive confirmed does/doesnt that work… I can disable/remove Astro or the DateTime lookup and see if that has any impact.

P.S. I have no rules that interact with the Current_DateTime variable