Rule execution lags, possibly Java GC

Yes, that’s clear enough.

It is a curiosity, but I think only a coincidence, that your Current_DateTime variable fires up in the same second. (I’m guessing that’s from a once a minute cron rule).
Out of interest though - that could fire at any random second depending on rule startup, but then be consistent about it.
For example, in the session you’ve shown us the log snippet from, I’d expect to see that update at xx:xx:56 consistently.
If you check that, it will give added confidence that it’s not a general system clagginess, but purely a rule thing.
(Note that cron rules run in different thread pool to event triggered rules, but that doesn’t mater here)

I’m still convinced by the garbage collection and/or swapping out of memory theories, I wonder which is closest to truth.

1 Like

@rossko57 It took me a little while to track where its coming from, but I can confirm its the Astro binding. It does run this every 60 seconds, though I cant find where in the binding you would change the frequency of the ntp request.

EDIT: I found the “Local Time” thing in PaperUI and changed its frequency to every 6000 seconds, rather than every 60 seconds.

This is whats in my Items file that relates to the event.log entry:

DateTime Current_DateTime "Today [%1$tA, %1$td/%1$tm/%1$tY, %1$tH:%1$tM]" <calendar> (Astro) {channel="ntp:ntp:local:dateTime"}

and //'ing out the entry in the items file, results in:

2020-04-30 08:51:35.138 [vent.ItemStateChangedEvent] - Current_DateTime changed from 2020-04-30T08:50:35.093+0100 to 2020-04-30T08:51:35.109+0100
2020-04-30 08:52:21.053 [temChannelLinkRemovedEvent] - Link 'Current_DateTime => ntp:ntp:local:dateTime' has been removed.

with no more DateTime lookups occurring since.

As far as setup of the Astro binding, my system simply has an install of the binding and a copy paste of the recommended code snippet to use it.

Im happy to finish going through the suggested Java settings (forgot to test it first thing this morning and its too late now, I walked past a PIR sensor which would have updated the RFDataLights variable) though I am on Step 3, the final step of that. Once Ive confirmed does/doesnt that work… I can disable/remove Astro or the DateTime lookup and see if that has any impact.

P.S. I have no rules that interact with the Current_DateTime variable

Okay, no rule involved with Current_DateTime at all then. I was not suggesting this was in any way contributing to the problem, only wondering what diagnostic info we could glean from it. Do away with it or not, doesn’t matter.

1 Like

@mstormi @rossko57

After moving through the Java settings, I noticed some difference I think with option 2. Just to reiterate the issue, its slow response to rules after the rules havnt been processed for a period of hours, though direct on/off mqtt commands that arent rule driven, run perfectly fine.

So my initial experiences pre any changes was about a 22 second from button press to activation. After moving to setting 2 (add option 1 and option 2 to the EXTRA_JAVA_OPTS)… My response time seems to have dropped to 13 seconds (when the issue occurs) and has remained that way over testing, during the last 2 days. Ive not experienced a longer delay. (please note, I am using -XX:+UseParNewGC and not -XX:+UseParallelGC)

Options for EXTRA_JAVA_OPTS

  1. -XX:-UseAdaptiveSizePolicy -XX:+UseParallelGC -XX:InitiatingHeapOccupancyPercent=95
  2. -XX:MinHeapFreeRatio=15 -XX:MaxHeapFreeRatio=30
  3. -XX:+UseCodeCacheFlushing

Example from this morning with a 13 second delay

2020-05-02 10:14:21.201 [vent.ItemStateChangedEvent] - RFDataLights changed from NULLFORTESTING to 811301
2020-05-02 10:14:33.634 [ome.event.ItemCommandEvent] - Item 'gUpstairs' received command ON
2020-05-02 10:14:33.677 [ome.event.ItemCommandEvent] - Item 'lsDesk' received command ON
2020-05-02 10:14:33.716 [ome.event.ItemCommandEvent] - Item 'lsWillow' received command ON
2020-05-02 10:14:33.718 [vent.ItemStateChangedEvent] - RFDataLights changed from 811301 to NULLFORTESTING
2020-05-02 10:14:33.720 [nt.ItemStatePredictedEvent] - lsDesk predicted to become ON
2020-05-02 10:14:33.757 [nt.ItemStatePredictedEvent] - lsWillow predicted to become ON
2020-05-02 10:14:33.774 [vent.ItemStateChangedEvent] - lsDesk changed from OFF to ON
2020-05-02 10:14:33.787 [vent.ItemStateChangedEvent] - lsWillow changed from OFF to ON

My system has 2462mb ram free, or 72% of memory free, so it shouldnt be a memory issue.

As mentioned, once the rule has been used, things remain responsive for the next X hours, dropping down to a 1-2 second response time.

Also, on one of my tests where I experienced a delay, I had literally been on a remote desktop session on the server 1 minute before, though not interacting with Openhab… I only mention this, so as to point out its not like the system was idling in a low power state or something (which I dont think it drops into anyway).

So I guess its something else.

As an aside, I have a Raspberry Pi 3 A+ turning up for another project im working on. I can for a couple of days, setup Openhabian on it, take all my rules etc over and see if I experience the same issues… so we will have a comparative to see if the problem follows with the hardware/OS.

Happy to look at any other suggestions you guys have. Again, thanks for looking into this with me.

1 Like

Hmm, so none of the options made your problem go away, right?
Less delay but still delay still means it’s garbaged-out.

Albeit I don’t think it would make a difference, any reason not to try -XX:+UseParallelGC ?

Did you change -Xms meanwhile ? Ran this in addition to those new -XX ?
Which -Xms and -Xmx do you actually use ?

BTW how did you determine the delay? Differences in events.log only?
Can you insert specific logging at the beginning of your rule and correlate that with the button press in event.log, please.

Could you please add -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:<file>, see

Finally, could you please just for a test try two extreme ones:
-Xcomp and -XX:+NeverTenure. You can start with both but if it makes a difference I need to know their individual effect

1 Like

@mstormi

1) Albeit I don’t think it would make a difference, any reason not to try -XX:+UseParallelGC ?
You may have missed my post above (linked below). The -XX:+UseParallelGC option caused my Openhab to never actually start. My testing/conclusion is in this post.
https://community.openhab.org/t/8-seconds-lag-always-after-few-minutes-without-script-activation/93882/47?u=ewrw

2) Did you change -Xms meanwhile ? Ran this in addition to those new -XX ?
Which -Xms and -Xmx do you actually use ?

Im not using anything else, so my EXTRA_JAVA_OPTS looks like this currently

EXTRA_JAVA_OPTS="-XX:-UseAdaptiveSizePolicy -XX:+UseParNewGC -XX:InitiatingHeapOccupancyPercent=95 -XX:MinHeapFreeRatio=15 -XX:MaxHeapFreeRatio=30 -XX:+UseCodeCacheFlushing"

3) BTW how did you determine the delay? Differences in events.log only?
Can you insert specific logging at the beginning of your rule and correlate that with the button press in event.log, please.
On my 1st post on here, I got the 22 second delay info from MQTT Browser showing the press of the button and then the MQTT response:
https://community.openhab.org/t/8-seconds-lag-always-after-few-minutes-without-script-activation/93882/29?u=ewrw

On another post, part way through the Java options, this is an event log, showing a 13 second delay, with RFDataLights changed from NULLFORTESTING to 3D1301 being the button press.

2020-04-29 20:10:43.743 [vent.ItemStateChangedEvent] - RFDataLights changed from NULLFORTESTING to 3D1301
2020-04-29 20:10:56.670 [vent.ItemStateChangedEvent] - Current_DateTime changed from 2020-04-29T20:09:56.642+0100 to 2020-04-29T20:10:56.649+0100
2020-04-29 20:10:56.913 [ome.event.ItemCommandEvent] - Item 'gBedroom1' received command ON
2020-04-29 20:10:56.940 [ome.event.ItemCommandEvent] - Item 'lsBed1' received command ON
2020-04-29 20:10:56.954 [ome.event.ItemCommandEvent] - Item 'lsBed1a' received command ON
2020-04-29 20:10:56.961 [vent.ItemStateChangedEvent] - RFDataLights changed from 3D1301 to NULLFORTESTING
2020-04-29 20:10:56.964 [nt.ItemStatePredictedEvent] - lsBed1 predicted to become ON
2020-04-29 20:10:56.977 [nt.ItemStatePredictedEvent] - lsBed1a predicted to become ON
2020-04-29 20:10:56.992 [vent.ItemStateChangedEvent] - lsBed1 changed from OFF to ON
2020-04-29 20:10:56.994 [vent.ItemStateChangedEvent] - lsBed1a changed from OFF to ON

On my post 2 posts up, I have another sample from the event log, with a similar result:

2020-05-02 10:14:21.201 [vent.ItemStateChangedEvent] - RFDataLights changed from NULLFORTESTING to 811301
2020-05-02 10:14:33.634 [ome.event.ItemCommandEvent] - Item 'gUpstairs' received command ON
2020-05-02 10:14:33.677 [ome.event.ItemCommandEvent] - Item 'lsDesk' received command ON
2020-05-02 10:14:33.716 [ome.event.ItemCommandEvent] - Item 'lsWillow' received command ON
2020-05-02 10:14:33.718 [vent.ItemStateChangedEvent] - RFDataLights changed from 811301 to NULLFORTESTING
2020-05-02 10:14:33.720 [nt.ItemStatePredictedEvent] - lsDesk predicted to become ON
2020-05-02 10:14:33.757 [nt.ItemStatePredictedEvent] - lsWillow predicted to become ON
2020-05-02 10:14:33.774 [vent.ItemStateChangedEvent] - lsDesk changed from OFF to ON
2020-05-02 10:14:33.787 [vent.ItemStateChangedEvent] - lsWillow changed from OFF to ON

The rule I have, changes RFDataLights, back to “NULLFORTESTING” so that I always know that variable is changed to a new variable and therefore logged each time its used by a device.

Im assuming you want me to ADD these new options onto the Java EXTRA_JAVA_OPTS that I have… I will read the document you sent, get that set up… I will get you a recording with and without the extreme options… Give me a few more days!!

Thanks :slight_smile:

@mstormi Im still working my way through these settings, though, I started the Java loging BEFORE I added in your extreme settings and I recorded a test until the issue occurred. Thankfully the logs are pretty small (maybe 100 lines long). I thought id at least give you this to check over initially as it may give you what youre looking for. There is a zip file that can be downloaded here for the next 10 days or so (its only 22kb in size and contains only 3 x text files). (and yes… Ive remove any identifiable information e.g. GPS coordinates, openhab cloud details etc).

Please note, in this log, I have NOT added any extra memory settings to Java startup (Xmx and Xms)

Ive added a little bit of extra info at the top of each file to show where to look for the button press occurring.

Thanks

A post was merged into an existing topic: Rules to reload on thing changes

@mstormi Sorry its take a while for me to get through this, things became busy for me for a few days.

JAVA bits
I have results though and I can say that the 1 line that appeared to have worked:

-XX:+NeverTenure -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:-UseAdaptiveSizePolicy -XX:+UseCodeCacheFlushing -XX:+UseParNewGC

That seemed to stay solid without delays occurring, though, it probably requires a longer test run. Im uploading my 4 separate test result information for you to look at, though, one thing I cant explain is that some of the Java logs have lots of NULL NULL NULL or whitespace in them… perhaps this was a memory issue. So maybe they arent going to be useful without another run through of the tests.

File is here… should be a 32kb ZIP file “OpenhabTests.zip”…available for 30 days from today.

I tried cutting the line down to the -XX:+NeverTenure on its own… but that resulted in delays…so Im going to try this

EXTRA_JAVA_OPTS="-XX:+NeverTenure -XX:-UseAdaptiveSizePolicy -XX:+UseCodeCacheFlushing -XX:+UseParNewGC -Xms500m -Xmx800m"

(adding in the XMX and XMS)

Opehhabian - Why it failed on a generic ARM/Linux install
This is a side note… I had a separate arm system running Armbian (Ubuntu for ARM) and I just re-tried to load Openhabian on it to see what happened. I found out why Openhabian, in its current form doesnt work on Generic ARM systems like this.

On those systems, you have a BOOT volume, which contains files like “aml_autoscript” “s905_autoscript” etc…

In the Opehabian installation menus, menu item 13 “System Tweaks”… it deletes/changes some files on that Boot volume… and next time you reboot, your system doesnt boot because some of the files are now missing. Put a copy of your files from that Boot volume back (if you have a copy, which I did) and it all boots again and Openhabian install can be completed, as long as you dont run that option again.

As I bothered to test this, I thought it worth putting somewhere on the forums… or noting it to people.

Thanks

Thank you for your testing efforts … not sure yet what to make of it, though. Will try on my own box in the first place.
However, on your (intermediate?) result, I get a warning on my RPi that I do not get when I use -XX:+UseParallelGC instead of -XX:+UseParNewGC:
OpenJDK Client VM warning: Using the ParNew young collector with the Serial old collector is deprecated and will likely be removed in a future release

So that probably is not a recommendable combination.
Please keep me updated.

On Armbian, … well. We decided not to support openHABian on top of Armbian some time ago already for a number of reasons (one being that we do not have the dev and test capacities) and I wouldn’t think we’ll revert that.
Then again, it’s also not our intention to prohibit install on specific OS or HW so I wouldn’t mind though to eventually make a change that does not delete files you would need to have present in ARMbian.
Could you please be more specific what’s missing and after what operation it went missing?
Menu option 13 actually calls the routine misc_system_settings() in /opt/openhabian/functions/system.bash but at first glance it does not delete anything there so I’m not sure you’re right on that part.

@mstormi Apologies… you are correct, its not that option… I had to flash a new system to run through the install again. I used Khadas Linux this time, as I had a system I could quickly restore to factory (Ubuntu on ARM again nevertheless).

Its the “Packages” installation option… and whilst it flashed by pretty quickly, I would guess its one of the first 8-12 changes/items it installs that does it. Heres the Before and After of the Boot volume. As you can see, quite a few files go missing… though as mentioned, if you copy the files back to the Boot volume, it will boot again without issue.

Sorry for getting the wrong menu item (I was recalling from my test yesterday and thought I had noticed the correct one).

Thanks

P.S. And yes… if you DO restore your boot files back… from what I can tell, the rest of the Openhabian install is perfect and everything works fine… so it is possible to get Openhabian installed, with the one caveat to backup your boot files before you start and restore them after the packages bit.

@mstormi

If you can tell me where to find the packages script… and if its possible to pick out line by line what its doing/installing… and there arent 1000 packages… I can run through it line by line (copy/paste) and tell you which actual line is the one that does it.

Thanks

/opt/openhabian/functions/packages.bash, yeah sorry it’s a bit lengthy so cumbersome to work through. Thanks for taking on that.

set mode=debug_maximum in /etc/openhabian.conf to get verbose output.

@mstormi Ive found it!! Arping was the install package and Im guessing its the line Removing linux-board-package-bionic-vim3 (0.8.1) … as the files were deleted mid way through arping being installed.

apt-get -y install arping

Reading package lists… Done
Building dependency tree
Reading state information… Done
The following additional packages will be installed:
libnet1 libpcap0.8
The following packages will be REMOVED:
iputils-arping linux-board-package-bionic-vim3
The following NEW packages will be installed:
arping libnet1 libpcap0.8
0 upgraded, 3 newly installed, 2 to remove and 206 not upgraded.
Need to get 170 kB of archives.
After this operation, 439 kB of additional disk space will be used.
Get:1 http://ports.ubuntu.com bionic-security/main arm64 libpcap0.8 arm64 1.8.1-6ubuntu1.18.04.1 [104 kB]
Get:2 http://ports.ubuntu.com bionic/main arm64 libnet1 arm64 1.1.6+dfsg-3.1 [39.4 kB]
Get:3 http://ports.ubuntu.com bionic/universe arm64 arping arm64 2.19-4 [26.9 kB]
Fetched 170 kB in 0s (1,972 kB/s)
(Reading database … 110836 files and directories currently installed.)
Removing linux-board-package-bionic-vim3 (0.8.1) …
Removing iputils-arping (3:20161105-1ubuntu3) …
Selecting previously unselected package libpcap0.8:arm64.
(Reading database … 110687 files and directories currently installed.)
Preparing to unpack …/libpcap0.8_1.8.1-6ubuntu1.18.04.1_arm64.deb …
Unpacking libpcap0.8:arm64 (1.8.1-6ubuntu1.18.04.1) …
Selecting previously unselected package libnet1:arm64.
Preparing to unpack …/libnet1_1.1.6+dfsg-3.1_arm64.deb …
Unpacking libnet1:arm64 (1.1.6+dfsg-3.1) …
Selecting previously unselected package arping.
Preparing to unpack …/arping_2.19-4_arm64.deb …
Unpacking arping (2.19-4) …
Setting up libnet1:arm64 (1.1.6+dfsg-3.1) …
Setting up libpcap0.8:arm64 (1.8.1-6ubuntu1.18.04.1) …
Setting up arping (2.19-4) …
Processing triggers for libc-bin (2.27-3ubuntu1) …
Processing triggers for systemd (237-3ubuntu10.33) …
Processing triggers for man-db (2.8.3-2ubuntu0.1) …
Processing triggers for initramfs-tools (0.130ubuntu3.9) …
ln: failed to create hard link ‘/boot/initrd.img-4.9.206.dpkg-bak’ => ‘/boot/initrd.img-4.9.206’: Operation not permitted
update-initramfs: Generating /boot/initrd.img-4.9.206

Glad you found it. But it makes me wonder why it UNinstalls that, and more important, how to work around that.
Package dependencies are defined in packages themselves, i.e. depend on distribution (you are using what exactly?).
Just tested on my box and on Raspbian (buster at least), removing arping will NOT remove that package like on your box.

@Benjy do you know of a sleek way to avoid this uninstall to happen ? preferrably a command to work with any distro ?
I would like to avoid adding “if is_armbian()” exceptions in openHABian while we’re busy streamlining that.

@ewrw could you try apt-get install arping linux-board-package-bionic-vim3- ?
Note the ‘-’ at the end, eventually try a ‘+’ instead.

The Systems/OS’s
These are Amlogic based ARM chipset systems that have those boot files and the OS’s I have tried are Ubuntu based. Ive searched the internet for known issues, but havnt found anything after 5 minutes searching.

I have tried Khadas VIM OS https://www.khadas.com/ (they are a board manufacturer who makes their own Ubuntu variant for their boards dl.khadas.com - Index of /firmware/vim3/)

Ive also used Armbian OS, which my Opehab is running atm. And I know from previous experience, that the effect is the same with the Packages install… though the board package file name will be different… I can test/find the name on that install if you want? (I just need to prep it for a recovery setup, just in case, as its on internal storage).

Test Results
The - (minus) will install and DOES delete the files and the + (plus) errors and wont complete.

apt-get install arping linux-board-package-bionic-vim3-

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  libnet1 libpcap0.8
The following packages will be REMOVED:
  iputils-arping linux-board-package-bionic-vim3
The following NEW packages will be installed:
  arping libnet1 libpcap0.8
0 upgraded, 3 newly installed, 2 to remove and 0 not upgraded.
Need to get 170 kB of archives.
After this operation, 439 kB of additional disk space will be used.
Do you want to continue? [Y/n] ***I said YES and it did delete the files***

apt-get install arping linux-board-package-bionic-vim3+

Reading package lists... Done
Building dependency tree       
Reading state information... Done
linux-board-package-bionic-vim3 is already the newest version (0.8.1).
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 linux-board-package-bionic-vim3 : Depends: iputils-arping but it is not going to be installed
E: Unable to correct problems, you have held broken packages.

@mstormi I did just have a quick check…from the files that are deleted by the ARPING installation, boot.ini and boot.scr exist on both an SD card or NAND/EMMC installations of Ubuntu.

By this, I mean that if you have Ubuntu for Amlogic systems running off an SD card, you get files like aml_autoscript, which is removed from the boot area if you then install your Ubuntu OS to the onboard NAND/EMMC internal storage. But, the boot.ini and boot.scr are present in BOTH situations (SD card or EMMC/NAND installed OS)… and of course, if you then install ARPING, these files are deleted, be it on SD card or EMMC/NAND.

Obviously that could change on day I guess… but you could test for the presence of those files in /boot/ (no idea if those files exist on RasPi) (I thought this a bit more broad as a test than “is it Armbian” etc as it may target ALL Amlogic based Ubuntu, possibly other Linux variant systems).

I cant say what other linux variants (debian etc) may have this issue… and I cant speak for other chipsets like Rockchip, Allwinner etc… though I know other chipsets use a slightly different boot setup… but may still contain the boot.ini and boot.scr in the /boot/ area

Its not a problem for me, but Im happy to help you guys test and maybe understand what can/cant be done :slight_smile: Im very appreciative to have Openhab and the work/help you guys have given.

I wouldn’t think it is intentional or even required to remove that when you install arping.
Could you raise that as a bug with your OS ?
BTW what exactly is your OS right now you have this problem on ?

So an openHABian install on ARMbian also uninstalls required /boot files ?
Then yes please.

BTW check out Amanda inside (available in openhabian-config) for backup (incl. making a recovery SD copy).

@mstormi Ive just performed a quick test on my live Armbian system… It didnt delete the files, however, I think this is because I need to do a fresh install of the OS.

What I found on my test system (the Khadas one), is that once you have run the ARPING install once and then restored the boot area… if you uninstall and re-install ARPING… it doesnt perform the same action where it deletes the boot area. Its obviously satisfied it did whatever it needed to do on the 1st installation, so 2nd installation it doesnt bother.

My Armbian install is ??? maybe 18-24 months old and I think I did have to refresh the boot area at some point in that time… so it may be that the ARPING install thinks its done, whatever it needed to do to the boot files and isnt trying again.

If you can give me a couple of days with this, as I will have an spare system that I can wipe/trash/destroy/rebuild that does work with Armbian (my Khadas system wont boot Armbian).

Oh… Yes… my 1st OS I had the problem with was Armbian OS on a Beelink TV box…Though I just recently got my hands on the Khadas system and I thought off the back of our previous posts (Java and Openhabian specific customization), Id just for the fun of it, try Openhabian install to see what happens on Khadas-Ubuntu OS (does it work or not).

Not one for this situation, no. The package linux-board-package-bionic-vim3 (maintained by Khadas) requires iputils-arping, but installing arping conflicts with iputils-arping (they’re two different implementations that can’t co-exist) so apt must remove iputils-arping and anything that depends on it before it can install the package it’s been asked to install.