Java runs amok weekly

Hi Ross, …and others who contributed…time to give up on this I fear and close the topic as ‘unsolved’. I really thought Ross’s suggestion to disable ‘Nightly Z-wave heal’ had worked but today, 8 days later, Java is at it again. Sometime between 1 and 9 am it got out of control. Have examined all logs and nothing particular stands out. So time to pack it in and try another hardware platform or a different system altogether. Maybe I’ll be luckier with Openhab 3… Thanks to all who offered advice and were generous with their time.
regards, Denis

Final effort to solve Java runaway if anyone can help. Caught Java event in the act!
Java unravels around 01.47 am 27-Oct. I shut down system at 02.00 am.
Had installed binding ‘Systeminfo’ on previous day to log fan speed and CPU temperature and set up a rule to notify me if they went beyond certain limits.
Openhab log, Events log and CPU Temperature log attached for runaway period.
I had installed another Binding ‘Network Discovery’ shortly before the event but deleted the ‘things’ discovered and uninstalled it because I decided it wasn’t useful to me. Is this the culprit this time?? It’s all very odd the way the runaways seem to happen in the early hours even though Z-Wave heal is NOT enabled. Also, ‘Network Discover’ was not involved in previous runaways. All advice very welcome. regards, Java runaway-Events log.txt (226.1 KB) Javarunaway starts at 01.45-openhab-log.txt (5.3 KB) Javarunaway-CPU Temperature log.txt (21.1 KB)

Did you restart OH just before the log starts?

2020-10-27 01:45:29.922 [INFO ] [basic.internal.servlet.WebAppServlet] - Stopped Basic UI\
2020-10-27 01:45:38.538 [INFO ] [ui.habmin.internal.servlet.HABminApp] - Started HABmin servlet at /habmin\
2020-10-27 01:45:38.807 [INFO ] [panel.internal.HABPanelDashboardTile] - Started HABPanel at /habpanel\
2020-10-27 01:45:38.922 [INFO ] [openhab.ui.paper.internal.PaperUIApp] - Started Paper UI at /paperui\
2020-10-27 01:45:39.840 [WARN ] [lipse.smarthome.io.net.exec.ExecUtil] - Execution failed (Exit value: -559038737. Caused by java.io.IOException: Cannot run program "arping" (in directory "."): error=2, No such file or directory)\
2020-10-27 01:45:39.908 [WARN ] [lipse.smarthome.io.net.exec.ExecUtil] - Execution failed (Exit value: -559038737. Caused by java.io.IOException: Cannot run program "arping" (in directory "."): error=2, No such file or directory)\
2020-10-27 01:45:42.339 [INFO ] [zwave.handler.ZWaveControllerHandler] - Attempting to add listener when controller is null\
2020-10-27 01:45:47.500 [INFO ] [ve.internal.protocol.ZWaveController] - Starting ZWave controller\
2020-10-27 01:45:47.501 [INFO ] [ve.internal.protocol.ZWaveController] - ZWave timeout is set to 5000ms. Soft reset is false.\
2020-10-27 01:45:49.354 [INFO ] [io.openhabcloud.internal.CloudClient] - Connected to the openHAB Cloud service (UUID = XXXXXXXXXXXXXXXXXXX, base URL = http://localhost:8080)\

These lines indicate that at least a few services re-/starts. Can you include the logs a few more minutes back?

1 Like

Deeply suspicious this is around the time when heals should often happen.

So far as I can see, what you are labelling “java runaway” appears to be a system boot up. We need that preceding info.

I do not like this entry -

2020-10-27 01:45:51.880 [me.event.ThingUpdatedEvent] - Thing 'zwave:serial_zstick:c0ec4738' has been updated.

as before, it suggests you recorded Thing configuration does not quite match the reality discovered.

Hello Anders, no I didn’t restart OH at all at any stage. I have posted the OH log info for earlier (from 22.20 actually) as you requested. Also now posting Events log from earlier till shutdown. I had been experimenting with my ‘fan speed’ rule by ramping up the fan with ‘Mac fans control’ to simulate a runaway during the day with no adverse affects.
Reminder: I’m running OH in its own Desktop folder on MacOSX. Restart done by simply dragging the ‘start.sh’ file into a Terminal window. Shutdown by ‘logout’ in Terminal window. regards and thanks, OpenHab log a bit earlier.txt (2.1 KB) Events log from 0030 until shutdown at 0200.txt (827.1 KB)

Well, let’s be clear …

So, us who cannot see what you are doing are going to assume the apparent reboot at 01:45 is not your doing, but the 02:00 one is your intervention.

Alright, logs show business as usual until 01:45 where we have a major reinitialize. Very similar to a reboot - but maybe it’s just all/many bindings reinitializing.
The first to speak up is zwave - and it is still suspicious that time of day relates to zwave heal activities.
I do not like the stick Thing updating.
But let’s call zwave ‘prime suspect’, not culprit.

There are I think other ways to cause chaos - binding version updates, or failure to to update, etc. What is in your addons.config? People have seen very similar affects with rubbish in there - but usually much more frequently.

In the middle of all thing initializations:

2020-10-27 01:45:39.913 [thome.event.ExtensionEvent] - Extension 'binding-network' has been installed.\

As you suspected, that might be the cause this time, but unfortunately doesn’t help with the issue the other times it has happened for you.

1 Like

Yes absolutely Ross.

Z-Wave heal definitely deactivated using Habmin.

Addons.config folder is empty. I have only 9 bindings installed: Astro, Automower, Expire, Harmony Hub, OpenWeatherMap map, Sysinfo, Tradfri (which I don’t use so far), Weather Binding ver 1.14.0 (which I don’t use and dunno why I installed it), Z-Wave. Also, Map & Scale Transformations, OpenHan cloud com add-on and MapDBpersistence.
Ross can you step me through your suggested process of deleting all my ‘things’ (I’m using paper UI for adding things)…do I delete the Z-Wave stick first or the other things first? Just wary of mucking up…I do have backups done. Thanks,

addons.config is not a folder, it is file. Squirelled away in some other place from the addons folder I think.
addons.config is a working copy that we are not supposed to edit, but if it gets messed up sometimes you have to.
It is supposed to mirror the similarly named addons.cfg file.

Don’t rush around deleting all Things. One step at a time.
Guessing you not yet attempted this as described earlier, focus on the stick Thing.
Note the UID.
Delete the Thing.
Reboot for belt and braces.
Shortly find the rediscovered stick I think? Zwave discovers? If not, manually recreate however you did it before.
Make sure the UID is the same and all its children should reattach.

The point here is to create the Thing under the current binding version, not use a historic Thing shape.

Haha isn’t it always the way that stuff gets squirrelled away :upside_down_face: I probably could hunt it down in unix but I’ll go with your advice and leave it alone for the moment.
I’ll follow your steps and just focus on the Aeotec stick Thing and delete & re-add etc. I’ll then do nothing else to the system and see how it all runs during the week…so that way we’re only changing one part at a time…does that sound like a good plan?
ps. do you suspect anything with that old ‘Weather binding’ ver 1.14.0? I may have installed that solely for Habpanel at some stage when experimenting. thanks again,

Why keep it if not using it.

You might also look into that network binding oddity @pacive noted - that sounds like the work of addons.cfg
Maybe it’s gone away after a reboot after reinstalling, maybe it hasn’t.

Hello Ross, more bad news. I deleted and reconnected the Z-Wave stick as you recommended 4 days ago. Everything fine until early hours of this morning…must be the halloween factor… :alien: My Java/temperature/fan speed rule alerted me at 02.27 am. I shutdown OpenHab at 03.28 am
I attach events log from 01.30 until shutdown at 03.28. You’ll see the CPU temperature creeping up from 02.27 onwards. MacMini doing nothing, just idling quietly. I wasn’t on it. Found Java hogging >220% of CPU!! (using Terminal app utility).
Also attach filtered events log to spot those ‘double astro’ events you mentioned in earlier posts. I can’t see anything drastic in the events log that might explain Java runaway. Also, I checked the Java garbage/thread situation before shutdown in the Karaf terminal using ‘shell:info’ and all seemed relatively normal. I’m really baffled and don’t want to abandon Openhab after all my time investment and designing a nice Habpanel interface too. Hope you have a few more suggestions. regards,Astro double events.txt (3.2 KB) Events log 31Oct.txt (377.0 KB)

Have you tried to find the thread which consume the CPU? E.g. top command in mac doesn’t support CPU usage of threads, but you can use openHAB (Karaf) console to get CPU information as well by command threads --list

openhab> threads --list
Id      │ Name                                                                                           │ State         │ CPU time │ Usr time
────────┼────────────────────────────────────────────────────────────────────────────────────────────────┼───────────────┼──────────┼─────────
1       │ main                                                                                           │ WAITING       │ 3496     │ 1200
2       │ Reference Handler                                                                              │ WAITING       │ 16136    │ 14150
3       │ Finalizer                                                                                      │ WAITING       │ 35762    │ 28700
5       │ Signal Dispatcher                                                                              │ RUNNABLE      │ 0        │ 0
11      │ Active Thread: Equinox Container: ac6166e9-ea21-4362-80bb-4ff7b65ae114                         │ TIMED_WAITING │ 289277   │ 203940
  1. Execute threads --list command

  2. Keep a e.g. minute delay

  3. Execute threads --list command again

  4. Try find the thread which CPU time have increased most.

  5. Then you can get more information about thread by command threads <id>

     openhab> threads 11
     Thread 11 Active Thread: Equinox Container: ac6166e9-ea21-4362-80bb-4ff7b65ae114 TIMED_WAITING
     Stacktrace:
     sun.misc.Unsafe.park line: -2
     java.util.concurrent.locks.LockSupport.parkNanos line: 215
     java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos line: 2083
     java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take line: 1093
     java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take line: 809
     java.util.concurrent.ThreadPoolExecutor.getTask line: 1074
     java.util.concurrent.ThreadPoolExecutor.runWorker line: 1134
     java.util.concurrent.ThreadPoolExecutor$Worker.run line: 624
     java.lang.Thread.run line: 748
    

I would bet that your problem is related to nrjavaserial (gnu.io.RXTXPort).

Hello Pali, thanks for your reply. If I understand you correctly, I need to do all this next time there’s a Java runaway?
Am I to specifically look at the ‘Equinox Container’ thread when that happens? What will we be expecting to see then? I’m not well versed in some of this tech detail, sorry, and thus I don’t understand the connection with this ‘nrjavaserial’ item. What can be done if this ‘nrjavaserial’ is to blame?? regards,

Yes

No, that was just an example. See point 4.

openHAB use nrjavaserial implementation for serial port communication and that have many know issues (also CPU related).

Ok Pali, thank you for the clarification. I’ll follow this next time. If you’re right, then it means I can’t have a stable system because there’s an intrinsic problem with OpenHab & Java. Why don’t many others seem to have this problem? Is it a MacOSX & Java & serial-USB ports issue…? I’m out of my depth now… :slightly_frowning_face:

I have run openHAB also in mac mini several years and nowadays I run it on Intel NUC/Linux. I have met serial port CPU issues in both systems time to time.

But let’s not make any sudden decision before there is any facts from your system.

Sorry if that‘s OT…

I am running my production environment on Virtualbox which is free. Host is a current Mac mini and Debian as Guest OS… Everything works like a charme - even accessing usb-devices connected to the host works without problems… Maybe it might be an option for you too…

Hello Flip, thanks for your interesting idea… I have tried out VirtualBox actually on my MacMini and took “Home Assistant” for a spin…oops…yes went to the dark side for a day… :shushing_face: …whilst the overall system ran fine, Z-Wave was absolutely cr*p…terrible latency and unreliable, so I quickly reverted back to OpenHab… :innocent: … indeed OpenHab’s Z-Wave performance is excellent. So yes I’ll try your idea…What packaged form of OpenHab should I install?.. is there a particular disk image version? regards,

Hi Pali, did you ever manage some workaround or modus operandi when these serial port issues arose?