OH3 Spontaneously Restarts

I am running 2 OH3.0.2 systems, 1 for my home security system and the other at my daughter’s house to run the pool/spa system. Both systems utilize the openhabcloud service. I notice on occasion that both systems restart spontaneously within minutes of one another. For example, today 1 at 5:38pm and the other at 5:51pm. Does anyone know why this is happening? Is something being pushed down from the cloud that causes a restart?

Can you show us your logfiles?

I am currently experiencing similar issues. I don’t know how long this has been going on, I was suspecting something like this happening for quite a while because I sometimes heard the switching of my relays used for my rollershutters at times when they really should not do anything (shutter down and relay activates the down switch).

Just recently I added a notification that sends me a message whenever the system has restarted and I found out that the system really restarts from time to time. However I could not find anything suspicious in the logs yet.

It seems however to occur more often in times when my server is at high load due to e.g. backup activity but I can’t really put my finget on it yet. Any hints how to debug this behaviour are really appreciated.

Oh and I am on 3.3.0 release build in docker.

You’re going to have to dig more to establish what goes on. If, say, the hardware ‘resets’ due to power blip, of course openHAB restarts too but no amount of poking at openhab.log will tell you why. Likewise if something like a backup causes some file system to timeout, etc.

We can rule out a power surge as 1 of my systems is running behind an UPS.

Please remember that my 2 totally independent systems restart within minutes of one another. This smells like it is somehow cloud related.

BTW - Both systems send me a notification and email when they restart. That’s how i know of this problem. I can probably check email history to report when the spontaneous restarts have occurred.

I can also rule out a power outage. The docker container and the whole machine keeps running, only OH seems to be restarting. This is annoying since this causes connection issues with serial devices as the lock files do not belong to the actual running OH instance.

I really suspect it might have something to do with my backup system… It is sort of reproducible that these reboots occur when I trigger the backup process and the backup is indexing the file system…

That would not explain why i experience it on 2 independent systems miles away from one another.

True, it seems that out issues are probably not related but we only see the same symptoms. But it might be that I am wrong… I can’t really proof yet that there is a connection between my backup and the restarts. I might just disable the backup for a week and see if it happens again.

Well I guess I would need to figure out how to configure log rotation for the openhab.log. Only my events.log does rotate so far… must be some change in the log4j config but I haven’t figured out what to change there yet.

should be default to keep 7 files of history:

               <!-- Rolling file appender -->
                <RollingFile fileName="${sys:openhab.logdir}/openhab.log" filePattern="${sys:openhab.logdir}/openhab.log.%i.gz" name="LOGFILE">
                        <PatternLayout pattern="%d{yyyy-MM-dd HH:mm:ss.SSS} [%-5.5p] [%-36.36c] - %m%n"/>
                        <Policies>
                                <OnStartupTriggeringPolicy/>
                                <SizeBasedTriggeringPolicy size="16 MB"/>
                        </Policies>
                        <DefaultRolloverStrategy max="7"/>
                </RollingFile>

                <!-- Event log appender -->
                <RollingRandomAccessFile fileName="${sys:openhab.logdir}/events.log" filePattern="${sys:openhab.logdir}/events.log.%i.gz" name="EVENT">
                        <PatternLayout pattern="%d{yyyy-MM-dd HH:mm:ss.SSS} [%-5.5p] [%-36.36c] - %m%n"/>
                        <Policies>
                                <OnStartupTriggeringPolicy/>
                                <SizeBasedTriggeringPolicy size="16 MB"/>
                        </Policies>
                        <DefaultRolloverStrategy max="7"/>
                </RollingRandomAccessFile>
ls -lt *log*
-rw-r--r-- 1 openhab openhab 10007563 Sep 15 08:06 events.log
-rw-r--r-- 1 openhab openhab   473986 Sep 15 08:00 openhab.log
-rw-r--r-- 1 openhab openhab  1780861 Sep  6 08:30 events.log.7.gz
-rw-r--r-- 1 openhab openhab  1791276 Aug 22 08:45 events.log.6.gz
-rw-r--r-- 1 openhab openhab   904107 Aug  6 22:20 events.log.5.gz
-rw-r--r-- 1 openhab openhab    21309 Aug  6 22:20 openhab.log.7.gz
-rw-r--r-- 1 openhab openhab  1738729 Jul 26 12:47 events.log.4.gz
-rw-r--r-- 1 openhab openhab     5174 Jul  8 23:55 events.log.3.gz
-rw-r--r-- 1 openhab openhab     2660 Jul  8 23:55 openhab.log.6.gz
-rw-r--r-- 1 openhab openhab   200603 Jul  8 23:45 events.log.2.gz
-rw-r--r-- 1 openhab openhab    15345 Jul  8 23:45 openhab.log.5.gz
-rw-r--r-- 1 openhab openhab  1724345 Jul  7 05:59 events.log.1.gz
-rw-r--r-- 1 openhab openhab    24574 Jun 20 08:12 openhab.log.4.gz
-rw-r--r-- 1 openhab openhab    50749 May 19 17:59 openhab.log.3.gz
-rw-r--r-- 1 openhab openhab     4726 Feb 28  2022 openhab.log.2.gz
-rw-r--r-- 1 openhab openhab      343 Feb 25  2022 openhab.log.1.gz

Yes I hab a look at my log4j config yesterday and came to the same conclusion. However it does not work for the openhab.log so there must be a difference. I cannot remember to have ever changed something within the file myself though. I’m not at home right now but will compare my config to the one in the official repo later.

Ok I don’t know what I was looking at… the logs are there… must be some sort of long covid symptoms or just my age…

However my restarts also seem to be dependent on the moon phase or something. Hadn’t had a single reboot today

I had another restart today but there’s absolutely nothing in the logs. It’s just running normally and then the log just ends, no signs of a shutdown, no exceptions or anything. In the new logging a see a normal startup sequence.

When the JVM crashes you’ll find a hs_err_pid log file in your userdata dir which may help with explaining restarts.

1 Like

Thank you, I found the core dump.

Apparently it has something to do with the enOcean binding:

---------------  T H R E A D  ---------------

Current thread (0x00007f1018061800):  JavaThread "OH-thingHandler-2" daemon [_thread_in_native, id=609, stack(0x00007f0fc554a000,0x00007f0fc564b000)]

Stack: [0x00007f0fc554a000,0x00007f0fc564b000],  sp=0x00007f0fc56493a0,  free space=1020k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libNRJavaSerial.so+0x7723]  read_byte_array+0x1f3

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J 24000  gnu.io.RXTXPort.readArray([BII)I (0 bytes) @ 0x00007f106ccad90c [0x00007f106ccad8c0+0x000000000000004c]
J 30142 c2 gnu.io.RXTXPort$SerialInputStream.read([BII)I (230 bytes) @ 0x00007f106cbd7a04 [0x00007f106cbd78a0+0x0000000000000164]
J 35652% c1 org.openhab.binding.enocean.internal.transceiver.EnOceanTransceiver.receivePackets()V (45 bytes) @ 0x00007f10684ab524 [0x00007f10684ab120+0x0000000000000404]
j  org.openhab.binding.enocean.internal.transceiver.EnOceanTransceiver$1.run()V+4J 4524 c1 java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object; java.base@11.0.15 (14 bytes) @ 0x00007f10657cd1cc [0x00007f10657cd0c0+0x000000000000010c]
J 4742 c1 java.util.concurrent.FutureTask.run()V java.base@11.0.15 (123 bytes) @ 0x00007f1065865f4c [0x00007f10658658a0+0x00000000000006ac]
J 6984 c1 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V java.base@11.0.15 (57 bytes) @ 0x00007f1065d81c94 [0x00007f1065d81b80+0x0000000000000114]
j  java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+92 java.base@11.0.15
j  java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 java.base@11.0.15
J 8909 c1 java.lang.Thread.run()V java.base@11.0.15 (17 bytes) @ 0x00007f1066137eb4 [0x00007f1066137e40+0x0000000000000074]
v  ~StubRoutines::call_stub

It looks similar to nrjavaserial#136 but that is supposed to be fixed already. If you keep running into the same issue it would be worth to create a new issue in the nrjavaserial issue tracker for this.

I haven’t run into this myself with my serial devices using the OH 3.3 Docker images. Though I only actively use the 64-bit (amd64/aarch) Debian images. It could be a different experience when using the 32-bit ARM (armhf) image or the Alpine images. :upside_down_face:

I also use the Debian 64 bit images. If this happens again I will post another issue there. Thanks for your input!

Hi,

same or me, I have Openhab service restart form time to time without any explanation. Other services keep running smooth (essentially zigbee2mqtt) and the load is very low.
I don’t know where to search, I have no clue
grep ‘Started openHAB’ /var/log/syslog
Sep 11 18:19:50 openhabian systemd[1]: Started openHAB - empowering the smart home.
Sep 19 12:03:41 openhabian systemd[1]: Started openHAB - empowering the smart home.
Sep 19 13:06:06 openhabian systemd[1]: Started openHAB - empowering the smart home.
Sep 19 13:56:03 openhabian systemd[1]: Started openHAB - empowering the smart home.

Any help appreciated.
Dom

is there any crash log of the java virtual machine ?
Search/look for a file called hs_err_pid* in userdata directory as explained by wborn.
Depending on how you installed OH and architecture of your OS the userdata directory could have slightly different locations. Debian apt installations use /var/lib/openhab/ as userdata directory.