Weird issue with rotated logfiles disappearing

I have had a long-standing issue with my zwave devices reporting 100% battery until they die, but I will save that for another post. The motion sensor in my porch died sometime yesterday even though it reported 100% battery earlier in the day. So I thought I had the perfect opportunity to find out the contents of that last message from my logfiles, which also contain zwave debugging messages.

Unfortunately when I checked userdata/logs/openhab.log it only contains messages from 2018-02-26 05:14:48.562 to now, and earlier logfiles (.1, .2, .3) are missing:

/opt/openhab$ ls -l userdata/logs
total 152684
-rw-r--r-- 1 openhab openhab      856 Feb 25 10:51 audit.log
-rw-r--r-- 1 openhab openhab  2446310 Feb 26 10:11 events.log
-rw-r--r-- 1 openhab openhab 10485781 Aug  9  2017 events.log.4
-rw-r--r-- 1 openhab openhab 10485797 Jul  7  2017 events.log.5
-rw-r--r-- 1 openhab openhab 10485782 Jun  9  2017 events.log.6
-rw-r--r-- 1 openhab openhab 16777336 Feb 21 16:40 events.log.7
-rw-r--r-- 1 openhab openhab 10485917 Apr 12  2017 events.log.8
-rw-r--r-- 1 openhab openhab 10485790 Mar 15  2017 events.log.9
-rw-r--r-- 1 openhab openhab 15388313 Feb 26 10:11 openhab.log
-rw-r--r-- 1 openhab openhab 10485916 Dec 18 20:16 openhab.log.4
-rw-r--r-- 1 openhab openhab 10504620 Dec 18 18:12 openhab.log.5
-rw-r--r-- 1 openhab openhab 10485880 Dec 18 16:16 openhab.log.6
-rw-r--r-- 1 openhab openhab 16777304 Feb 26 05:14 openhab.log.7
-rw-r--r-- 1 openhab openhab 10485866 Dec 18 10:44 openhab.log.8
-rw-r--r-- 1 openhab openhab 10485770 Dec 18 07:45 openhab.log.9

Even stranger, openhab.log.7 does exist, and seems to be the one immediately before the current openhab.log, as it contains messages from 2018-02-25 23:07:21.914 to 2018-02-26 05:14:48.562. Unfortunately I needed messages from earlier on 25th.
I run openhab in a docker container:

$ docker ps | grep openhab
41c198235d6a        openhab/openhab:2.2.0-amd64-debian                       "/entrypoint.sh go..."   3 weeks ago         Up 16 hours                                                                                                                                                                                                                              openhab_openhab_1

(I rebooted the server yesterday).

It’s a fairly beefy physical server at my house with 16 vCPUs, 24GB RAM, ~300GB free space on that partition etc.

If you are on a SBC running off of an SD card, this is a classic symptom of a failing SD card.

Since that isn’t the case here I can’t say what the problem is. I’ve not seen reported nor have I experienced this sort of problem running off of SSDs or HDDs.

Me too.

/home/openhab/openhab2/userdata/logs
portland2% ls -l openhab.log*
-rw-r--r-- 1 openhab openhab  7334574 Feb 26 20:05 openhab.log
-rw-r--r-- 1 openhab openhab 16777233 Feb 26 19:53 openhab.log.7
portland2% 

Where’s the rest of them?

One thing I do notice is that it is pretty unusual to fill up openhab.log and have a rotate in one day. Do you have a lot of errors?

What is the time gap between the last entry in openhab.log.7 and the first entry in openhab.log?

Do either of you have any other log rotate scripts that might be impacting this folder running?

I have zwave debug logging enabled, and I’ve got ~40 active zwave devices.

I wiped out all of the logs yesterday and interestingly it already looks like it’s gone wrong. Notice the timestamps of the .1, .2 and .3 files are from yesterday and the rest are from today. So it’s possible that something is causing it to skip .1, .2 and .3, and then maybe logrotate eventually deletes them because they’re old.

$ ls -l /opt/openhab/userdata/logs/
total 120320
-rw-r--r-- 1 openhab openhab      856 Feb 25 10:51 audit.log
-rw-r--r-- 1 openhab openhab  3339256 Feb 27 20:13 events.log
-rw-r--r-- 1 openhab openhab  2360181 Feb 27 20:14 openhab.log
-rw-r--r-- 1 openhab openhab 16777333 Feb 26 20:06 openhab.log.1
-rw-r--r-- 1 openhab openhab 16777344 Feb 26 20:36 openhab.log.2
-rw-r--r-- 1 openhab openhab 16777280 Feb 26 22:49 openhab.log.3
-rw-r--r-- 1 openhab openhab 16777377 Feb 27 02:37 openhab.log.4
-rw-r--r-- 1 openhab openhab 16777248 Feb 27 08:35 openhab.log.5
-rw-r--r-- 1 openhab openhab 16777355 Feb 27 14:05 openhab.log.6
-rw-r--r-- 1 openhab openhab 16777286 Feb 27 19:32 openhab.log.7
for a in openhab.log*; do echo $a; head -1 $a | cut -d ' ' -f 1-2; tail -1 $a | cut -d ' ' -f 1-2; done
openhab.log
2018-02-27 19:32:59.230
2018-02-27 20:29:16.753
openhab.log.1
2018-02-26 20:06:05.395
2018-02-26 20:06:20.520
openhab.log.2
2018-02-26 20:06:20.520
2018-02-26 20:36:27.285
openhab.log.3
2018-02-26 20:36:27.285
2018-02-26 22:49:54.917
openhab.log.4
2018-02-26 22:49:55.917
2018-02-27 02:37:23.990
openhab.log.5
2018-02-27 02:37:23.991
2018-02-27 08:35:37.696
openhab.log.6
2018-02-27 08:35:37.696
2018-02-27 14:05:28.930
openhab.log.7
2018-02-27 14:05:28.930
2018-02-27 19:32:59.230

I noticed a huge number of ZWaveAssociationGroup messages in the files, this happened as I restarted it yesterday (I also pulled the latest openhab/openhab:2.2.0-amd64-debian as I noticed it had been updated).

When I say huge I really mean it, look at this:

/opt/openhab$ grep "NODE 25" /opt/openhab/userdata/logs/openhab.log.1 | wc -l
55126
/opt/openhab$ grep "NODE 25" /opt/openhab/userdata/logs/openhab.log.1 | grep ZWaveAssociationGroup | wc -l
52523
/opt/openhab$ grep "NODE 28" /opt/openhab/userdata/logs/openhab.log.1 | grep ZWaveAssociationGroup | wc -l
40006
/opt/openhab$ grep "NODE 28" /opt/openhab/userdata/logs/openhab.log.2 | grep ZWaveAssociationGroup | wc -l
42978

Nodes 25 and 28 are “ZMNHDD Flush Dimmer Plus”.

A (relatively) small section of one of the logs showing the repetition, I can provide the whole file on request.

2018-02-26 20:06:20.529 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 28: Property to update: config_40_1, 5, 5
2018-02-26 20:06:20.529 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 28: Property to update: group_org.openhab.binding.zwave.internal.protocol.ZWaveAssociationGroup@2adf86e9, [], []
2018-02-26 20:06:20.529 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 28: Property to update: group_org.openhab.binding.zwave.internal.protocol.ZWaveAssociationGroup@4c1ad33, [], []
2018-02-26 20:06:20.529 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 28: Property to update: group_org.openhab.binding.zwave.internal.protocol.ZWaveAssociationGroup@11437afb, [], []
2018-02-26 20:06:20.529 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 28: Property to update: group_org.openhab.binding.zwave.internal.protocol.ZWaveAssociationGroup@14f75ad7, [], []
2018-02-26 20:06:20.529 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 28: Property to update: group_org.openhab.binding.zwave.internal.protocol.ZWaveAssociationGroup@2edadde5, [], []
2018-02-26 20:06:20.529 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 28: Property to update: group_org.openhab.binding.zwave.internal.protocol.ZWaveAssociationGroup@7523a82e, [], []
2018-02-26 20:06:20.529 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 28: Property to update: group_org.openhab.binding.zwave.internal.protocol.ZWaveAssociationGroup@508fdcfb, [], []
2018-02-26 20:06:20.530 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 28: Property to update: group_org.openhab.binding.zwave.internal.protocol.ZWaveAssociationGroup@78a6be59, [], []
2018-02-26 20:06:20.530 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 28: Property to update: group_org.openhab.binding.zwave.internal.protocol.ZWaveAssociationGroup@853a5b1, [], []
2018-02-26 20:06:20.530 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 28: Property to update: group_org.openhab.binding.zwave.internal.protocol.ZWaveAssociationGroup@14dbc983, [], []
2018-02-26 20:06:20.530 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 28: Property to update: group_org.openhab.binding.zwave.internal.protocol.ZWaveAssociationGroup@5a064b1a, [], []
2018-02-26 20:06:20.530 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 28: Property to update: group_org.openhab.binding.zwave.internal.protocol.ZWaveAssociationGroup@12300c54, [node_1_0], [node_1_0]
[... snip ~110 similar lines ...]
2018-02-26 20:06:20.545 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 28: Property to update: config_10_2, 255, 255
2018-02-26 20:06:20.545 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 28: Property to update: group_org.openhab.binding.zwave.internal.protocol.ZWaveAssociationGroup@21d509f5, [], []
2018-02-26 20:06:20.545 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 28: Property to update: group_org.openhab.binding.zwave.internal.protocol.ZWaveAssociationGroup@168fcbbe, [], []
[... etc. etc. ...]

@chris does that look normal?

Well duh, they are actually in the correct order, ascending by date. I’m still trying to work out what happened to the missing files in my first message. :slight_smile: Currently looking at the docs for RollingRandomAccessFileAppender.

You can shunt the zwave logs into their own file and see if the problem is related to how quickly they fill up or if it is caused by something specific to openhab.log.

### Custom Loggers ###
# ZWave
log4j2.logger.zwave.name = org.openhab.binding.zwave
log4j2.logger.zwave.level = INFO
log4j2.logger.zwave.additivity = false
log4j2.logger.zwave.appenderRefs = ZWAVE
log4j2.logger.zwave.appenderRef.zwave.ref = ZWAVE

# ZWave
log4j2.appender.zwave.name = ZWAVE
log4j2.appender.zwave.type = RollingRandomAccessFile
log4j2.appender.zwave.fileName = ${openhab.logdir}/zwave.log
log4j2.appender.zwave.filePattern = ${openhab.logdir}/zwave.log.%i
log4j2.appender.zwave.immediateFlush = true
log4j2.appender.zwave.append = true
log4j2.appender.zwave.layout.type = PatternLayout
log4j2.appender.zwave.layout.pattern = %d{dd-MMM-yyyy HH:mm:ss.SSS} [%-5.5p] [%-50.50c] - %m%n
log4j2.appender.zwave.policies.type = Policies
log4j2.appender.zwave.policies.size.type = SizeBasedTriggeringPolicy
log4j2.appender.zwave.policies.size.size = 10MB
log4j2.appender.zwave.strategy.type = DefaultRolloverStrategy
log4j2.appender.zwave.strategy.max = 10

Unless you are actively debugging a problem, I’m not sure I would recommend running at debug or trace level logging all the time.

Anyway, if the problem is caused by the log files rapidly filling up then the zwave.log file will be the one missing its versions. If it is a config problem them openhab.log will be the one with the problem. If both show problems something else is going on.

portland2 logs # head -1 openhab.log.7
2018-02-26 19:30:54.645 [DEBUG] [nternal.connection.OneWireConnection] - check if device exists '3A.E62018000000': 
portland2 logs # tail -1 openhab.log.7
2018-02-26 19:53:23.735 [DEBUG] [ding.onewire.internal.OneWireBinding] - received command OFF for item HVACHeat
portland2 logs # head -1 openhab.log
2018-02-26 19:53:23.735 [DEBUG] [ding.onewire.internal.OneWireBinding] - write to item HVACHeat
portland2 logs # tail -1 openhab.log
2018-02-27 16:45:37.697 [WARN ] [nal.protocol.ZWaveTransactionManager] - NODE 144: Not initialized (ie node unknown), ignoring message.
portland2 logs # 

AFAIK, I don’t have any other log rotating software in place for openhab.

@rlkoshak thanks, I’ve just made that change and restarted.

Interestingly these lines in your appender don’t appear in the OOTB config file:

I was able to find this doc: Log4j – Log4j 2 Appenders
which shows that the defaults are:

fileIndex = max (files with a higher index will be newer than files with a smaller index)
min = 1
max = 7

Anyway, I’m up and running now with a separate zwave.log. We will see what happens. Thanks!