openHAB freezing after 24 hours

I’ve suddenly hit an odd problem, with openHAB freezing after about 24 hours of running. Most rules fail to fire, and bindings bring up “Handler takes more than 5000ms for processing event” warnings.

I made three changes around the same time that could be the cause:

  • installed influxdb persistence
  • upgraded to snapshot #1312
  • installed the latest version of the zwave experimental binding (having previously been running an early July version of the binding without problems)

The only error I can see in the logs is zwave-related:

There is one error in the zwave log:

20-Jul-2018 02:50:42.259 [ERROR] [wave.handler.ZWaveSerialHandler$ZWaveReceiveThread] - Exception during ZWave thread. 
java.lang.IllegalStateException: Queue full
	at java.util.AbstractQueue.add(AbstractQueue.java:98) [?:?]
	at java.util.concurrent.ArrayBlockingQueue.add(ArrayBlockingQueue.java:312) [?:?]
	at org.openhab.binding.zwave.internal.protocol.ZWaveTransactionManager.processReceiveMessage(ZWaveTransactionManager.java:394) [181:org.openhab.binding.zwave:2.4.0.201807151249]
	at org.openhab.binding.zwave.internal.protocol.ZWaveController.incomingPacket(ZWaveController.java:1053) [181:org.openhab.binding.zwave:2.4.0.201807151249]
	at org.openhab.binding.zwave.handler.ZWaveControllerHandler.incomingMessage(ZWaveControllerHandler.java:634) [181:org.openhab.binding.zwave:2.4.0.201807151249]
	at org.openhab.binding.zwave.handler.ZWaveSerialHandler$ZWaveReceiveThread.run(ZWaveSerialHandler.java:319) [181:org.openhab.binding.zwave:2.4.0.201807151249]

No other errors in the logs, no high CPU utilisation (it’s a fast i5 PC), no other signs of trouble. I haven’t had anything like this in the past, so would really appreciate thoughts on what the problem might be

thanks,

Dan

Have you recently added rules with Thread::sleep or while loops?

1 Like

Sorry no response towards any solution but I did similar things (upgrade to #1313, latest zwave dev binding) and experiencing similar problems.
Laggy zwave device response most of the time but not always. Yes either rule or rather it looked like a complete zwave lockup this morning To restart the binding only (not openHAB) made things work again for a while.
EDIT: Just saw that ‘Queue full’ exception you saw, too, yesterday, just once. As written, restarting the zwave bundle made things work again although lagginess remains, definitely something’s wrong with some queueing there (don’t know which one, though).

Ran #1302 before without problems, would like to go back there but it isn’t available in the repository any more I believe (apt-cache madison does not show it). @benjy, mind to put that up again in the repo ? Package is openhab2=2.4.0~20180616001001-1

2 Likes

good thought but no - just a few old rules with 500ms sleeps, none of which were running at the times in question.

At the moment, “latest” is not really definitive enough. Can you say exactly what version, or, can you ensure you use the “latest” right now :slight_smile: .

The “Queue Full” error is likely an effect, rather than the cause.

I saw this recently in a log and I believe it was caused by a synchronisation issue in ZWave for the recently added hold-off delay. This was updated recently (Wednesday night I think) which is why I say to use the very latest version. However, I don’t think that would impact other issues you see such as problems with rules - I might be wrong though.

It may also be caused by a processing problem in the ZWave binding - eg an exception while processing a frame that prevents further data from being processed. In this case, data continues to go into the queue, and at some point later, you will get this exception. I think the receive thread is fully protected, so this shouldn’t happen, but it’s worth looking out for.

Or, it could be something else :slight_smile: . I’d suggest to check debug logs to see if there’s any ZWave issue anyway…

1 Like

It was the version right after you fixed the holdoff processing issue earlier this week, but I’ve just upgraded to the 18 July version and will see if that makes a difference.

nothing very useful in the logs.

given it impacts the whole system, I was rather assuming this was a general openhab issue not a zwave binding issue

I think there were a couple of fixes so let’s see if the latest happens. I’ve not heard back from the person who reported the similar problem (@vossivossi if I remember correctly?).

If the problem persists, then please provide a log so I can see if it looks like the same thing or not. I don’t believe that a lock of the receive thread, and a timer thread in the ZWave binding will cause such wide issues as you’re seeing with rules etc, but I may of course be wrong and a log would probably allow me to tell if it is this issue…

thanks - I’ll do that

I think my problem was different as ONLY the Z-Wave binding stopped receiving status updates while all other bindings and rules kept on running. This problem is solved with the recent 201807182237 dev version of the Z-Wave binding. I also confirmed that it is working now to you in the large 3000+ posts Z-Wave thread.

1 Like

Yes, that’s also what I thought, but the error is the same, and that’s all I had to go on :sunglasses:. As above, I don’t really think the holdoff error is likely to cause this.

Great - thanks for testing :slight_smile: .

I upgraded to the latest zwave binding and the latest OH snapshot, and also added to /etc/default/openhab2:

EXTRA_JAVA_OPTS="-XX:+HeapDumpOnOutOfMemoryError -Xmx512m"

and the system kept going for about 30 hours; then the zwave items all became unresponsive (with everything else working fine) and then the whole system halted. Zwave Log here.

Dan

I don’t think this is related to your original issue - it’s a thread lock in the holdoff code that’s simply preventing messages being processed. I’ll do another update later today to try again to rectify this later.

great news - thanks. Wasn’t thrilled at having an untraceable and unreproducable fatal problem with my system…

Sorry, but I don’t think this is related to the original issue you reported. This is a simple thread-lock where the ZWave binding stops processing received messages. I doubt it’s likely to have any influence on things like rules or other parts of the system…

I might of course be wrong, but I think it should be completely decoupled.

I’ve just updated the dev binding to change the way the holdoff synchronisation is managed - it’s at the usual download link for the dev binding.

I’m sorry I just experienced ‘Queue full’ with your today’s binding.
Quite some ‘controller rejected’ responses before, I’ll send the log.
FWIW, I’m running snapshot #1302.

EDIT: another ‘Queue full’ with snapshot #1316 and binding 2.4.0.201807242215.

yes, and me - after about 12 hours all zwave devices became unresponsive (but openhab is otherwise fine). Log here

Went back to snapshot #1302 but kept binding 2.4.0.201807242215.

Saw a couple of ‘Task already scheduled or cancelled’ (on one device that I fired two consecutive commands at, maybe these are treated together ? just a guess) but no lockup/need for restart since. Seems that only those transactions to generate that error weren’t executed.

EDIT: another lockup due to queue full, I’ve uploaded the logs.

EDIT: another lockup due to queue full with Aug 3rd dev binding, I’ve uploaded the logs.
Went back to stock 2.3.0 for other reasons than ZWave, and some June version of the dev binding (hopefully that’s before the holdoff stuff got in).

1 Like