Homekit or Openhab crashing in fatal way, nothing works without restart

I’ve got a real problem here that I’m loosing my patience with. Been ongoing since the end of February or so and I’m sure Chris is tired of me asking about it since I originally thought it was a zwave binding issue and he looked it over pretty well. Essentially off a fresh reboot of the server Openhab2 is on everything works just fine, all Zwave devices and Homekit interfacing.

Somewhere during a period of around 24 - 36 hours everything stops working, openhab can no longer control things from a sitemap and homekit can no longer control things. Nothing even gets logged to the log file other than homekit showing that it’s trying. It appears whatever crashes it’s so bad that it takes Openhab with it. I’ve tried restarting the zwave binding, the homekit service etc but nothing brings it back until I stop the entire openhab service and restart and/or restart the server.

Everytime this happens the main thing I find in the log is that things worked “normally” up to a certain point of time where the Zwave binding seems to “miss” some things it should do followed by Homekit saying its terminating a connection and then restarting a new one. Then it skips several hours with nothing logged before Homekit throws some crashes and nothing works again.

I’m at a loss for what to do. I’ve tried both the normal and secure zwave binding with the same results. I’ve tried nuking Homekit everything, clearing pairing, removing the app, removing the service and re-building it all. Doesn’t help. On a whim I made sure my IOS devices are all at the same software release. Same result. Any help??

Linking the log, yes there is one secure device but I really don’t care if I need to reset it when this gets fixed. You can see around 2:18 to 2:27 when it starts to freak out. And then skips to 6pm before some crashing and nothing works when I tried…

http://ntel.us/A/openhab.log

No one has any suggestions? I suppose my next test is to completely nuke Homekit and run without it and see if it still crashes that way.

No way to get Openhab to tell me from the console what’s hanging it up?

Couple of things, can you log into the console? (located somewhere like openHAB2/runtime/bin/client) .

Can you type ‘list’ and print a list of all the addons that are enabled. Maybe something else is the cause?

Also what is the memory and CPU usage of the box when it starts to crash? Is OH running at 100%?

Well, some interesting news… Homekit appears to not be related to the problem at all. In the latest test I completely removed it, unpaired the phones, cleared pairings on the server console, deleted the homekit add-on, and rebooted the server. Same problem came back around the same time period… @chris, do you think you could take another look at this log now that Homekit can be ruled out?

http://ntel.us/A/openhab.log2

To answer @digitaldan, the load average is at 0.00 on all 3 averages with java showing a 0.7% usage.
The list in the console while hung is here: (console access doesn’t hang up)

openhab> list
START LEVEL 100 , List Threshold: 50
ID | State | Lvl | Version | Name

9 | Active | 80 | 2.1.0.201704232237 | ZWave Binding
10 | Active | 80 | 2.3.0.201506221200 | JAX-RS Gson Provider
11 | Active | 80 | 5.3.1.201602281253 | OSGi JAX-RS Connector
12 | Active | 80 | 2.3.1 | Gson
13 | Active | 80 | 18.0.0 | Guava: Google Core Libraries for Java
14 | Active | 80 | 3.0.0.v201312141243 | Google Guice (No AOP)
15 | Active | 80 | 1.5.8 | swagger-annotations
16 | Active | 80 | 3.19.0.GA | Javassist
18 | Active | 80 | 3.5.1 | JmDNS
21 | Active | 80 | 1.1.0.Final | Bean Validation API
23 | Active | 80 | 2.0.1 | javax.ws.rs-api
25 | Active | 80 | 3.2.0.v201101311130 | ANTLR Runtime
28 | Active | 80 | 1.6.0 | Commons Codec
29 | Active | 80 | 3.2.1 | Commons Collections
30 | Active | 80 | 1.1 | Commons Exec
31 | Active | 80 | 2.2.0 | Commons IO
32 | Active | 80 | 2.6 | Commons Lang
37 | Active | 80 | 4.2.3 | Apache HttpClient OSGi bundle
38 | Active | 80 | 4.2.3 | Apache HttpCore OSGi bundle
56 | Active | 80 | 4.0.8 | Apache Karaf :: Shell :: Core
59 | Active | 80 | 4.0.8 | Apache Karaf :: Wrapper :: Core
60 | Active | 80 | 3.1.0.7 | Apache ServiceMix :: Bundles :: commons-httpclient
65 | Active | 80 | 2.11.0.v20150805-0538 | EMF Common
66 | Active | 80 | 2.11.1.v20150805-0538 | EMF Ecore
67 | Active | 80 | 2.11.1.v20150805-0538 | EMF XML/XMI Persistence
68 | Active | 80 | 3.7.0.v20150402-1709 | Common Eclipse Runtime
69 | Active | 80 | 3.6.0.v20150318-1503 | Extension Registry Support
96 | Active | 80 | 0.9.0.201704201358 | Eclipse SmartHome Config Core
97 | Active | 80 | 0.9.0.201704201358 | Eclipse SmartHome Configuration Discovery
98 | Active | 80 | 0.9.0.201704201358 | Eclipse SmartHome Config Dispatcher
99 | Active | 75 | 0.9.0.201704201358 | Eclipse SmartHome Config XML
100 | Active | 80 | 0.9.0.201704201358 | Eclipse SmartHome Core
101 | Active | 80 | 0.9.0.201704201358 | Eclipse SmartHome Core Audio
102 | Active | 80 | 0.9.0.201704201358 | Eclipse SmartHome AutoUpdate Binding
103 | Active | 80 | 0.9.0.201704201358 | Eclipse SmartHome Core Binding XML
104 | Active | 80 | 0.9.0.201704201358 | Eclipse SmartHome Core ID
105 | Active | 80 | 0.9.0.201704011656 | Eclipse SmartHome Core Persistence
106 | Active | 80 | 0.9.0.201704011656 | Eclipse SmartHome Scheduler Service
107 | Active | 80 | 0.9.0.201704201358 | Eclipse SmartHome Core Thing
108 | Active | 75 | 0.9.0.201704201358 | Eclipse SmartHome Core Thing XML
109 | Active | 80 | 0.9.0.201704011656 | Eclipse SmartHome Transformation Service
110 | Active | 80 | 0.9.0.201704011656 | Eclipse SmartHome Core Voice
111 | Active | 80 | 0.9.0.201704201358 | Eclipse SmartHome Console
112 | Active | 80 | 0.9.0.201704011656 | Eclipse SmartHome Console for OSGi runtime Karaf
113 | Active | 80 | 0.9.0.201704011656 | Eclipse SmartHome JavaSound I/O, Fragments: 171
114 | Active | 80 | 0.9.0.201704011656 | Eclipse SmartHome Monitor
115 | Active | 80 | 0.9.0.201704011656 | Eclipse SmartHome Net I/O Bundle
116 | Active | 80 | 0.9.0.201704201358 | Eclipse SmartHome REST Interface Bundle
117 | Active | 80 | 0.9.0.201704011656 | Eclipse SmartHome Core REST API
118 | Active | 80 | 0.9.0.201704011656 | Eclipse SmartHome REST mDNS Announcer
119 | Active | 80 | 0.9.0.201704011656 | Eclipse SmartHome Sitemap REST API
120 | Active | 80 | 0.9.0.201704011656 | Eclipse SmartHome SSE REST API
121 | Active | 80 | 0.9.0.201704011656 | Eclipse SmartHome Voice REST API
122 | Active | 80 | 0.9.0.201704011656 | Eclipse SmartHome Bonjour/MDS Service Discovery Bundle
123 | Active | 80 | 0.9.0.201704011656 | Eclipse SmartHome Web Audio Support
124 | Active | 80 | 0.9.0.201704011656 | Eclipse SmartHome Model Core
125 | Active | 80 | 0.9.0.201704011656 | Eclipse SmartHome Item Model
126 | Active | 80 | 0.9.0.201704011656 | Eclipse SmartHome Item Model Runtime
127 | Active | 80 | 0.9.0.201704011656 | Eclipse SmartHome Persistence Model
128 | Active | 80 | 0.9.0.201704011656 | Eclipse SmartHome Persistence Runtime
129 | Active | 80 | 0.9.0.201704011656 | Eclipse SmartHome Rule Model
130 | Active | 80 | 0.9.0.201704011656 | Eclipse SmartHome Rule Runtime
131 | Active | 80 | 0.9.0.201704011656 | Eclipse SmartHome Script
132 | Active | 80 | 0.9.0.201704011656 | Eclipse SmartHome Script Runtime
133 | Active | 80 | 0.9.0.201704011656 | Eclipse SmartHome Sitemap Model
134 | Active | 80 | 0.9.0.201704011656 | Eclipse SmartHome Sitemap Runtime
135 | Active | 80 | 0.9.0.201704011656 | Eclipse SmartHome Thing Model
136 | Active | 80 | 0.9.0.201704011656 | Eclipse SmartHome Thing Model Runtime
137 | Active | 80 | 0.9.0.201704011656 | Eclipse SmartHome Json Storage Service
138 | Active | 80 | 0.9.0.201704011656 | Eclipse SmartHome UI
139 | Active | 80 | 0.9.0.201704011656 | Eclipse SmartHome UI Icons
140 | Active | 80 | 0.9.0.201704011656 | Eclipse SmartHome Classic IconSet
141 | Active | 80 | 2.9.2 | Xtend Runtime Library
142 | Active | 80 | 2.9.2 | Xtend Macro Interfaces
143 | Active | 80 | 2.9.2 | Xtext
144 | Active | 80 | 2.9.2 | Xtext Common Types
145 | Active | 80 | 2.9.2 | JSR-45 Smap installer
146 | Active | 80 | 2.9.2 | Xtext Utility
147 | Active | 80 | 2.9.2.v20160428-1452 | Xbase Model
148 | Active | 80 | 2.9.2 | Xbase Runtime Library
164 | Active | 80 | 2.2.0 | JUPnP Library
165 | Active | 80 | 1.9.6 | MIME streaming extension
166 | Active | 80 | 5.0.2 | Require-Bundle Fix :: org.objectweb.asm
168 | Active | 90 | 2.1.0.201704211358 | openHAB Core
169 | Active | 80 | 2.1.0.201704211358 | openHAB Karaf Integration
171 | Resolved | 80 | 2.1.0.201704211358 | openHAB Sound Support, Hosts: 113
172 | Active | 80 | 2.1.0.201704211358 | openHAB Dashboard UI
177 | Active | 80 | 3.2.0 | Commons Net
178 | Active | 80 | 0.9.0.201704011656 | Eclipse SmartHome Basic UI, Fragments: 182
179 | Active | 80 | 0.9.0.201704011656 | Eclipse SmartHome Paper UI, Fragments: 185
180 | Active | 80 | 2.1.0.201704211358 | Astro Binding
182 | Resolved | 80 | 2.1.0.201704211358 | openHAB Basic UI Fragment, Hosts: 178
183 | Active | 80 | 2.1.0.201704211358 | HABmin User Interface
184 | Active | 80 | 2.1.0.201704211358 | HABPanel User Interface
185 | Resolved | 80 | 2.1.0.201704211358 | openHAB Paper UI Theme Fragment, Hosts: 179
186 | Active | 80 | 3.12.0.OH | nrjavaserial

I’m not sure what the issue is. I run OH2 with a fairly large zwave setup on Chris’s dev branch and far more bindings without any problems, so what difference is there between your setup and mine (is what I’m thinking) ? Sorry if you have already relayed this in another thread, but what hardware are you running this on (make, model, memory, cores, os) ? What Java version and vendor? What clients do you have running against your instance (ios, android, paperui, basicui, habmin, panelui, custom, etc…) ? Can you list the running threads when its healthy, and again when it freezes up? (in the console the command is "threads: I believe)

I will take a look at the log tonight, but if it’s anything like the previous logs, it won’t really be very useful as there was nothing logged after the error occurred…

Of course I’ll claim it’s not related to the ZWave binding (already have ;)), but there are a lot of people running this version, and I’ve not seen this sort of error ever (other than your system). If I remember rightly, you also changed back to the current version of the ZWave binding and it still happened (right?).

Currently I’m trying to run this on a VM on Virtualbox with 1 core, 1gb ram, 8gb disk. When the problem is exhibited the available ram is still around 250mb with 1gb of swap entirely untouched so I don’t think thats the issue. OS is Ubuntu server with the Openhab2 dev snapshot builds. Java is:

foglem@openhab2:~$ java -version
java version "1.8.0_131"
Java™ SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot™ Client VM (build 25.131-b11, mixed mode)

Threads (shell:threads) in Crashed State:
https://www.pastiebin.com/5904f8e458c31

Threads in running state:
https://www.pastiebin.com/5904fa43ba115

The biggest thing I see there @chris is that zwave in the crashed state seems to have a lot of “BLOCKED” threads. I don’t know enough to understand what thats telling me… But yes, I seemed to see this same thing with the non secure binding.

Clients Dan asked about, I’m running primarily paperui, habpanel (not much, but recently after trying to ditch homekit), homekit (ios/siri integration). Thats about it…

Until I here from you all with something else to try I’m going to watch the VM logs this time around and see if I get any interesting activity about the time of the issue. My running theory is perhaps some host machine issue causes a “hiccup” of the VM usb device for the ZWave stick and the binding(s) are programming in a non-forgiving way that blocks without a graceful timeout?

I got no viable error from the VM. No hiccups in the USB that I see. Same ZWave binding blocking on the serial comms to the stick though. And if I try to restart the zwave binding those blocking threads remain… @chris any chance you know where in your code this is and could make a timeout to gracefully unblock in cases like this?

I’m not sure there’s much I can do. The serial thread will basically always block if you look at it at any instant in time - serial speed is low so it will block most of the time.

There is a timeout every second or two though, so it isn’t really blocked.

I looked at the log a few days back and there’s nothing in there that indicates any problem with zwave binding - just HomeKit errors if I remember correctly. The development binding and the master are also very different in the way they handle threads so if it’s happening on both bindings as you say then it’s strange.

I still think this must be a problem with your system. You seem to be the only one with this issue and I still don’t see anything that links the problem to the binding.

How long have you been using OH2 and when did this start? I would try and correlate the start of the problem with a change in your system.

Sorry, I don’t see anything else odd either, you don’t have many threads running and I don’t see anything that stands out. The Java version is up to date and I am assuming is Oracle (not openJDK) so that look fine as well. Not sure if there are issues with Virtual box and USB drivers?

Currently I’ve reverted to running openhab on the non-virtualized server with the restored config files from the VM instance. I’ll let you all know if it crashes again or goes away. If this fixes it I would find that rather troubling…

At this point I can say the Virtual machine seems to be causing the issue. Very odd, and it sure appears it’s the Zwave binding that doesn’t get along with Virtualbox and it’s USB passthrough driver. I guess I don’t have to virtualize it, but that’s really something that should have worked. Thanks for all your help in digging into this very annoying problem.

Curious if you resolved your issue. I’m not using a VM but having terrible issues with OH2 reliability when OH1 was flawless.