[homekit] "No response" error on all openHAB items

Hello Eugen,

I discovered some new behavior since I upgraded to OH 3.2.0 Stable release. Not sure this is related to the fix you added.

The Device are suddenly not assigned to the desired Room in Homekit anymore, but to the “default” OpenHAB room instead. Also the item names, icons etc. are reset.
This is happening sometimes, but I couldn’t identify the reason so far.

Could this be a side-effect of the recent changes?

Hi Stefan,

i checked the recent changes one more time and i dont think that these changes could lead to your issue. the changes are minor changes and fixes for specific issues.

reset of room and icons assignment means that home app has received an empty or short list of accessories from openhab. this typically happens in case of errors in the items configuration or if homekit binding starts before openHAB is done with config parsing. for this case, there is delay of 30 seconds (startDelay) which can be increased.

please make sure that you dont change item config in between, check openhab log files for number of homekit accessories identified and maybe try to increase startDelay.via homekit binding settings

I’ve been following this discussion for a while, but not it’s time to contribute some information that might help to shed some light on this issue (though not solve it).

A few remarks upfront:

  1. This is about the mDNS no response issue, not about the lost pairing/user info issue. I had the latter too, but this clearly seems to be a separate topic.
  2. Encouraged by the dicussion her, I switched to 64 bit, but this didn’t do the job for me
  3. I’m currently running openhabian 3.3.0-SNAPSHOT (#2659) on a Pi 3+
  4. I’ve configured openHAB not to use ipv6 adresses, but not on OS level, so the Pi still has ipv6 addresses
  5. My standard configuration is not to use openHAB mDNS service for the Homekit binding (option unchecked in Homekit integration preferences)

I noticed whenever I have the “no response” error in the Home app, not only is the openHAB Homekit binding _hap._tcp. not beeing advertised, but also the openhab-server isn’t being advertised under .openhab-server._tcp.(as reported by the Discovery app on iOS). So I went through the following steps:

  1. Intial situation: no response of devices in Home app
  2. Discovery app: _hap._tcp.: openhab missing, .openhab-server._tcp.: openhab not showing up
  3. Toggle “use openHAB DNS service” on in Homekit binding preferences: no change
  4. Togggle “use openHAB DNS service” off again;now
    _hap._tcp.: openHAB is visible again, but with an ipv6 address, and devices respond again in Home app,
    but in .openhab-server._tcp openhab still is missing
  5. Restart mDNS service (org.openhab.core.io.transport.mdns, not jmDNS) via console:
    now openhab under .openhab-server._tcp. is also available, with an ipv4 address
  6. Toggle “use openHAB DNS service” on again:
    now the Homekit binding shows up under _hap._tcp. as “openHAB (2)” with an ipv4 address
  7. Toggle “use openHAB DNS service” off once again:
    Homekit binding now is advertised as “openHAB” again, but with the ipv4 address

So whatever breaks mDNS on openhab seems to equally affect the openhab mDNS service and the Homekit binding’s mDNS service.

I also found out that a simple restart of org.openhab.core.io.transport.mdns (mDNS service) recovers the connection (however with an ipv6 address for the Homekit binding), but a restart of javax.jmdns does not.

Here is also some mDNS related WARN level log information around the occurence of the issue:

2021-12-31 14:00:28.184 [WARN ] [javax.jmdns.impl.tasks.Responder    ] - Responder(fe80-0-0-0-3913-e253-bd00-f7d0-wlan0.local.)run() exception 
java.io.IOException: Die angeforderte Adresse kann nicht zugewiesen werden
	at java.net.PlainDatagramSocketImpl.send(Native Method) ~[?:?]
	at java.net.DatagramSocket.send(DatagramSocket.java:695) ~[?:?]
	at javax.jmdns.impl.JmDNSImpl.send(JmDNSImpl.java:1665) ~[bundleFile:3.5.7]
	at javax.jmdns.impl.tasks.Responder.run(Responder.java:154) [bundleFile:3.5.7]
	at java.util.TimerThread.mainLoop(Timer.java:556) [?:?]
	at java.util.TimerThread.run(Timer.java:506) [?:?]
2021-12-31 14:00:28.194 [WARN ] [.jmdns.impl.tasks.state.DNSStateTask] - Canceler(fe80-0-0-0-3913-e253-bd00-f7d0-wlan0.local.).run() exception 
java.io.IOException: Die angeforderte Adresse kann nicht zugewiesen werden
	at java.net.PlainDatagramSocketImpl.send(Native Method) ~[?:?]
	at java.net.DatagramSocket.send(DatagramSocket.java:695) ~[?:?]
	at javax.jmdns.impl.JmDNSImpl.send(JmDNSImpl.java:1665) ~[bundleFile:3.5.7]
	at javax.jmdns.impl.tasks.state.DNSStateTask.run(DNSStateTask.java:131) [bundleFile:3.5.7]
	at java.util.TimerThread.mainLoop(Timer.java:556) [?:?]
	at java.util.TimerThread.run(Timer.java:506) [?:?]
2021-12-31 14:00:33.208 [WARN ] [StatefulObject$DefaultImplementation] - Wait for canceled timed out: DNS: fe80-0-0-0-3913-e253-bd00-f7d0-wlan0.local. [fe80:0:0:0:3913:e253:bd00:f7d0%wlan0/fe80:0:0:0:3913:e253:bd00:f7d0%wlan0] state: canceling 1 task: null
2021-12-31 14:00:38.220 [WARN ] [StatefulObject$DefaultImplementation] - Wait for canceled timed out: DNS: fe80-0-0-0-3913-e253-bd00-f7d0-wlan0.local. [fe80:0:0:0:3913:e253:bd00:f7d0%wlan0/fe80:0:0:0:3913:e253:bd00:f7d0%wlan0] state: canceling 1 task: null
2021-12-31 14:01:15.111 [WARN ] [x.jmdns.impl.constants.DNSRecordType] - Could not find record type for index: 65
2021-12-31 14:01:15.113 [WARN ] [javax.jmdns.impl.DNSIncoming        ] - Could not find record type: dns[query,fe80:0:0:0:10b7:98de:e0ff:f03e%3:5353, length=70, id=0x0]
   0: 0000000000030000 00000001056b7563 6865056c6f63616c 0000418001c00c00     ........ .....kuc he.local ..A.....
  20: 1c8001c00c000180 0100002905a00000 119400120004000e 006faa817e706369     ........ ...).... ........ .o..~pci
  40: 22a8baa86173                                                            "...as

And this is what I get when I restart mDNS:

2021-12-31 14:28:01.086 [WARN ] [javax.jmdns.impl.JmDNSImpl          ] - fe80-0-0-0-3913-e253-bd00-f7d0-wlan0.local. removing unregistered service info: openhab._hap._tcp.local.

I have the feeling @erikvdv1 might have pointed the right direction with his post above, though this doesn’t explain the root cause of mDNS breaking. Anyway, I will try his proposed buxfix and see if it makes a difference.

1 Like

thank you for sharing. good investigation!
all helps to nail down this issue. personally, i still not sure what the main root cause.

one more question: do you have an IPv4 address defined in the homekit binding settings? if “use openHAB DNS service” is off, binding should publish the IP address from the settings and not the ipv6

Yes, the IPv4 address is configured in the binding. And “use IPv6” is disbabled in the openhab network settings.

I have rechecked this: after system start openHAB._hap._tcp.local get’s published with an IPv6 address, openhab._openhab-server[-ssl]._tcp.local with the IPv4 address. After toggling “Use openHAB mDNS service” ON and OFF in the UI, also the hap-service has IPv4. Maybe the binding’s configuration doesn’t get read in time for the initial service start?

I’m now quite certain that at least in my case every connection loss goes hand in hand with a java.io.IOException in the jmdns responder. And as @erikvdv1 pointed out, in the current implementation of jmdns this causes the dns service to be closed.

However, the IO exceptions can be of different nature. In Erik’s example it was a “operation not permitted”, and I have seen at least seen “cannot assign requested address” and “network not reachable”. So I assume this depends on the individual network and the actual requests that jmdns receives.

Which leads to the question if it’s feasible to sort out all the possible root causes for the exceptions or if it makes more sense to handle them robustly. I think Erik’s approach with his “bug fix” goes in this direction, but I’m not able to judge if it’s sufficient and acceptable. At least, I have seen that “recover” instead of “close” is used in other places within jmdns to handle IO exceptions.

I’m working on getting a TRACE log around an exception, but this is a bit tough on the Pi as the log files grow really fast and I have to catch one event within maybe one day. Hopefully I succeed in creating a dedicated log file on a nfs share where file size isn’t an issue.

Guys, for those who have issues, how many iOS “server” devices do you have? I.e. do you have some appletv or HomePod besides the openhab instance?

In my case (I am who created this topic) i do not have any other server devices except openhab who serves as homekit bridge and clients (iPhones, macs etc) and with ipv4 disabled all works perfectly for quite a long time now

Thanks

Yes, I do have a HomePod mini and other HomeKit bridges (Hue, Netatmo). And I agree, this increases the chance to run into problems.

Regarding IPv4/v6: I also had the feeling that the system is more stable when openHAB’s HomeKit bridge publishes on IPv4. But I did experience issues even in that case.
@vlas : have you just configured openHAB + HomeKit binding to use only IPv4, or did you completely disable IPv6 on the system openHAB runs on?

Btw, last night I ran into the same issue as @stefan13 :

Definitely no interaction with openHAB configuration when that happened, as everybody was asleep. Even more annoying was that this stopped syncing the devices’ HomeKit properties (like locations) across Apple devices, so I had to exclude/include them from/to HomeKit again.

Anyway, I have set up the loggers and am generating GBs of log data now. Let’s see if that brings more insight.

Hi Christian, in my case I have openhab running on debian with customized kernel (in which ipv6 is just disabled completely). So that ipv6 related functionality just does not exist at all.

Thanks

Same here. Had that Issue now the second time. I Have IPv6 disabled and I use a HomePod mini.

At the moment I have no idea how to reproduce the problem. I restarted openHAB and also the HomePod.

The only Thing I know. Problems began after Upgrading to OpenHAB 3.2.0. the iOS Update to 15.2 a bit earlier.

I will give an Update here, if the devices loose its assignment to the rooms.

1 Like

Time to give an update. I had a few “no response” issues during the last days, maybe every second or third day. And it did happen also with IPv6 being completely disabled on kernel level.

Looking into the logs, each issue seems to be caused by the jmdns responder trying to send a message to the multicast DNS address (224.0.0.251:5353 and ff02::00fb:5353, respectively). One among thousands or ten thousands similar/identical messages usually causing no problems. I’ll copy just a few selected lines from the log to show this:

2022-01-07 10:34:07.454 [DEBUG] [javax.jmdns.impl.tasks.Responder    ] - Responder(schaltkiste-fritz-box.local.).run() JmDNS responding
2022-01-07 10:34:07.454 [DEBUG] [javax.jmdns.impl.tasks.Responder    ] - Responder(schaltkiste-fritz-box.local.).run() JmDNS responding
2022-01-07 10:34:07.454 [DEBUG] [javax.jmdns.impl.DNSIncoming        ] - DNSIncoming() questions:0 answers:1 authorities:0 additionals:0
2022-01-07 10:34:07.454 [DEBUG] [javax.jmdns.impl.DNSIncoming        ] - DNSIncoming() questions:0 answers:3 authorities:0 additionals:0
2022-01-07 10:34:07.454 [TRACE] [javax.jmdns.impl.JmDNSImpl          ] - send(schaltkiste-fritz-box.local.) JmDNS out:dns[response,224.0.0.251:5353, length=55, id=0x0, flags=0x8400:r:aa, answers=1
answers:
	[IPv4Address@1589910590 type: TYPE_A index ...
 
2022-01-07 10:34:07.455 [TRACE] [javax.jmdns.impl.JmDNSImpl          ] - send(schaltkiste-fritz-box.local.) JmDNS out:dns[response,224.0.0.251:5353, length=175, id=0x0, flags=0x8400:r:aa, answers=3
answers:
	[Service@1687574434 type: TYPE_SRV index 33, ...
 
2022-01-07 10:34:07.455 [WARN ] [javax.jmdns.impl.tasks.Responder    ] - Responder(schaltkiste-fritz-box.local.)run() exception 
java.io.IOException: Das Netzwerk ist nicht erreichbar
	at java.net.PlainDatagramSocketImpl.send(Native Method) ~[?:?]
	at java.net.DatagramSocket.send(DatagramSocket.java:695) ~[?:?]
	at javax.jmdns.impl.JmDNSImpl.send(JmDNSImpl.java:1665) ~[bundleFile:3.5.7]
	at javax.jmdns.impl.tasks.Responder.run(Responder.java:154) [bundleFile:3.5.7]
	at java.util.TimerThread.mainLoop(Timer.java:556) [?:?]
	at java.util.TimerThread.run(Timer.java:506) [?:?]
...
2022-01-07 10:34:07.596 [DEBUG] [javax.jmdns.impl.JmDNSImpl          ] - Cancelling JmDNS: 
	---- Local Host -----
	local host info[schaltkiste-fritz-box.local., wlan0:192.168.178.87, DNS: schaltkiste-fritz-box.local. [schaltkiste.fritz.box/192.168.178.87] state: announced task: Renewer(schaltkiste-fritz-box.local.) state: announced]
	---- Services -----
		Service: openhab._hap._tcp.local.: [ServiceInfoImpl@1030685467 name: 'openHAB._hap._tcp.local.' address: 'schaltkiste.fritz.box/192.168.178.87:9123 null:9123 ' status: 'DNS: schaltkiste-fritz-box.local. [schaltkiste.fritz.box/192.168.178.87] state: announced task: Renewer(schaltkiste-fritz-box.local.) state: announced', has data
	ff: 0
	ci: 1
	sh: N4sWPw==
	sf: 0
	md: openHAB
	s#: 1
	c#: 58
	id: ba:63:f8:d5:da:b4]
...

The actual exception is thrown on the network transport level (java.net.PlainDatagramSocketImpl.send), so it most likely has nothing to do with the Homekit binding and seems to be caused by a temporary flaw of the network or network interface. However, the exception is caught by the reponder, which in turn requests the jmdns service to cancel.

From that point of time, the openHAB hap-service can no longer be discovered in the network. Typically, you can still access you accessories from the home app for a certain time, as the clients have cached their connection information to the bridge. Only after this information expires, they cannnot discover the service any more.

Other jmdns modules than the responder handle such network transport level exceptions more tolerantly. I have examples where, after a similar exception, the module catching the exception requests jmdns to recover the service, and everything is fine afterwards.

So my next step will be to test the modified responder, where exceptions also should lead to a service recovery.

I think I have found a common cause/trigger both for the “no response” and the new “accessories reset to default room” issue in my system.

I had a look into the /var/log/syslog file and found that always when a problem occured the Pi had shortly lost it’s WiFi network connection (“wlan0: carrier lost”), which caused a re-initialization of the network interface. This typically happens several times a day but usually is handled well by the system, with two exceptions:

  1. When the JmDNS repsonder tries to send just in the moment when the network interface is down, the JmDNS service get’s cancelled as described in my previous posts. This leads to the “no response” issue.

  2. The temporary loss of the network connection also causes a “restart of the homekit bridge on network interface change”, a feature that to my understanding was introduced with OH 3.2.0. This restart usually works well and seamlessly, but under bad circumstances seams to go wrong. At least I found my accessories reset to default (and unsynced between Apple devices) again after such an event. The log file shows this (note: order of the time stamps get’s messed up, three minute time lag, and the restart is triggered twice before one stop was finished):

2022-01-11 05:25:16.633 [TRACE] [nhab.io.homekit.internal.HomekitImpl] - restarting homekit bridge on network interface changes.
2022-01-11 05:28:05.755 [TRACE] [nhab.io.homekit.internal.HomekitImpl] - stop HomeKit bridge
2022-01-11 05:26:47.074 [TRACE] [nhab.io.homekit.internal.HomekitImpl] - restarting homekit bridge on network interface changes.
2022-01-11 05:28:05.756 [TRACE] [penhab.io.homekit.internal.Debouncer] - stop debouncer
2022-01-11 05:26:26.814 [TRACE] [ssories.AbstractHomekitAccessoryImpl] - getKeyFromMapping: characteristic PositionState, state 100, mapping {DECREASING=DECREASING, INCREASING=INCREASING, STOPPED=STOPPED}
2022-01-11 05:28:05.756 [TRACE] [nhab.io.homekit.internal.HomekitImpl] - stop HomeKit bridge
2022-01-11 05:25:26.437 [TRACE] [ssories.AbstractHomekitAccessoryImpl] - getKeyFromMapping: characteristic PositionState, state 100, mapping {DECREASING=DECREASING, INCREASING=INCREASING, STOPPED=STOPPED}
2022-01-11 05:28:05.762 [TRACE] [ssories.AbstractHomekitAccessoryImpl] - getKeyFromMapping: characteristic CurrentHeatingCoolingMode, state AUTO, mapping {OFF=Off, HEAT=HeatOn, COOL=CoolOn}
2022-01-11 05:28:05.762 [TRACE] [ssories.AbstractHomekitAccessoryImpl] - getKeyFromMapping: characteristic CurrentHeatingCoolingMode, state AUTO, mapping {OFF=Off, HEAT=HeatOn, COOL=CoolOn}
2022-01-11 05:28:05.762 [WARN ] [ssories.AbstractHomekitAccessoryImpl] - Wrong value AUTO for CurrentHeatingCoolingMode characteristic of the item vi_th_Dach_Bad_TCurrentThermostatMode. Expected one of following [Off, HeatOn, CoolOn]. Returning OFF.
2022-01-11 05:28:05.762 [TRACE] [ssories.AbstractHomekitAccessoryImpl] - getKeyFromMapping: characteristic TargetHeatingCoolingMode, state AUTO, mapping {OFF=Off, HEAT=HeatOn, COOL=CoolOn, AUTO=AUTO}
2022-01-11 05:28:05.762 [WARN ] [ssories.AbstractHomekitAccessoryImpl] - Wrong value AUTO for CurrentHeatingCoolingMode characteristic of the item vi_th_Dach_Bad_TCurrentThermostatMode. Expected one of following [Off, HeatOn, CoolOn]. Returning OFF.
2022-01-11 05:28:05.765 [TRACE] [ssories.AbstractHomekitAccessoryImpl] - getKeyFromMapping: characteristic TargetHeatingCoolingMode, state AUTO, mapping {OFF=Off, HEAT=HeatOn, COOL=CoolOn, AUTO=AUTO}
2022-01-11 05:28:05.925 [TRACE] [penhab.io.homekit.internal.Debouncer] - stop debouncer
2022-01-11 05:28:07.960 [TRACE] [kit.internal.HomekitAccessoryUpdater] - Removing existing subscription for it_wd_Dach_Regal (Type=RollershutterItem, State=100, Label=Dachfenster Regal, Category=window) / CurrentPosition
2022-01-11 05:28:07.960 [TRACE] [nhab.io.homekit.internal.HomekitImpl] - start HomeKit bridge
2022-01-11 05:28:07.961 [TRACE] [nhab.io.homekit.internal.HomekitImpl] - create HomeKit server with dedicated mDNS server
2022-01-11 05:28:07.961 [TRACE] [kit.internal.HomekitAccessoryUpdater] - Removing existing subscription for it_wd_Dach_Regal (Type=RollershutterItem, State=100, Label=Dachfenster Regal, Category=window) / TargetPosition
2022-01-11 05:28:07.961 [TRACE] [kit.internal.HomekitAccessoryUpdater] - Removing existing subscription for it_wd_Dach_Regal (Type=RollershutterItem, State=100, Label=Dachfenster Regal, Category=window) / PositionState
2022-01-11 05:28:07.961 [TRACE] [nhab.io.homekit.internal.HomekitImpl] - start HomeKit bridge
2022-01-11 05:28:07.961 [TRACE] [nhab.io.homekit.internal.HomekitImpl] - create HomeKit server with dedicated mDNS server
4 Likes

Wow. good troubleshooting and insights.
yes, “restarting homekit bridge on network interface changes.” was introduced in OH3.2.0. just to confirm, it does work correct but not always, right?

from your log i had the impression that 2 instances of homekit binding were running. could you check it?
maybe the restart on network changes not always stop the previous bridge correctly or if network change happens too fast, it starts 2 instances. i will check this

Yes, that’s what it looks like. I can find several restarts in the log file, which were not noticable for the user.

I’m not sure. I don’t think two instances were running before the event. My assumption is, that “onChanged” was called upon loss of the network interface, and before it had completed (almost 3 minutes form the first “restarting” log message to the first “stop HomeKit bridge” log message!), it was called a second time by another network interface change (about 1:30 min after the first one). Does that make sense?

If you are interested, I can upload the complete log files to a OneDrive share and pm you the link. Just let me know.

right, 2 changes on network interface in a “short” time could explain also the room reset in home app.

while first thread already done with configuration parsing and has started the bridge, the second thread could modify the list of accessories.

need to check the code and make sure that list of accessories is untouched.

@pzcn fix submitted

please test once it gets approved and merged.

1 Like

Thanks, very much appreciated. I will test.

I will also leave a comment on github. Some addtional observations/thoughts, but this goes beyond the discussion here.

@yfre I’m now testing your code changes. Just added some addtional logging for the added interfaces to find out when onChanged is actually triggered, as this has looked somewhat random so far.

The binding now starts directly with the IPv4 address as configured in the settings, not with an IPv6 address. I’ve seen that in the old code it looks like “networkInterface” was not initialized correctly.

I also made a few additional changes to jmdns within the last two weeks, and things are getting more robust now. Turned out that there are several reports on jmdns having issues handling IOExceptions (not only with openHAB/Homekit binding). I hope I’ll find a way to mitigate what’s relevant for the issues discussed here.

1 Like

thank you Christian! jmdns fixes would be great. have you committed them already to jmdns repo?

You can find what I currently work with here:
https://github.com/Sail-Away/jmdns/tree/work

Need further testing, because the state machine is complex, and the side effects of what I am doing are hard to predict.

Additionally needs some clean up, as I forced it to compile with JDK 17, and I also changed the version.

1 Like

Any updates on this problem? I currently have my homekit.json permissions set for read-only to prevent it, but this is not sustainable.