mDNS bug?

Tags: #<Tag:0x00007f616f094548>

I’m running OH 2.5.4 IDE environment and developing the Shelly binding. Works fine so far, but when discovering a larger number of Shelly devices using mDNS the scan don’t get them all. I need to re-run the discovery up to 5 times to get the complete device list. Next time re-discovery all devices it could be that you get them all with a single run. This also happens with a 2.5.1 core, latest addon bundle and my binding SNAPSHOT.

On the code level I see that the discovery comes in, but the IP address is empty. Those requests are ignored, also to prevent NPEs, but at the end the list is incomplete. In this case the complete list is initially reported with empty IP address. A very short time later most of them are reported with their IP address. On Linux all devices are discovered correctly and the list is always complete.

The mDNS client shows those exceptions:

07:46:23.553 [/192.168.6.152)] WARN  javax.jmdns.impl.DNSIncoming:266 - DNSIncoming() dump dns[response,192.168.6.76:5353, length=930, id=0x0, flags=0x8400:r:aa, answers=5
answers:
	[Pointer@1106519438 type: TYPE_PTR index 12, class: CLASS_IN index 1, name: _services._dns-sd._udp.local. ttl: '120/120' alias: '_http._tcp.']
	[Pointer@1328796608 type: TYPE_PTR index 12, class: CLASS_IN index 1, name: _http._tcp.local. ttl: '120/120' alias: 'shellyem3-DC4F227649A8._http._tcp.local.']
	[Service@1259097854 type: TYPE_SRV index 33, class: CLASS_IN index 1-unique, name: shellyem3-DC4F227649A8._http._tcp.local. ttl: '120/120' server: 'shellyem3-DC4F227649A8.local.:80']
	[Text@235854331 type: TYPE_TXT index 16, class: CLASS_IN index 1-unique, name: shellyem3-DC4F227649A8._http._tcp.local. ttl: '120/120' text: 'id=shellyem3-DC4...']
	[IPv4Address@1849050593 type: TYPE_A index 1, class: CLASS_IN index 1-unique, name: shellyem3-DC4F227649A8.local. ttl: '120/120' address: '192.168.6.76']]
	answer:        [Pointer@1106519438 type: TYPE_PTR index 12, class: CLASS_IN index 1, name: _services._dns-sd._udp.local. ttl: '120/120' alias: '_http._tcp.']
	answer:        [Pointer@1328796608 type: TYPE_PTR index 12, class: CLASS_IN index 1, name: _http._tcp.local. ttl: '120/120' alias: 'shellyem3-DC4F227649A8._http._tcp.local.']
	answer:        [Service@1259097854 type: TYPE_SRV index 33, class: CLASS_IN index 1-unique, name: shellyem3-DC4F227649A8._http._tcp.local. ttl: '120/120' server: 'shellyem3-DC4F227649A8.local.:80']
	answer:        [Text@235854331 type: TYPE_TXT index 16, class: CLASS_IN index 1-unique, name: shellyem3-DC4F227649A8._http._tcp.local. ttl: '120/120' text: 'id=shellyem3-DC4...']
	answer:        [IPv4Address@1849050593 type: TYPE_A index 1, class: CLASS_IN index 1-unique, name: shellyem3-DC4F227649A8.local. ttl: '120/120' address: '192.168.6.76']
   0: 0000840000000006 00000000095f7365 727669636573075f 646e732d7364045f     ........ ....._se rvices._ dns-sd._
  20: 756470056c6f6361 6c00000c00010000 0078000c055f6874 7470045f74637000     udp.loca l....... .x..._ht tp._tcp.
  40: 055f68747470045f 746370056c6f6361 6c00000c00010000 0078002916736865     ._http._ tcp.loca l....... .x.).she
  60: 6c6c79656d332d44 4334463232373634 394138055f687474 70045f746370056c     llyem3-D C4F22764 9A8._htt p._tcp.l
  80: 6f63616c00167368 656c6c79656d332d 4443344632323736 34394138055f6874     ocal..sh ellyem3- DC4F2276 49A8._ht
  a0: 7470045f74637005 6c6f63616c000021 8001000000780024 0000000000501673     tp._tcp. local..! .....x.$ .....P.s
  c0: 68656c6c79656d33 2d44433446323237 3634394138056c6f 63616c0016736865     hellyem3 -DC4F227 649A8.lo cal..she
  e0: 6c6c79656d332d44 4334463232373634 394138055f687474 70045f746370056c     llyem3-D C4F22764 9A8._htt p._tcp.l
 100: 6f63616c00001080 0100000078004d19 69643d7368656c6c 79656d332d444334     ocal.... ....x.M. id=shell yem3-DC4
 120: 4632323736343941 382566775f69643d 3230323030333039 2d3130343631392f     F227649A 8%fw_id= 20200309 -104619/
 140: 76312e362e304034 333035366435380c 617263683d657370 3832363616736865     v1.6.0@4 3056d58. arch=esp 8266.she
 160: 6c6c79656d332d44 4334463232373634 394138056c6f6361 6c00000180010000     llyem3-D C4F22764 9A8.loca l.......
 180: 00780004c0a8064c 167368656c6c7965 6d332d4443344632 3237363439413805     .x.....L .shellye m3-DC4F2 27649A8.
 1a0: 6c6f63616c00002f 8001000000780021 167368656c6c7965 6d332d4443344632     local../ .....x.! .shellye m3-DC4F2
 1c0: 3237363439413805 6c6f63616c000001 4000008400000000 0600000000095f73     27649A8. local... @....... ......_s
 1e0: 6572766963657307 5f646e732d736404 5f756470056c6f63 616c00000c000100     ervices. _dns-sd. _udp.loc al......
 200: 000078000c055f68 747470045f746370 00055f6874747004 5f746370056c6f63     ..x..._h ttp._tcp .._http. _tcp.loc
 220: 616c00000c000100 0000780029167368 656c6c79656d332d 4443344632323736     al...... ..x.).sh ellyem3- DC4F2276
 240: 34394138055f6874 7470045f74637005 6c6f63616c001673 68656c6c79656d33     49A8._ht tp._tcp. local..s hellyem3
 260: 2d44433446323237 3634394138055f68 747470045f746370 056c6f63616c0000     -DC4F227 649A8._h ttp._tcp .local..
 280: 2180010000007800 2400000000005016 7368656c6c79656d 332d444334463232     !.....x. $.....P. shellyem 3-DC4F22
 2a0: 373634394138056c 6f63616c00167368 656c6c79656d332d 4443344632323736     7649A8.l ocal..sh ellyem3- DC4F2276
 2c0: 34394138055f6874 7470045f74637005 6c6f63616c000010 800100000078004d     49A8._ht tp._tcp. local... .....x.M
 2e0: 1969643d7368656c 6c79656d332d4443 3446323237363439 41382566775f6964     .id=shel lyem3-DC 4F227649 A8%fw_id
 300: 3d32303230303330 392d313034363139 2f76312e362e3040 3433303536643538     =2020030 9-104619 /v1.6.0@ 43056d58
 320: 0c617263683d6573 7038323636167368 656c6c79656d332d 4443344632323736     .arch=es p8266.sh ellyem3- DC4F2276
 340: 34394138056c6f63 616c000001800100 0000780004c0a806 4c167368656c6c79     49A8.loc al...... ..x..... L.shelly
 360: 656d332d44433446 3232373634394138 056c6f63616c0000 2f80010000007800     em3-DC4F 227649A8 .local.. /.....x.
 380: 21167368656c6c79 656d332d44433446 3232373634394138 056c6f63616c0000     !.shelly em3-DC4F 227649A8 .local..
 3a0: 0140                                                                    .@

 exception 
java.io.IOException: Received a message with the wrong length.
	at javax.jmdns.impl.DNSIncoming.<init>(DNSIncoming.java:263)
	at javax.jmdns.impl.SocketListener.run(SocketListener.java:50)
07:46:23.554 [/192.168.6.152)] WARN  javax.jmdns.impl.SocketListener:69 - SocketListener(JmDNS-/192.168.6.152).run() exception 
java.io.IOException: DNSIncoming corrupted message
	at javax.jmdns.impl.DNSIncoming.<init>(DNSIncoming.java:268)
	at javax.jmdns.impl.SocketListener.run(SocketListener.java:50)
Caused by: java.io.IOException: Received a message with the wrong length.
	at javax.jmdns.impl.DNSIncoming.<init>(DNSIncoming.java:263)
	... 1 common frames omitted

I can’t say if those are “normal” to related to the problem. I see them frequently in the log. The fact that Linux is able to discover the complete list while OH shows the issue tends to the assumption that there is a bug in the mDNS implementation.

This is my implementation of the DiscoveryParticipant: https://github.com/markus7017/openhab-addons/blob/shelly_snapshot/bundles/org.openhab.binding.shelly/src/main/java/org/openhab/binding/shelly/internal/discovery/ShellyDiscoveryParticipant.java

Is there a way to update to the latest jmdns implementation and do a cross-check. Does OH 3.0 has a newer jmdns package?

Any other idea?

Wouldn‘t it be better to report this at
https://github.com/jmdns/jmdns ?

Both OH 2.5 and OH 3.0 use JmDNS 3.5.5 which is the latest release. It doesn’t seem like any additional fixes were merged into the master branch since that release.

Maybe, I wanted to get the view of the community first. Maybe someone else has an opinion or experieced a problem like that with different device types?