Homematic Binding: not all devices read from CCU

Rayk · December 31, 2021, 1:14pm

Hello,

my name is Rayk. I am a software developer and have been involved in home automation for 15 years now.
Since I lack the time to keep my self-developed home automation system up to date, I am currently looking at openHAB 3.2.
I mainly use devices of the Homematic system. Currently I have the following problems using the Homematic binding.

I have a CCU2 with about 70 Homematic and Homematic IP devices in use.
After configuring, a few devices are missing from the ‘INBOX’. Even with a rescan the devices are not found.
When debugging the binding I found that the devices are present in the device list sent from the CCU to the binding.
For all devices the binding reads the device description from the CCU individually. The whole process obviously takes so long that the reading of the device description is aborted. So I changed the parameter ‘Install Mode Duration’ in the binding configuration from 60s (default) to 120s. After that all devices are found during a new scan. In my opinion the parameter should only be used for teaching the devices to the CCU, but not for reading the devices from the CCU into openHAB.

The second (bigger problem) for which I can’t find a solution is the following:
After restarting openHAB, the Things (Homematic devices), which could only be created by setting the ‘Install Mode Duration’ parameter high, show the error “CONFIG: ERROR” (Device with address ‘ABC’ not found on gateway ‘xyz’) and show no reaction in openHAB.
After a new, manually triggered scan, the Things are ready for use again.

What can I do to work around this bug? Running a manual scan after every reboot is absolutely not an option.

Greetings,
Rayk

gitmaster2013 · March 10, 2022, 6:11am

I have the same problem since last week. Increasing the discover time helped me at first.
Now some devices are missing on every restart of openHAB!

Cplant · March 17, 2022, 7:33pm

I’m having a similar issue, not sure for how long, but I remember not having had this when I started with OH 3.1.0: After rebooting OH, not all things manage to get online with the Device with address 'xxx' not found on gateway 'yyy' error message

2022-03-17 20:16:38.328 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HMIP-SWDO:MEQ0227988:0000DD898C64A9' changed from INITIALIZING to ONLINE
2022-03-17 20:16:38.457 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Thermostat_Garten_Humidity' changed from 69 to 69 %
2022-03-17 20:16:39.881 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-SMI:MEQ0227988:00091D899E19B2' changed from INITIALIZING to ONLINE
2022-03-17 20:16:40.039 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Rolladen_Kueche_Rechts_Stop' changed from OFF to NULL
2022-03-17 20:16:41.408 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-BSL:MEQ0227988:001A5BE9A65495' changed from INITIALIZING to ONLINE
2022-03-17 20:16:42.884 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-BROLL:MEQ0227988:00111A49A8DEF0' changed from INITIALIZING to OFFLINE (CONFIGURATION_ERROR): Device with address '00111A49A8DEF0' not found on gateway 'MEQ0227988'
2022-03-17 20:16:42.888 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-BROLL:MEQ0227988:00111A49A8DDCC' changed from INITIALIZING to OFFLINE (CONFIGURATION_ERROR): Device with address '00111A49A8DDCC' not found on gateway 'MEQ0227988'
2022-03-17 20:16:42.891 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-BROLL:MEQ0227988:00111A49A8DDD0' changed from INITIALIZING to OFFLINE (CONFIGURATION_ERROR): Device with address '00111A49A8DDD0' not found on gateway 'MEQ0227988'

This is especially odd because most devices manage to get online.

Does anyone have an idea?

MHerbst · March 17, 2022, 7:43pm

Do you have CCU2 or CCU3? It seems that this problem only occurs if a CCU2 is used.

Can you enable TRACE log mode, restart OH and post the relevant parts from openHAB. Maybe it provides some more information. Maybe it is possible to raise an OH internal timeout.

Cplant · May 27, 2022, 8:07am

Thanks! I’m currently trying out something different, since I have the impression that this problem is somehow related to various other connection-unstability-problems I keep having (also see here).

Since I’m running everything on Raspi/Docker, I could re-install and restore my entire system this morning in <30 minutes, so I’m curious to see whether all the connection problems are now (hopefully) solved.

Cplant · June 28, 2022, 8:14pm

I’m using a CCU2.

@MHerbst, even after completely reinstalling my entire openHab (on Raspi 3 with CCU2 on Docker) the problem persists. Good news: I managed to trace/log a pretty nice example of a reboot of openHAB 3.3 that resulted in what I described above (only some things getting online properly, while some remain offline with "Device with address ‘xxx’ not found on gateway ‘yyy’):

Below the openhab.log (due to filesize-restriction on Dropbox):
https://www.dropbox.com/sh/znm4pd2h5nwt38n/AAA8J4Sl-MCc6s8E0f8xt_Uqa?dl=0

Let me know if I can help any further. Since I did a pretty standard re-install, I believe this could be something worth investigating further?

Piffer · June 28, 2022, 8:23pm

I had the same issue, after the upgrade to the latest milestone. As nothing big was mentioned in the patchnotes regarding homematic, I assumed the problem is on my end.
I tried restarting both, first CCU3 then OpenHAB, which didn’t fix the issue.
I then changed the buffersize from 2048 to 4096 and this worked for me. I assumed this is due to the number of Homematic Items I have in my setup.

I had the same error.

Cplant · June 28, 2022, 8:27pm

Thanks for the hint. Since I have a number of homematic devices (25+) I somehow also suspected a timeout/load/buffer problem.

Where can I change the buffer size?

Piffer · June 28, 2022, 8:31pm

It’s in the Thing configuration of the bridge. You need to tick “show advanced”. It’s at the very bottom.

Cplant · June 28, 2022, 9:16pm

Thanks.

Your hint (increasing the buffer size from 2048 kb to 4096) unfortunately did not change the picture for me.

Some observations:

Still, independent from the buffer size-setting, a partially large share (> 50% of all homematic devices) does not manage to go online. This also seems pretty reproducibly.
When I was at openHAB 3.2, the number of homematic devices that did not manage to go online was either much lower, or 0, so at least for me it seems as if something has changed with openHAB 3.3. Needless to say that this is now a pretty big issue.
Sometimes, some time after the reboot, some (but not all) of the homematic devices still manage to go online: 22:55:04.045 [INFO ] [hab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-PDT:MEQ0227988:000DD709B03918' changed from OFFLINE (CONFIGURATION_ERROR): Device with address '000DD709B03918' not found on gateway 'MEQ0227988' to ONLINE. So the information “not found on gateway” is clearly misleading, as it seems more like a connection problem.
To make sure that it’s not a problem with my CCU2, I did a complete factory reset / restore from backup (though no change).

Is there someone who can comment on that?

MHerbst · June 29, 2022, 5:50pm

As far as I rember there have not been any relevant changes in the Homematic binding between 3.2 and 3.3 especiall regarding connection problems.
Maybe some timings have changed with OH 3.3 (I did not had the time to test this release in my environment).
Another possible cause is the CCU. Maybe it takes longer then expected until the HmIP service is ready. You can try to increase the “Callback req. timeout” value in the advanced settings of the bridge configuration.
If this does not help, I will try to figure out what’s wrong with the help of the log file.

Cplant · June 30, 2022, 10:07am

First of all: Thanks a lot!

And good news first: After having performed 20+ restarts, I managed to reproduce the problem at least a bit, though did not manage to solve it.

You’re right, I kept having the same problems also in previous versions. Now that I think about it, the more homematic devices I kept adding, the more persistent the problem became.

So here’s what I found out (funny coincidence: I mixed up the parameters “Timeout” with “Callback Reg. Timeout” at first, so I did some tests with both):

Increasing the “Callback Reg. Timeout” even up to 600 (to be absolutely sure) does not solve the problem.
Increasing the “Timeout” (also even up to 600, also to be absolutely sure) also does not solve the problem.

Here’s something funny, which makes me think that the problem is timeout-related, but maybe not related to the two parameters I kept testing:

While the problem of some homematic things remaining in “offline” always persists if I either reboot the Raspi or just the openHAB container…
… the problem never persists if I just disable the CCU2-thing in openHAB and enable it again, to trigger openHAB to re-initialize all homematic things.

MHerbst · June 30, 2022, 5:52pm

I suspect that the problem only occurs with HmIP devices. The cause of the problems is the behaviour of the HmIP service in the CCU and some devices. After a restart of the CCU it takes some time until the HmIP service is ready (for this the Callback Reg. Timeout setting).
But depending on the device it can take about 5 min. until the device is fully available in the CCU itself. This meand that they will be shown as offline in OH. If later one of these devices sends an event (like a changed temperature) its state should change to online.

So, after a restart of the CCU you should wait some minutes (about 5) until you restart OH or disable and enable the CCU2 thing.

I have some ideas to solve this problem in OH but this would mean a reimplementation of the connect handling and possibly some breaking changes.

Cplant · July 2, 2022, 9:41pm

I could spend some time today, trying to figure out how to reproduce the various things I observed. From what I can see, fixing this maybe isn’t that complicated after all?

Here are my reproducible findings:

Rebooting the entire raspi: ~50% of all my HmIP-devices always remain offline.
Restarting just the openHAB container: ~50% of all my HmIP-devices always remain offline.
Doing the above with / without rebooting the CCU shortly before: No impact at all!
Playing around with the CCU parameters “Callback req. timeout” or “Timeout” in openHAB: No impact at all!
Disabling the CCU-thing and enabling it again: Always solves the problem and results in all HmIP-devices coming online.

Doesn’t #5 mean that it’s “just” a timing issue?

One way of fixing this could be to have a rule that, x minutes after openHAB has started, disables the CCU and re-enables it again. But then again, this would be a rather “dirty” solution?

MHerbst · July 3, 2022, 11:54am

Yes, this fits with what I also suspect

I would call it a “temporary bypass”. Maybe it is possible to execute an action like this automatically by the binding if it detects that there are offline devices.

beckerm3 · July 27, 2022, 6:08pm

Hi there,

after my servers mainboard failed, I moved my OpenHab-installation to a Raspi 4 with a docker setup.
I started out with Openhab 3.2 but upgraded to 3.3 recently.

After the upgrade I also noticed the described issues with missing Homematic items.
My homematic is also running on a Raspi with an older OCCU - Version.
Changing the timeouts and buffer sizes did not improve anything.

@MHerbst mentioned that the issue might be restricted to HM-IP only.
This is not the case for me, since I do not own a single HM-IP device, but the issues apply to my “old” devices as well.
The only “device” which comes online everytime is “homematic:GATEWAY-EXTRAS”.
Others might come online later, and maybe not.

Not sure if it helps, but I stumbled on the following log-entry after re-enabling the binding for the n-th time: HomematicDeviceDiscoveryService] - Failed to set Homematic controller in install mode

java.io.IOException: Received no data from the Homematic gateway

	at org.openhab.binding.homematic.internal.communicator.client.XmlRpcClient.send(XmlRpcClient.java:114) ~[?:?]

	at org.openhab.binding.homematic.internal.communicator.client.XmlRpcClient.sendMessage(XmlRpcClient.java:73) ~[?:?]

	at org.openhab.binding.homematic.internal.communicator.client.RpcClient.setInstallMode(RpcClient.java:438) ~[?:?]

	at org.openhab.binding.homematic.internal.communicator.virtual.InstallModeVirtualDatapoint.handleCommand(InstallModeVirtualDatapoint.java:62) ~[?:?]

Cplant · July 27, 2022, 6:26pm

Good to know I’m not making the problem up, since you appear to have exactly the same setting (except HM instead of HMIP in my case).

What always resolves the problem, though being quite annoying: Disabling and re-enabling the CCU thing. Annoying, but works reliably.

MHerbst · July 28, 2022, 5:15pm

This Exception indicates a network problem or a communication problem with the older OCCU version (which one is it?),

Please make sure that all necessary ports are open in the firewall(s) and that they are correctly mapped in docker configuration. Please check whether both Raspis have static IP addresses.

If the binding is running in a docker container, you will probably have to set the “Callback network addresse” in the configuration of the bridge (must be set to the IP address of the Raspi running docker).

If this does not help, please enable DEBUG log mode for the binding, restart OH (or at least the HM binding) and attach the log file.

beckerm3 · August 12, 2022, 9:39pm

After trying multiple approaches, including a rollback to version 3.2 my setup is finally up and running again with V3.3.
Unfortunately I made so many changes to the system, that I cannot trace back, what exact changes ultimetaly led to success.

The hint with “Callback network address” was definitely a good one.
Somehow I forgot to adapt this entry after the initial re-installation. Might have fixed the issue.

But I am also pretty sure that OH ran into some timeouts.
In the meanwhile I blame this onto a degrading SD-Card of the raspi which hosts OCCU.
The OCCU-UI got pretty slow after some minutes. I did not realize this at first, because I dont run anything on OCCU except the devices. I think that happend to the API as well.
After switching to a new SD-Card, everything worked again.

Maybe it was a combination of multiple issues.

Thank you for the hints & the discussion,

Cplant · October 22, 2022, 6:57am

@MHerbst, quick follow-up question on this: I’m running openHAB on Raspi/docker with CCU2 for almost 1,5 years now, and I do not recall that I had this problem from the beginning. You sure it’s not something else? Anything else I could do/log/test to help resolving this, to not go for the “temporary bypass” or wait for someone to re-implement the connect handling?

Edit: Disabling things from rules is not as easy as I thought (I hit this roadblock as well).

Edit 2: After installing the ECMA Script 2021, this worked fine. Only “problems” is a slight delay / log entry (Failed to retrieve script script dependency listener from engine bindings. Script dependency tracking will be disabled." (also described here).