Homematic Binding: not all devices read from CCU

It’s in the Thing configuration of the bridge. You need to tick “show advanced”. It’s at the very bottom.

Thanks.

Your hint (increasing the buffer size from 2048 kb to 4096) unfortunately did not change the picture for me.

Some observations:

  • Still, independent from the buffer size-setting, a partially large share (> 50% of all homematic devices) does not manage to go online. This also seems pretty reproducibly.
  • When I was at openHAB 3.2, the number of homematic devices that did not manage to go online was either much lower, or 0, so at least for me it seems as if something has changed with openHAB 3.3. Needless to say that this is now a pretty big issue.
  • Sometimes, some time after the reboot, some (but not all) of the homematic devices still manage to go online: 22:55:04.045 [INFO ] [hab.event.ThingStatusInfoChangedEvent] - Thing 'homematic:HmIP-PDT:MEQ0227988:000DD709B03918' changed from OFFLINE (CONFIGURATION_ERROR): Device with address '000DD709B03918' not found on gateway 'MEQ0227988' to ONLINE. So the information “not found on gateway” is clearly misleading, as it seems more like a connection problem.
  • To make sure that it’s not a problem with my CCU2, I did a complete factory reset / restore from backup (though no change).

Is there someone who can comment on that?

As far as I rember there have not been any relevant changes in the Homematic binding between 3.2 and 3.3 especiall regarding connection problems.
Maybe some timings have changed with OH 3.3 (I did not had the time to test this release in my environment).
Another possible cause is the CCU. Maybe it takes longer then expected until the HmIP service is ready. You can try to increase the “Callback req. timeout” value in the advanced settings of the bridge configuration.
If this does not help, I will try to figure out what’s wrong with the help of the log file.

First of all: Thanks a lot!

And good news first: After having performed 20+ restarts, I managed to reproduce the problem at least a bit, though did not manage to solve it.

You’re right, I kept having the same problems also in previous versions. Now that I think about it, the more homematic devices I kept adding, the more persistent the problem became.

So here’s what I found out (funny coincidence: I mixed up the parameters “Timeout” with “Callback Reg. Timeout” at first, so I did some tests with both):

  • Increasing the “Callback Reg. Timeout” even up to 600 (to be absolutely sure) does not solve the problem.
  • Increasing the “Timeout” (also even up to 600, also to be absolutely sure) also does not solve the problem.

Here’s something funny, which makes me think that the problem is timeout-related, but maybe not related to the two parameters I kept testing:

  • While the problem of some homematic things remaining in “offline” always persists if I either reboot the Raspi or just the openHAB container
  • … the problem never persists if I just disable the CCU2-thing in openHAB and enable it again, to trigger openHAB to re-initialize all homematic things.

I suspect that the problem only occurs with HmIP devices. The cause of the problems is the behaviour of the HmIP service in the CCU and some devices. After a restart of the CCU it takes some time until the HmIP service is ready (for this the Callback Reg. Timeout setting).
But depending on the device it can take about 5 min. until the device is fully available in the CCU itself. This meand that they will be shown as offline in OH. If later one of these devices sends an event (like a changed temperature) its state should change to online.

So, after a restart of the CCU you should wait some minutes (about 5) until you restart OH or disable and enable the CCU2 thing.

I have some ideas to solve this problem in OH but this would mean a reimplementation of the connect handling and possibly some breaking changes.

1 Like

I could spend some time today, trying to figure out how to reproduce the various things I observed. From what I can see, fixing this maybe isn’t that complicated after all?

Here are my reproducible findings:

  1. Rebooting the entire raspi: ~50% of all my HmIP-devices always remain offline.
  2. Restarting just the openHAB container: ~50% of all my HmIP-devices always remain offline.
  3. Doing the above with / without rebooting the CCU shortly before: No impact at all!
  4. Playing around with the CCU parameters “Callback req. timeout” or “Timeout” in openHAB: No impact at all!
  5. Disabling the CCU-thing and enabling it again: Always solves the problem and results in all HmIP-devices coming online.

Doesn’t #5 mean that it’s “just” a timing issue?

One way of fixing this could be to have a rule that, x minutes after openHAB has started, disables the CCU and re-enables it again. But then again, this would be a rather “dirty” solution?

Yes, this fits with what I also suspect

I would call it a “temporary bypass”. Maybe it is possible to execute an action like this automatically by the binding if it detects that there are offline devices.

Hi there,

after my servers mainboard failed, I moved my OpenHab-installation to a Raspi 4 with a docker setup.
I started out with Openhab 3.2 but upgraded to 3.3 recently.

After the upgrade I also noticed the described issues with missing Homematic items.
My homematic is also running on a Raspi with an older OCCU - Version.
Changing the timeouts and buffer sizes did not improve anything.

@MHerbst mentioned that the issue might be restricted to HM-IP only.
This is not the case for me, since I do not own a single HM-IP device, but the issues apply to my “old” devices as well.
The only “device” which comes online everytime is “homematic:GATEWAY-EXTRAS”.
Others might come online later, and maybe not.

Not sure if it helps, but I stumbled on the following log-entry after re-enabling the binding for the n-th time: HomematicDeviceDiscoveryService] - Failed to set Homematic controller in install mode

java.io.IOException: Received no data from the Homematic gateway

	at org.openhab.binding.homematic.internal.communicator.client.XmlRpcClient.send(XmlRpcClient.java:114) ~[?:?]

	at org.openhab.binding.homematic.internal.communicator.client.XmlRpcClient.sendMessage(XmlRpcClient.java:73) ~[?:?]

	at org.openhab.binding.homematic.internal.communicator.client.RpcClient.setInstallMode(RpcClient.java:438) ~[?:?]

	at org.openhab.binding.homematic.internal.communicator.virtual.InstallModeVirtualDatapoint.handleCommand(InstallModeVirtualDatapoint.java:62) ~[?:?]

Good to know I’m not making the problem up, since you appear to have exactly the same setting (except HM instead of HMIP in my case).

What always resolves the problem, though being quite annoying: Disabling and re-enabling the CCU thing. Annoying, but works reliably.

This Exception indicates a network problem or a communication problem with the older OCCU version (which one is it?),

Please make sure that all necessary ports are open in the firewall(s) and that they are correctly mapped in docker configuration. Please check whether both Raspis have static IP addresses.

If the binding is running in a docker container, you will probably have to set the “Callback network addresse” in the configuration of the bridge (must be set to the IP address of the Raspi running docker).

If this does not help, please enable DEBUG log mode for the binding, restart OH (or at least the HM binding) and attach the log file.

After trying multiple approaches, including a rollback to version 3.2 my setup is finally up and running again with V3.3.
Unfortunately I made so many changes to the system, that I cannot trace back, what exact changes ultimetaly led to success.

The hint with “Callback network address” was definitely a good one.
Somehow I forgot to adapt this entry after the initial re-installation. Might have fixed the issue.

But I am also pretty sure that OH ran into some timeouts.
In the meanwhile I blame this onto a degrading SD-Card of the raspi which hosts OCCU.
The OCCU-UI got pretty slow after some minutes. I did not realize this at first, because I dont run anything on OCCU except the devices. I think that happend to the API as well.
After switching to a new SD-Card, everything worked again.

Maybe it was a combination of multiple issues.

Thank you for the hints & the discussion,

@MHerbst, quick follow-up question on this: I’m running openHAB on Raspi/docker with CCU2 for almost 1,5 years now, and I do not recall that I had this problem from the beginning. You sure it’s not something else? Anything else I could do/log/test to help resolving this, to not go for the “temporary bypass” or wait for someone to re-implement the connect handling?

Edit: Disabling things from rules is not as easy as I thought (I hit this roadblock as well).

Edit 2: After installing the ECMA Script 2021, this worked fine. Only “problems” is a slight delay / log entry (Failed to retrieve script script dependency listener from engine bindings. Script dependency tracking will be disabled." (also described here).

I yesterday faced this issue for a Homematic (not IP) device. Not sure, if more items where affected. It’s a rain sensor, so I will wait until the next rain starts and see, if it’s resolved after I needed to restart my system for another purpose.
The device was just not updated in OH, but I could see the correct actual status in the frontend of raspberrymatic on my ccu3.
Unfortunately it’s quite hard to get aware of devices, that don’t get updated anymore, because they don’t get updated anymore :slight_smile:

These type of problems are really hard to trace and solve. It seems to depend on the CCU version (CCU2 seems to cause more problems than the faster CCU3), the network configuration and stability (on Raspis you should always use an ethernet cable and not WiFi) and also on the device types that are used (it seems that HmIP devices are causing more trouble than the older devices).

For example, I am running Raspberrymatic and openHAB on two different Raspi 3 (no docker or VM) without any problems for weeks.

Let me explain how the communication between OH and the CCU works: when the binding is started, it asks the CCU for information about all devices and requests the current values of all data points. Then it registers itself as a receiver for events (exactly speaking it has to register a maximum of three receivers: for HM-RF, HmIP and Groups). Every time a device value changes, the CCU sends an event to the registered OH installation(s).

If the network is stable, this can work perfectly for weeks. If the connection is broken, the binding tries to reconnect and register itself as an event receiver. But if the previous event registration was not removed correctly from the CCU, this can cause problems on the CCU side (and the binding no longer receives values). This is at least what I detected. During my connect handling tests, I even ended up with a CCU that I had to restart via ssh because the web interface was no longer working correctly.

If you detect such communication issues again, you can try to ssh into the CCU, go to directory /var/log. There are two log files that can help in this situation: hmserver.log and messages.

1 Like

Hello,
I am new to openhab and homematic and did a test setup at home before starting to implement complete integration. And I have exactly the same issue as TS reports. In my test setup I have just about 10 devices and one system variable. Only 2 of them plus variable are configured in openhab. When I setup binding all works just fine. Also it continue working for days until I reboot openhab. I guess it would never work if some ports are closed, but from other side what I have observed, that during openhab tries to activate binding and devices, the CCU3 works very bad. By bad I mean I can log-in until this point all is fine, but when I try to check device status or any other page it is just stuck (CCU3 UI). It shows some progress like rotating wheel or popup and never goes back.
I did not deep inside the issue yet, but I have feeling, that openhab does some API calls on CCU3, that makes CCU3 stuck. At the same time top command on CCU3 shows, that CPU is almost 0% usage and a lot of free RAM. This could be a sign, that CCU3 is also waiting for something. May be does a some scan or so. But also could be some stuck in communication with openhab. Will try to debug if I decide to stay with openhab (which is most probable…).

With best regards,
Mike.

Small update. I can reproduce it 100%. As soon as I start openhab after about 30 seconds, CPU usage drops to 0% on CCU3 and on UI I can not see duty-cycle green bar any more. So means some RF related part is stuck. And from this point I can not open any page. I have second CCU3 with fresh installed Raspberrymaticn (current one is a native CCU3 with the latest firmware) and will try to reconfigure it with openhab and test different scenarios like add gateway without any device configured in CCU3. Add device to CCU3 but not to openhab, Add device to openhab. And test at which point it stuck. Theretically also possible, that my current CCU3 has some garbage, but as not only me who has an issue I would guess it should be some genric reason.
With best regards,
Mike.

UPDATE:
Indeed CCU3 is stuck. I want to remove all extensions before resetting it first.

  683     1 root     S    23588   2%   0% /bin/rfd -f /etc/config/rfd.conf -l 5
Oct 25 15:35:48 ccu3-webui daemon.info node-red[1578]: [ccu-connection:localhost] Interfaces: ReGaHSS, BidCos-RF, HmIP-RF, VirtualDevices
Oct 25 15:35:48 ccu3-webui daemon.info node-red[1578]: [ccu-connection:localhost] Interface ReGaHSS connected
        at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:686) ~[HMIPServer.jar:?]
        at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:488) ~[HMIPServer.jar:?]
        at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:884) ~[HMIPServer.jar:?]
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) ~[HMIPServer.jar:?]
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107) ~[HMIPServer.jar:?]
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55) ~[HMIPServer.jar:?]
        at de.eq3.cbcs.legacy.communication.rpc.internal.transport.http.HttpTransport.sendRequest(HttpTransport.java:106) ~[HMIPServer.jar:?]
        at de.eq3.cbcs.legacy.communication.rpc.internal.rpc.RpcClient.sendRequest(RpcClient.java:94) ~[HMIPServer.jar:?]
        at de.eq3.cbcs.legacy.communication.rpc.internal.rpc.RpcClient.invoke(RpcClient.java:82) ~[HMIPServer.jar:?]
        at com.sun.proxy.$Proxy41.newDevices(Unknown Source) ~[?:?]
        at de.eq3.cbcs.legacy.bidcos.rpc.internal.LegacyBackendClient.newDevices(LegacyBackendClient.java:157) ~[HMIPServer.jar:?]
        at de.eq3.cbcs.legacy.bidcos.rpc.internal.DeviceUtil.synchronizedBackendDevices(DeviceUtil.java:232) ~[HMIPServer.jar:?]
        at de.eq3.cbcs.legacy.bidcos.rpc.internal.InterfaceInitializer.handle(InterfaceInitializer.java:109) ~[HMIPServer.jar:?]
        at de.eq3.cbcs.legacy.bidcos.rpc.internal.InterfaceInitializer.handle(InterfaceInitializer.java:26) ~[HMIPServer.jar:?]
        at io.vertx.core.impl.AbstractContext.dispatch(AbstractContext.java:100) ~[HMIPServer.jar:?]
        at io.vertx.core.impl.WorkerContext.lambda$emit$0(WorkerContext.java:59) ~[HMIPServer.jar:?]
        at io.vertx.core.impl.WorkerContext$$Lambda$91/24996555.handle(Unknown Source) ~[?:?]
        at io.vertx.core.impl.WorkerContext.lambda$execute$2(WorkerContext.java:104) ~[HMIPServer.jar:?]
        at io.vertx.core.impl.WorkerContext$$Lambda$92/13535296.run(Unknown Source) ~[?:?]
        at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76) ~[HMIPServer.jar:?]
        at io.vertx.core.impl.TaskQueue$$Lambda$29/7182616.run(Unknown Source) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_202]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_202]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[HMIPServer.jar:?]
        at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_202]
2022-10-25 15:37:29,223 io.vertx.core.impl.BlockedThreadChecker WARN  [vertx-blocked-thread-checker] Thread Thread[vert.x-worker-thread-0,5,main] has been blocked for 198574 ms, time limit is 60000 m
s
io.vertx.core.VertxException: Thread blocked
        at java.net.SocketInputStream.socketRead0(Native Method) ~[?:1.8.0_202]

The Homematic binding does nothing special to block the CCU. This can only happen if you have multiple connections to the CCU open at the same time. Or if you start/stop the binding very often and the binding was not able to unregister from the CCU.

I am using Raspberrymatic with > 10 devices (all 3 types) on an Raspi 3 together with OH 3.3 without any problems. Only when I am debugging the binding the CCU can be blocked after some time with lots of restarts.

Do you have node-red running on your CCU? Your log message shows “node-red”. Maybe this causes the problem together with OH.
Also, please tell us a bit more about your configuration (Hardware, network types, OH version).

Hello Martin,

Thank you for your help!

tl;dr; Problem solved, and I am kind of noob :slight_smile:

Any way could be some of reported here issues are the same by nature, so I will describe my steps to find the issue and how I have solved it.

First, my configuration:
CCU3 original from Homematic
Intel Notebook with 2 cores and 4GB RAM running Linux Ubuntu 22.04.1 LTS on NVMe
OH latest 3.3.0

What I have tried:
CCU3 flashed with Raspberrymatic (latest) and several devices connected to the CCU3.
CCU3 with Raspberrymatic after reset to defaults and no devices.
CCU3 with ccu3-3.65.8
CCU3 with ccu3-3.65.11
Additionally, I have re-installed OH with completely wiping the configuration.

In all above cases, I had the following issue:
I can add CCU3 and any other thing. It works fine and instant.

As soon as I restart OH, CCU3 is stuck. It does not matter which version of CCU3 software I use. OH is not able to connect to the CCU3 and reports error.

Next I decided to dive inside, so I have checked out OH project, configured remote debug and start Java debugger to see what is going on. All this I did on my Windows PC and, as it must be, I could not reproduce the issue. As soon as OH is up, it almost instantly connects to CCU3 with all defined things. First idea was, that my PC by many orders faster than running notebook with Linux and this could be an issue, but then I have realized that I have forgot the key point - firewall. All the time I thought and checked the firewall on CCU3, but not on my Linux server. If we check documentation for Homematic binding we can find this:

And FROM the gateway to the binding:
XML-RPC: 9125
BIN-RPC: 9126

Dunno why it works to setup homematic at first time, but it is definitely an issue when OH is starting and initializes bridges. Also I would say it is kind of bug in the Homatic CCU3. Even if CCU3 could not respond to the client, it should not kill the system. After opening mentioned above ports on my Linux PC now all works like a charm! Moreover, when I check open ports on my Linux server I see, that 9125 is opened and in use. I feel a shame, but who never makes a mistake…

So thank you all for the collaboration in this topic and I do really hope my finding could help someone.

With best regards,
Mike

1 Like

Hello everyone!
While searching for a solution to a similar problem with my setup, I stumbled across this thread. It appears I have managed to fix it, so I’d like to share the details of my setup and my solution:

TL;DR Entering the Address of my openHAB as Callback Network Address in the Thing config of the CCU seems to have solved the problem.

I have openhab running as a docker container inside a Proxmox LXC container, which is working fine.
For the Homematic Stuff I use RaspberryMatic running in a Proxmox VM, with a HmIP-RFUSB.

Both systems work well independently, but when I enable the Homematic Binding in openHAB and add Things, seemingly random error messages appear in the openHAB Things and the WebUI of the CCU (RaspberryMatic) freezes on “Loading…” when trying to login.

After disabling the Binding in OH and rebooting the CCU VM, I can login into it again normally.
Sometimes the defined Things also show up as “Online” in openHAB but don’t get any Item updates, and controlling the Items also doesn’t work.

OH and the CCU are both on the same subnet, though OH also has an IP configured on another subnet (which I use for MQTT devices). It seems to be the cause of the problem that when the Thing configuration parameter “Callback Network Address” of the CCU is left empty, it probably chooses the wrong address which isn’t on the same subnet as the CCU - hence the communication between the two won’t work properly (or not at all). I do not know why that causes the WebUI of the CCU to freeze entirely.

Hope this maybe helps somebody!

Ok tha was my issue, I am running OH inside a docker container with networkmode “host”.
That causes that the binding does not know what the callbackHost parameter is with all that network devices I guess.
I put in the openhab url with port and now I have the missing 5 homematic devices in my Inbox =)
Ty @mva_one you post helped a lot!