Randomly changed or restored values of items

Netglider · December 16, 2021, 1:11pm

Hi,
I’ve been struggling with OH3 instance for several days. I’ve been using OH2 for few years, and few weeks ago as part of migration, I’ve installed OH3 on another systems and started rebuilding my setup from scratch, testing new functionalities etc. Everything was running fine, but at some point some items started to change it’s state “themselves”, randomly, eg. Switches changing state from off to on, numerical values being setpoints also changed. This is all basically logged in events.log as ‘Item xx received command XYZ’. openhab.log provides no logs at that point. Enabling DEBUG logging for everything has not helped so far, as I was not able to catch logs when it happened and they rotated. I may need to hunt for potential errors this way longer.

It’s starting to drive me crazy, as blinds going up and lights turning at night in uncontrolled manner is not fun. I have a feeling that this is restoring some state, so light always turn on, but I’ve never seen them turn off in uncontrolled manner. This not always same items, but there are items which are somehow repeating. It’s also sending states to group (eg. group of rollershutters which I only use to send command to all rollershutters, and I’m not using state of the group in any way). This can sometimes happen in series, so every few minutes. Sometimes only once. I’ve tried restarting OH3, clearing cache. But from what I’ve red about cache, it should not have anything todo with items states. When restarted OH, if this is happening in series, it stops, but returns in few hours or few days.

I also noticed that after some time of running of OH3, following error appears in the GUI:
SSE subscription failed (503 Service Unavailable): running in fallback mode
But it disappears after reboot, to be back after some time (sometimes hours sometimes days).

I have a feeling (but I may be wrong) that random state changes happen only when this SSE message is appearing.

I’m somehow running out of the ideas how to trace it. Is there a way to check where does command for an item comes from? Any hints on how to trace it down apart from hunting for DEBUG logs?

Platform information:
- Hardware: Intel x86_64 GNU/Linux, 8GB Ram
- OS: Debian GNU/Linux bookworm/sid (testing)
- Java Runtime Environment: From java -version: openjdk version “11.0.13” 2021-10-19 LTS, OpenJDK Runtime Environment Zulu11.52+13-CA (build 11.0.13+8-LTS), OpenJDK 64-Bit Server VM Zulu11.52+13-CA (build 11.0.13+8-LTS, mixed mode)
- openHAB version: 3.1.0-1

rossko57 · December 16, 2021, 1:29pm

Restoring states does not issue commands. Unless you have rules that would react to state changes by issuing commands - I’d assume that you would recognise the limited Items that might be affected by that.

GUI can issue commands, that’s its job - but should not be randomly of course.
Have you a GUI open fulltime? Which one?

Netglider · December 16, 2021, 1:46pm

Thank you very much @rossko57, so you have excluded MapDB restoration factor. Some of the items affected, are not receiving commands in the rules. I normally control OH3 from 3 points: my laptop with web browser and BaseUI opened with sitemap (often also MainUI as I’m still migrating stuff), 2 android phones with OH app and 2 in-wall raspberry screens working in a kiosk mode (BasicUI and same sitemap, as I’m using only one). Last night when this happened both laptop and phones were off the wifi, so only in walls screens were up. Those are 2 identical raspbians with Chrome in kiosk mode having sitemap displayed.
From your question I suspect that full time opened gui may be a problem?

rossko57 · December 16, 2021, 1:48pm

No idea; if you see Item commands they come from somewhere. If no GUIs were online, it wouldn’t be from there.

Netglider · December 16, 2021, 1:53pm

No, there were 2 GUIs online, the in-wall screens. So those are candidates. I’m shutting them down for testing.

Netglider · December 16, 2021, 3:04pm

So it just happened again, with those 2 in wall screens disabled, which means this is not their fault.

rossko57 · December 16, 2021, 3:26pm

You’ve made a step forward in elimination.

What “it” happened? Study of an incident can sometimes get you further than trying to make sense of lot of other stuff.

This thread may give you some ideas to look at

Netglider · December 16, 2021, 3:49pm

@rossko57 Thanks for hint, but I consider this unlikely taking into account how deeply this box is located in my network. I do Incident Response and Forensics for a living, so I consider myself less exposed, but of course never say never.

Is there any way to log where the commands comes from? I guess I would need to log api requests as well?

rossko57 · December 16, 2021, 4:16pm

Nope this is a deductive exercise. Most likely sources GUI - because that handles many Items. Rules - but it’s hard to mangle a rule to affect more than a limited set of Items. Bindings - few bindings can even emit commands, and then similar constraints apply to breadth of Items involved.
Beware Event Bus if you have it - full scope for naughtiness.

Yep, big elimination step

rlkoshak · December 16, 2021, 4:54pm

Are the two instances still running side by side or is only the OH 3 instance running?

One other suggestion you can try is to enable logging of rule events. Change the logging level of the RuleStatusInfoEvent from ERROR to INFO in log4j2.xml and events.log will start to include lines when a rule runs and stops. You might be able to correlate the Item events with a rule or two that are running at times you don’t expect.

Not directly. You can in some cases enable other logging which might help you correlate events with other actions (e.g. like enabling logging rule events above).

Did you set up Remote openHAB binding during the transition?

Are you using MQTT and/or the MQTT Event Bus?

Have you changed anything about authentication on your openHAB instance (e.g. enabled Basic Auth).

Do you expose this OH instance to the Internet, even if through a reverse proxy?

Do you use myopenhab.org?

Netglider · December 16, 2021, 10:42pm

Thanks @rlkoshak for your response.
I’m using MQTT only for reading sensors, not using mqtt event bus. Not sending any commands over MQTT. Instance is not exposed to internet, just sitting locally. I have remote access over VPN if needed. All software like router box, vpn etc I run by myself. I’m not using myopenhab.org. I’ve not touched Basic Auth on OH3, running in the default security setup - credentials created during installation.

Yes, both instances are still running (OH3 and OH2). The old OH2 is sitting on another box, also debian linux. Running for months without issues. In the same network segment. I’m still using Remote Openhab binding connecting some items from OH2 for very simple reason - I was not able to make my Zwave stick working (Aeotec Z-Stick Gen5) working in OH3 (this is the last component preventing me from being fully migrated). I’ve managed to get it working for a few minutes, allowing to discover all Z-Wave things, but then got it offline (I’ve checked few threads here on the forum but without success yet). For this reason, I’m using remote Openhab pointing to OH2 to handle z-wave items. I’ve tried to eliminate this components, and some of the items which are affected by “state changes” are not z-wave items, neither linked to remote item from OH2.

I’ll enable logging per your suggestion and see how it goes.

I’ve also found something interesting today in the logwatch from systems where OH2 is running, following log entries:

Requests with error response codes
    502 Bad Gateway
       /proxy?sitemap=uicomponents_page_932f28ec9 ... t=1639230129060: 2 Time(s)

I don’t know how things should work, but the sitemap I have in OH3 has the id ‘page_932f28ec9’. Why there are such logs on systems with OH2? I’ve removed all active controls from sitemap in OH3 today, but I was surprise not being able to find a button to delete a custom page I created - any hint how this can be done? Also box running OH2 is a default gateway in the network.

rlkoshak · December 16, 2021, 11:24pm

I assume you are not using ser2net or anything like that so that both instances of OH are trying to access the Zwave controller at the same time?

Have you deleted the Zwave Things on the OH 3 instance?

If both instances are somehow trying to talk to the same controller at the same time unexpected things can happen.

The Remote openHAB binding is also a suspect here. Does the behavior stop if you disable that?

Enable and check the logging on the OH 2 instance too and correlate the behaviors if you can.

Settings → Pages

Choose “Select” at the top. Check the box next to the Page you want to delete and then click “Delete” at the bottom. It’s unfortunately well hidden how to do that.

Netglider · December 17, 2021, 9:52am

I know that ser2net works only for one connection, as I’m using it for remote rfxcom. But no, I’m not trying to access controller from both OH - I was moving physically controller between machines plugging directly to USB port, having udev rules to map it always to same port (alias).

@rlkoshak I’ve was deleting Thing for Z-Wave controller when trying to make it work several times, but when OH3 pulled all nodes from the network I’ve not deleted them. Is there a good reason to delete everything again? The controller was not changing to online after adding. The lock files for the serial port is created, OH user is in the groups which allow access to serial port, box was restarted to ensure that username pulls group membership correctly. But still I’m missing something to make it working.

So far I’ve not disabled remote openhab binding as I relay too heavily on -z-wave devices in my OH solution, and I did not wanted days to confirm if this is the factor But I may need to eventually do it.

Thank you @rlkoshak for a hint on deleting it. I was trying using “Select” searching for such an option, but seems I’ve missed it somehow.

rossko57 · December 17, 2021, 10:00am

This is the leading suspect. “Mysterious Item state changes and commands” is exactly its job, after all. How many Items are duplicated across systems?

Netglider · December 17, 2021, 10:07am

Roughly ~40 things, giving I think around 80 items.

rlkoshak · December 17, 2021, 2:46pm

We are totally at the process of elimination stage. We can’t explain the behavior so everything is on the table until we can eliminate it as the source. We can’t make assumptions; if we could we would have found the cause already.

Deleting the Things is one more thing we can eliminate as a potential source for the behaviors.

Unfortunately for the problem you had in OH3 with the controller all I can say is it just worked right out of the gate for me. I can’t offer more than that.

It would probably be sufficient to disable the Things, but with Zwave they are discovered automatically so very easy to get back. Just make sure the Controller Thing has the same UID and all the discovered Things as well as the Links to your Items will be preserved.

Once the Items are severed from the Thing watch events.log for further commands. If the Items are still receiving commands we know that the source is coming from something else.

I do agree with @rossko57 though, the lead suspect is the Remote openHAB binding so maybe focusing on confirming or eliminating that as a source would be the best place to focus on.

Maciej_Krochmalski · January 4, 2022, 2:12pm

Hey there, I am facing similar problems for about 1,5 months now. Every now and then (roughly once a day, sometimes every few minutes) some of my items receive ON or 100% command.
Since then I did reinstall the OH 3.2, leaving my items and config things - didn’t help. I made some minor improvements - no luck either. Yesterday I followed the “hacked” path and changed all passwords. Although I am still waiting for the result - nothing wrong happened - I guess this will not be it.
Just to share your experience, what bindings do you have? Maybe we have some in common and it is one of those?

Myself I have installed Astro, CoronaStats, Denon/Marantz, Exec, GPSTracker, HPPrinter, HTTP, iCloud, Kodi, Modbus, Network, OpenWeatherMap, Samsung, System Info, Telegram, Xiaomi Wifi and ZoneMinder. Rules: Telegram: Proces Answer. UI: Baic UI and uninstalled yesterday HABot and HABPanel. In addition to the above: openHAB Cloud Connector, RRD4j Persistance, JDBC Persistance MariaDBm JDBC Persistance MySQL. Transformations: Javascript , JSONPath , RegEx, Map and Scale.
If we can rule out these that are not in common might be it will lead us to a solution unless you found one… As I have few hundred items and it’s a live organism, cutting them one by one is a process I would love to avoid.

Maciej_Krochmalski · January 5, 2022, 9:38am

Just as suspected - changing passwords didn’t help. My lights went on during the night and blinds opened wide. Still seeking a solution…

ianst · April 30, 2022, 10:32am

Same here, and its not fun at all. Once a month ALL switches are turned on. Randomly. At night. Now at 5.50am. As well as water PUMPS and BOILERS that are connected by KNX to OH… that’s are serious things. This can lead to very danger situation as our boilers are fully controlled by KNX.
Nothing interesting in logs. At 5.50 one by one all switches are turned on, group switches and its child switches. Previously i’ve made empty switch not connected to ANYTHING and it was turned on too.
How to debug this? How to fix it already?!
Clean installation - done.
Clear configuration from scratch - done.
Logging KNX (the only binding in OH) - done.
Changing passwords - done.
Praying for all Gods - done.
Whats next?

rossko57 · April 30, 2022, 2:12pm

Never true. The absence of commands in logs is just as interesting for the symptoms described. Share what you have, if you want opinions.

I understand the frustration, but we’ll have to hunt down details.