Ok, so we seriously take this thread off topic, then.
(Of course, we can continue this elsewhere if advisable.)
I’ll give you my full “rant” then, knowing fully well that some of this will be justified, and other parts will just feel justified to me. I trust that you’re able to take it with the appropriate barrel of salt. And while you may, of course, disagree with me in as many regards as you wish, I hope that you’ll nevertheless get the general points.
I’d also like to say upfront that I love what OH accomplishes. I wouldn’t have gotten as far as I am without it. So it’s love and hate at the same time. ![]()
The key problem is that whenever something changes (an update, a new binding that I install, an overhaul in some parts of my rule set,…), I must hold my breath whether there will be several other things that suddenly (or not-so-suddenly, noticeable only after days or weeks) fall apart. Usually there will. To some extent, this is clearly due to the complexity of my setup: we’re not talking about a Raspi with an SD card, we’re talking about a full-grown server system where OH is one service among several, and an OH setup that manages 1000+ items, 350+ automation rules, 500+ events per minute. 10+ years ago I started with things like switching the Christmas tree lights on iff the lights in the living room or the TV are on. By now my setup controls virtually everything in and around my home. Heating, rollershutters, PV incl. battery, car charging, smoke detectors, gates, lights, water supply, ventilation, home appliances, phones, doorbell, home entertainment in multiple rooms, alarm clocks, and quite a bit more. Yes, I enjoy that as a hobby. No, I am not expecting open source software made by enthusiasts to come with military-grade stability. Yes, I appreciate the incredible amount of hard work put into this, by so many skilled people. Yes, I know how to build and administer complex IT systems, and how hard it is to build them “right”. And yes, of course I have a plan B if things go very seriously wrong. So even in the worst case I will still be able to get through my front gate, I will still be able to switch on the lights, and with some manual pressing of buttons I won’t sit in the cold without heating. But over the years it has still become really, really mission critical for my entire household.
And yes, you will say that it’s natural that upon an upgrade things may go wrong. That’s true. I have that with other services, too, occasionally. But OH is different in one significant way: there’s no clean separation between code, configuration, and data. And this makes it a) super hard to go back to an older version if it turns out that things did go wrong, and b) to seriously skip updates (as you rightly suggest as a natural strategy for stuff you depend on).
When I do a version upgrade, I want to do it with my distribution’s package manager. I want to do it without having to read up in forums, in release notes, and perhaps even in the code first, hoping not to miss something that comes back to haunt me later. I want to stay in the user role. I spend more than enough time in the guts of other systems, so for this one my desire is that it just does it’s job, with all the flexibility that it allows me to leverage on the user side.
But if I upgrade OH, things like the infamous “upgradetool” start shoveling around stuff - because OH has more and more started to treat configuration like data since the advent of PaperUI, with different, incompatible formats every now and then and stuff getting stirred together in oftentimes surprising ways and places. An upgrade may also result in an unexpected restructuring of my persistency storage, and so on. Of course I will make sure I have a good backup before I upgrade. But if three weeks after the upgrade I figure that something in the new version is not really doing what it’s supposed to do, I can either roll back with the old backup, losing all my changes and likely also most persistency data from the time since the upgrade (even if it’s in a separate database, as the schema may have changed since the upgrade, or because of meanwhile added items, or…). If I don’t want the full roll-back, I’m bound to stick with whatever version I got myself into. Carefully, piece-by-piece transferring configuration data? Perhaps even version-controlling configuration data like I do for most other services? No way, and least not dependably without the constant risk of missing something. As we’ve seen two days ago, it even sucks new binary code from a directory into a place somewhere in the intestines of the system. No way back but by downloading the old version again, hoping that it gets integrated correctly, and still expecting some things to still be differently (like IDs), with possible consequences that are hard to predict. (Yes, shit happens, I am optimistic, but at the same time appalled by the general systems design approach as such.)
So I have learned to live with, for instance
- unexplainable missed triggers, sometimes more often, sometimes less often (has happened from time to time, happened often after upgrading to 5.1.0, seems to have gotten better again for unknown reasons; system load related? uptime related?)
- OH not starting occasionally, instead flooding my log with tons of errors - and if I kill it and try a second time right away, it suddenly boots as if nothing had happened (still a phenomenon that I experience from time to time, especially often when it first starts after upgrades, but nondeterministically)
- error messages by the Shelly binding that “Channel types or config descriptions for thing ‘…’ are missing in the respective registry for more than 120s.”, and that this “should be fixed in the binding”
- sometimes my EnOcean binding becomes deaf, and only restarting OH helps that it starts receiving again; I haven’t looked into this one deeply, but it seems that in the recent version(s) scanning for new EnOcean devices also stopped working (I worked around this by manually editing addresses, after trying a few times without success)
- when I open some sitemaps in the Android app, everything works as expected, but something called “IconServlet” occasionally complains in the logs that it “Failed sending the icon byte stream as a response: Reset cancel_stream_error”, with tons of errors afterwards about “IllegalStateExceptions: ABORTED” (as I said: nothing noticeable that anything doesn’t work, despite all the log messages; no indication which specific items/states/situations might cause this and when)
- the configuration of some MQTT channels silently got silently rewritten and became non-functional sometime recently, so that the channels did not process updates anymore; this messed up my heating control without me noticing for the first few days, and burned lots of gas unnecessarily (I implemented a workaround, reported the bug, and after some discussions a fix seems on the way)
- very frequent error messages (every few minutes) in my logs that “Handler HomeConnectHoodHandler of thing homeconnect:Hood:*** tried accessing its bridge although the handler was already disposed.”, also without anything apparently becoming non-functional - the hood works as it should. But I lost trust that it will continue to work, and am half-expecting this to fail at some point.
This is not a complete list. It’s also for sure not a to-do list. For most of these, there are (often quite ancient) issues or forum threads with people seeing the same problems, but without solutions that would have helped me nail it down or fix it. And I just don’t have the nerves and the time to pursue reporting every single such thing, especially since the past showed that this seems not unrestrictedly welcome from “outsiders” like me, and the typical first reaction for other people’s reports as well as for mine is along the lines of “I can’t reproduce it, that’s fishy, that can’t be true, works for me, it must be your fault”. So, I’ve just come to accept this as a sign for an overcomplex system, where my scale of using it is apparently exceeding the scope within which it can still be used without encountering overly many “strange” effects.
If I upgrade to a new version of virtually any other service that I am operating, then I can more or less count on being able to at least find a safe way back, usually without losing critical data. Yes, there are exceptions (and I try to avoid them). In particular, it’s highly unusual that a service automatically re-writes and changes configuration information automatically in uncontrolled ways. Most backend software systems are simply doing a much, much better job at keeping stuff (especially configuration, most often also data) forward and backward compatible without resorting to “conversions” or other automatic editing.
It may be that I am particularly sensitive in that regard, because what I do for a living is working with IT systems and data formats in an area where there must be room for unforeseen extensions over long time spans, without compromising on compatibility with (long) past and (far into the) future versions. So I am getting a cold sweat when I install a new version of OH, and notice that yet again it’s telling me that something in my configuration has just been automatically re-written. It just feels like: “This would very likely be totally unnecessary had there been a clean design right from the start, and a few well thought through design rules.”
Given these experiences, I am so much with you when it comes to “not running after every new version”. A clear YES. That is also my general philosophy, especially for things on which I really depend every day. It is a philosophy that I had to give up for OH, though. It’s clearly not foreseen and not appreciated to do upgrades to anything but the next release, especially not in a system with a complex existing configuration where “just reconstructing it” in a higher release if necessary is not an option. The upgradetool (or whatever else) may mess up badly when you jump over releases, I was told and I learned. And spending a weekend catching up on release after release after release, step by step is a) hard with typical package management systems, because it’s not what these packet managers expect the managed software to expect, and b) it doesn’t feel like it’s anywhere more likely to result in a really functional system. A while ago I got badly scolded in this forum for daring not to go through every single minor upgrade separately, but instead waited for things to become stable, and then asked for help when something - as far as I can tell from all that I know and saw completely unrelated - stopped working after I finally did a somewhat longer ranging upgrade. (That, by the way, was also related to rules that didn’t trigger - but that’s coincidental.) So I typically delay my updates for a few days, but then I do them when my distribution offers them, so that I don’t risk missing any apparently so important steps along the mandatory upgrade ladder.
Essentially, what I would expect is: I can freeze a version of OH in my Linux distribution’s package manager, to preventing automatic upgrades even if they appear in the repositories. When I decide that it’s time for an upgrade, I do one, again using the package manager of my distribution, without digging though all the footnotes in all the release notes of all the intermediate releases that I might be jumping over. I’d expect that nevertheless (a) my configuration will stay intact, (b) I get a message if something in my specific configuration is at risk of breaking, before anything is getting modified, (c) I don’t lose the option to go back later without losing configuration changes and persistence data that happened between the upgrade and the decision to go back (unless, obviously, they depend on features that haven’t been there in the old release). I can do that for virtually any other mission-critical software that I am using, but not really with OH. At least I have lost any confidence that I can, and I got quite explicit statements from people here in this forum that it’s entirely my fault if I assume that I should be able to.
So, my way out is that I started developing something tailored, lightweight, which handles persistence and automation rules based on MQTT, and I started porting my automation rules out of OH and into this much more overseeable, less feature-(over)loaded and hopefully also in the longer term more easily debuggable environment. My intention is to separate device/”thing” interfaces from automation logic, and both from user interfaces. I might keep OH as a “bridge” to some types of devices where there’s a binding available, but where I can’t find a more lightweight alternative to interface with the device. I might also keep it for user interfaces, like panels on tablets (though I will evaluate alternatives; for instance, I like that fact that HAss has a working Android Auto integration, something I have really been missing).