Stability -- openhab knows nothing but crapping itself

I think you’re doing that wrong. I run the test version, then when it breaks I tell the better half that I need to spend $1,000 to fix it. Asking for money to change something that’s working is a fool’s errand. :slight_smile:

(Of course, not true at all, the better half is quite indulgent)

5 Likes

I guess its a matter of balance between the time without the production setup and the test setup. But since the is a smart home system, that down time is often very limited.
Going back to the stable version is offcouse an opputunity, when having a decent backup procedure. But going back only makes sense if you found some problems and are able to file an issue from it… This is where things are getting tricky… If you found a problem, and you file an issue, the developer would need someone to test fixes etc…And you might end up going forward and back many times. and each time it means yet another down periode for the production setup (ie your system, which perhaps drives all the lighs in your home).

I know, these are the game rules for things like this… It cant hardly be anything else… But it´s always a problem with developing like this.

I wish I could agree on this, but I dont…
Think about the problematic serial driver issues which suddenly apperead in one of the 2.5 M1 builds. If you´re “lucky” this error would have showen the first time you boot up the test. But not always. Then it would maybe show the next time you boot.
Most new builds are for the core, I guess. The rest are the bindings. Bindings are in my opinion alot easier to test, except if you start up with a binding, which doesn´t work at all, ofcouse.

But isn´t that the goal of openhab as well?
And isn´t this the reason why it´s important to have a stable version, which doesn´t need fixes?
If openhab is just for testing purpose only, lots of all this doesn´t really make any sense to me.
Every time a software reach a stable stage, developers should have cleared all (known) bugs and troubbles, and start focusing on developing new features only.
At that stage, the software should be ready for any kind of productions, as well as all documentations should have been updated. But every time something fails, which require the user to update the core system to a later version, the developing has failed!

It depends on the nature of the problem. If it’s a problem not involving a shared resource like a Zwave dongle, then you can always run them both at the same time. Just use different networking ports. In all likelihood, once you’ve identified a problem well enough to file an issue, you don’t need a fully functional configuration in your test instance to further debug the found problem.

If down time is really that important to you, then you need at least two completely separate yet fully functional instances of OH which can become quite a challenge. So that becomes a place where you should probably stick with the release versions and only rarely update. You can’t be a tester and have a perfectly stable OH instance with limited down time at the same time. The one precludes the other.

There is one prominent OH user (has spoken at several of the openHAB days in Germany who’s mantra is “never ever update.” If you don’t want downtime, get it to work and never touch it again.

Another way to mitigate this is to avoid putting too much into the sole control of OH itself. This is why I constantly advocate for putting/keeping as much of the smarts in the end devices as possible. Even if OH goes down, I can operate the light switches manually, turn on/off the heat (and the heat will still automatically maintain the last commanded target temp), and so on. Down time isn’t such a big deal with the house itself is still operable even when OH goes offline. It just might be a little less convenient.

Well, that problem isn’t exactly something new. It’s been known since OH 2.0 two years ago that the startup order is undefined. The problem here occurs because the serial bundle doesn’t start before some other bundles that depend upon it. There is no further testing or debug that needs to be done for this. It’s a long term known problem so all you have to do as a user is watch the issues for a fix.

There is a new build of everything every day. The snapshot includes all the latest merged changes for both the core and the bindings.

It has “home” in the name. I’ve not seen any maintainer advocate for it’s use in a commercial or industrial setting. As is probably clear, I’d advocate against that.

There is almost no such thing in software. Even mission critical systems upon which human lives depend have instabilities and need fixes. An open source project rarely has the manpower nor the expertise to achieve anything like the rigor and quality control as those systems. If you think development of OH is slow consider this. A ten line function written for something like a NASA rocket launch control will require about six months from the time it is written until it gets merged. In that six months it has been independently reviewed by at least a dozen different people including peers, testers, integration and test engineers, quality control, and an engineering review board. It will have undergone hundreds of hours of dedicated testing on special built hardware and test suites which usually took longer and involve orders of magnitude more code and engineering effort than the functional code does.

In short, what commercial program, let alone an open source project, is going to dedicated 3/4 of all of their effort towards testing and review of the production code instead of writing said production code?

I’ve worked in the defense industry my entire career. I’ve seen what it takes to write stable code like this. It could never happen in an open source project.

Then there will never be a release of OH, ever.

In a commercial project you can enforce something like this. In an open source project like OH, we depend on volunteer efforts. Volunteers are going to only donate their time to work on what they want to work on. Consequently some known bugs never get fixed and some important features never get implemented because no one volunteers to fix them. If we say “no, you can’t work on that new binding or new feature until someone fixes Issue X” then we will shed volunteers like an oak tree in the autumn.

A successful open source project without commercial backing is as much about managing the community and keeping the volunteers motivated to continue to contribute as it is about the code. Consequently, there will never be a time where there are no known bugs. There will never be a time where the docs are complete, clear, and accurate. This is because developers will always be more passionate and motivated to work on something new than on fixing bugs in someone else’s (or sometimes even their own) code.

Even if someone is willing to work on it, sometimes the developers cannot reproduce the reported behavior. If they can’t reproduce it they can’t fix it.

And there are always tons of latent bugs lurking in the software that are not known. And this is why I say that all users of OH, or any software, open source or commercial, are all testers.

If you want stable and don’t want to be a “tester” you need to go out and purchase an industrial control software with a service contract. If you use a commercial or open source home automation software, you will never get to 99% up time. It’s simply never going to happen.

1 Like

The shit comes from java and linux…would be nice to have a real platform supported by real computer pros.

pros = paid for. What did you have in mind?

I’ve been a loyal OH user for many years. My skills are totally homegrown, IT is not my profession but is definitely my obsessive hobby. I have a core automation/alarm panel (Omnipro) that I try to use for any heavy lifting/must always work type stuff and then use OH to bridge enhanced functionality my board won’t support, like controlling my mix-mash of multimedia devices, z-wave (thank you @chris) and provide push notifications to our cell phones for various happenings around my house.

I will say that I have only had a handful of “melt-down” moments using snapshots and I just had one happen July 3rd when I updated to snapshot #1630. Suddenly, at 10 pm, during what I routinely do many times during the week, my OH system decided to crap the bad news out via the logs. Something wasn’t working and no real (obvious) clue as to what was now broke. I backed up my conf folder (only :frowning: ) and proceeded to install, reinstall, purge, revert to previous snapshots, all without success. This lasted until well after midnight and (after many bad OH dreams) I awoke to continue working to get my system up and running again.

Under normal circumstances, I know the routine and what to save, however, time pressures mounted because my wife and I were hosting the neighborhood 4th of July party with 50+ guests expected to show midday the next day. I wanted my remotely located HDMI switch, Onkyo receiver, and 6 zone audio amp to work with my Harmony remote and LG televisions as well as show off my fancy colored Milight LED’s (thank you @matt1) and my guests to be in awe of my fancy house.

I did eventually get up and running about 30 minutes before my guests arrived by reverting back to 2.5M1 but (because I didn’t do a proper back up) I had to rebuild all my channels and Paper configurations manually. The big loss for me, however, was I neglected to back up my Habpanel.config which, for some frustratingly unknown reason, resides in the bilges of the OH user settings and not the conf folder with everything else. I’m willing to do all the other set up for a fresh install but that one file…it’s always that one file…

I share this with all of you because I do think it would be helpful to have a snapshot status page on the forum for the developers to report (known) breaking changes and let those of us who don’t live in the IT world by trade have some obvious place to go look for issues before we make a plunge. I do scour Github daily to try to see what changed, but breaking changes are at best cryptic in a hidden Github issue or pull request and not easy to determine until actually making the plunge to upgrade.

I think it would also be helpful if someone could share a layman’s version of how to set up a Git repository and/or some other backup strategies that are tested and proven. Obviously, backing up via the OH CLI would have been the smart thing to do, but I’d be interested in figuring out a strategy for having a test and production environment set up and how others manage devices like z-wave sticks with the two environments.

End of the day, I am super appreciative of the OH developers and community as I am constantly learning (and plagiarizing) something off this forum (thank you @rlkoshak, et al). I will say, however, I think we are missing basic information for snapshots. This is the first place I go when I have problems like this and this time I appeared to only have two other users reporting an issue with 1630 but no real idea as to what changed in 1630 that I should have been cautious of before upgrading.

It’s funny how the most used/in demand programming language (according to Tiobe) is somehow the problem. I mean no one has ever had problems with Django (written in Python) or Ruby on Rails, or C# programs, right? I guess the tens of thousands of companies and developers in the world using Java are wrong.

Linux is run on more computers than any other OS (is you count Android), and which runs more servers than any other OS (even on Azure more VMs are running Linux than Windows Server, and OH is a server application). But you are aware you can run OH on Windows or OSX too? Hell, you can even run it on FreeBSD and I bet with a little work you could make it run on any Unix variant of you choice that runs Java 8.

Somehow, despite these facts they are somehow the root of any OH problem.

I’m not saying that OH is perfect and doesn’t have problems. But blaming Java and Linux is lazy fan-boyism

You should be using openhab-cli backup and restore for this. It will grab everything from both conf and userdata to preserve all of the stuff that you lost.

The problem is there is a new snapshot every day. It could be a week or more before a problem is detected. Even if there were such a list it would never be complete or accurate enough to be useful. The best I can offer is if you want to run the snapshots you are doing so at risk. If you can’t afford the down time and don’t want to be an active tester, you should not be running the snapshots. Or if you absolutely must run a snapshot for something, don’t upgrade it again until the next milestone or full release.

Yes, the snapshots are usually really stable so we all get lulled into a false sense of security. But the snapshots are essentially untested code and come with all the risks of running untested code.

I wrote a tutorial on how I set my system up a few years ago and darned if I can’t find it now.

At a high level:

  • I run gogs as my server. If you don’t care about a nice web interface you can set up almost any old machine as a git server. https://git-scm.com/book/en/v1/Git-on-the-Server GitLab is another popular git server package. I like gogs because it is a bit more light weight.
  • I run OH in Docker so it’s easy to have all my configs ($OH_CONF and $OH_USERDATA) in one subdirectory. You can do the same and use symbolic links to mount them from, for example, /opt/openhab2/conf to /etc/openhab2 and /opt/openhab2/userdata to /var/lib/openhab2. I put them in the same directory so I only need one git repo to store both. You could set up two repos if you prefer.
  • This is my .gitignore which prevents git from checking in stuff that doesn’t belong like cache and temp.
userdata/*
!userdata/etc
!userdata/jsondb
userdata/jsondb/backup
userdata/backup

This ignores everything in userdata except for etc and jsondb but the jsondb/backup folder is ignored. userdata/backup is a folder you will only have if you are running in Docker. When you upgrade or downgrade a container, it takes a backup of your userdata folder so you can easily revert should something go wrong.

  • As I make changes I check them in and push them to my gogs server. When I want to set up a test, I can pull from gogs and have the most recent changes. Then I start up a new container with the new image and am ready to go.

Hopefully this will be enough to find some more tutorials to get started. If not, over the next few weeks I may write a new tutorial since I can’t find the old one.

It sounds like a great idea but remember a snapshot only lasts a day before it’s replaced with a new one the next day. That’s not a whole lot of time for testing and reporting. Literally anything that gets merged into the baseline before the snapshot build starts gets included in the snapshot. You are likely running code that is less than 24 hours old. This is why I just don’t think that the extra effort to maintain a separate list of known issues about snapshots is worth the effort. At best it would only be a partial list. At worst it gives users a false sense of security that the snapshot does not contain killer bugs.

Better to assume that any snapshot can kill your system and don’t try to upgrade snapshots until and unless you have time to roll back or troubleshoot some problems. And in order to roll back you must have a good backup and backup and restore procedure.

Ok folks, I’m on vacation and two times me Rpi 3b+ has had our alarm go off. I’m running 2.4 stable release and my rules report via e-mail which sensor was tripped. I need to look to see which rule reports the generic “home alarm triggered” e-mail alert but none of the motion or door sensors were triggered The system would have specified the exact sensor that was triggered.
How do you capture an event and look back at it a few days later? How long are troubleshooting logs retained?

Any thoughts will be appreciated!

Thanks in advance, Mike

paid for something like it (openhab) that does not use java and linux

Gaston

You’d be much better off starting a new thread for your specific problem.

openhab.log , events.log are the current versions.
In the same folder you will find the dated archived versions.

I don’t know. Probably depends how much capacity your system has. I get many months on mine, and clear out sometimes, but it’s hard disc based.

For a very long period of time the earth was flat and every body believed that, even if it did not make any sense (every thing in the sky was round…except the earth that was flat, no questions asked).

The only value I see in Linux is it’s free…it still calls a display a TTY device and mount devices like we used to mount old tape drives…As far as Java, it is not a language, it is a piece of symbolic jargon that do not resemble to any language the would speak a human being. It is a complicated derivative of the stupid Fortran of the 60’s produced by university teachers that the did not know what to do in their free times. You can like Java fine but I do not…you can like Linux, I don’t.

Gaston
p.s sorry for my english, I speak french

rlkoshak
Rich Koshak
Foundation member

    July 6

zonegrise:
The shit comes from java and linux…would be nice to have a real platform supported by real computer pros.

It’s funny how the most used/in demand programming language (according to Tiobe) is somehow the problem. I mean no one has ever had problems with Django (written in Python) or Ruby on Rails, or C# programs, right? I guess the tens of thousands of companies and developers in the world using Java are wrong.

Linux is run on more computers than any other OS (is you count Android), and which runs more servers than any other OS (even on Azure more VMs are running Linux than Windows Server, and OH is a server application). But you are aware you can run OH on Windows or OSX too? Hell, you can even run it on FreeBSD and I bet with a little work you could make it run on any Unix variant of you choice that runs Java 8.

Somehow, despite these facts they are somehow the root of any OH problem.

I’m not saying that OH is perfect and doesn’t have problems. But blaming Java and Linux is lazy fan-boyism

swamiller:
’m willing to do all the other set up for a fresh install but that one file…it’s always that one file…

You should be using openhab-cli backup and restore for this. It will grab everything from both conf and userdata to preserve all of the stuff that you lost.

swamiller:
I share this with all of you because I do think it would be helpful to have a snapshot status page on the forum for the developers to report (known) breaking changes and let those of us who don’t live in the IT world by trade have some obvious place to go look for issues before we make a plunge. I do scour Github daily to try to see what changed, but breaking changes are at best cryptic in a hidden Github issue or pull request and not easy to determine until actually making the plunge to upgrade.

The problem is there is a new snapshot every day. It could be a week or more before a problem is detected. Even if there were such a list it would never be complete or accurate enough to be useful. The best I can offer is if you want to run the snapshots you are doing so at risk. If you can’t afford the down time and don’t want to be an active tester, you should not be running the snapshots. Or if you absolutely must run a snapshot for something, don’t upgrade it again until the next milestone or full release.

Yes, the snapshots are usually really stable so we all get lulled into a false sense of security. But the snapshots are essentially untested code and come with all the risks of running untested code.

swamiller:
I think it would also be helpful if someone could share a layman’s version of how to set up a Git repository and/or some other backup strategies that are tested and proven. Obviously, backing up via the OH CLI would have been the smart thing to do, but I’d be interested in figuring out a strategy for having a test and production environment set up and how others manage devices like z-wave sticks with the two environments.

I wrote a tutorial on how I set my system up a few years ago and darned if I can’t find it now.

At a high level:

  • I run gogs as my server. If you don’t care about a nice web interface you can set up almost any old machine as a git server. https://git-scm.com/book/en/v1/Git-on-the-Server GitLab is another popular git server package. I like gogs because it is a bit more light weight.
  • I run OH in Docker so it’s easy to have all my configs ($OH_CONF and $OH_USERDATA) in one subdirectory. You can do the same and use symbolic links to mount them from, for example, /opt/openhab2/conf to /etc/openhab2 and /opt/openhab2/userdata to /var/lib/openhab2. I put them in the same directory so I only need one git repo to store both. You could set up two repos if you prefer.
  • This is my .gitignore which prevents git from checking in stuff that doesn’t belong like cache and temp.
userdata/*
!userdata/etc
!userdata/jsondb
userdata/jsondb/backup
userdata/backup

This ignores everything in userdata except for etc and jsondb but the jsondb/backup folder is ignored. userdata/backup is a folder you will only have if you are running in Docker. When you upgrade or downgrade a container, it takes a backup of your userdata folder so you can easily revert should something go wrong.

  • As I make changes I check them in and push them to my gogs server. When I want to set up a test, I can pull from gogs and have the most recent changes. Then I start up a new container with the new image and am ready to go.
    Hopefully this will be enough to find some more tutorials to get started. If not, over the next few weeks I may write a new tutorial since I can’t find the old one.

swamiller:
This is the first place I go when I have problems like this and this time I appeared to only have two other users reporting an issue with 1630 but no real idea as to what changed in 1630 that I should have been cautious of before upgrading.

It sounds like a great idea but remember a snapshot only lasts a day before it’s replaced with a new one the next day. That’s not a whole lot of time for testing and reporting. Literally anything that gets merged into the baseline before the snapshot build starts gets included in the snapshot. You are likely running code that is less than 24 hours old. This is why I just don’t think that the extra effort to maintain a separate list of known issues about snapshots is worth the effort. At best it would only be a partial list. At worst it gives users a false sense of security that the snapshot does not contain killer bugs.

Better to assume that any snapshot can kill your system and don’t try to upgrade snapshots until and unless you have time to roll back or troubleshoot some problems. And in order to roll back you must have a good backup and backup and restore procedure.

Consider yourself lucky.

I struggle to get an extra block of cheese in the groceries.

**** Update ****

Apparently that statement isn’t true.

The official line is,

“I can have as much cheese as is available in our fridge”

1 Like

I agree with this mantra as opposed to the opposite end of the scale that occurs too often.

Some users seem to have the mantra that ‘newer is always better’ and will update the kernel, java, countless other processes, openhab snapshot etc… Then after changing 20 different things all at the same time they will blame Openhab for being buggy. A few months back there was a memory leak in Mosquitto that was crashing my setup, it was fixed in 2 days with a new mosquito version but it still took my Openhab down.

2 Likes

Except even the early Greeks and Chinese knew it wasn’t.

You don’t have to like it (language or operating system) but to ignore the fact that both are exceptionally successful. If either or both were completely useless, as you describe, then this could not be the case. Hundreds of thousands of developers and administrators would not stand for it. To believe that either are the root cause of OH problems is to believe in conspiracy theories akin to the lizard people for how else could either become and retain it’s continued success.

No one says you have to like Linux, though I assume you are not using an Android phone (Android is Linux) or using any commercial grade WiFi router (they all run Linux), or have a smart TV (yep, Linux there too), or use any of the major websites (Google, Facebook, Netflix, Amazon, Reddit, you name it, all running Linux). Fly in a commercial air plane? Drive a car with a sat nav? You’ve used Linux.

Just imagine what could be accomplished in this world if only we had a modern operating system that used something less old than TTY devices. I mean come on, Windows uses the ultra modern (largely unchanged since the early 1980s in Windows 3.0 ) COM ports. (sarcasm in case you missed it). Oh, you like OSX? Guess what? It has TTYs too because it’s based on BSD, a Unix derivative.

I could go through a similar list like the above for Java.

You don’t have to like it. Personally, I don’t like coding in Java myself. But don’t confuse your dislike for something to mean that there is no value or worth in it. That it is not a good tool for the job. Or that everything would be unicorns and gum drops if only the world were using something more modern like ??? Windows? Python?

I do like Linux because I can (and do) actually understand how it works from the kernel space through to user space and beyond. Half of that stuff is proprietary in Windows and honestly OSX induces rage in me every time I have to use Finder. But hey, just because I hate it doesn’t mean I think no one should use it or that OSX is the root of all problems on a Mac.

I think the right place is a happy medium. There will always be reasons one needs to upgrade at some point. Bugs are fixed, sever security vulnerabilities discovered and fixed, a new feature that is required to get a system running or add a new capability to a system. Never upgrade ever seems as bad to me as upgrade every day.

There is also the problem that the longer you way to upgrade, if you fall into the upgrade seldom camp, the more work it is to perform an upgrade when you finally get around to it. Each person needs to find their happy medium.

But over all it really surprises me the expectations people have from openHAB. I’ve seen what it takes to build software like is used for air traffic control, missile launch systems, flight control software. It feels like people are expecting the sort of quality and rigor to go into this volunteer effort as what goes into systems like tat. It’s simply an unrealistic expectation.

4 Likes

That’s a bit ouch given Boeing’s recent demonstrations.

2 Likes

Really?

It’s derived from C / C++ and is generally considered a modern programming language. I guess ultimately all languages probably derive back to some language from the '60s - that’s evolution - but I don’t think it’s in any way linked to Fortran.

I did not read all comments in here. But I just want to give my point of view of Openhab2 and stability.

I have been working with openhab2.5 testing and openhab2.5-SNAPSHOT. And I have to say it is really stable for me. And I’m also using a bunch of bindings and other services (See lists below)

My “production” server are running openhab2.5-testing and has been running for the past 8 days with no issues. But I think it has had over a month uptime
My “development” server are running openhab2.5-SNAPSHOT and are getting restarted often. But 2 - 3 days has not been any problem

But I did not like to use the PaperUI. So I spend some time in the documentation and wrote all my configs in files and only using UI to check if icons and links was looking correct

But it is a really good idea to restart and clear the cache while configuring as some things could get a but strange after too many changes.

I made a small script to easy restart openhab2 (restart_openhab.sh)

About java, I did start using openjdk-11-jre-headless. But I got alot of errors in the log. But after switching to openjdk-8-jre-headless everything was great

And the community is also a great place to get help if you can’t find the answer you are looking for

Hope some of you can use this, else send me a message and I can try to help if you have any issues with stability :slight_smile:

restart_openhab.sh [soft | hard]:

#!/bin/bash
timeStamp=$(date +"%Y-%m-%d %H:%M:%S")
logFile="/tmp/restart_openhab.log"
logString=""

action=${1}

service openhab2 stop
case ${action} in
        hard | HARD)
                rm -rf /var/lib/openhab2/cache/*
                rm -rf /var/lib/openhab2/tmp/*
                echo "y" | openhab-cli clean-cache
                sleep 2
        ;;

        soft | SOFT)
                sleep 5
        ;;

        *)
                action="Default"
                sleep 2
        ;;
esac

logString="${timeStamp} - Restarting Openhab ${action}"
echo ${logString}
echo ${logString} >> ${logFile}

chown -R openhab. /etc/openhab2/*
service openhab2 start

System (VM):

  • Ubuntu server 18.04 LTS
  • 4 core
  • 2 GB ram
  • 20 GB SSD

Extra packages:

  • openjdk-8-jdk-headless
  • openjdk-8-jre-headless
  • mosquitto
  • mosquitto-clients
  • mono-complete
  • unzip
  • zip
  • nut-snmp
  • python-nut
  • nut-xml
  • nut-monitor
  • nut-ipmi
  • nut-cgi
  • python-pip
  • python-dev
  • build-essential
  • python-setuptools
  • python3-pip
  • mysql-common
  • php7.2-cli

Bindings:

  • networkupstools1
  • unifi
  • chromecast
  • mqtt
  • openweathermap
  • http1
  • sonyaudio
  • astro
  • mihome
  • systeminfo
  • kodi
  • ihc
  • verisure

Other services:

  • openhabcloud
  • restdocs
  • jdbc-mysql
  • influxdb
  • map
  • javascript
  • jsonpath
  • regex
  • exec
  • googletts
  • pushbullet

Wether or not its intended for home use or anything else, shouldn´t have anything to do with how stable it´s going to be.
Ofcouse you´re right, when this is open source project, things are not like commercial ones. But that sounds more like a question of obligations and responsibility, rather than creating good software.

I know the game rules of open source… And I accepts them as well. But they should not all be excuses for bad and unstable software. (Mind you, I dont think openhab is bad. I think it´s great. It has a few issues which I personally would like beeing solved or changed. Beside that I really do appreciate everything the developers are doing. And if there was an easier way to support the foundation, I would as well).

Fortunatly, you dont build smart-homes by the use of cheese :smiley:

Exactly why I have stayed away from Windoze and stick with linux. It is much more stable and constant updates aren’t a thing (or, typically do not require a system reboot when there are updates.) :wink:

1 Like