Reliabilty of openHAB

SteMo · July 21, 2016, 6:08pm

Hi,

I want to open a discussion about the reliability of openhab and your experiences.
I have a lot of items in operation, wall plugs, dimmer, switches, roller shutters, motion sensors and remote controls.
As controller I use the Raspberry PI2.
I do not belief that openhab is not reliable, It must be something different. What happens from time to time is that some rules do not trigger. For a light it’s not really a problem, but for my alarm system it is.
When I check the system it runs, but later on the rules partly fails. And I can’t find a reason. Can it be that the Raspberry is the problem. But if this is the case, which system is better? Has someone of you similar problems? And solved it?

rlkoshak · July 21, 2016, 7:19pm

When you have inconsistent behavior you must look in the logs to see what actually happened. When you say “but later on the rules partly fails” makes me think there is an error in your rules themselves and if so there is likely an error printed to the logs when this happens.

Are you using persistence with restoreOnStartup? If not a restart of OH or reloading of the rules files could cause a bunch of your Items to become undefined and if you don’t test for that in your rules it can cause problems.

Now I have had some problems with the reliability of my Pis (original gen and gen 3) but it has more to do with them falling off the network than running programs unreliably. But I run OH on an old laptop running Ubuntu and pretty much only have to restart OH when I change openhab.cfg or logback.xml and it usually runs for months between those times.

noemark1 · July 22, 2016, 2:57am

All-

I am using a Raspberry Pi 3 and find Openhab to be quite reliable (at least for my purposes) - running version 1.8.3. I have about 10 lights on a Aeotec Zstick 5, an MQTT server hosting several Arduinos with climate sensors, 2 Ecobee thermostats, a Logitech Harmony Hub, etc. I had this configuration running on a Raspberry Pi 2 and found it to sometimes execute the rules more slowly (sometimes taking 20 to 30 seconds to implement a simple rule for adjusting lights on a scene button press), but I have noticed it works much better on the Pi 3. Just my observation…

-Mark

mstormi · July 22, 2016, 7:13am

I wouldn’t question OH in terms of reliability. It used to have issues, and it probably still has as does any complex system.
But as with any complex system, you need to understand the dependencies and identify the root cause before you can tell who’s actually the culprit.
Sometimes it’s the environment (such as partially bad radio coverage resulting in lost messages), but only very very rarely, you encounter true OH software bugs.
In versions before 1.7.1, there used to exist a thread concurrency problem. And there’s one bug that I believe still exists in 1.8.3: if you have a trigger “if Time XXX OR Time YYY then ZZZ”, this may lead to malfunction. You may work around that by turning that into two rules with a single “Time” trigger each.

But the point (sorry for the longish intro) is: almost all of the time, it’s a user programming error. The rule doesn’t execute properly because it is not implemented carefully enough. That is, while it can be syntactically correct (and quite often it isn’t even that but goes unspotted) and seems to work in general, it will fail if specific parameters change (such as item states). Note they can change for a number of non-obvious reasons, e.g. if sensors fail. Quite often it’s also related to data types and their lack of correct conversion.
The way to get around that is to debug rules. Enable all appropriate debugging in logback.xml, and insert debug statements in your rule to output parameter values that might be related. Add a virtual switch to instantly trigger rule execution, search the logs to find variable states at time of rule execution and repeat as needed.
Btw, long delays are often a hint: once you enable rule engine debugging, you suddenly notice Java exceptions (i.e. your rule crashes during execution, resulting in CPU being busy for seconds, plus various unpredictable results.).

a0174741 · July 22, 2016, 3:49pm

I run OpenHab 1.8.3 on Windows 10 and have found it to very reliable with the exception of some HTTP or NTP communication. I have the following in use: InsteonPLM (USB managing 62 devices), Logitec Harmony, DSCAlarm, weather to WeatherUnground, Chamberlain MyQ, and SamsungTV. About once every week or two the IP connection to the DSCAlarm EnvisaLink 3 security system breaks or losses connectivity. But the neat things is, I can detect the connection loss in a rule and run a command file that reboots the EnvisaLink interface and recover the connection. Overall it is very reliable but since I and using OpenHab to tie together so many other systems, it can be hard to determine if the failures are in OpenHab, the configuration, the rules or the interfaced-to systems.

As was noted before, review the logs can be very helpful.

notanatheist · July 22, 2016, 10:48pm

Running OH on a Pi is more akin to ‘hobby’ status. With the assorted issues plaguing SD cards, Ethernet over USB, the limited amount of RAM and low power CPU it doesn’t take much to choke a Pi. That said, I like my Pis for other jobs but not for my home automation controller. I don’t have as vast of a setup (about a dozen Insteon devices, Wemo, MQTT, and a few other things) but running on a Intel NUC or other small form factor PC that you can toss a 4GB stick (or more) of RAM in and have a better performance SSD will make a world of difference compared to a Pi. I know I can probably run my system reliably for years without reboots if I didn’t enjoy constantly tweaking it. Intel, Zotac, Asus, and others all have x86 boxes with multi core processors that you can fully assemble for ~$200 USD.

Max_G · July 23, 2016, 3:39am

Well…

I am running stats on my Raspberries; the one with OH on it, MQTT, SQUID, dnsmasq, NTP, rsyslog, SAMBA, Webmin, PHP, MySql, etc. hardly does any work over 1%… yes, I have an SSD connected to it… hence, no performance bottleneck. There Pi2 and have a dedicated Ethernet port (USB or not); working with enough throughput to run OH on it.
I also run OH on Windows… I have yet to see / experience that the Raspberry platform is failing me (or the OH system).

mstormi · July 23, 2016, 9:13am

Pis aren’t the most reliable systems in terms of hardware reliability, but with respect to the thread topic, Pis are fine.
They provide enough processing power for home automation. If you encounter overload, delays, timeouts or similar, it’s not the Pi but either bad programming (rule loops and crashes) or environment (such as bad radio coverage).
The effect can be observed sooner and more dramatic on low-performance CPUs like that of the Pi (compared to a NUC), but lack of CPU power is a symptom in these cases and not their cause.
If you’re interested in that, there’s a thread on ‘best hardware’.
Btw, I just got an Odroid C2 for almost the price of a Pi. They have native GigE and eMMC to overcome Pi limitations.

Max_G · July 24, 2016, 5:47am

What evidence makes you say that?

Sounds very subjective to me… what are the Pi limitations when it comes to the question here: reliability of OH?

If you need a faster network; fine… I wonder how you would saturate 100MBit/sec. in OH…

In short: I can’t see how you diminish the reliability of a hardware by picking on its specs.

→ Just a thought on generalisations.

mstormi · July 24, 2016, 8:26am

I was actually only referring to SD card crashes. Odroid has eMMC. Ok, I see my wording was easy to misunderstand.

EDIT: in the ‘best hardware’ thread, some people mention they lose Ethernet connectivity on Pis.

I had also this in mind when I initially wrote down my statement, but couldn’t recall where I had read about it, so I better left it out.

Here I was referring to limitations in network and storage speed in general. Depending on use case, they might apply to you or they might not. If you use such a box in parallel to OH for a media player like some do, it can impact OH performance, and that’s where these specs are of relevance.
I don’t wanna blame your Pis, I’m using them myself.

Max_G · July 24, 2016, 9:21am

My point exactly; there is no limitation to run OH, and no limitation to the reliability of the Pi… however, whack some other resource intensive stuff on it and hit limits, but not in the context of the original post.

… and it’s not my Pi… as far as I am concerned “Pi” can replaced with any other technology – if the technology is overburdened it is not a limitation of the system but an overload
I know: it seems like splitting hairs…

So I would summarise: the Pi is an adequate platform for running OH on it.

vespaman · July 24, 2016, 9:47am

I really struggle to understand the bad reputation RPI has with regards to the uSD - we have been testing quality uSD (such as SanDisk etc) at work and had no success in destroying them (consistently writing onto them with a test program) if there is space available for wear leveling.
Cheap uSD can be destroyed withing days/hours with the same test. So in my mind, if a uSD, or any kind of disk media fails, blame the media manufacturer.

I have had RPI running for years with SanDisk here at home. Of course YMMV as usual.

This is not a respons to anyone in this thread, just my own observation. And I may be totally wrong of course.

mstormi · July 24, 2016, 10:56am

Well, that’s no yes-no-thing but a probability one. Quality cards lower the probability, but they don’t remove the risk.
Probably it would just have taken more testing to destroy them
Given the fair amount of people reporting this issue even just in this forum, it would be bad advice to ignore it.
Starting with eMMC, an USB stick or disk or other ‘safer’ storage medium is a good idea.

vespaman · July 24, 2016, 1:19pm

It was last summer, so my memory is a little shady, and I don’t have the figures at hand here, since I’m on vacation, but the test continisously (24/7) wrote to uSD for weeks, my guess is at least 3 three weeks, most likely a month before we ended it. The cheaper ones did not last a week-end in the same test. (And cheaper where still well known brands).
Nothing ever removes the risk, simple fact. Everything can, and will eventually break. The problem with consumer uSD is that only a few brands makes their own memory cells, so the quality changes with batches, and very very few specifies wear. Industrial uSD are tighter and better specified, but terribly expensive.

We did this test, since we rely on them in our business, but I think this is also common knowledge with photographers.

Edit:
And in my opinion, USB is just as bad, if not worse. The only think that sets eMMC aside, is that they are sofar manufactured by good brands. USB may, or may not.

Max_G · July 24, 2016, 11:24pm

It is a myth: An USB stick no different to SD!

mstormi · July 25, 2016, 10:59am

This discussion is leading nowhere because you never know what’s inside.
All I meant to say is increase your chances to start with a safe medium. And given the number of bad experiences people made, SD cards are probably the most unreliable ones among those alternatives - most likely because there’s more competition on pricing than for the others. So starting with one of those others will still be a good idea. But it’s your choice.

Anyway, this is completely getting off-topic. The thread is on software reliability.

marcolino7 · July 25, 2016, 3:59pm

Hi,
I’m running OpenHab 1.6.2 on a ESXi Virtual Machine, with Debian 7. It have 2 processor and 3GB of RAM.
It’s running since 6 month without reboot or any issue. I have a lot of bindings HTTP, Souliss, Netatmo, Integration with Zoneminder and some other custom integration. I also have Zabbix that pull data from OHab REST API to monitor item and handle the trigger.
I’m very satisfied of the product and the stability.

Marco

george.erhan · July 27, 2016, 1:41pm

Hi,

After 2 years of “playing” around with OH, I can state that the only unreliable things in OH are the mistakes done by the user in configurations, items, or rules!
My experience with OH is of about 6 setups ranging from raspberry pi to powerful vms (from single core to 8 cores) with items ranging from a couple of hundred to tens of thousand with multiple bindings (TCP, KNX, NTP, WEATHER, network health, exec, etc.)
I never had the situation where the restart of oh happened because of unreliability of oh core or add-ons!
Needless to say that one of the most important things is to balance hardware requirements based on the needs in order to achieve performance and reliability!

George

SteMo · July 27, 2016, 5:33pm

Hi,

I didn’t expect to start such a huge thread. Thanks for all the answers.

Also I think that the mistake is on my side, not in openhab. But I can’t find it.
Now I have like proposed revised my persistences and store only the items I really need and used the command:

strategy = everyChange, restoreOnStartup

I thought that I could solve the problems but today I had again a problem. Via a wall switch I turned on my heater. But the rule didn’t start and after 30 minutes the heater didn’t switched off. What happend was not a problem with the rule but with the item. The heater which turns on and off via relay switch was definetely ON. But in the UI this switch was displayed as OFF. Thats the reason why the rule don’t start.
But how can it be that the item sometimes run ans sometimes not?
Do some of you had a similar behavior?

Thats the item, a Fibaro 1 x 2.5 kW relay switch

Switch    heizung_1    { zwave="24:command=switch_binary" }

rlkoshak · July 27, 2016, 5:50pm

There can be any number of causes. Since you are using zwave my first guesses would be:

a poor mesh network with one or more single points of failure
one or more dead nodes

There isn’t always a lot of feedback or confirmation when sending commands to remote devices and in my experience Zwave doesn’t guarantee delivery. So what likely happened was openHAB sent the OFF command and that command failed to reach the device, but openHAB didn’t know that so its internal state indicated the relay was OFF.

Put your zwave binding into DEBUG or TRACE mode (see the wiki page) and look for errors. Also, look at your network in Habmin and make sure there are at least two paths through mains powered zwave devices between all of your devices and the controller. If there is a choke point (i.e. only one mains powered device can see the controller) that device might be getting overwhelmed. NOTE: Battery powered devices will not relay messages so they do nothing to improve the mesh network.

I had this problem in my network. There was only one outlet that was a neighbor to the controller and occasionally zwave messages would get lost and devices marked as dead when they really were fine. I added a relay giving everything at least two paths to the controller and my zwave network has been rock solid ever since.