Restart Openhab regularly

borderbridge · November 19, 2022, 8:14pm

Hi There,
I dont know wether this is the right place to post this, tell me if there is a better section of the forum to post a wueation like this.

I have more of a general question:

I am wondering right now if it is a good idea to restart openhab or the whole rasperry regularly (for example: every night at 3:00)

Is this generally a good or a bad idea?

Is there any recommended best practice?

Do you restart openhab or machine running oh automatically?

What are Pro’s and Cons in your opintion?

Greetings
Manuel

Pedro_Liberal · November 19, 2022, 8:25pm

Olá Manuel, tudo bom?

I never restart the machine running openHAB.
I restart for updates, and that is about it.

Now, it’s true that you may eventually want to consider a safe shutdown procedure and an uninterruptible power supply to prevent forced shutdown when the lights go out.

Other than that… keep it up

rpwong · November 19, 2022, 8:52pm

It’s a bad idea. If you’re running openHABian with ZRAM, then your system is recording activity (e.g., item updates, log entries) to RAM to prevent wear on your SD card. Every time you shut down the system, those activities are saved to the SD card so that your system can restore them after startup. Restarting every 24 hours won’t immediately kill your SD card, but it will absolutely reduce its lifespan.

Obviously this matters less if you’re using an SSD or HDD instead of an SD card, but it’s still not necessary. OH is not a demanding application, and on a Raspberry Pi it can run for months on end without needing to be restarted. Some people have gone more than a year, but I don’t encourage that as it means they aren’t updating their systems regularly.

If you’re having problems with your system that are requiring regular reboots, then we need to figure out what’s wrong with it (which could be the SD card).

Some people suggest that the SD card should be changed annually, but with ZRAM I don’t think that’s necessary any more. However, I do recommend backing up regularly so that you’re always prepared if your system becomes unrecoverable. Even if you’re using an SSD/HDD, it’s always worth having a recent backup.

borderbridge · November 19, 2022, 9:15pm

Thanks for your input. @rpwong @Pedro_Liberal

yes i use openhabian with ZRAM.
I have a Backup Raspbery with SD card ready.
I dont have the need for regular restarts at the moment. (Although twice in this year the loggin in fontail
stopped and after restart it worked again)
I have an ups and my raspberry should shutdown at powerloss (Although it does not boot again automatically if power comes back, but that is an other topic, and in the last 5 years i had 1 power outage i can remember…)

But back to my question:
Part of the background of my question is: There are times in my live where i am not at home for alonger “timespan” but the rest of the family is.

I still want openhab to run without supervision for lets say weeks, months, not neccesarry but better even years… Maybe someday i die one day(will happen eventually to all of us) would be nice if things keep working for a while…

So i thought it might be a good idea to restart it regularly. I thought " If something goes wrong it would maybe fix itself + my persistence is written to sd card → less dataloss if something catastropic happened")

So from your posts i read that it is not neccesary to restart it refgularly might even hurt…

rpwong · November 19, 2022, 10:42pm

You’ve done everything you should to keep it running properly. If you have 99% uptime, there’s no reason to expect that the 1% will strike when you’re not there (particularly since you have reliable power and a UPS). It’s a possibility, but it’s not a likely possibility.

I get what you’re saying, but what happens when it eventually does fail? Or if a security issue is discovered that compromises your entire network? Someone needs to know what to do when you’re not there.

I suggest training your family members how to properly restart your OH server if it becomes unresponsive (ideally with instructions that you leave next to the RPi), so that they’re prepared for your short-term absences. If they do it a few times with you, they’ll be much more comfortable doing it when you’re not around.

In the event that you do pass away or become incapacitated, then your family has two options:

Learn how openHAB works.
Replace openHAB with something else (which could mean simply removing it entirely).

You might be able to buy them some time, but one of these things will eventually happen. And when it does, they’ll figure out.

This happened to me recently. It was during a period when I was running DEBUG on a binding and generating a lot of log entries, so I suspect that ZRAM maxed out and couldn’t dump them to a file.

If you’re concerned that it might happen again, then restarting infrequently might be worthwhile (just not every day). I clone my SD card every 3-4 months (sometimes corresponding with OH updates), which takes care of that. If you want to automate the process, you could set up a cron rule to do it.

You can also add filters to your log4j2.xml file to reduce the number of entries.

pleedell · November 20, 2022, 12:57am

Thanks for posting this…

I’ve had an issue since openhabian 3.3 with ram usage and thread count increasing to failure over the course of about a week or so. After working with the java versions, removing and re-adding add-ons, and disabling and re-enabling rules I’ve had no definitive success in tracking down the cause. So I’ve implemented the reboot rule in the OH3 gui to trigger when threadcount exceeds a threshold AND time is during a set window overnight to avoid any noticeable interruptions. We’ll log and see just how often it fires

matt1 · November 20, 2022, 5:42am

Yes it will technically reduce its lifespan if you want to get picky, but not by any amount of time that you need to worry about. There is wear levelling of some type in all decent branded cards and the amount written will be a small percentage of their total space.

The bigger issue is data corruption IF you loose power in the middle of the flash card doing the writes. If you use a UPS to stop power outages, I don’t see a big issue doing a daily restart.

I generally will only reboot once a month after doing the milestone updates, currently my system has been running for 28 days uptime without a reboot.

There are differences in a cheap computer like a raspberry pi and an expensive server. Look at what ECC ram is. There are other reasons why doing a reboot every now and then is a good idea, how often you do it is up to you.

Ram usage in Linux always goes up and is normal as it keeps cache and does not flush old stuff out unless it is needed. I doubt it is a ram issue.

I believe someone posted that the thread count increasing and never going back down happens if your using Zulu java. The recommended is now OpenJDK 11 and not zulu. Perhaps give that a go and see if it solves the thread count never going down.

Currently the docs are saying to use Zulu which I believe is now outdated advice.
Installation Overview | openHAB

Lastly search this forum for the HEAP and how to watch it with the system info binding, this is the memory that java uses. This is what my heap looks like, if yours is always increasing, then a reboot will be needed unless you find and fix the cause. Please post in a thread about the heap and dont hijack this one if it turns out to be your cause.

rpwong · November 20, 2022, 6:57am

Good point. So the corollary to it would be, “get a decent, branded SD card that has wear-levelling”. Without that, it’ll write over and over to the same spot.

Saying that, I still don’t see a good reason to reboot daily, or even weekly. If an OH server can’t run for longer than a month without crashing, something’s wrong and needs to be fixed.

matt1 · November 20, 2022, 7:47am

Potentially, but there are different types of spreading the wear and tear over the flash and good luck getting an answer as to what each card is doing internally. Some cards may actively make sure all blocks are used evenly, whilst other cards may simply write to a random free space and not track how often it is used, just using a random location will help increase lifespan and even cheap cards are probably doing this.

I don’t see a good reason to reboot daily either, nor do I see a reason why it should not be done. You will get glitches/holes in your persistence/history data when you’re restarting for a reason why you would not want to. However, there will be an advantage to doing a restart at some point as computers are not perfect and do make mistakes and the errors can accumulate. This is why big companies spend lots of $ on decent hardware that can detect and correct errors in realtime. Sorry to say it but a $50 computer is not high-quality hardware, but it is excellent value. How often is a good time to restart? I have no idea as I am not an expert, but I think that you should do regular security updates and doing a reboot after completing updates is going to be often enough.

As a side story, I did go away for 2+ weeks recently and openHAB worked great, however my modem had a fit and stopped working on the 2nd day I was away, so I could not remote in. Power cycled the modem when I got home and all was good again without restarting openHAB… So many devices have to work flawlessly to keep a smart home running.

mstormi · November 20, 2022, 10:02am

Any reboot means service interruption. Any reboot has the potential that your system fails to return to a proper working state. There’s gazillions of things that can go wrong on a restart and the least thing you will want is that it happens when you’re not there on standby to fix problems right when they show up.

Proactive restarting is 80’s or at best 90’s thinking and a real No-Go in modern system design.
Companies to operate internet services would never ever think of proactively (needless) rebooting their servers. There is a lot of wisdom in the saying “never touch a running system”.

Availability isn’t about HW reliability any more. It’s all software design. If services fail today it’s software 99+% of the time. Yes HW can break so you need to prepare and have a spare ready, but companies don’t use $5k servers because they’re more reliable than a $50 SBC.
The only reason they pay a premium for HW redundancy like dual power supplies is that the server is located in some remote datacenter and that it’s more expensive to repair it even if that’s only needed once in some years.

openHAB is designed to run 24/7 and openHABian is designed for system availability.
The SD mirroring feature is built the way it is because it even allows you to talk some layman like a family member or neighbour of your summer cottage through a replacement process from remote.

You cannot because no SD vendor tells and even changes without notice. It’s an irrelevant side branch leading away from the main points anyway. For the whole story see Corrupt FileSystems every 2-3 month? - Setup, Configuration and Use / Beginners - openHAB Community

pleedell · November 20, 2022, 2:39pm

Thanks. I’ve checked the java heap allocation as well in the karaf console often and it is always well below max so agreed, likely not a heap issue (although I’ve added this to my logging, good tip). If ram isn’t the issue and heap isn’t the issue then the only constantly increasing metric I have to correlate with weekly failures is threadcount. But I don’t think it’s an OH issue based on heap metrics. I’d tend to point a finger at openhabian but I’m sure it would be far more frequently reported, and it isn’t. At any rate I’ll continue to monitor it and see if I can come up with anything else.

Would you happen to have a trace of your threadcount over time that you could share?
Cheers.

rlkoshak · November 20, 2022, 6:09pm

I just want to throw in a perspective that hasn’t been addressed yet. Things have a greater chance of breaking when undergoing a change. A reboot can result in a pretty significant number of changes so the likelihood of something breaking increases when you reboot.

From that perspective, preemptively rebooting the machine increases the likelihood of something going wrong.

That’s all inline with what has already been discussed.

This is something that I do not think gets enough attention in home automation in general which I’ve been spending a good deal of time considering. My SO has flat out stated “I don’t know what I’m going to do if you are gone.” and the home automation part is the least part of it. She is not technical and has no interest in the routine maintenance required to keep all this stuff going so, if I’m ever incapacitated and cannot maintain the system, I’ve instructions for what she needs to do to basically pull everything out and return to a “normal” house and network (the part she’s really worried about is the firewall and ad blocking and parental controls I’ve put on the network).

For me, that’s not too hard actually because I practice what I preach and create escalators instead of elevators (an escalator can still be used as stairs when it’s broken). So all she has to do is change a couple of wires, one setting on the WiFi (turn it into a router instead of just an AP) and turn off the computers in my office.

Unless you have family that is as interested in keeping it up, the best you can hope for is a graceful way to rip it all out. So keep that in mind as you design and build your home automation system.

As for handling cases where you are away and something goes wrong.

VPN can help solve a ton of problems. I’ve managed to resurrect my system from a weird breakdown from 1000 miles away using Tailscale and JuiceSSH on my phone.
Documented and simple procedures that a child could follow in a pinch is an additional layer of protection. Stuff like I described above. In an OH context, maybe having a spare SD card that’s ready to just swap in and run when the need arises.
Make sure your home automation fails gracefully so even if nothing is working, you can still control the lighting, HVAC, etc. in the traditional manual ways.

pleedell · November 20, 2022, 6:47pm

IMO the likelihood of a large change related impact increases the longer you go without rebooting. The choice here is more frequent but hopefully less impactful and easier to diagnose change impacts, or less frequent but more impactful, compounded, and more difficult to diagnose impacts. There are pros and cons to each approach, but I think saying ‘I don’t know what will happen if/when I reboot’ isn’t the correct thought process.

The ‘what if I’m gone’ question though is something that may warrant its own page in the documentation though. I’ve told my family that if I’m hit by a bus I’m quite happy to continue to turn off lights, turn down the temperature, and close garage doors from beyond the grave. But I’m sure there are better approaches

mstormi · November 20, 2022, 7:11pm

No and no. It’s quite the opposite: hyperactivity creates many of these problems in the first place.
If you don’t change anything, there aren’t “new” problems. They have either been there all the time or they are not. They do not appear out of nowhere and pile up.

That sort of argumentation applies in a discussion how often to upgrade - but not to restart frequency of running, working systems. They don’t change while running.

Noone said that.

Don’t mix with but keep in mind there’s also an ongoing need for maintenance, e.g. to proactively fix security issues, to upgrade to obtain new features or to intentionally rework your system.
That’s usually not happening on a schedule at regular intervals and time intervals vary greatly depending on your attitude, time investment etc. But you will update software and restart the system as part of your works at some point in time, for most that’ll be some months at most.
To maximize uptime and minimize overall efforts at the same time, these should be the only occasions when to reboot. And you will be available to validate workings and fix issues right away.

matt1 · November 20, 2022, 11:28pm

I could create one but let’s not hijack this thread and go off topic. My threads were going up above 1500 and always rising. Start your own thread if you have an issue.