OHv2: Machine running OHv2 hangs w/o warning after 4 to 6 weeks

Are you using openHABian?

Just looked up my installation notes…

This is what I did:

181110-1240	downloaded 2018-10-09-raspbian-stretch-lite

# add the openHAB 2 Bintray repository key to your package manager and allow Apt to use the HTTPS Protocol:
wget -qO - 'https://bintray.com/user/downloadSubjectPublicKey?username=openhab' | sudo apt-key add -
sudo apt-get install apt-transport-https


# Add the openHAB 2 Stable Repository to your systems apt sources list:
echo 'deb https://dl.bintray.com/openhab/apt-repo2 stable main' | sudo tee /etc/apt/sources.list.d/openhab2.list

# resynchronize the package index:
sudo apt-get update

# install openHAB with
sudo apt-get install openhab2

# When you choose to install an add-on, openHAB will download it from the internet on request.
# If you plan on disconnecting your machine from the internet, then you will want to also
# install the add-ons package.
# sudo apt-get install openhab2-addons


Systems based on systemd (e.g. Debian 8, Ubuntu 15.x, Raspbian Jessie and newer):
sudo systemctl start openhab2.service
sudo systemctl status openhab2.service
sudo systemctl daemon-reload
sudo systemctl enable openhab2.service

# The first start may take up to 15 minutes, this is a good time to reward yourself
# with hot coffee or a freshly brewed tea!

# You should be able to reach the openHAB 2 Dashboard at http://openhab-device:8080 at this point.
http://192.168.1.4:8080

If you are using a Pi I highly recommend you to use openHABian and then try the migration of your configuration again.

my understanding is this was a valid installation option at the time.
I had installed openHabian before but it seemed too restrictive, can’t remember what the issues where I had with it, which made me decide against it.
I think it wasn’t the lite version, hence a lot of things I do not need; something with the username/password that had to be used (I can’t recall).

Just checked, this is what I used: https://www.openhab.org/docs/installation/linux.html

It is still a valid installation option, but you have to do the whole setup with its dependencies manually, which is of course fail-prone. I don’t know, if the manually installation and an accidently missconfiguration is the root cause of your problem. But I also struggled with the manually installation and then decided to give openHABian a chance. Since then I never experienced problems, everything just works flawlessy. And you can use and configure it like a normal Rasbpian. It is just a very clever script which reduces lots of manually effort and failures :wink:

If you’re posting for help, you at least need to provide a comprehensive and detailed description of WHAT does not work, and provide all the details that might be of relevance.

All we can know from your description (on 2nd attempt) is that your machine is broken (or “hangs”) at that point in time and that you deliberately chose a custom (non-openHABian, SSD) OS+HW setup.
A machine to hang does not have anything to do with openHAB in the first place.
To answer your question, if I was harsh I’d say I expect you to google for help outside of this forum …

Well as said it’s not OH to stop working but your machine. And we’re in May now.

I am sorry, but - still willing to help - I don’t think anyone can with that little information - it’s digging in the dark.
I’d check system logs again for hints. And I’d connect a console to be prepared for logging into the system next time this happens to view/analyse the state it is in.

That’s the thing and reason why I’d suggest moving to openHABian as well.
You can but don’t have to use the RPi image. You can also install it on top of your (custom) OS as that’s Debian derived, too.

Thank you all for your help.

My question in essence was: OH died twice this year; something I did not experience in OHv1. The logs do not give me any pointers… where should I look, or what should I monitor / install to get more info when it happens next time.
I have searched and read the forum; googled the issue as well.

What I am hearing is: use openHabian for better support…

Thanks… I may come back if it happens again or have more information.

What bindings do you run?

binding = expire1,fritzboxtr0641,mqtt1,weather1,astro,exec,network,ntp,systeminfo,logreader
ui = paper,basic,classic
persistence = rrd4j,mapdb
action = mail,mqtt
transformation = map,javascript,xslt,scale,jsonpath

Happy to update any bindings if required or recommended.

No I was just curious, don’t see any of the typical trouble makers / memory hogs

And I’m guessing OpenHAB, possibly a separate mqtt broker and the database is the only thing running on the pi

( is rrd4j an external database?)

It depends on the symptoms when it stops. Is just OH unresponsive or is the whole machine unresponsive? That will determine if you can put the monitor on the same machine or not.

Do calls to the web server time out or get refused? Is there unusually high CPU usage? We need to find something we can check to determine when OH has become unresponsive. Once we have that we can write a script that gets triggered by cron and when it sees that OH is unresponsive it will restart OH or reboot the machine if necessary.

Are the spikes in the graph pre restarting OH or post restarting OH? It looks like it gradually uses more and more memory until becoming unresponsive until you restart. Correct?

IIRC you moved to OH 2 because your OH 1 instance had the same problem. This points to that pesky memory leak still existing. What OH1 bindings are you using? It’s gotta be one of those causing the problem. If we can identify it perhaps we can get this fixed for good. But, given your graph, you are no where near using up enough RAM to cause a system wide failure so I think that is probably a red herring.

It looks like the most amount of memory consumed is 58% so you should have room to expand the amount of RAM that Java allows itself to consume. This thread hopefully can help you get that working. If not it should give you the terms to search for. But this would only be a band aide and really only give you an extra week or two before OH crashes. And it would only work if it were just OH that was running out of memory. You are experiencing a near complete system failure.

This is critical information. The problem is much larger than just OH, it’s the whole machine freezing or at least becoming degraded.

Are you writing your syslog to persistent storage? If there is an error discovered it’s going to be logged out there. Focusing solely on OH is not going to reveal the problem. This is a bigger system wide problem.

Are you running Grafana by chance?

The thing is if you install openHABian then we will have a basis of comparison. openHABian will give you the same configuration from the operating system on up that many hundreds or thousands of other OH users are using.

And there is nothing restrictive about using openHABian. It’s just a stock Raspbian Lite with a bunch of scripts to install and configure OH and Mosquitto and some other third party applications.

Because the problem is affecting the entire system, I don’t think the problem is caused by openHAB itself. It is probably caused by something else outside of openHAB, and we don’t know the full set of configuration or changes you’ve made to this machine. If you use openHABian, we will know because it’s all scripted and standard. It’s also very well tested given the number of users.

I should note that if you want to do something more hands on of an installation of openHABian, you can download a Raspbian Lite SD card image, then follow the manual installation instructions for openHABian. That too will give you a near standard configuration.

And once openHABian is installed, you can add in any other software or packages you may need. Though, with the information we have now, I’d guess that it’s one of those that might be causing the problem.

This is a good one. If the RPi is plugged into a screen then there might be something written to the screen (e.g. Kernel panic!) that might not end up in the logs.

The syslog and other general Linux places to monitor. The problem isn’t that openHAB failed, it’s that your entire machine is failed or at least degraded. That means the problem is outside of OH. Since you are not running openHABian we do not and can not have enough information about your configuration to guess at the cause.

No, it’s embedded. As is mapdb.

You running mosquito maybe, that was the issue in my case, some of versions has memory leak, latest version has it fixed

2 Likes

k, thoughts so, thanks for that Rich

He is Karman! from Max’s first post

BTW welcome to the OpenHAB community Karman

1 Like

I had the same problem with freezes after some days/weeks.

Thanks @karman for the mosquitto hint! Uninstalled mosquitto solved the problem for me.

3 Likes

Did you ‘uninstall’ mosquitto entirely, or ‘re-installed’ mosquitto?

I’ve done a complete uninstall (apt remove mosquitto --purge).

And no problems since uninstall until today.

Max, do you know OHv2 has an integrated (built in) mqtt server? Running external mosquitto server is no longer only option. Built in version is to my understanding just easy to configure version (no security) of mosquitto anyhow but may work better for you if your external one turns out to be memory leak

1 Like

Thanks… will have a look at it.

I understand the embedded broker is Moquette (not mosquitto)

2 Likes