OHv2 hangs w/o warning after 4 to 6 weeks

(Karl Nickel) #6

Ah, your graph also shows the free memory and not the memory in usage. So the peak of memory in usage grows over several weeks. There are one or more processes which do evil stuff and leads OH2 to freeze. But it will be hard to find such processes because of this long term. It’s a classical race condition. So you should follow @matt1’s advice: Run OH in a kind of “safe mode” which means disable everything (rules, bindings, etc.) and enable them one by one while looking at the memory consumption.

0 Likes

(Markus Storm) #7

Ok, but do you still use the internal SD for booting ?

If you did search the forum you would get to notice that there’s no general problem like that known.
So your problem is specific to your setup, and most likely it’s not OH but the underlying OS or HW.
Sorry but asking for help like that without giving details of the problem first is a little naive.

What do you mean by “OH sh!t itself” ? Did the java process still run[hang around] ? Was OH restarted ? Is your system setup to restart it … ?
Do you use openHABian ? How are the java -Xmx and -Xms parameters set ?

Expand logging to debug to eventually get more log data on next occurence. Use something like
sudo strace -fp <java pid> to see if java is still doing anything.

And upgrade to 2.5M1 while you’re at it.

0 Likes

(Skinah) #8

I see no evidence that this is related to memory. Heap sizes look great after 21 days up time and the main graph probably accounts for cache which Linux does not free back up unless more ram is needed. As said I see no evidence so looking at ram is most likely a waste of time.

Agree with Markus that the milestone build is worth it as a next step.

What you need to do is find something that either makes the issue worse or makes it go away.

0 Likes

(Max G) #9

no; boots off SSD :slight_smile:

OH sh!t itself = OH stopped working… in fact the whole machine failed; no SSH, ping replied (I have to write this down in the future, as I can’t remember exactly).

Set-up to restart it? No, wouldn’t know how to do that.

java -Xmx and -Xms parameters? = whatever the default is…

“did search the forum you would get to notice that there’s no general problem like that know”
Did read up on it in January but had no time to dig deeper; a google search returnes over 100k hits on openhab stops working, which led me to believe there is a problem.

My java version is:

java -version
openjdk version "1.8.0_181"
OpenJDK Runtime Environment (Zulu 8.31.1.122-linux_aarch32hf) (build 1.8.0_181-b122)
OpenJDK Client VM (Zulu 8.31.1.122-linux_aarch32hf) (build 25.181-b122, mixed mode, Evaluation)

sudo strace -fp … no idea how to read the output… :frowning:

If I’d only knew how … I installed per instructions, nothing extra other than Rasbian Lite, Zulu Java, mosquitto, OH2.4 …

What am I expected to do when OH no longer works?

0 Likes

(Karl Nickel) #10

In your first post you wrote “I migrated from OH1 to OH2”. Did you do a completly fresh install or did you use the upgrade function from “openhabian-config” (if you use openHABian)?

0 Likes

(Max G) #11

… with a fresh install on a new machine, and copied the items, rules, etc.

0 Likes

(Karl Nickel) #12

Are you using openHABian?

0 Likes

(Max G) #13

Just looked up my installation notes…

This is what I did:

181110-1240	downloaded 2018-10-09-raspbian-stretch-lite

# add the openHAB 2 Bintray repository key to your package manager and allow Apt to use the HTTPS Protocol:
wget -qO - 'https://bintray.com/user/downloadSubjectPublicKey?username=openhab' | sudo apt-key add -
sudo apt-get install apt-transport-https


# Add the openHAB 2 Stable Repository to your systems apt sources list:
echo 'deb https://dl.bintray.com/openhab/apt-repo2 stable main' | sudo tee /etc/apt/sources.list.d/openhab2.list

# resynchronize the package index:
sudo apt-get update

# install openHAB with
sudo apt-get install openhab2

# When you choose to install an add-on, openHAB will download it from the internet on request.
# If you plan on disconnecting your machine from the internet, then you will want to also
# install the add-ons package.
# sudo apt-get install openhab2-addons


Systems based on systemd (e.g. Debian 8, Ubuntu 15.x, Raspbian Jessie and newer):
sudo systemctl start openhab2.service
sudo systemctl status openhab2.service
sudo systemctl daemon-reload
sudo systemctl enable openhab2.service

# The first start may take up to 15 minutes, this is a good time to reward yourself
# with hot coffee or a freshly brewed tea!

# You should be able to reach the openHAB 2 Dashboard at http://openhab-device:8080 at this point.
http://192.168.1.4:8080

0 Likes

(Karl Nickel) #14

If you are using a Pi I highly recommend you to use openHABian and then try the migration of your configuration again.

0 Likes

(Max G) #15

my understanding is this was a valid installation option at the time.
I had installed openHabian before but it seemed too restrictive, can’t remember what the issues where I had with it, which made me decide against it.
I think it wasn’t the lite version, hence a lot of things I do not need; something with the username/password that had to be used (I can’t recall).

Just checked, this is what I used: https://www.openhab.org/docs/installation/linux.html

0 Likes

(Karl Nickel) #16

It is still a valid installation option, but you have to do the whole setup with its dependencies manually, which is of course fail-prone. I don’t know, if the manually installation and an accidently missconfiguration is the root cause of your problem. But I also struggled with the manually installation and then decided to give openHABian a chance. Since then I never experienced problems, everything just works flawlessy. And you can use and configure it like a normal Rasbpian. It is just a very clever script which reduces lots of manually effort and failures :wink:

0 Likes

(Markus Storm) #17

If you’re posting for help, you at least need to provide a comprehensive and detailed description of WHAT does not work, and provide all the details that might be of relevance.

All we can know from your description (on 2nd attempt) is that your machine is broken (or “hangs”) at that point in time and that you deliberately chose a custom (non-openHABian, SSD) OS+HW setup.
A machine to hang does not have anything to do with openHAB in the first place.
To answer your question, if I was harsh I’d say I expect you to google for help outside of this forum …

Well as said it’s not OH to stop working but your machine. And we’re in May now.

I am sorry, but - still willing to help - I don’t think anyone can with that little information - it’s digging in the dark.
I’d check system logs again for hints. And I’d connect a console to be prepared for logging into the system next time this happens to view/analyse the state it is in.

That’s the thing and reason why I’d suggest moving to openHABian as well.
You can but don’t have to use the RPi image. You can also install it on top of your (custom) OS as that’s Debian derived, too.

0 Likes

(Max G) #18

Thank you all for your help.

My question in essence was: OH died twice this year; something I did not experience in OHv1. The logs do not give me any pointers… where should I look, or what should I monitor / install to get more info when it happens next time.
I have searched and read the forum; googled the issue as well.

What I am hearing is: use openHabian for better support…

Thanks… I may come back if it happens again or have more information.

0 Likes

(Andrew Rowe) #19

What bindings do you run?

0 Likes

(Max G) #20
binding = expire1,fritzboxtr0641,mqtt1,weather1,astro,exec,network,ntp,systeminfo,logreader
ui = paper,basic,classic
persistence = rrd4j,mapdb
action = mail,mqtt
transformation = map,javascript,xslt,scale,jsonpath

Happy to update any bindings if required or recommended.

0 Likes

(Andrew Rowe) #21

No I was just curious, don’t see any of the typical trouble makers / memory hogs

And I’m guessing OpenHAB, possibly a separate mqtt broker and the database is the only thing running on the pi

( is rrd4j an external database?)

0 Likes

(Rich Koshak) #22

It depends on the symptoms when it stops. Is just OH unresponsive or is the whole machine unresponsive? That will determine if you can put the monitor on the same machine or not.

Do calls to the web server time out or get refused? Is there unusually high CPU usage? We need to find something we can check to determine when OH has become unresponsive. Once we have that we can write a script that gets triggered by cron and when it sees that OH is unresponsive it will restart OH or reboot the machine if necessary.

Are the spikes in the graph pre restarting OH or post restarting OH? It looks like it gradually uses more and more memory until becoming unresponsive until you restart. Correct?

IIRC you moved to OH 2 because your OH 1 instance had the same problem. This points to that pesky memory leak still existing. What OH1 bindings are you using? It’s gotta be one of those causing the problem. If we can identify it perhaps we can get this fixed for good. But, given your graph, you are no where near using up enough RAM to cause a system wide failure so I think that is probably a red herring.

It looks like the most amount of memory consumed is 58% so you should have room to expand the amount of RAM that Java allows itself to consume. This thread hopefully can help you get that working. If not it should give you the terms to search for. But this would only be a band aide and really only give you an extra week or two before OH crashes. And it would only work if it were just OH that was running out of memory. You are experiencing a near complete system failure.

This is critical information. The problem is much larger than just OH, it’s the whole machine freezing or at least becoming degraded.

Are you writing your syslog to persistent storage? If there is an error discovered it’s going to be logged out there. Focusing solely on OH is not going to reveal the problem. This is a bigger system wide problem.

Are you running Grafana by chance?

The thing is if you install openHABian then we will have a basis of comparison. openHABian will give you the same configuration from the operating system on up that many hundreds or thousands of other OH users are using.

And there is nothing restrictive about using openHABian. It’s just a stock Raspbian Lite with a bunch of scripts to install and configure OH and Mosquitto and some other third party applications.

Because the problem is affecting the entire system, I don’t think the problem is caused by openHAB itself. It is probably caused by something else outside of openHAB, and we don’t know the full set of configuration or changes you’ve made to this machine. If you use openHABian, we will know because it’s all scripted and standard. It’s also very well tested given the number of users.

I should note that if you want to do something more hands on of an installation of openHABian, you can download a Raspbian Lite SD card image, then follow the manual installation instructions for openHABian. That too will give you a near standard configuration.

And once openHABian is installed, you can add in any other software or packages you may need. Though, with the information we have now, I’d guess that it’s one of those that might be causing the problem.

This is a good one. If the RPi is plugged into a screen then there might be something written to the screen (e.g. Kernel panic!) that might not end up in the logs.

The syslog and other general Linux places to monitor. The problem isn’t that openHAB failed, it’s that your entire machine is failed or at least degraded. That means the problem is outside of OH. Since you are not running openHABian we do not and can not have enough information about your configuration to guess at the cause.

No, it’s embedded. As is mapdb.

0 Likes

(Karman de Lange) #23

You running mosquito maybe, that was the issue in my case, some of versions has memory leak, latest version has it fixed

1 Like

(Andrew Rowe) #24

k, thoughts so, thanks for that Rich

0 Likes

(Andrew Rowe) #25

He is Karman! from Max’s first post

BTW welcome to the OpenHAB community Karman

1 Like