RasPi 3B+ does not run stable

I am using OH 3 (openhabian image) on RasPi 3B+

###############################################################################
##        Ip = 10.1.0.21
##   Release = Raspbian GNU/Linux 10 (buster)
##    Kernel = Linux 5.10.103-v7+
##  Platform = Started Raspberry Pi bluetooth helper.
##    Uptime = 2 day(s). 14:12:21
## CPU Usage = 0% avg over 4 cpu(s) (4 core(s) x 1 socket(s))
##  CPU Load = 1m: 0.11, 5m: 0.46, 15m: 0.44
##    Memory = Free: 0.05GB (6%), Used: 0.89GB (94%), Total: 0.94GB
##      Swap = Free: 2.15GB (96%), Used: 0.09GB (4%), Total: 2.24GB
##      Root = Free: 50.01GB (89%), Used: 5.85GB (11%), Total: 58.27GB
##   Updates = 18 apt updates available.
##  Sessions = 2 session(s)
## Processes = 122 running processes of 32768 maximum processes
###############################################################################

                          _   _     _     ____   _
  ___   ___   ___   ___  | | | |   / \   | __ ) (_)  ____   ___
 / _ \ / _ \ / _ \ / _ \ | |_| |  / _ \  |  _ \ | | / _  \ / _ \
| (_) | (_) |  __/| | | ||  _  | / ___ \ | |_) )| || (_) || | | |
 \___/|  __/ \___/|_| |_||_| |_|/_/   \_\|____/ |_| \__|_||_| | |
      |_|                  openHAB 3.3.0 - Release Build

But my system is not stable. Intermittently I lose the interface on eth0 and can no longer reach host, not even ping it. The system seems to continue to run (login with monitor HDMI and USB keyboard is possible) but I can’t see what is causing the problem. I suspect a memory problem. I can pretty much rule out hardware problems or corrupt SD cards. I’m not a Linux expert. Can it be that the large memory space of ZRAM causes the problem?

A sufficient and stable power supply is the most important thing to ensure a stable operation. So this is what I would check in the first place.

This is what I checked before opening this topic. I hat a lot of hardware checks in the last weeks (RasPi, Power supply, SD card). I can pretty much rule out hardware problems.
But what about ZRAM, which uses most of the ram space?

Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/root       61103412 6137188  52441676  11% /
devtmpfs          464768       0    464768   0% /dev
tmpfs             498048       0    498048   0% /dev/shm
tmpfs             498048    1552    496496   1% /run
tmpfs               5120       0      5120   0% /run/lock
tmpfs             498048       0    498048   0% /sys/fs/cgroup
/dev/mmcblk0p1    258095   49347    208749  20% /boot
/dev/zram1        330768   66880    238800  22% /opt/zram/zram1
overlay1          330768   66880    238800  22% /var/lib/openhab/persistence
/dev/zram2        840320   76088    704328  10% /opt/zram/zram2
overlay2          840320   76088    704328  10% /var/log
tmpfs              99608       0     99608   0% /run/user/1000

openHAB is not the only service running on my system, but evrything is running stable if openhab service is down.
I only can say, that I never had unstable effects in a similar configuration with openhab2 and no ZRAM.

But you did not mention it in your opening post :wink:

With ZRAM I cannot help, though. I don’t use it and have no experience with that.
As you are using openHABian it may help to add a tag the topic accordingly. That may attract other users with appropriate background.

Zram isn’t free. It takes a chunk out of memory and there really is almost no memory to spare when running openHABian on an RPi 3. Assuming you really have eliminated hardware and you have deviated from stock openHABian by installing additional services it is likely that you are running out of memory and the machine is just barely able to limp along using swap because it’s run out of memory.

Though, as @stefan.oh hit upon, this sort of behavior is the stereotypical behavior when one has an insufficient power supply on an RPi. You don’t say how you’ve eliminated that as a problem but the symptoms scream a power supply problem.

You probably won’t get much more than this for help though. When you deviate from openHABian by, for example, installing additional services outside of openhabian-config, as you indicate you’ve done, it significantly limits our ability to help. We don’t know what you’ve done and we don’t know how you’ve done it and it’s pretty much impossible to remotely debug such a setup. That’s why the docs warn against doing so so strongly.

*What you must not do, though, is to mess with the system, OS packages and config and expect anyone to help you with that. Let’s clearly state this as well: when you deliberately decide to make manual changes to the OS software packages and configuration (i.e. outside of openhabian-config), you will be on your own. Your setup is untested, and no-one but you knows about your changes. openHABian maintainers are really committed to providing you with a fine user experience, but this takes enormous efforts in testing and is only possible with a fixed set of hardware. You don’t get to see this as a user.

So if you choose to deviate from the standard openHABian installation and run into problems thereafter, don’t be unfair: don’t waste maintainer’s or anyone’s time by asking for help or information on your issues on the forum. Thank you !*

ZRAM is being used to not have to many write cycles to your SD card. It is something like a buffer to keep data in memory and reduce number of write cycles.
As you can see you have two entries zram1 and zram2. One is to ‘mirror’ /var/log folder the other one for your persistence data.

I had exactly the same problem for over a year now, and only two weeks ago (after having exchanged / tried almost everything) seem to have solved it. See here for what I tried: Raspberry Pi 2 loses ethernet connection - #26 by Cplant . Let me know if this helps. Error logs would be helpful as well. Spoiler: It now only seems to run smoothly after having replaced the router (acting as an ethernet switch / probably a random incompatibility).

Thanks Rick for the clear statement. I haven’t read all the docs. I understand the claim of developers to only be able to support efficiently if the scenario is comprehensible and reproducible. But it’s not that easy from my point of view. With the Exec binding, openhab provides a powerful tool that nobody would want to do without. As soon as the Binding scripts calls, a “protected” environment is left.
My question is not about looking for errors in openhab, but about asking the community if anyone knows of similar effects and has already found solutions. Maybe my environment is to blame for the misbehaviour, maybe not.

My suspicion that ZRAM creates a problem has not been confirmed. I reduced the RAM space for log files. The memory utilization of the Rasp Pi looks good.
I still sometimes lose the interface in eth0.
I forgot to mention that I use habPanel. Now I have the suspicion that it could be related to the browser on my computer. I’m using MS Edge and the loss of interface could be related to the browser automatically terminating (due to lack of traffic) and reconnecting.
Firefox has been running in this scenario for the last two days without any problems …

To a degree that you lose link / can’t ping the Raspi anymore?

Anything relevant in the Raspi logs?

I experienced losing connectivity with my Raspi 3b very likely due to SD corruption in the past. After a long list of attempts to improve stability, I ended up with a 4gb Raspi 4 connected to a ups (not a powerbank}.
There is a script running to read ble devices, in addition to openhab. I also use habapp.

This is very stable, for my application of course, even if after some weeks the zram directories become read-only, possibly due to some memory leak that I have not been able to find so far (the system does not report memory java heap size or disk usage as being full, though).

I would suggest that you upgrade to a raspi v4 with more memory or split the services on a second raspberry V3.

This! It still very much sounds like a hardware problem.

But you keep shooting at the dark.
Unless and until you have an error message or other proper indication what else it might be, asking us to help you with that is wasting people’s ressources here on everybody’s part.
Sorry but that’s a very inefficient approach.

So first and foremost, please reproduce your problem with a different RPi and a 2.5A or better power supply. Exchanging your SD is a good idea, too.

Does the system ‘return’ after you cannot ping it anymore ?
Does the red LED flicker at times ? Do you see ‘undervoltage’ messages in syslog ?
Read up on the background here.

you running Python?
just paste this in a file and run it (tested with python2):

#!/usr/bin/env python2

import subprocess

GET_THROTTLED_CMD = 'vcgencmd get_throttled'
MESSAGES = {
    0: 'Under-voltage!',
    1: 'ARM frequency capped!',
    2: 'Currently throttled!',
    3: 'Soft temperature limit active',
    16: 'Under-voltage has occurred since last reboot.',
    17: 'Throttling has occurred since last reboot.',
    18: 'ARM frequency capped has occurred since last reboot.',
    19: 'Soft temperature limit has occurred'
}

print("Checking for throttling issues since last reboot...")

throttled_output = subprocess.check_output(GET_THROTTLED_CMD, shell=True)
throttled_binary = bin(int(throttled_output.split('=')[1], 0))

warnings = 0
for position, message in MESSAGES.iteritems():
    # Check for the binary digits to be "on" for each warning message
    if len(throttled_binary) > position and throttled_binary[0 - position - 1] == '1':
        print(message)
        warnings += 1

if warnings == 0:
    print("Looking good!")
else:
    print("Houston, we may have a problem!")

otherwise just type vcgencmd get_throttled and look the output up:

11100000000000001010
||||             ||||_ under-voltage
||||             |||_ currently throttled
||||             ||_ arm frequency capped
||||             |_ soft temperature reached
||||_ under-voltage has occurred since last reboot
|||_ throttling has occurred since last reboot
||_ arm frequency capped has occurred since last reboot
|_ soft temperature reached since last reboot

hint:

  • 0x0 means nothing wrong
  • 0x50000 means throttled has occurred since the last reboot.
  • 0x50005 means you are currently under-voltage and throttled.
1 Like

Thanks guys for your support. The behavior remains suspect to me.
The only thing I can say for sure is that the effect hasn’t happened since I’ve been using Firefox as the frontend for habPanel.
I had previously exchanged the Rasp Pi 3B+.
I hadn’t seen any signs (flickering LEDs) or indications in the logs (kernel log) of problems with the power supply. I will still order and install another power supply in the coming days.

be aware: AFAIR there’s no logging of undervoltage. Only thing is the above mentioned method. There’s also only some slowly blinking red LED-activity (sometimes not even that) indicating undervoltage => if you’re headless, that’s pretty much the only way to “see” undervoltage.
There’s a symbol appearing in the upper left corner, if you plug in a monitor - that’s mostly all.