Raspi stops working - help needed to identify reason

Hi guys,

i know that this might be more of a topic for the raspberry community but maybe some of you can give me a hint.

Hardware / SW Overview

  • Release = Raspbian GNU/Linux 9 (stretch)
  • Kernel = Linux 4.14.98-v7+
  • Platform = Raspberry Pi 3 Model B Plus Rev 1.3
  • openHAB 2.4.0-1 (Release Build)
  • originally Raspberry power supply
  • Raspberry directly boots from an SSD without SD card
  • Razberry ZWave board
  • Originally Raspberry case

Every now and then (speaking of weeks) my raspberry completely crashes. SSH is not possible and the network (LAN) is down
Today it happened again and I’ looking at the logs to identify the reason, but I can’t figure it out (or maybe i am looking at the wrong logs): The “booting” entries in the logs are me disconnecting and reconnecting the power

Syslog

Jun 20 13:03:55 openHABianPi influxd[527]: [httpd] 127.0.0.1 - openhab [20/Jun/2019:13:03:55 +0200] "POST /write?consistency=one&db=openhab_db&p=%5BREDACTED%5D&precision=n&rp=autogen&u=openhab HTTP/1.1" 204 0 "-" "okhttp/2.4.0" 185fa337$
Jun 20 13:03:57 openHABianPi influxd[527]: [httpd] 127.0.0.1 - openhab [20/Jun/2019:13:03:57 +0200] "POST /write?consistency=one&db=openhab_db&p=%5BREDACTED%5D&precision=n&rp=autogen&u=openhab HTTP/1.1" 204 0 "-" "okhttp/2.4.0" 199117f9$
Jun 20 13:02:52 openHABianPi kernel: [    0.000000] Booting Linux on physical CPU 0x0

Messages (ttyAMA0 is the Razberry board)

Jun 20 12:17:50 openHABianPi kernel: [1170492.527093] ttyAMA ttyAMA0: 1 input overrun(s)
Jun 20 13:00:04 openHABianPi kernel: [1173027.215670] ttyAMA ttyAMA0: 1 input overrun(s)
Jun 20 13:02:52 openHABianPi kernel: [    0.000000] Booting Linux on physical CPU 0x0

daemon.log:

Jun 20 13:03:52 openHABianPi influxd[527]: [httpd] 127.0.0.1 - openhab [20/Jun/2019:13:03:52 +0200] "POST /write?consistency=one&db=openhab_db&p=%5BREDACTED%5D&precision=n&rp=autogen&u=openhab HTTP/1.1" 204 0 "-" "okhttp/2.4.0" 16c3aba2$
Jun 20 13:03:55 openHABianPi influxd[527]: [httpd] 127.0.0.1 - openhab [20/Jun/2019:13:03:55 +0200] "POST /write?consistency=one&db=openhab_db&p=%5BREDACTED%5D&precision=n&rp=autogen&u=openhab HTTP/1.1" 204 0 "-" "okhttp/2.4.0" 185fa337$
Jun 20 13:03:57 openHABianPi influxd[527]: [httpd] 127.0.0.1 - openhab [20/Jun/2019:13:03:57 +0200] "POST /write?consistency=one&db=openhab_db&p=%5BREDACTED%5D&precision=n&rp=autogen&u=openhab HTTP/1.1" 204 0 "-" "okhttp/2.4.0" 199117f9$
Jun 20 13:02:52 openHABianPi fake-hwclock[97]: Do 20. Jun 10:17:01 UTC 2019
Jun 20 13:02:52 openHABianPi systemd[1]: Started Create Static Device Nodes in /dev.
Jun 20 13:02:52 openHABianPi systemd[1]: Starting udev Kernel Device Manager...

So i can’t find anything helpful in those logs.
Since power supply should be sufficient, I was checking the CPU temperature as well but i couldn’t say that the temperature is unusually high. The peak at 70° C comes from rebooting. The problem itself (where you also see the reboot in the logmessages is lower, 64° C):
image

Is there anything else where I could find some useful information or should I think about replacing the PI since it might be a HW related problem?

Thanks for any suggestions.

I have a similar problem with an almost identical setup except for the zwave board.

My system has crashed 3 times this year. The third time I noticed the system load was very high, but I was unable to SSH in to check the reason. OH seemed to run fine for several more hours with load and number of threads gradually increasing. Eventually I powered it off and on, but when it restarted all of the logs had stopped 5 hours previously (system and OH logs). Also on restart the system clock was initially 5 hours behind before it updated to the correct time.

Thanks - it seems i am not alone.
I think I will monitor the CPU load as well as the CPU temperature.
It seems that my system clock is also a bit behind…

If it is not the CPU load i could imagine that in some certain conditions the USB SSD draws too much current but I do not know how to figure that out other than trying another power supply.

It’s digging in the dark. Logs don’t point anywhere in terms of SW.
I’d replace the Pi for a try (you should have a second one handy anyway for backup purposes).
Double-check you have a strong power supply: since you also use a SSD, it should match the recommendation to have 2.5A (or more).