openhabian/OH2 not reachable

Good Morning,

I use openhabian and OH2.1 on a raspberry pi 3 B.

For the second time I can not access either the SSH or OpenHab (8080) on the system. Ping goes however. Two days ago I noticed that already. Always so around 6/7 clock when our son woke us up the system does not work. The last time I had to work and in the evening ran again everything. Before I now the PI hard reboote I wanted to get rid of a few questions:

  1. Is there a kind of resting state? How do I wake the PI?
  2. Could the Amanda backup lie? Where do I see when this is done? How long does it take? My memory card has 32GB and it should be secured over LAN on a NAS (although the folders (slot1-3) until now are empty. I had configured the backup via openhabian-config three days ago …
  3. Which logs should I look after a hard reboot? (I am a Linux beginner …)
  4. How long would you wait before your system reboots hard?

Thank you

Oki I could not wait for an answere :smile: At home, there are too many everyday scenes on the server

After hardreset:

2017-07-01 04:30:00.351 [INFO ] [e.smarthome.model.script.Luftfeuchte] - TempK: 21.7 LuftK: 69.2 MittelwA: 17 LuftA: 81.00
2017-07-01 04:30:00.365 [INFO ] [e.smarthome.model.script.Luftfeuchte] - Abs. Luftfeuchte - in: 13.20081822668095 g/m3, out: 11.720773167605614 g/m3 dif: 1.480045059075336
2017-07-01 04:33:00.324 [INFO ] [Manager$ExpressionThreadPoolExecutor] - Expression '0 33 4 1 7 ? 2017' has no future executions anymore
2017-07-01 04:33:00.339 [INFO ] [Manager$ExpressionThreadPoolExecutor] - Expression '0 33 4 1 7 ? 2017' has no future executions anymore
2017-07-01 04:33:00.360 [INFO ] [Manager$ExpressionThreadPoolExecutor] - Expression '0 33 4 1 7 ? 2017' has no future executions anymore
2017-07-01 04:33:00.375 [INFO ] [Manager$ExpressionThreadPoolExecutor] - Expression '0 33 4 1 7 ? 2017' has no future executions anymore
2017-07-01 04:33:00.399 [INFO ] [Manager$ExpressionThreadPoolExecutor] - Expression '0 33 4 1 7 ? 2017' has no future executions anymore
2017-07-01 05:00:00.346 [INFO ] [e.smarthome.model.script.Luftfeuchte] - TempK: 21.7 LuftK: 69.2 MittelwA: 17 LuftA: 80.00
2017-07-01 05:00:00.362 [INFO ] [e.smarthome.model.script.Luftfeuchte] - Abs. Luftfeuchte - in: 13.20081822668095 g/m3, out: 11.576072264301843 g/m3 dif: 1.624745962379107
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 2017-07-01 08:53:51.335 [INFO ] [.dashboard.internal.DashboardService] - Started dashboard at http://192.168.1.166:8080
2017-07-01 08:53:51.353 [INFO ] [.dashboard.internal.DashboardService] - Started dashboard at https://192.168.1.166:8443
...

The empty line I can not copy. So here’s a screenshot:

Which logs in the system should I look at?

Tonight the log ends at 3:30 am … I had to restart the server again hard. Has no one idea?

So according to the WWW :slight_smile: Is the syslog decisive …

The last line befor the crash is: “dhcpcd[633]: eth0: no IPv6 Routers available”

Gestern:

.
.
.   
Jul  1 05:17:12 JBHome ntpd[866]: Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123
Jul  1 05:17:12 JBHome systemd[1]: Started LSB: Start NTP daemon.
Jul  1 05:17:12 JBHome ntpd[866]: Listen and drop on 1 v6wildcard :: UDP 123
Jul  1 05:17:12 JBHome ntpd[866]: Listen normally on 2 lo 127.0.0.1 UDP 123
Jul  1 05:17:12 JBHome ntpd[866]: Listen normally on 3 eth0 192.168.1.166 UDP 123
Jul  1 05:17:12 JBHome ntpd[866]: Listen normally on 4 eth0 fe80::23d0:3c47:15be:c5c2 UDP 123
Jul  1 05:17:12 JBHome ntpd[866]: Listen normally on 5 lo ::1 UDP 123
Jul  1 05:17:12 JBHome ntpd[866]: peers refreshed
Jul  1 05:17:12 JBHome ntpd[866]: Listening on routing socket on fd #22 for interface updates
Jul  1 05:17:12 JBHome dphys-swapfile[769]: done.
Jul  1 05:17:12 JBHome kernel: [   14.267164] Adding 102396k swap on /var/swap.  Priority:-1 extents:5 across:200700k SSFS
Jul  1 05:17:12 JBHome systemd[1]: Started LSB: Autogenerate and use a swap file.
Jul  1 05:17:14 JBHome exim4[768]: Starting MTA: exim4.
Jul  1 05:17:14 JBHome systemd[1]: Started LSB: exim Mail Transport Agent.
Jul  1 05:17:16 JBHome systemd[1]: Started LSB: start Samba daemons for the AD DC.
Jul  1 05:17:16 JBHome nmbd[772]: Starting NetBIOS name server: nmbd.
Jul  1 05:17:16 JBHome systemd[1]: Started LSB: start Samba NetBIOS nameserver (nmbd).
Jul  1 05:17:16 JBHome systemd[1]: Starting LSB: start Samba SMB/CIFS daemon (smbd)...
Jul  1 05:17:18 JBHome smbd[1154]: Starting SMB/CIFS daemon: smbd.
Jul  1 05:17:18 JBHome systemd[1]: Started LSB: start Samba SMB/CIFS daemon (smbd).
Jul  1 05:17:18 JBHome systemd[1]: Starting Multi-User System.
Jul  1 05:17:18 JBHome systemd[1]: Reached target Multi-User System.
Jul  1 05:17:18 JBHome systemd[1]: Starting Graphical Interface.
Jul  1 05:17:18 JBHome systemd[1]: Reached target Graphical Interface.
Jul  1 05:17:18 JBHome systemd[1]: Starting Update UTMP about System Runlevel Changes...
Jul  1 05:17:18 JBHome systemd[1]: Started Update UTMP about System Runlevel Changes.
Jul  1 05:17:18 JBHome systemd[1]: Startup finished in 1.520s (kernel) + 18.389s (userspace) = 19.910s.
Jul  1 05:17:19 JBHome dhcpcd[633]: eth0: no IPv6 Routers available
Jul  1 08:53:35 JBHome systemd[1]: Time has been changed
Jul  1 08:54:37 JBHome kernel: [   83.317373] random: crng init done
.
.
.

Heute:

.
.
.
Jul  2 03:17:13 JBHome ntpd[839]: Listen normally on 2 lo 127.0.0.1 UDP 123
Jul  2 03:17:13 JBHome ntpd[839]: Listen normally on 3 eth0 192.168.1.166 UDP 123
Jul  2 03:17:13 JBHome ntpd[839]: Listen normally on 4 eth0 fe80::23d0:3c47:15be:c5c2 UDP 123
Jul  2 03:17:13 JBHome ntpd[839]: Listen normally on 5 lo ::1 UDP 123
Jul  2 03:17:13 JBHome ntpd[839]: peers refreshed
Jul  2 03:17:13 JBHome ntpd[839]: Listening on routing socket on fd #22 for interface updates
Jul  2 03:17:13 JBHome systemd[1]: Started LSB: Start NTP daemon.
Jul  2 03:17:13 JBHome systemd[1]: Received SIGRTMIN+21 from PID 252 (plymouthd).
Jul  2 03:17:13 JBHome systemd[1]: Started Hold until boot process finishes up.
Jul  2 03:17:13 JBHome systemd[1]: Started Terminate Plymouth Boot Screen.
Jul  2 03:17:13 JBHome ntp[775]: Starting NTP server: ntpd.
Jul  2 03:17:13 JBHome dphys-swapfile[772]: want /var/swap=100MByte, checking existing: keeping it
Jul  2 03:17:13 JBHome systemd[1]: Starting Getty on tty1...
Jul  2 03:17:13 JBHome systemd[1]: Started Getty on tty1.
Jul  2 03:17:13 JBHome systemd[1]: Starting Login Prompts.
Jul  2 03:17:13 JBHome systemd[1]: Reached target Login Prompts.
Jul  2 03:17:13 JBHome kernel: [   15.191408] Adding 102396k swap on /var/swap.  Priority:-1 extents:5 across:200700k SSFS
Jul  2 03:17:13 JBHome dphys-swapfile[772]: done.
Jul  2 03:17:13 JBHome systemd[1]: Started LSB: Autogenerate and use a swap file.
Jul  2 03:17:14 JBHome exim4[770]: Starting MTA: exim4.
Jul  2 03:17:14 JBHome systemd[1]: Started LSB: exim Mail Transport Agent.
Jul  2 03:17:16 JBHome systemd[1]: Started LSB: start Samba daemons for the AD DC.
Jul  2 03:17:17 JBHome nmbd[774]: Starting NetBIOS name server: nmbd.
Jul  2 03:17:17 JBHome systemd[1]: Started LSB: start Samba NetBIOS nameserver (nmbd).
Jul  2 03:17:17 JBHome systemd[1]: Starting LSB: start Samba SMB/CIFS daemon (smbd)...
Jul  2 03:17:18 JBHome systemd[1]: Started LSB: start Samba SMB/CIFS daemon (smbd).
Jul  2 03:17:18 JBHome systemd[1]: Starting Multi-User System.
Jul  2 03:17:18 JBHome systemd[1]: Reached target Multi-User System.
Jul  2 03:17:18 JBHome systemd[1]: Starting Graphical Interface.
Jul  2 03:17:18 JBHome systemd[1]: Reached target Graphical Interface.
Jul  2 03:17:18 JBHome systemd[1]: Starting Update UTMP about System Runlevel Changes...
Jul  2 03:17:18 JBHome smbd[1158]: Starting SMB/CIFS daemon: smbd.
Jul  2 03:17:18 JBHome systemd[1]: Started Update UTMP about System Runlevel Changes.
Jul  2 03:17:18 JBHome systemd[1]: Startup finished in 1.717s (kernel) + 18.646s (userspace) = 20.363s.
Jul  2 03:17:19 JBHome dhcpcd[633]: eth0: no IPv6 Routers available
Jul  2 10:58:17 JBHome systemd[1]: Time has been changed
Jul  2 10:58:56 JBHome kernel: [   60.809208] random: crng init done
.
.
.

Does anyone have an idea what my problem is?

var/log/messages:

Jul  1 05:17:11 JBHome kernel: [   13.350791] FS-Cache: Netfs 'cifs' registered for caching
Jul  1 05:17:11 JBHome kernel: [   13.351594] Key type cifs.spnego registered
Jul  1 05:17:11 JBHome kernel: [   13.351623] Key type cifs.idmap registered
Jul  1 05:17:12 JBHome kernel: [   14.267164] Adding 102396k swap on /var/swap.  Priority:-1 extents:5 across:200700k SSFS
Jul  1 08:54:37 JBHome kernel: [   83.317373] random: crng init done

Jul  2 03:17:11 JBHome kernel: [   13.493566] FS-Cache: Netfs 'cifs' registered for caching
Jul  2 03:17:11 JBHome kernel: [   13.494393] Key type cifs.spnego registered
Jul  2 03:17:11 JBHome kernel: [   13.494427] Key type cifs.idmap registered
Jul  2 03:17:13 JBHome kernel: [   15.191408] Adding 102396k swap on /var/swap.  Priority:-1 extents:5 across:200700k SSFS

Bad sd card? Bad power supply?

Thanks for the hint.
I ordered a new and recommended power supply now: “Wicked Chili 3100mA / 15.5W for Raspberry Pi 3 B”. I will test it the next days.
The “SanDisk Ultra Android microSDHC 32GB” is a new one. I hope that one is not the problem… Is there a way to test it?
Can it also be the PI itself?
I am confused that the crashes take place always at a similar time in the morning.

Solution:
Yesterday I have uninstalled amanda backup with:

sudo apt-get purge --auto-remove amanda-server
sudo apt-get purge --auto-remove amanda-client

No more freeze/hold today :wink:

1 Like

This should be watched closely if that happens on other systems, too. @mstormi ?

Well, it’s unlikely to be related to Amanda itself.
But if your backup is too large or you stress the SD card too much or Pi I/O or even the NAS I/O altogether, this can lead to lockups (temporary or permanent) like what you observed.
Actually, to backup a 32GB disk is possibly just too much (never thought anyone would use such a big one, I thought my own 16GB is already way to large).

My suggestion would be to re-install Amanda but to remove the raw ‘disk’ backup (remove the /dev/mmcblk0 line from /etc/amanda/openhab-dir/disklist) so Amanda will only backup the (way smaller) config directories.

To answer your question, Amanda is started from /etc/cron.d/amanda at 01:00 by default, but you can remove or change that entry and you can manually initiate it running 'amdump '.
Note you need to be the ‘backup’ user, so actually it’s sudo su - backup -c "amdump openhab-dir".

PS: eventually you could enable the now commented-out holding disk in /etc/amanda/openhab-dir/amanda.conf (make sure you insert a NAS-mounted directory name to the ‘directory’ line) , that would segment the big dump into multiple smaller segments (as large as the holding disk size definition).

2 Likes

Thank you for that information. I am using a 32GB SD because it is cheap :slight_smile:

Because I am not that good in debugging with linux, I am happy to ceep my system stabel. Due to that I will beackup the files by using the official recommandation for now:

sudo systemctl stop openhab2.service
BACKUPDIR="/mnt/JB-Data/JBHome/openhab2-backup/$(date +%Y%m%d_%H%M%S)"
mkdir -p $BACKUPDIR
cp -arv /etc/openhab2 "$BACKUPDIR/conf"
cp -arv /var/lib/openhab2 "$BACKUPDIR/userdata"
rm -rf "$BACKUPDIR/userdata/cache"
rm -rf "$BACKUPDIR/userdata/tmp"
sudo systemctl start openhab2.service

But anyway some more information to help you debugging:
My infrastructure:

  • TP-Link TL-SG1024DE
  • FritzBox 6490 Cable
  • Netzwerk Gigabit (1200 mhz)
  • Synology DS216j
  • //xxxx/xxxx /mnt/JB-Data/JBHome cifs username=xxxxx,password=yyyyy,file_mode=0777,dir_mode=0777 0 0
  • During the backup time (1am) I have no known traffic or disc I/O

During the configuration with openhabian-config --> Backup/Restore I remember some settings:

  • 40GB
  • 3 virtual folder or something like this
  • backup to nas

Apologize that I’m too cowardly, to install amanda again. I just have low Linux experience and I have to cheat on every error. That costs a lot of time.

What leads me to my next question…:

Is it useful to remove the remaining packages (see “The following packages will be REMOVED”)?

FWIW, I’ve disabled SD backups per default now in openHABian. You can still enable it with an entry in the disklist file.