Corrupt FileSystems every 2-3 month?

aco · August 7, 2016, 6:06pm

Hello.
I use OH 1.7 / 1.8 with wheezy on RPi 2. I have try this setup on different RPi with different SD-Cards. with always the same result:
My problem ist, that every 2-3 months the systems stops to work. It’s not accessible by SAMAB, SSH or the console. A restart report only a file-system error. The SD-Card is also not accessible on my Mac.

Have anybody maybe a ISO from a stabel system (I have a backup from the config-file, but not from the DB from the persistance )

I also open for any suggestions.

sihui · August 7, 2016, 6:13pm

2-3 months lifetime is very short, but it happens …
See this thread for a couple of solutions to move the root system from the sd card to an external device:

Benjy · August 7, 2016, 11:42pm

I’m going strong into my 15th month using the RPi 2, latest version of rasbian. It’s rarely ever been turned off, and when it has I’ve shut it down via SSH first.

What persistence service are you using, and how many items are you persisting? Currently I’m using mySQL, and only persisting 15 items. I’ve also never connected to my.openhab with this system, since I’m able to generate my own TTL certificates and connect via HTTPS, not sure if that changes anything but I know many people do use this service.

If others can state how long their RPi systems have lasted, that’d be useful. If it’s a very common occurancing I might consider moving root to an external also.

sihui · August 8, 2016, 6:13am

18 months, one corrupted sd card after 6 month, that’s it (not too bad I think).
But I will move the root system to an external device soon, just to make sure …

aco · August 8, 2016, 6:27am

I use rrd4j (moved from mysql) with about 20 datapoints from one binding (modbus).
Think I will try the solution with Synology NFS-Share. Hope this will help.

btw: I use RPi B as “Telefon-blocker” since about 2 years without any problem.

Spaceman_Spiff · August 8, 2016, 8:01am

You have to mount var/logs, openhab/logs and openhab/etc (or the whole openhab) as a tmpfs folder.
This will significantly increase your life time.

sihui · August 8, 2016, 8:18am

I mounted /var/log to tmpfs a long time ago but had problems restarting the raspberry, so I changed that to /var/log/openhab and that works fine. But don’t forget to save the log files before restarting or they are gone

Also mounting the /tmp folder to tmpfs via
tmpfs /tmp tmpfs defaults,noatime,nosuid,size=100m 0 0 in the /etc/fstab
works great.

Spaceman_Spiff · August 8, 2016, 9:52am

I have no problems with the /var/log on all my three raspberry pi 2.
But I mount them via script, which runs on startup.

Also I forgot to mention /tmp. Thanks for the hint!

sihui · August 8, 2016, 10:10am

I did it with /etc/fstab and the system complained during restart because the folder is not available (tmpfs creates the folder later in sequence (too late)).
So it looks like doing it with a script is the better way …

rlkoshak · August 8, 2016, 4:38pm

If you run a Google search on “Raspberry Pi SD corruption” you will find tons of postings and articles on this problem. It is a well known limitation of SD cards. Supposedly certain brands experience this wear out much more frequently than others (apparently the main brand and more expensive ones work better/last longer).

Personally, while not running OH on Pi’s, I’m running other stuff on Pi’s including an IP cam. I’ve never experienced an SD card corruption, though I have Samsung SD cards in them so maybe that is one of the better brands.

There is a special Raspberry Pi distro called (Nard)[Nard SDK] that basically runs everything in memory and only writes OS changes made in memory to the SD card. It claims to even allow one to hot swap the SD card and it will dump its current OS config to the new card automatically. Pretty cool stuff actually.

I’m currently in the process of configuring my logging to go to a central rsyslog server so I can convert my Pis to run in read only mode and use either a RAM disk or NFS disk for data storage. I’m less concerned about the SD corruption issue than I am in security (I can trust a ro platform more than one monitored by Tripwire). I can easily use Ansible to switch it to rw mode, install updates, then switch it back automatically. It’s slow going though.

vespaman · August 8, 2016, 4:57pm

Andreas,
which brand / type of uSD cards did you use, and which filestystem(s)?
Brands such as Sandisk, with free space (the more, the better), will normally (at least on later silicon), distribute the wear over the free area, whereas cheap uSD will not. The difference is substantial.
But also the filesystem, and mount options contribute to wear.

aco · August 8, 2016, 5:26pm

I used mostly sandisk. But also tried with samsung. Always the same. I have also other RPis in use (Homematic (Raspimatic), CCU.IO, Telblocker) All of them are running about 6 Month. Some more then two years. Only the RPi with OH have so fare this problem (That’s why I thought it is a problem from OH) I also have exchanged the RPi. Same result.

But no mater. I will try with Synology as a NFS-Server.

Benjy · August 8, 2016, 5:32pm

Fair enough! It is understandable with SD cards but I’ve never had the problem before so didn’t think it was a common occurrence, might invest some time in an external boot in the next few weeks.

vespaman · August 8, 2016, 5:39pm

I assume you are not using virtual memory? Mounting a nfs share certainly should fix the issue, with the cost of a more complicated set-up.

stephen_winnall · August 9, 2016, 7:48pm

I stopped using a Raspberry 2 because I was running short of memory. OpenHAB, being Java based, put rather a strain on it.

I switched to a PC Engines APU 2 with a 16 GB SSD, runnung under Ubuntu 14.04. The first SSD failed, and, after the second one had been running for a while, I realised there were periodic disk problems which caused the system to freeze.

So, periodically, I automatically restart OH in the middle of the night. The disk problems are a thing of the past. I recently had to switch off the system for some msjor rewiring: it had been running perfectly for 358 days according to uptime.

Steve

aco · August 9, 2016, 8:02pm

Hi Steve. Great Ideea! I use the same Board with IPFire since almost 3 years. Think i will try too

B34N · August 9, 2016, 8:31pm

I had what I thought was curroption but was actually my cards getting filled with logs. I thought I’d just throw it out in case it might help.

greenoid · August 10, 2016, 4:05pm

See the discussion about corrupted file systems on RaspiPi[1-3] at

Even when you

shutdown before pulling th power plug
use a good PSU / power plug (min. 2A)
use sd cards of known brands
in my experience there is no guaranty the file system doesn’t get corrupted.
With some SD-Cards (which worked good in other systems) the
apt-get update
apt-get upgrade
after a fresh install triggered enough write activity on my Raspi2 that the file system got corrupted and the system didn’t came up after reboot.

Something in the hardware and or firmware of the Raspberries keeps on eating sd-cards, may it be voltage spikes while shutting down or timing problems while writing data to the card, I don’t know.

So my solution was to shift the system on an usb stick and boot from a boot partition on a sd-card mounted read-only (from which I have a backup).

BTW: OH on my Synology NAS proved to be too inflexibel and after each DSM (OS) update the risk was high that OH didn’t run anymore.

Spaceman_Spiff · August 10, 2016, 9:03pm

As already written in the thread, the log-files and rrd4j will corrupt every sd-card in a couple of month.
Frequent writes is what kills the card, so this has to be prevented.
This can be done via a read only partition or mounting the folders to tmpfs (ram-disk).
To prevent data loss on reboot the data can be persisted on shutdown or cyclic once a day.

mstormi · August 12, 2016, 7:44am

Most statements on SD card wearout and filesystem corruption you can read about on this forum and the internet are missing the important points so I try to de-mystify and correct misconceptions with this post. It is constantly being updated and is meant to serve as a user reference. You probably got directed here by me or some other forum responder in response to a question or post of yours.
If you find any of the information contained herein to be incorrect please let me know.
Note I’m assuming you’re using a Raspberry Pi to boot off the internal SD reader.
Information may or may not apply to other computers or modified RPi setups.

Power

File storage corruption can happen when your server loses power while writing to disk - SD cards in particular - because every flash controller provides some caching memory so not every finished write command really means that the data was successfully written to the medium. Note this isn’t about file system level handling.
Power losses happen a lot in home automation setups, particularly if you’re in a build phase and there’s you or others to work on the electrical system of the house.
Fortunately there’s a simple solution: get a UPS. There’s RPi addon HATs such as the one from Waveshare, but you should also consider getting an external unit. Most of those add surge protection, too, and allow to run the OH server, Internet router and other critical systems on battery for at least a couple of minutes.
If you’re using a RPi, you might be tempted to use a simple powerbank as a UPS but make sure to get one that allows for charging and powering at the same time - most do not provide this capability and none of the vendors tells you upfront.
Any power supply (uninterruptible or not) must be able to provide the full amount of power the regular power adapter for your RPi provides. Raspberry foundation recommends 2.5A for a RPi3 with power hungry USB peripherals, and for the RPi4 they even had to move to USB-C and supplies to provide 3A. Common supplies are 1A or 2.1A at most.
It’s true that you usually get away with 1A, but you must not forget to factor in all your RPi HATs (HArdware-on-Top) and USB-attached devices as well as to remember that you need to size your system for peak power consumption such as at boot or backup time and not for the average value.
Note that if underpowered, a RPi3 or older will power down components, the USB chipset being among the first. On RPi 1-3, Ethernet is connected via USB so a first symptom of this to happen usually is network problems.
You’ll usually get to see ‘under-voltage’ messages in syslog, too, as well as the lightning symbol on the screen if you connect HDMI. The red power LED on newer RPIs (3, 4) will also flicker.
OFF means input current is insufficient.

SD and other media

Second, with maybe one exception (see #1 below), there’s no way to increase reliability of a SD card. They suffer from wearout leading to corruption, and you can do little about it. Reality is even worse because this is not a SD thing but a memory chip thing: the very same technology is used in USB sticks, eMMC cards and even SSDs, so the following applies even if you use one of those.
There’s variances w.r.t. error-free runtime, some cards or models or brands are better than others, but all but the most cheapest SD controller/card combos do wear levelling to some extent already. Read on if you’re interested in details.
Unfortunately, that isn’t enough. All of this is ignoring the fact that once setup, openHAB keeps writing to SD again and again in rapid succession (logs, data recording and paging) - wear levelling may simply not be enough in this case.
For what’s it’s worth, unlike many believe, SD size is not a good indicator for buffer size - twice the size does not imply all the extra capacity is available as extra buffer and it’s only doubling buffer at best. And even those with a large buffer fail at some point in time.
You should be getting a card that the vendor tagged to be suited for video recording. They have larger buffers and often also use more resilient electronics (storing less bits per cell) and can thus stand more writes. The major vendors (SanDisk, Samsung, Kingston) are selling these with an ‘Endurance’ name extension.
Don’t confuse with cards tagged as ‘Industrial’. While those may have a larger buffer, too, they’re not guaranteed to. ‘Industrial’ merely refers to use environment i.e. this is about temperature range, mechanical stability etc and not about buffers and wearout.
Also don’t think eMMC gives you any advantage in terms of reliability, it’s rather the opposite.
eMMC is essentially the same flash chips as are used in SDs but soldered to the controller.
So it’s usually a daughterboard rather than a card hence it’s much more cumbersome and expensive to exchange it (like it is when your Tesla is hit by this very same wearout problem).
It’s sometimes faster but then again, disk speed is not important to openHAB - it’s all about reliability. However as we’re just touching this, get an “A1” or even “A2” rated SD card. It’s not more reliable but faster than traditional ones rated UHS-1 or less under random access conditions as we have them in openHAB(ian).
On SSDs, be aware they also use the same flash memory. But all except the cheapest of them have a DRAM cache, too. That effectively results in relatively few writes to flash memory which is why flash wearout affects them way less than it affects SD and eMMC.
SSDs have drawbacks including some that pose yet more threats to overall system availability, such as power requirements and boot issues, in addition to cost and packaging and that it just isn’t there in a RPi by default. All of these are reasons why I will not recommend to use SSDs over SDs. Your mileage may vary if you feel this is of relevance to your own situation or not.

Advertising break
If you shop on Amazon, use smile.amazon.de and select openHAB foundation to donate to. Thanks !

Either way, to select a ‘better’ card or ‘proper’ medium is no solution to the corruption problem.
You need to take a complementary measure (#2 below).

There’s two real useful things you can do to fight corruption:

reduce write operations (to SD or flash memory in general)

Ideally, put persistence, logs and swap into RAM and sync them to a permanent medium.
You can use any permanent medium (USB stick, SSD or NFS mount on NAS) to put these on.
Losing RAM (on reboot) or the medium with these files is not critical. openHAB usually keeps working, and you can restore them from backup.
Corruption of the system and data you need to keep on the other hand side is critical.
in a nutshell: use ZRAM.
That’s a RAM disk with compression for swap and the most active directories.
See this thread.
I recommend to keep existing swap as a fallback solution. Note the ZRAM swap is created with a higher priority so it’s used first.
adding an option like commit=60 to /etc/fstab will result in files being written to the medium only every 60 seconds, greatly reducing the number of writes, but note it doesn’t apply to swap or NFS.

Backup

Reducing writes and moving write-intensive files off the boot medium is a small one-time effort and will greatly reduce the risk of a SD card corruption caused crash, but it won’t fully mitigate it, the takeaway point is that to offload logging and persistence all by themselves is not sufficient. So either way, you also need to

make daily backups
This will not increase runtime, but it will mitigate the impact of a SD (or USB stick or USB attached SSD or other disk) crash or accidential admin failure.
openHABian now comes with Amanda, a professional backup system.

You might be unaware that openHABian is not just a RPi disk image - it is a set of scripts that can be installed on top of any Debian like UNIX as well. Amanda is available on x86, too.
Some features, however, are only available on RPi hardware.
If on RPi, make sure to use the new SD mirroring feature in openHABian to clone your SD right at installation time or via menu option. In case of crash, you just need to exchange SD cards and you are good to go.
Here’s the link to the official docs.