Sd-card corruption?

(E. Gerland) #1

Today my system stopped working and I wonder why.
It’s a raspberry pi 2 with sd-card (openhabian 1.3 with openhab 2.1 stable)

It ran stable for about a year. Recently it lost the network connection but a reboot solved the issue.

Today it stopped completely. It revived after unplugging and replugging the network cable but not for long.
Even a restart and copying the last image to the card did not help.
I was able to login right after startup, but the network collapsed after a while.

So, if a Card is corrupt, is the hardware itself crap?
My assumption was that the data is corrupt!?

(Markus Storm) #2

What do you mean by “copying the last image to the card” ? The last WHAT image, openHABian ?
Or a backup image you’ve taken at a time when possibly the original data already WAS corrupted ? You won’t repair bad data by copying it.
Use the current card in your spare Pi (you have one, don’t you ?).
If you don’t, get a fresh card, install openHABian from scratch and try it with that one.
If that doesn’t solve it I believe it’s the Pi HW.

(Rich Koshak) #3

When an SD card goes bad it means that would operations stop working. Not all write operations fail and lots of weird behavior can result. For example logrotate sis working and your files grow and grow until you fill your card. You deleted a file and on a reboot the file comes back. You make a change to a file and save it but the file doesn’t change.

Now lots of services depend upon writing to files to work so these write fails can cause then to stop working. I don’t know if networking is one of them.

So I would test it a fresh openhabian on a new SD card and see if your network remains flakey. If not it is most likely the SD card.

[SOLVED] Question integrity of SD card to save data
(Skinah) #4

SDcards wear out and better larger cards have whats called 'wear levelling" in them.

  1. Always use quality cards that are not FAKE from a reliable store. Samsung EVO Plus 32gb is what I use as they are great at random write speeds compared to most other cards.
  2. NEVER overclock the SD card or the PI.
  3. Use a UPS so the PI does not loose power in the middle of writing a file.
  4. Don’t yank the power out, shutdown gracefully unless you have OS setup as read only.
  5. Stop as much write activity to the sd card as you can if your not setting up as read only.

One way is to use the -noatime switch in /etc/fstab plenty of guides on the rasbian forum cover this as well as other ideas.

(E. Gerland) #5

Thanks for your help.
Sorry, by copying I mean the last backup image (openhab running fine) I made 2 weeks ago.

Good point, running the card in another Pi. I will try it.

Tonight I tried to run the Raspi with OH shut down. The network was down anyway.
Unplugging / plugging the cable helped again.
Does this sound familiar?

Any recommendations regarding SD-Cards I used pretty expensive Samsung Cards (PRO Plus Micro SDHC 32GB, Class 10, U3) , but this did not help (IF the card is the root cause).

I guess you are right to start from scratch. The sad thing is, I started several weeks ago and it takes a while until I am “back on track” (owntracks with TLS and some other stuff.
I though, that maintenance efforts will disappear, that’s why I am also looking for another HW platform

(E. Gerland) #6

Hi Matthew,

thanks for your response.
I actually used exactly this Samsung card and did not overclock at all.

I recently switched to a USB stick with the root, but just after I have seen the network dropping about one week ago. So that was probably too late already.

I guess read only would be possible when using a USB stick, so I will consider this after I have it up and running again.

(Skinah) #7

USB sticks are still flash and hence will.

A. corrupt if u remove Power when writing
B. Wear out. Writes wear out flash faster than reading will. Hence why I use techniques to reduce writes to the flash. Noatime is one method and ram drives is another.

Some people will use USB sticks to hold the log files so they wear out instead of the sd card. You can also use a NAS.

Read only is not really an option for me as u need to load security patches and other changes plus you need persistence to be stored somewhere and log files. So for my setup I reduce writes to the flash and take the approach that I have a spare pi2 and sd card sitting on the shelf tested that can be swapped out at a moments notice. Pi and sd cards are cheap so having a full backup of hardware and software is easy. A UPS stops power outages or surges from being an issue that removes power.

(Markus Storm) #8

As already referenced in the other thread:

(Seaside) #9

If you want a 24/7 environment without worries you have two options as I see it.

  1. A server/nas to hold logs/db etc, I do this successfully over nfs.
  2. An external usb harddrive preferably an ssd.

Using a large sd and minimize writes will work most of the time, but imo it’s not guaranteed to work, eventually with time the card will become corrupted.


(E. Gerland) #10

I have a NAS but it’s usually running during the day only, so a usb stick might be the better choice.

I am confused about SSD though.
Some people say that USB sticks are not better than sd-cards and some even say, that this is the same for ssd.
So does it help to use a ssd instead of a USB stick of the same size or is the reliability on the same level?

(Seaside) #11

With an ssd harddrive writes will not be a issue. It does not even compare with an sdcard or usb stick.

In the example 900 Terabytes were written before failure.

(Rich Koshak) #12

SSDs use a slightly different technology from flash and therefore, based on testing, are reported to last as long if not longer than HDDs.

(Markus Storm) #13

Huh? A NAS to NOT run 24/7 ?
Keep it turned on, and move logs and persistence db over there.
=> way less write activity on your SD, way minimized risk it’ll hit ground again.

That’s how I run my system, too.

(Rich Koshak) #14

I agree with Markus. That is what a NAS is for. You already have in place everything you need to add reliability to your system. And the extra cost of running your NAS 24/7 is insignificant compared to the cost of a new NUC.

(E. Gerland) #15

Yeah, I usually don’t need my NAS at night - maybe this might change soon :wink:

(E. Gerland) #16

Alright, alright, you two made a point. (@rlkoshak, @mstormi) :wink:
I will check the option with the NAS

(E. Gerland) #17

Thanks @Seaside and @rlkoshak

(Daniel Malmgren) #18

I’m currently running OH on a CuBox (Solidrun) with SD card, but I’m planning for buying an Odroid instead and keeping OH on a eMMC card, I understand they’re much less prone to be corrupted. And no need for a NAS with a noisy old magnetic disk :slight_smile:

(Markus Storm) #19

An Odroid C2 is a fine box (got one, too), but don’t be mistaken. eMMC is not better than SD or USB storage, the problem essentially is that those memory chips used everywhere just wear out after a number of writes. The interface does not make a difference w.r.t. reliability.

(Skinah) #20

QUOTE: I have a NAS but it’s usually running during the day only, so a usb stick might be the better choice.
I am confused about SSD though.
Some people say that USB sticks are not better than sd-cards and some even say, that this is the same for ssd.
So does it help to use a ssd instead of a USB stick of the same size or is the reliability on the same level?

NCO: There is USB, SD and SSD, they all run flash of different qualities BUT SSD will do far better active wear levelling AND they most likely have power supplies that keep the power running long enough to dump the caches back onto the flash before the power fails, or they may use the wear levelling to write the changes to a new area before erasing the old area. Better ecc and CRC error checks and things like that make SSD’s far better than platter drives on everything except price. USB and SD do NOT have those very important things so they are essentially the SAME as each other just with different plugs but they are both vastly different to SSD. Sure you can get better flash controllers and industrial grade flash but they both still don’t have the space for what is needed to make a rock solid SSD and this is reflected in the price. Having said this, SD cards can work very well if they are understood and they are very cheap.

Here is why flash causes issues: They need to prepare a BLOCK of flash before they can write to even a small area, so if the flash needs 1 file updated they have to read a BLOCK of the flash into ram and this block may contain 5 other file pieces that do not need to be changed ! They then prepare the BLOCK of flash that wipes it clean and then they write the contents of ram back on to the newly prepared flash. If the power fails at the wrong time before it can write the ram back to flash you loose parts of 5 files that were never going to change on the drive ! Since it was only pieces of files the result is a file that exists but is corrupt.

This is the reason for a number of points:

  1. Putting log files onto a USB drive or ram drive means if the power fails during a log write, it can not then effect other files as the USB only contains log files. Same idea to directing log files to a NAS, if you remove the need to write to the flash you don’t have the flash changing blocks that contain other file pieces as well. I opt to use ram disks as I don’t care about the logs and it is easy to redirect them if needed. USB sticks are cheap so some people prefer to use them.

  2. Linux by default when it READS a file, it will then WRITE what time that file was last accessed, so every read of a file results in a write. Imagine how many times that creates the opportunity for corruption as every block of flash may contain parts of other files as well. Years ago when I started playing with linux I was stunned at how quickly linux would corrupt flash when windows never/rarely did. Many people use the “noatime” switch and most of the time it causes no issues to how a linux machine runs. From doing some quick checks the openhabian image for the PIx by default appears to not update the atime so the noatime switch appears to do nothing as it must be achieved another way. Edit: latest builds of raspbian now use noatime by default when not specified which is great news, if u roll your own u need to check this still.

  3. The PI has a stupid tiny USB port for power and when plugging in it may cut the power in and out, many usb cables and power supplies cant supply enough current to properly power the PI’s. I have been using OSMC for years on a PI2 and they have a great feature that draws a rainbow icon on the screen to show when the power is not enough, does openhabian do this in any way, maybe to a log file? Raspbian last time I used it also used a rainbow icon so if you want to check your power supply and usb cable out it is quick to load a SD card with raspbian and have a play.

This is why I love the PI range so much. Cheap SD cards you can pull out, backup, change the entire system in seconds to another OS and purpose. If you buy a NUC and you get a hardware failure you either have downtime whilst you source another identical NUC that costs heaps and you have downtime for days, or you need to have a spare NUC sitting in your house costing you way more upfront. Spare raspberry PIx is cheap and you can even use your spare PI’s for kodi or other uses around the house and all it takes is a SD card swap out and you have replaced a failed device in seconds. Why change to SSD as then you have to pay more for a spare SSD in case it fails. What happens when your NAS is not reachable with a network issue? Be careful not to make things overly complex as you could make doing backups far harder and swap 1 cause of issues for another that will occur way more often.

To check what files get written use these commands:

sudo apt-get update
sudo apt-get install inotify-tools
sudo nano /proc/sys/fs/inotify/max_user_watches
inotifywait -mr --exclude 0 -e modify,attrib,close_write,move,create,delete /

You need to increase the number in the file /proc/sys/fs/inotify/max_user_watches if you want to do a global watch, otherwise you can watch a specified folder. Add two ‘0’ to the end of the number and save and exit nano. Press Ctrl+C to stop watching with the last command.