Been using SSD on my RPi4 to host docker based deployment.
Since yesterday I keep seeing weird errors which remind me the days of SD card issues.
Can it be? It’s running for a year now
Can I check SSD for errors that will effect this?
And most importantly, can the errors below be a result of bad config? I do regularly updates.
EDIT: I’ve just tried to check with smartctl
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: “Execute SMART Extended self-test routine immediately in off-line mode”.
Drive command “Execute SMART Extended self-test routine immediately in off-line mode” successful.
Testing has begun.
However, now I get the issue again, all commands are “not found”, like SD wear stuff
pi@raspberrypi:~/pod $ smartctl -l selftest /dev/sda
-bash: smartctl: command not found
pi@raspberrypi:~/pod $ ls
-bash: ls: command not found
pi@raspberrypi:~/pod $ whoami
-bash: whoami: command not found
Do you do regular backups also?
Did you suffer from a power outage when doing an update/upgrade on OS level?
I do not think your SSD is suffering from the same effects as SD cards. A german IT magazine tested this a while ago and they had to stress the SSDs back then way above the guaranteed limits before the SSDs showed issues/errors. Nevertheless, that was not with the most current technology which brought higher capacities but also lower endurance for a single memory cell.
Is the SSD running the OS in your configuration? If so, I suggest to connect the SSD to a different computer with similar OS and do a file system check. And run the SMART check there as well.
If this proves the SSD is healthy, you might need to setup the SSD again from backups.
There is more than just wearing out that can cause strange errors on any flash based file storage. The biggest and probably most common is losing power in the middle of a write. Flash works by writing whole sector at a time. So when a file is changing, it needs to copy the sector that changed over to a new place. The problem comes in because there is likely parts of more than one file in that sector so if the power is lost in the middle of a write all of the parts of files in that sector get lost.
And due to wear leveling, it doesn’t matter how old the files are. They all move around during writes.
I doubt that the SSD is itself wearing out, though it could be. But ti does seem that bash or some part of your file system may be corrupted which could have been caused by a power loss.
Thanks for the comments.
The RPi is sitting behind UPS, inline one however. And I understand that when there is power outage, the transition to battery do take toll from end devices.
Could it be that during that AC>Battery something happen ?
We did have few outage recently, so it does make sense.
How can I resolve this ? format/reinstall or there is a was to fix it ?
Since everything is dockerized i’m not too concerned if its just that corruption of OS.
But wait, if its OS corruption due to write failure or something, isn’t it strange that it works well again after reboot ?
Yes, if your system is running near the margins for safe operations, a small drop in energy supply (even for a few milliseconds when the systems switches from external supply to battery support) can cause write errors.
I think your best way forward would be to setup the SSD from backups that are known as being “good”.
That is a Good clue that it is probably not corruption of the drive. Flash on a ssd that costs $$$ is well built compared to a 12 buck flash drive and they build in The ability to handle wear and power issues for your xtra money spent. You get what u pay for is often true.
It could be ram has a damaged area but that is unlikely as that would translate to corrupt files over time. I mention because it is possible and people always forget about hardware failures and having backup hardware.
Most likely a software bug.
Reload from a known image and if it is fixed until you apply updates you can then say what the fault is.
Worth looking at the system log file as it may hold a clue.