Help with RPI Openhab full hang. (WAS Rule for longest ever uptime?)

The problem is the bigger value getting overwrote by a smaller value it should have stayed as a high value

Ok (BACK TO MY ORIGINAL PROBLEM)

I woke up this morning and my setup was frozen again how can I troubleshoot this problem?

I first thought it was a time related thing as it seems too happen after 3 days that’s why I wanted too make the rule above but this has happened just after 1.5 days

The system goes fully unresponsive no ssh no habpannel or paperui no frontail log rules don’t run nothing happens the system just locks up

Looking at the rpi when this has happened it is still powered the network jack is flashing away so is the sd card light

The only way too get the system running is a forced power off (pull the plug)

@watou will watchdog solve this problem? Is this common?

And everytime you do that, you damage your SD card

You can do a clean power off with a switch on gpios:

If you have a recent back up, I would get a new SD card and do a clean openHabian install and restore openHAB to get myself back up and running on a fresh SD

Have a look at the sys logs, they may give you an indication of what happens before the "lock-down"

Hi again @vzorglub

I know and that’s a big worry sd corruption is already a problem

would the rpi still power off nicely when it’s unresponsive what’s going on here is openhabian unresponsive and that’s why everythings down and the pi is still running fine? or somethings gone wrong on rpi and brought it all down?

Semi recent the system is fully working now and the sd card is only two month old using sandisk class 10 16gb

I do have spares :slight_smile:

Where are the sys logs stored are they oh or rpi related?

  1. It might successfully restart the Pi. 2. I’ve never experienced symptoms like yours. My Pi hardware failures have only been due to worn out/faulty SD cards or iffy power supplies.

Hi @watou

My setup is less than three months old

Brand new everything
Official rpi psu
Sandisk class 10 16gb sdhc a1

It’s doesn’t really matter, if you have corrupted a critical area of the SD card while unplugging

1 Like

@vzorglub

Ok I need too add a button first otherwise I will just be corrupting card after card

But is there no way too check the health of the card like you can with a standard Hdd and wouldent there be more symtoms the system does seem too run fine apart from this one thing

Would my card be uncorrupted if I format it? Don’t want too throw it away :slight_smile:

No. But you can still format it and use it for semi-permanent storage of non critical stuff

It all boils down to your willingness to accept risk. If your SD card got corrupted because of a power outage then the SD card is probably just fine. The problem with power outages is how SD cards work (the following is a gross simplification but it illustrates the problem). There are blocks of storage space on an SD card. Each block may have parts of more than one file in it. When you change a file and save it, everything that is in that block gets moved to a new block with the changes made. If the RPi loses power while writing the new block then not only does the file being written to get lost but all of the other files in that block get lost. Thus even though the only thing being written is your log file you can mangle system critical files like config files or executables.

The SD wearing out problem is different. In this case (again, a gross simplification) each block on the SD card has a finite amount of writes that it will support. Once you exceed that number of writes the block will no longer store new data. To avoid wearing out parts of the SD card too fast, the OS will spread out the writes and load balance them across all the available blocks. That greatly extends the life of the SD card. However, if you have a write heavy application (persistence, lots of logging, etc) eventually you will run out and blocks will wear out. When they wear out we can see some of the same problems described above. The OS thinks that the data was written but the block was worn out so the changes didn’t actually get changed. And as described above, any other parts of files that were in that same block didn’t get written as well and we have corruption.

SD cards wearing out is probably far less common than assumed. But loss of power corruption is very common.

But here on the forum we usually don’t have enough information to know which caused the problem. So we tend to assume the worse and assume the SD card is worn out. But most likely your SD card is fine and just needs a reformat. But there is always the chance that the it is worn out. So you have to decide it if the cost of a new SD card is worth the risk of using an SD card that might be worn out.

2 Likes

@rlkoshak well said!

Thanks for that rich

I’m going too look at making my system more stable and reliable I’m going too add a switch too reboot the rasberry when it’s not working properly if it even responds enough for that, I’m also planning on moving over too Amanda backup from the other method I’m using this will keep my backups in a more recent state than I currently have.

Will just see what happens there

Did you ever look into this for me ?

I forgot. I just created a rule:

rule "Test InfluxDB"
when
    System started
then
    logInfo("Test", "Max temp for last 28 days is " + vWeather_Temp.maximumSince(now.minusDays(28)).state.toString)
    createTimer(now.plusMinutes(5), [ | logInfo("Test", "Max temp for last 28 days is " + vWeather_Temp.maximumSince(now.minusDays(28)).state.toString) ])
end

Every time I run it I get a reasonable value as a result. This value gets updated every couple of minutes so as I run it I sometimes get a slightly different value which reflects that the temp changed over these five minutes 28 days ago.

My most recent run showed:

Max temp for last 28 days is 81.8
Max temp for last 28 days is 81.6
// restart OH
Max temp for last 28 days is 81.6
Max temp for last 28 days is 81.6

But I’ve over a year’s worth of data in this database. Maybe there is something weird going on since you are querying way before you have data in your database.

1 Like

Thanks for that rich no worries about forgetting im pretty prone to it myself :slight_smile:

Maby you are right about that seems too work for you

i would like too get this right as i would also like too add max temp like you have just posted :slight_smile:

All I can suggest is don’t use a time you know is greater than the amount of data you have in the database and see if you get a consistent value. If you only have 5 days of data, don’t use more than now.minusDays(5). That will show if the problem is you are searching past where you have data.

i will edit the rule

how can i find when the first entry is?

1 Like

Hi Again @5iver

I will have a look into that looks nice

Thanks for that

I have followed instructions and checked when my earliest persistence update was added looks like it was on Tue Jul 17 2018 23:27:00 GMT+0100 (British Summer Time)

I also changed my rule too only check 10 days back and restarted my system it changed again so im not sure whats wrong here

Nice Little Tool

1 Like