Help with RPI Openhab full hang. (WAS Rule for longest ever uptime?)

rlkoshak · July 18, 2018, 4:39pm

I don’t think you can show it on Grafana or the sitemap without creating a separate Item. So I think you can do the following:

Longestuptime28daysDate.postUpdate(Systemuptime.maximumSince(now.minusDays(28)).getTimeStamp.toString)

I’m making an assumption that Java’s Date toString is ISO 8601 formatted. If not we need to do a little bit of a conversion.

I can’t find where I posted this before so here it is again.

rule "Keep track of the last time a door was opened or closed"
when
  Member of gDoorSensors changed
then
  if(previousState == NULL) return;

  val name = triggeringItem.name
  val state = triggeringItem.state

  // Update the time stamp
  postUpdate(name+"_LastUpdate", now.toString)

  // Set the timer if the door is open, cancel if it is closed
  if(state == OPEN) sendCommand(name+"_Timer", "ON")
  else postUpdate(name+"_Timer", "OFF")

  // Set the message
  val msg = new StringBuilder
  msg.append(transform("MAP", "en.map", name) + " was ")
  msg.append(if(state == OPEN) "opened" else "closed")

  var alert = false
  if(vTimeOfDay.state.toString == "NIGHT" || vTimeOfDay.state.toString == "BED") {
    msg.append(" and it is night")
    alert = true
  }
  if(vPresent.state == OFF) {
    msg.append(" and no one is home")
    alert = true
  }

  // Alert if necessary
  if(alert){
    msg.append("!")
    val timer = flappingTimers.get(name)
    if(timer !== null) {
      logWarn(logName, name + " is flapping!")
      timer.cancel
      flappingTimers.put(name, null)
    }
    else {
      flappingTimers.put(name, createTimer(now.plusSeconds(3), [ |
        aAlert.sendCommand(msg.toString)
        flappingTimers.put(name, null)
      ]))
    }
  }
  // Log the message if we didn't alert
  else {
    logInfo(logName, msg.toString)
  }
end

The above uses Design Pattern: Associated Items, Design Pattern: Human Readable Names in Messages and some other DPs I’m sure.

Theory of operation: When ever a sensor changes state I postUpdate now.toString to the associated LastUpdate Item. So what you are asking about is in the first 11 lines of the Rule including the rule header stuff and spaces. The rest of the Rule sets a flapping timer (my back door timer flaps when it gets below 25 degrees F) and sends an alert when the Timer goes off if an alert is warranted (no one is home, it is night time, etc).

No. Keep MapDB for restoreOnStartup and only use InfluxDB for data you want to chart, analyze or save for some reason. MapDB never grows and is faster than InfluxDB in most circumstances.

That’s a pretty concise guide. Thanks for posting it. I’ve been meaning to install them on my RPis. But this will only work if the RPi itself goes unresponsive. It can’t, for example, reboot if the RPi’s network only goes offline, right? For something like that I’d need to use something like systemd’s watchdog, right? It’d be awesome if it did work for just certain subsystems instead of the whole system.

Sharpy · July 18, 2018, 5:10pm

I’m not using grafana yet it’s installed but not connected im not up for learning that too atm (is it complex I might have a look ) I’m just using basic charts and timeliness built into habpannel I also don’t mind creating a seperate item either

OK i will keep it I already new it was fast and well suited too restore on startup I was recommended too not replace mapdb with rrd4j was wondering if the advice was different with a different dB

I’m planning on using most of your design patternes at the moment I have just been getting the things too work not the better ways of doing it I have also found them complicated tbh your coding is better than mine so they are daunting

I plan on coming back too then once I understand them more as you know I don’t like adding code too my setup if I don’t understand how it works

I will look at the other rules ect soon need goo be at the computer

Sharpy · July 23, 2018, 10:26pm

@5iver @rlkoshak @vzorglub
I didn’t know who too quote so i just tagged you all

I thought the code posted was working but its not working as expected

It works fine it updates the items but once i have restarted OH the value of Max uptime in 28 days gets overwritten by a smaller value this should not have happened it should only show the longest uptime in 28 days

//Persistence loaded
2018-07-23 23:14:03.797 [vent.ItemStateChangedEvent] - Longestuptime28days changed from NULL to 4.25423611
// The new uptime replaced the record
2018-07-23 23:17:40.902 [ome.event.ItemCommandEvent] - Item 'Longestuptime28days' received command 0.00409722

2018-07-23 23:17:40.913 [vent.ItemStateChangedEvent] - Longestuptime28days changed from 4.25423611 to 0.00409722

This should have stayed at 4.25423611

The Rule and Item Posted below

Number Longestuptime28days "Longest uptime in 28 days" <calander>
Number Systemuptime "System Uptime" <time> { channel="REMOVED" }

rule "Convert System Uptime from Min to Days"
when
		Item Systemuptime changed
then
		Systemuptimedays.postUpdate((Systemuptime.state as DecimalType) / 1440)
end

5iver · July 23, 2018, 11:25pm

You removed the channel… how are you getting the system uptime?

Sharpy · July 23, 2018, 11:48pm

I just removed it from the post lol

5iver · July 23, 2018, 11:53pm

I was thinking that the method you were using may be a factor. But more so, are you persisting the Longestuptime28daysDate item?

Sharpy · July 24, 2018, 9:35am

Yes im persisting that item it got restored on start up but overwrote by the new time

5iver · July 24, 2018, 11:22am

I’m not sure what would cause this to occur after a restart of OH. If it were me, I’d probably look into the database to confirm the data. Or get it from here…

http://<your OH server>:8080/rest/persistence/items/Systemuptime?starttime=2018-06-22T00%3A00%3A00.000

What does your other rule look like? Did the time your Pi jump forward 28 days?!

Sharpy · July 24, 2018, 12:01pm

I will try what you have posted

And the only rule used here is

rule "Convert System Uptime from Min to Days"
when
		Item Systemuptime changed
then
		Systemuptimedays.postUpdate((Systemuptime.state as DecimalType) / 1440)
end

Edit I just realised I posted the wrong rule

I’m on a mobile so can’t retrieve my rule at the moment

I will post back later feeling stupid lol

Sharpy · July 24, 2018, 6:37pm

@5iver

Ok Im at a computer now heres my rule

items

Number Systemuptime "System Uptime" <time> { channel="systeminfo:computer:openHABianPi:cpu#uptime" }
Number Systemuptimedays "System Uptime in days" <time>
Number Longestuptime28days "Longest uptime in 28 days" <calander>

rule "Convert System Uptime from Min to Days"
when
		Item Systemuptime changed
then
		Systemuptimedays.postUpdate((Systemuptime.state as DecimalType) / 1440)
end

rule "Record System Uptime in 28 Days"
when
    Item Systemuptimedays changed
then
    Longestuptime28days.sendCommand(Systemuptimedays.maximumSince(now.minusDays(28)).state.toString)
end

The RPI reports uptime through the systeminfo binding using item “Systemuptime” its presented in mins so i use rule “convert uptime to days” to display the uptime in days i then use the rule “longest uptime in 28 days” too present the the longest uptime within 28 days but this is not working as you know

I have also ran the command you linked

http://192.168.0.94:8080/rest/persistence/items/Systemuptime?starttime=2018-06-22T00%3A00%3A00.000

it returned along string of stuff (TOO BIG TOO POST HERE)

(IM SORRY THATS THE FREINDLYIST WAY I CAN UPLOAD THERE IS STILL MORE)

5iver · July 24, 2018, 9:49pm

Wow, that’s a lot of updates… I suggest you change the update interval to Medium (60s)!

The data is correct. The max value is 4.25 days (6126.1 min) at 2018-07-23 22:08:03. So you’re not getting correct data out of…

Longestuptime28days.sendCommand(Systemuptimedays.maximumSince(now.minusDays(28)).state.toString)

There’s nothing obvious here. The only idea I have is to check the system clock on the Pi.

Sharpy · July 24, 2018, 10:04pm

Where would you change that ? i did stop the sysinfo binding from updating every second it now reports every minute the logs look like its changing every 3 mins

no 4.25 is correct it just changed to 0 after a restart it should have stayed at 4.25

Seems too be fine logs are timed correctly so is the time in SSH console

[23:09:37] openhabian@openHABianPi:~$ date
Tue Jul 24 23:10:40 BST 2018

5iver · July 24, 2018, 10:15pm

Not sure… it was in the docs for the binding. But it looks like you’ve got it changed.

Not sure what you mean… the persistence data was correct but the maximumSince is not correct. This could be an issue in ESH (this works correctly for me with snapshot 1304… and no resets after restart), or somewhere else. Although, I don’t have it in a place that’s as visobly as your usage, so I better double check. Your setup looks to be correct. I’m just guesing here, but maybe an OH update would help.

Sharpy · July 24, 2018, 10:19pm

Was that the systeminfo binding ? there were two items High Priority i set that too update every 60seconds and Medium Priority i set that too 3 mins so yes i think i changed that i thought you ment change a setting for influxfb

It seemed to be working fine the longest uptime was 4.25 days until a restart

the record uptime in 28 days was also 4.25 days but this also got changed too 0 i wanted this too stay at 4.25 as this is the record uptime in 28 days

5iver · July 24, 2018, 10:47pm

Yep… here.

I just tested and this is working for me with JDBC-MariaDB for persistence. I also don’t see anything in the OH or ESH issues for this. Maybe we can find someone else using Influx to test this? I think @rlkoshak uses Influx?

EDIT: I restarted again this morning and everything came up properly…

2018-07-25 08:20:51.514 [INFO ] [smarthome.event.ItemStateChangedEvent             ] - Systemuptime changed from NULL to 46656.3
2018-07-25 08:20:52.508 [INFO ] [smarthome.event.ItemStateEvent                    ] - Systemuptime updated to 46656.3
2018-07-25 08:21:06.268 [INFO ] [smarthome.event.ItemStateChangedEvent             ] - Systemuptimedays changed from NULL to 32.398958329999999250503606162965297698974609375
2018-07-25 08:21:06.273 [INFO ] [smarthome.event.ItemStateChangedEvent             ] - Longestuptime28days changed from NULL to 32.398958329999999250503606162965297698974609375
2018-07-25 08:21:52.515 [INFO ] [smarthome.event.ItemStateEvent                    ] - Systemuptime updated to 46657.3
2018-07-25 08:21:52.517 [INFO ] [smarthome.event.ItemStateChangedEvent             ] - Systemuptime changed from 46656.3 to 46657.3
2018-07-25 08:22:52.530 [INFO ] [smarthome.event.ItemStateEvent                    ] - Systemuptime updated to 46658.3
2018-07-25 08:22:52.541 [INFO ] [smarthome.event.ItemStateChangedEvent             ] - Systemuptime changed from 46657.3 to 46658.3
2018-07-25 08:22:56.828 [INFO ] [smarthome.event.ItemStateEvent                    ] - Systemuptimedays updated to 32.40159722
2018-07-25 08:22:56.831 [INFO ] [smarthome.event.ItemStateChangedEvent             ] - Systemuptimedays changed from 32.398958329999999250503606162965297698974609375 to 32.40159722
2018-07-25 08:22:56.864 [INFO ] [smarthome.event.ItemCommandEvent                  ] - Item 'Longestuptime28days' received command 32.40159722
2018-07-25 08:22:56.866 [INFO ] [smarthome.event.ItemStateEvent                    ] - Longestuptime28days updated to 32.40159722
2018-07-25 08:22:56.869 [INFO ] [smarthome.event.ItemStateChangedEvent             ] - Longestuptime28days changed from 32.398958329999999250503606162965297698974609375 to 32.40159722

rlkoshak · July 25, 2018, 3:26am

I do use InfluxDB. I’ll have a look at it something tomorrow and see if I can reproduce the behavior.

Sharpy · July 25, 2018, 9:04am

The problem is the bigger value getting overwrote by a smaller value it should have stayed as a high value

Sharpy · July 28, 2018, 10:59am

Ok (BACK TO MY ORIGINAL PROBLEM)

I woke up this morning and my setup was frozen again how can I troubleshoot this problem?

I first thought it was a time related thing as it seems too happen after 3 days that’s why I wanted too make the rule above but this has happened just after 1.5 days

The system goes fully unresponsive no ssh no habpannel or paperui no frontail log rules don’t run nothing happens the system just locks up

Looking at the rpi when this has happened it is still powered the network jack is flashing away so is the sd card light

The only way too get the system running is a forced power off (pull the plug)

@watou will watchdog solve this problem? Is this common?

vzorglub · July 28, 2018, 11:06am

And everytime you do that, you damage your SD card

You can do a clean power off with a switch on gpios:

If you have a recent back up, I would get a new SD card and do a clean openHabian install and restore openHAB to get myself back up and running on a fresh SD

Have a look at the sys logs, they may give you an indication of what happens before the "lock-down"

Sharpy · July 28, 2018, 11:10am

Hi again @vzorglub

I know and that’s a big worry sd corruption is already a problem

would the rpi still power off nicely when it’s unresponsive what’s going on here is openhabian unresponsive and that’s why everythings down and the pi is still running fine? or somethings gone wrong on rpi and brought it all down?

Semi recent the system is fully working now and the sd card is only two month old using sandisk class 10 16gb

I do have spares

Where are the sys logs stored are they oh or rpi related?