[solved]Trouble with Persistence and stopping cron jobs

j.koopmann · January 19, 2017, 9:00am

Hi,

I am fairly new to OpenHAB coming from FHEM. I started with OpenHAB2 so I have no OpenHAB1 experience. When I start OpenHAB in my configuration most things work as expected and the rest might be explained with some wrong assumptions on my end. However three errors are then happening over time:

RRD4 persistence “seems” to simply stop working. E.g. I am persisting the power consumption on my homemetic plug and have it drawn on the dashboard. Works like a charm and all of the sudden stops working. The corresponding homematic events still popup in the logs. After a restart things are back to normal.
mysql persistence is constantly loosing the connection after some time:

[sql.internal.MysqlPersistenceService] - mySQL: Connection is not valid!

It is not regaining the connection despite

the reconnection counter

reconnectCnt=300

the connection timeout (in seconds)

waitTimeout=28800

Again. Restart of OpenHab and things start working again.

cron-jobs (in rules) stop working. I have several rules with cron triggers like

Time cron “15,45 * * * * ? *”

First line in the rule is a log output. Works a few hours then all of the sudden this stops and again I need to restart. At least in one occasion I am able to break this down to a day change. Last execution is on 23:59:45 and then it simply stops and the next one at 00:00:15 does not happen.

Any idea on how I could start debugging this? I tried many things and I believe I will need your assistance.

Regards,
JP

opus · January 19, 2017, 9:37am

How do you determine that RRD4J stopped working?

j.koopmann · January 19, 2017, 9:42am

The events still show however the graph simply does not plot any data… If you point me to how I could be dumping the RRD database or something like that I might be able to see if it is the “writing to it” part or the “plotting” part that stops working.

As to the cronjob stopping: This might be related to unreleased locks somehow. I read that an Exception within a lock might bring the scheduler to a halt. One lock (which might have triggered at midnight) was missing the unlock part so this might sort of be related. I changed this and added a few debugs. Will keep looking.

opus · January 19, 2017, 9:46am

I assume that you have the logging for RRD4J set to DEBUG and do see the log of those Events.
Please show your line that defines the graphing.

j.koopmann · January 19, 2017, 9:56am

Hi Jürgen,

well you assume wrong and hence thanks very much for pointing this out! Can you point me to the debug statement I need to set in order to enable debug logging for RRD4J only?

Chart item=Kaffeemaschine_Power period=h refresh=600

The “Kaffeemaschine_Power” events are logged to event.log

opus · January 19, 2017, 10:00am

In oder to check the entries in the RRD4J database you can use the REST API, in section persistence use the GET /persistence/items/(itemname) tab. Enter “rrd4j” as the serviceID and a ItemName of your Setup. StartTime will Default to 24 houres Endtime is now, hit the “Try” and you’ll all the entries in the database for that time.

opus · January 19, 2017, 10:14am

In order to see the Log-entries on the karaf-konsole you need to enter “log:set DEBUG org.openhab.persistence.rrd4j” on the karaf-konsole and then use “log:tail” to see the constant logging.

Could you show a graph (screenshot) where the charting “stopped”

j.koopmann · January 19, 2017, 11:11am

log is set. Let’s wait for it to reappear. Currently (after a restart) it is working again.

Regarding the REST API. The “REST API” point in the startup GUI is missing. I see paper UI etc. but “REST API” is not there. Tried it with direct rest calls but probably my syntax is incorrect.

So I fail to run the rest commands you suggested. Graph: The last occurance already scrolled out. The datapoints were missing completely. Not 0, nothing. Will make a screenshot when it comes up next.

j.koopmann · January 19, 2017, 11:15am

Hmmm Somehow I got REST to work:

{
“time”: 1484778720000,
“state”: “0”
},
{
“time”: 1484778960000,
“state”: “0”
},
{
“time”: 1484779200000,
“state”: “0”
},
{
“time”: 1484779440000,
“state”: “0”
},
{
“time”: 1484779680000,
“state”: “0”
},
{
“time”: 1484779920000,
“state”: “0”
},
{
“time”: 1484780160000,
“state”: “0”
},
{
“time”: 1484780400000,
“state”: “0”
},
{
“time”: 1484818800000,
“state”: “2.4275000000000002131628207280300557613372802734375”
},

Does this mean the framework logs 0 (for whatever strange reason since the corresponding change entries in events.log look totally correct at first glance?

opus · January 19, 2017, 12:03pm

Sorry, false assumption again. You need to include under PaperUI-AddOns-Misc the REST Documentation (that name could be better, I think).

opus · January 19, 2017, 12:09pm

Looking at the datapoints, it seems that you are NOT logging at least every Minute which is a requirement of RRD4J to work properly! The timedifference between the shown entries is more then 60 000 i.e. more then 60 seconds!

j.koopmann · January 19, 2017, 12:48pm

Agreed. And I am aware of the restriction. However the corresponding persistence file reads

Strategies {
everyHour : “0 0 * * * ?”
everyDay : “0 0 0 * * ?”
everyMinute : “0 * * * * ?”

// Wenn bei einem unten definierten Item keine Strategie angebeben ist,
// wird die Default-Liste angewendet
default = everyChange

}

Items {
Kaffeemaschine_Power : strategy = everyMinute
Netatmo_Outside_Temp : strategy = everyMinute
Netatmo_WZ_Temp : strategy = everyMinute
Netatmo_SZ_Temp : strategy = everyMinute
}

So it would be my expectation to find a datapoint per minute.

And the logging tells me:

2017-01-19 13:35:00.002 [DEBUG] [sistence.rrd4j.internal.RRD4jService] - Stored ‘Kaffeemaschine_Power’ with state ‘2.50’ in rrd4j database
2017-01-19 13:36:00.002 [DEBUG] [sistence.rrd4j.internal.RRD4jService] - Stored ‘Kaffeemaschine_Power’ with state ‘2.49’ in rrd4j database
2017-01-19 13:37:00.002 [DEBUG] [sistence.rrd4j.internal.RRD4jService] - Stored ‘Kaffeemaschine_Power’ with state ‘930.65’ in rrd4j database
2017-01-19 13:38:00.002 [DEBUG] [sistence.rrd4j.internal.RRD4jService] - Stored ‘Kaffeemaschine_Power’ with state ‘2.49’ in rrd4j database
2017-01-19 13:39:00.002 [DEBUG] [sistence.rrd4j.internal.RRD4jService] - Stored ‘Kaffeemaschine_Power’ with state ‘2.50’ in rrd4j database
2017-01-19 13:40:00.002 [DEBUG] [sistence.rrd4j.internal.RRD4jService] - Stored ‘Kaffeemaschine_Power’ with state ‘1390.039999’ in rrd4j database
2017-01-19 13:41:00.003 [DEBUG] [sistence.rrd4j.internal.RRD4jService] - Stored ‘Kaffeemaschine_Power’ with state ‘2.48’ in rrd4j database
2017-01-19 13:42:00.002 [DEBUG] [sistence.rrd4j.internal.RRD4jService] - Stored ‘Kaffeemaschine_Power’ with state ‘2.48’ in rrd4j database
2017-01-19 13:43:00.002 [DEBUG] [sistence.rrd4j.internal.RRD4jService] - Stored ‘Kaffeemaschine_Power’ with state ‘1247.24’ in rrd4j database
2017-01-19 13:44:00.002 [DEBUG] [sistence.rrd4j.internal.RRD4jService] - Stored ‘Kaffeemaschine_Power’ with state ‘2.48’ in rrd4j database
2017-01-19 13:45:00.002 [DEBUG] [sistence.rrd4j.internal.RRD4jService] - Stored ‘Kaffeemaschine_Power’ with state ‘2.49’ in rrd4j database

so this happens once per minute. Currently that is. Maybe this is the underlying problem that something in the OpenHab2 backend goes berserk at some point resulting in datapoints being not written in 60second intervals anymore. Given that other things in my installation seem to be problematic (including periodic cron triggers) this is a fair assumption and then the persistence and graphing would work and it is just a follow-up problem.

Is the persistence maybe also handled by the QuartzScheduler which I suspect being a general problem in my setup (maybe due to a locking problem throwing the QuartzScheduler into nirwana)?

opus · January 19, 2017, 1:11pm

Can’t say anything about the QuartzScheduler, the only thing I could say is that my persitence using rrd4j is running smoothly without any hickup as well as the charting. No change seen since several builds, actually on #717.

j.koopmann · January 29, 2017, 11:26am

Ok. After fixing the rule which created a lock but never released the lock the stopping cron jobs and rrd4j persistence problems went away. It seams that with a simple forgotten “unlock” you can effectively kill the scheduler…

Now the only remaining problem is the mysql persistence constantly loosing connectivity to the local mysql database. But I will create a new threat for that.

opus · January 29, 2017, 12:50pm

That’s what locks are made for! Not killing a schedulwr, bit preventing it to run twice. Good luck with mysql.

j.koopmann · January 29, 2017, 4:13pm

Sure they are. Thanks for pointing it out. It was my expectation however that the scheduler would trigger a rule no matter what. If the rule then gets stuck on a lock so be it (a timeout would be great). But the scheduler not only does it fire the rule in question anymore but nothing. That was surprising.

andreasdk · April 6, 2017, 6:03pm

I have also big problems with a kind of random cron-rules.
E.g. a rule which should fire at 19:00 first fires at 19:35. Rules at 17:00 or 18:00 does not work, but than a rule at 21:00 works very precise.
How can I find these locks? What kind of locks did you have?