Sometimes rules don't get executed?

rkrisi · October 17, 2018, 7:52pm

Just for the record:
As a side effect now everything is sooo much faster, snappier…
For example before that, loading Things in PaperUI took about 5-6 seconds, now it is instant…

vossivossi · October 17, 2018, 8:08pm

My advice is to choose more powerful hardware. Everything will work on RPis once OH is started - ok. But the truth is that you will always optimize your system so it is quite normal to

edit item files which will cause everything to reload
edit rule files which will cause everything to reload
upgrade to newer versions which will cause a restart of OH
upgrade your OS, Java, Persistence DB… which will cause a restart
and so on and so on…

You will save lots of time and can test new situations much faster when moving to a more performant server. I am running OH on a Core i7 Notebook with 8 GB RAM and a 256GB SSD. Yes, it is a one time invest but you save time again and again. Power Consumption is 8 Watt in operation (so no advantage for RPis here).
And I have debug logging ALWAYS activated for some bindings (eg. Z-WAVE which produces alone 760 MB logs a day), I have 1000+ items all persisted on a every update strategy to mySQL DB which grows about 6MB each day. With that setup I can follow everything in detail (error situations become much more transparent as you have detailed logs and every update persisted) and performance is outstanding responsive. All this saves me heaps of time. A restart of OH is less than a minute and you know if things work well. So I see no point to choose weak hardware which will only cause a waste of time.
Additionally a notebook has a built-in battery so it is already safe for the case of power failures (you save the cost of a UPS).

rkrisi · October 17, 2018, 8:19pm

Thanks!

Yes, that’s what I’m looking for. Maybe now I invest more money and have an ‘overkill’ setup now, but it can last many years, easy to upgrade, faster, etc…

For example, I have bought a new TV and I have to buy a new media center (I have used an RPi) because it can’t handle 4K well… if I have something built I can easily replace GPU or CPU to increase computing capacity.

mstormi · October 18, 2018, 10:24am

Two times, actually. You should have spare HW ready at hand.

Well it’s no major point, but we all need to start small to save the planet - a RPi uses about half of that.

In most cases you do not since you need your WiFi AP and router to run as well (thus to be on UPS, too).

Granted, adding items or to restart takes significantly longer on RPi although that’s scaling with the number of items and the OP has nowhere near that 1000+ items.

@rkrisi now that you are essentially running that RPi setup you know how fast it is and if you feel that to be sufficient for your daily use. Also no need to hurry, you can still upgrade at any later point in time.
BTW: move persistence data AND SWAP to NAS as well.

rossko57 · October 18, 2018, 10:40am

But if the RPi is made dependent on a NAS … we could save the planet faster by cutting out the RPi

The suggestion that the mystery logjam could originate in persistence or swap feels good. @rkrisi Just get a new SD card while pondering the future!

rkrisi · October 18, 2018, 10:44am

I will I will keep testing because I had yesterday after I posted times when it seemed that the problem is back… will try to move swap and persistance to NAS

rkrisi · October 19, 2018, 8:21pm

Unfortunately the problem is back, nothing really changed… I will replace the SD card if I have time to purchase another one which is fast enough.

Other observations during this issue - maybe someone can help if recognize an error, or maybe I can help later with this post to other people

As I said before, the SSH is really slow that time (when rules are not executed) so it is now clear that this couldn’t be openHab’s problem…
The active threads during these times is 0 - I have executed the karaf command to see live threads:

shell:threads --list |grep "RuleEngine" |wc -l

That one, which returns 1 so the live threads is 0, right? So it seems that threads didn’t stuck, it finishes all of them, but after a time, it can’t start new ones… This confirms my theory above that this is not really an openHab problem. Maybe some hardware problem (SD) or other distro or kernel related…

Ps.: Maybe that can cause issues when my phones receive update it starts 2 rules (one which returns the postal address - we talked about earlier and another one which calculates that the phone is home or not). The Home detection rule doesn’t have any sleep or http request but it is a longer rule (~ 50 lines with lots of local variable). So that means 6 rule instance started at almost together. Can this be a problem if I only have 7? What happens if this 6 instance and two other rule are executed almost together?

rossko57 · October 19, 2018, 11:51pm

This shouldn’t be a problem, nor are ‘big’ rules. OH will run as many rules as thread pool allows, the others will queue for a thread. This does work as designed, pretty well proven by other users. In your case, when the cork pops out of whatever it is, your rules catch nicely up in a flurry, no?

There is scope for your HTTP to mess up with quickfire requests to Google. But this is not clagging rules up either, or you’d see the thread usage. The HTTP action keeps hold of the rule thread until completed or timeout. Again well proven.

Another area worthy of suspicion is networking - WiFi jamming, wonky ethernet cables, blah.

rkrisi · October 20, 2018, 9:11am

The Raspberry is connected with Ethernet directly to the router and I have replaced the cable only a few months ago. I think it will going to be an SD card problem, nothing else comes to my mind, which can fail and I haven’t tried an alternative…

rkrisi · October 21, 2018, 7:58pm

I have found another thing which I only saw when this logjam happens. So when it is back to normal I can see 1 entry like this (always different item):

Could not create rrd4j database file '/var/lib/openhab2/persistence/rrd4j/wifiled2_power.rrd': sync failed

Ps.: This maybe a huge config error by me, but now I can’t get my answer for that. My rrd4j.cfg is empty, just comments, but I have a rrd4j.persist file with the following content:

Strategies {
	everyMinute : "0 * * * * ?"
	everyHour: "0 0 * * * ?"
	everyDay: "0 0 0 * * ?"
	default = everyChange
}

Items {
	* : strategy = everyUpdate, everyMinute, restoreOnStartup
}

Shouldn’t it be in the cfg file?

Ps2.: I have updated my rrd4j persist file, that only persist data which is really used, not all items…

rossko57 · October 21, 2018, 9:15pm

No, examples of rrd4j.cfg and rrd4j.persist here

That may be a big clue, or it may be a symptom (i.e. timeout due to seizure elsewhere). Certainly worth following up here. Empty cfg does not seem right, I do not know what a default cfg looks like

rkrisi · October 21, 2018, 9:17pm

Thanks I will have a look at it!

I almost forgot that when I started testing out OH, I just said to persist all items, now I have a lot more item… So that might be a problem when writing that much item changes to the SD I think…

rkrisi · October 22, 2018, 4:10pm

Think I have found the main issue in my config!
First of all I want to thank for the community to helping me out, now especially @rossko57, @mstormi, @rlkoshak. Thank you guys!

So in general: all of my items was exposed to be persisted by rr4dj and that seems the SD card couldn’t handle it. I don’t know why, maybe the SD card gets faulty, or I just have a lots of items now, compared to the few which I had when I just started ‘playing’ with openHab. That’s why I just quickly setup that like this, however I know that this isn’t the best idea, I had forgot since then that it can cause any problem.