How to debug OH2?

I am using OH2 (latest stable release) via openhabian on a Raspberry Pi3.
I haven installed only a few bindings and setup the whole system.

Everything works fine… for some hours, sometimes even for 2-3 days.
After this time “some” rules are no longer executed (but some others (very few) are!).
On the other hand status updates (someone opens a window) are recognized (I see the change in the event.log).

I don’t see any errors in the logfile. CPU and Memory on the Raspberry are fine and OH2 consumes not more than before.

I haven’t any clue how to debug this. I already tried to uninstall binding by binding for a few days but that didn’t help.
I want to know what is causing this, how can I find out?

Best,

That is a tough one. You are already doing what I would recommend.

Do you see anything in common with the rules that stop working?

Are you using long Thread::sleep calls (anything more than 300 is considered long)?

Are you using ReentrantLocks in your rules?

Thanks for your reply.

I don’t see anything in common, but it feels like it occurs more often if my girlfriend and me return at the same time at home. That triggers several rules (stop the phone-answering machine, switch some poweroutlets on, disable the alarm system and some more). BUT: I have disabled these rules also for testing and OH2 stops working also, with these rules not loaded.

Indeed I use longer sleeps in some rules but they are only triggered if a window is opened while a blind is down (will open a bit after a while).

I already removed all reentrant locks because I thought they might block themselfes (for whatever reason).

I’d like to see a possibilty to look “inside” the rule-engine during runtime to understand what it is/isn’t doing or anything like that.

First disable the rules with long sleeps or, even better, eliminate the long sleeps and replace them with Timers (Expire based timers are pretty easy to use).

The behavior described sounds a lot like you are running out of execution threads. Thread::sleeps and ReentrantLocks can cause that to happen.

Are there any other calls in your rules that might take a long time to run which would tie up an execution thread? executeCommandLine, sendHTTP* actions are two sets that come to mind.

There is a limited number of threads available for rules to run in. If you tie one one in one rule and tie another up in another rule and so on eventually there are not enough threads left over to actually execute commands (which is why long Thread::sleeps are highly recommended against.

Thanks for that hint, I will re-organize my rules and try. I will update this thread if I have reliable results.

So I removed all Thread::sleeps from my rules and locks are not used also.
Still I have to restart OH2 one or two times a day because the rules are no longer executed.

Is there any chance to see what is going on “inside” somehow?

Otherwise I will reprogram my rules so that they write every start in the logfile PLUS I will write one triggered by cron every minute. So maybe I find out when this cron-triggered rule stops which rules have been executed before…

If someone has more ideas, you’re welcome… :slightly_smiling_face:

Before spending too much more time on debugging, I would recommend updating to the latest snapshot.

Assuming you do find a bug, you will have to update anyway to prove it is still a bug in the latest baseline.

I know of no ready way to look into OH internals short of setting up a Dev environment and running OH in a debugger. And even if you did that, these sorts of problems tend to be Heisenbugs (i.e. bugs that go away as soon as you start trying to look for them).

Agreed :slight_smile:

Just updated to the latest snapshot. I will wait 'til it’s stable and post again if this still occurs.
Thanks so far.

Just to update this thread, I tried some unstable daily snapshots from 2.2, but there are not good enough for productive use currently (starting with a lot of graphics missing on HabPanel).

BUT I found something else: Ich have two rules that may run a long time even without sleep-commands, and that is when someone leaves the house (and nobody is home anymore): OH should then switch the answering machine on, lights off, alertprofile on, inform via telegram about open windows, close the rollershutters of these windows, turn off the TVs and so on. All the if-then-cases surely take some time. I split this rule into several and it feels that this runs better.

I have to work at another rule (first person coming home) that way and then I will see if that is the solution.

Best, Bernd

OK, just want to report the latest experiences:

No matter what I try, the rule engine of openhab stops working out of nowhere, sometimes two times a day.
I’ve read in several google- and facebook-groups that people restart their openhab-Service via cronjob every night because of this (so the good thing is, I am definetely not alone with this problem).

I have installed a fresh 2.3 stable to a windows system with much more RAM/CPU-Power, things got even more worse than on the raspberry (rule engine stops working every X hours).

There a no more sleeps in my rules and I really took a lot of time to clean them up.

As I am not alone, I wonder if there is really no way to see what happens inside OH2?