[SOLVED] Strange rules behavior (including timers)

rossko57 · March 30, 2019, 10:47am

I’ll begin by saying other people use timers without problems.
You have quite complex cascading timers and interactions, the problems here are going to come from us not understanding the interactions properly.
Your ‘big picture’ is a bit too big to hold in the head at one time

I really would not go adding yet more variables here.
If you do have a need to “check up” on a timer, it does have methods to see if it is running or finished.

In your most recent example, if you set the lBramaWjazdowaTOpened to null outside of the timer code, and the timer code is running or will still run for whatever reason, then yes it will break the timer code.

When that kind of thing is likely to happen, you can code defensively. Test to see if’s null before using it.
Or you can change the logic - does that other rule need to set it null? If the timer is cancelled, does it really matter about the variable state?

Karol · March 30, 2019, 12:24pm

If main condition is “tSOBW === null”, then yes, I have to set it in the end no null (I think).

Nothing (non of the rules) is modyfing variables used inside each pair of gates rules (4 rules per gate, each pair have it’s own vars), so no outside modification are made.

As you see in last post, timer wasn’t cancel as it should:

 logInfo("KWI.debug.rules","R_DEBUG: tSOBG 9 (" + nSOBG + ") (" + tSOBG + ")")
            sendTelegramPhoto("Karol", "http://192.168.1.100/tmpfs/snap.jpg?usr=admin&pwd=C2H5CooH", "Brama garażowa pozostaje otwarta od " + lBramaGarazowaTPassed.intValue + " minut!" )

//somewere here timer should be canceled...

            sendTelegramPhoto("Ewelina", "http://192.168.1.100/tmpfs/snap.jpg?usr=admin&pwd=C2H5CooH", "Brama garażowa pozostaje otwarta od " + lBramaGarazowaTPassed.intValue + " minut!" )
            logInfo("KWI.debug.rules","R_DEBUG: tSOBG 10 (" + nSOBG + ") (" + tSOBG + ")")

Am I wright?

rossko57 · March 30, 2019, 1:19pm

I have not done any analysis to see why you think the timer was cancelled between those two lines; but sure, it could happen.

The important part is that timer.cancel does not work like that.
Cancel removes a scheduled future execution. If the timer code block is already running at that moment, it will not stop or abort it.
Indeed, in your case many of paths through the timer code block end up with reshedule. Then the “cancelled” timer will run again later. A zombie timer, if you like.

Or at least, it would unless you had also set the handle to null, which would cause the reshedule to fail in exactly the way we see also.

I think we need a “design review” here. What tasks do to in sequence, which ones depend on earlier actions or calculations, when and how a sequence should be aborted.
After that, we’ll work out how to do it.
Maybe start with this Gate one.

As an example, maybe you want gate opening to start turning a camera, wait for that to happen, then capture a photo.
Do we care if the gate closes before that finishes?
Do we check if camera turn completes, or just allow a fixed time?
There is more than one way to do things.

Karol · March 30, 2019, 3:46pm

Ok. I will decribe my logic, but did you see sequence decriptions on the right 4 post above?

There you can see when timer.cancel occures and that there is some strange delay between two entry near tags (only 2 function are executed between them).

rossko57 · March 30, 2019, 9:47pm

Alright, we talking about this?

After tSOBG 9 LOG, your action is sendTelegramPhoto()
I don’t know how long that takes to execute, but it sounds like a job that will take a while. It has to fetch the photo over http from somewhere else before sending it to telegram.

There is a lot of other stuff going on at the same time, so the photo send is competing for resources.

All the time this is going on - the timer code block is executing.
After tSOBG 9 log, but before tSOBG 10 log, we see tSOBG 15-16 etc.
These come from the “Gate Guard - Closing” rule (someone has opened and closed the gate within 12 seconds, not surprising).

This rule -

cancels the timer - which will have no effect while the timer code is running, doing the telegram.
sets the timer handle and the lBramaGarazowaTPassed variable both to null. That will cause an error the next time the still-running timer code tries to use one or the other of them.

So, yep, there’s the problem. Gate opening starts a long running sequence, gate closing tries to abort it and makes a mess.

We could add more complication to try and make a less messy abort process.

I think a more robust approach is to break up your sequence into simpler tasks and review what is needed, and what is not. Why would you want not to send photo just because the gate has closed? We can obviously avoid trouble if we don’t need to stop that part of the sequence.
Other tasks might really need to be stopped, that’s okay we can do that as smaller job.

Not do everything in one long-running job.

Karol · March 30, 2019, 10:22pm

In described scenario gate was initially open and stays that way from:

2019-03-30 09:43:07.959 [INFO ] [arthome.model.script.KWI.debug.rules] - REGRUN: Brama garazowa - Command
2019-03-30 09:43:07.961 [INFO ] [arthome.model.script.KWI.debug.rules] - REGRUN: Brama garazowa - Command
2019-03-30 09:43:08.175 [INFO ] [arthome.model.script.KWI.debug.rules] - R_DEBUG: FCTZZv3 3
2019-03-30 09:43:08.218 [INFO ] [arthome.model.script.KWI.debug.rules] - REGRUN: Ruch bramy garażowej - Otwarta

Then, after about 3m I closed gate at “2019-03-30 09:46:40.122”, what you can see in log. At this particular moment “Gate Guard - Opening” rule was somewhere about telegram lines. Execution of first photo send proces was triggered at “2019-03-30 09:46:34.285”, and the second photo action was triggered at “2019-03-30 09:46:48.543”.

Why in your opinion “Gate Guard - Closing” rule pauses execution of telegtram?
Why would the execution of the photo send functions of both photographs should take more than a few ms? After all, the execution of each command take place in the background, and their execution lasts several-dozen ms. Rule executes them one after another, not waiting them to end. - Am I right? That’s why we need to use timers?

Karol · March 30, 2019, 11:04pm

I think that in my configuration it’s highly probable that in such moments like this with telegram, I’m running out of some threads. Meybe rules threads or other. I don’t know yet, but lets see what was triggered at the same moment in above scenario (teoreticly - some I could miss, and some could just end earlier - don’t coexist with other from list in some moment):

Rule (1): Gate Guard - Open
Rule (2): TTS via Read_Trigger.sendCommand(ON) - Takes about 10-20s to complete
Command (1): CCTV_CamFront_Pos1
Command (2): sendTelegramPhoto
Rule (3): Gate Guard - Closed
Rule (4): Gate Closed
Rule (5): Gate Commands (its subject was not touched (it does not matter) but it is run at the same time).
Command (3): sendTelegramPhoto
and many others Commands (as I assume) and maybe few Rules…

So teoreticly, before 2’nd telegram there was 5 rules running and probalby over a dozen of commands Ii’m not sure).

Could this situation affect my OH to generate such delays, what in the meantime led to handler set to null and whereby exceptoin was generated?

It’s new issue for me so maybe you could share some knowledge with me:

How to monitor free:

threadpool:thingHandler=3
threadpool:discovery=3
threadpool:safeCall=3
threadpool:RuleEngine=15
and others(?)

To which category are included:

Rules,
Commands,
State updates,
Items refreshes,
etc.

Where and how I could trim those values?

My HW is:

Intel® Core™2 Duo CPU E6550 @ 2.33GHz (4vCPU),
4GB RAM.

System threads usage:

openhab> config:list "(service.pid=org.eclipse.smarthome.threadpool)"
> shell:threads --list |grep "discovery" |wc -l
> shell:threads --list |grep "safeCall" |wc -l
> shell:threads --list |grep "thingHandler" |wc -l
> shell:threads --list |grep "RuleEngine" |wc -l
----------------------------------------------------------------
Pid:            org.eclipse.smarthome.threadpool
BundleLocation: null
Properties:
   discovery = 5
   safeCall = 10
   service.pid = org.eclipse.smarthome.threadpool
   thingHandler = 5
       4      4
      10     10
       5      5
       0      0

rossko57 · March 30, 2019, 11:55pm

Making more threads etc. is a bandage to overcome long running threads. It might change your chances but it is not fixing the error source. Don’t do it, code differently, we’ll help.

I don’t think so. Each line completes before the next line can run.

It takes an understanding of what each line does.
Example
myLight.sendCommand(ON)
This only does what it says, which does lead to confusion.
It puts the command on the openHAB event bus, then ends.
It doesn’t wait for the light to turn on; it doesn’t even wait for the command to arrive at a binding.
So for our rule-writing purposes, it does look like it goes off and does the real work in the background, though the takes just a mS to execute.

Actions like
results = sendHttpGetRequest( "http://zzz.com" )
obviously take some time, and it will not move on to the next line until it’s fetched the remote data - which could take many seconds. You might use the results data in the next line, so it must wait.

sendTelegramPhoto("Ewelina", "http://192.168.1....
comes in the same category but worse.
It must fetch the photo from the (I guess) camera - how long does that take, is the camera busy sometimes? - then push the whole message to the remote telegram service - how often does that make you wait a few seconds, does it throttle your data speed? How many other TCP activities are going on on your machine, does it have to wait for network access?

You can’t do anything about all that, you can’t stop it hogging a thread while it’s happening. We can stop it causing errors, we can break threads into smaller chunks that gives everybody a turn.

Karol · March 31, 2019, 12:10am

Is there some way to run such “heavy” commands like sendTelegram or http get in backgroud?

PS After all, is there a way to change rules threads and safecalls in OH 2.4?

rossko57 · March 31, 2019, 12:15am

An example.
Requirement -
When gate opens, aim camera, allow 3 seconds, send a photo snapshot.
Don’t do this more frequently than every five minutes.


var Timer snapshotTimer = null

rule "snapshots"
when
   Item gate changed to OPEN
then
   camera.sendCommand("positioning")
         // we need to wait 3 secs for that
   if (snapshotTimer === null) {  // don't do it if timer already exists
      snapshotTimer = createTimer( now.plusSeconds(3) , [ |
         sendTelegramPhoto( blah blah )
            // now make an anonymous timer to unblock 
         createTimer( now.plusMinutes(5) , [ |
            snapshotTimer = null
          ] )
          // the snapshotTimer has finished but will stay non-null for 5 minutes
      ] )
   } else {
      logInfo("test", "Gate opened but telegram blocked")
   }
end

Karol · March 31, 2019, 12:29am

But thats another rule running. Too many small rules will consume all rule threads.
And what will happen if camera will not respond in 3s in that case? Will it bo or not the same effect as before?

rossko57 · March 31, 2019, 12:39am

But small rules don’t keep hold of the thread for very long. That little example finishes in a couple of milliseconds. The same work gets done in the end. The point is that you can’t break the timer by setting it to to null in some other rule, because the timer takes care of that by itself.

Well, that’s why I’m asking for your requirement. Is there any way for the camera to tell openHAB to that it’s ready? That would be great; instead of a timer we can trigger a telegram-sending rule from camera-move-complete event.

Karol · March 31, 2019, 12:43am

It could be problem because it’s possible that camera could be out of range sometimes for example.

Karol · March 31, 2019, 1:14am

To organize the acquired knowledge (a little):

Timer can be canceled ONLY when it’s in scheduled state? When it was already executed, lines of code inside him won’t stop running.
When (in my example) code from switch/case #2 clausule (“case 2 :”) is still running (truing to get/send image from camera - connection lost, waiting for timeout), canceling timer will do nothing?
If I will add additional veryfication like “if( !== null) { reschedule…” before reshedule I should avoid this exception…?

rossko57 · March 31, 2019, 12:07pm

mytimer.cancel just removes any future execution trigger. That’s all it does.

That’s right - see above

Yes. It’s obviously up to you how manage aborting one of your long sequences part way through, whatever tidying up may need to happen.

rlkoshak · April 1, 2019, 12:48am

In the background is still consuming a thread. You can either use a Rules thread, of which by default there is 5, it a timer thread which by default there are two.

Many small fast running rules as far far better than fewer long running rules.

I completely agree with rossko57. It would be far better for you to break this huge complicated rule into several smaller rules. These will be easier to understand, take less time to run (which is the key factor for avoiding running it if threads) and be easier to find errors.

I’ll also recommend again that switching to Expire binding fort Timers may help make the rules simpler and less error prone as well.

Ultimately, if you can convert this long and complicated set of rules into a series of simpler and faster running rules you will be far better off.

Karol · April 1, 2019, 8:47am

I’ll consider rewriting all to smaller Rules, but I have limited options, that all task included should run i propper sequence.

Afterr that I added aditional check that handle is not null just before each reschedule, problem dissapeared.

Once again… Thanks guys. It’s good to have such great support.

PS Enybody have knowledge how and where to trim threads setting? I mainly mean: org.eclipse.smarthome.threadpool:safeCall and org.eclipse.smarthome.threadpool:ruleEngine. Is it even possible in OH 2.4 (i read somewhere that there is some problem with lower uppercase variable name of Rules Threads, it looks like settings doesn’t take affect os system (changing rules threads = still max 5 threads running))?

rossko57 · April 1, 2019, 9:11am

Yes, of course.
But there is more than one way to do most tasks.

openHAB is event driven, the most “natural” way to do sequences is to kick off task A. When something responds “task A complete”, then you can trigger task B from that.
That’s not always possible, and you can turn to timers and delays as second-choice methods.

Karol · April 2, 2019, 8:03am

Enybody have knowledge how and where to trim threads setting? I mainly mean: org.eclipse.smarthome.threadpool:safeCall and org.eclipse.smarthome.threadpool:ruleEngine. Is it even possible in OH 2.4 (i read somewhere that there is some problem with lower uppercase variable name of Rules Threads, it looks like settings doesn’t take affect os system (changing rules threads = still max 5 threads running))?

rlkoshak · April 2, 2019, 3:00pm

This thread discusses all of this. Why have my Rules stopped running? Why Thread::sleep is a bad idea

The file is at the bottom of the first post but the rest of the thread has useful information as well.